Next Article in Journal
Biomimetic Computing for Efficient Spoken Language Identification
Previous Article in Journal
Application of Metaheuristics for Optimizing Predictive Models in iHealth: A Case Study on Hypotension Prediction in Dialysis Patients
Previous Article in Special Issue
Advances in Zeroing Neural Networks: Bio-Inspired Structures, Performance Enhancements, and Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Binary Particle Swarm Optimization with Manta Ray Foraging Learning Strategies for High-Dimensional Feature Selection

1
School of Artificial Intelligence, Xiamen Institute of Technology, Xiamen 361021, China
2
School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
*
Author to whom correspondence should be addressed.
Biomimetics 2025, 10(5), 315; https://doi.org/10.3390/biomimetics10050315
Submission received: 26 March 2025 / Revised: 26 April 2025 / Accepted: 29 April 2025 / Published: 13 May 2025

Abstract

:
High-dimensional feature selection is one of the key problems of big data analysis. The binary particle swarm optimization (BPSO) method, when used to achieve feature selection for high-dimensional data problems, can get stuck in local optima, leading to reduced search efficiency and inferior feature selection results. This paper proposes a novel BPSO method with manta ray foraging learning strategies (BPSO-MRFL) to address the challenges of high-dimensional feature selection tasks. The BPSO-MRFL algorithm draws inspiration from the manta ray foraging optimization (MRFO) algorithm and incorporates several distinctive search strategies to enhance its efficiency and effectiveness. These search strategies include chain learning, cyclone learning, and somersault learning. Chain learning allows particles to learn from each other and share information more effectively in order to improve the social learning ability of the population. Cyclone learning introduces a gradual increase over iterations, which helps the BPSO-MRFL algorithm to transition smoothly from exploratory searching to exploitative searching, and it creates a balance between exploration and exploitation. Somersault learning enables particles to adaptively search within a changing search range and allows the algorithm to fine-tune the selected features, which enhances the algorithm’s local search ability and improves the quality of the selected subset. The proposed BPSO-MRFL algorithm was evaluated using 10 high-dimensional small-sample gene expression datasets. The results demonstrate that the proposed BPSO-MRFL algorithm achieves enhanced classification accuracy and feature reduction compared to traditional feature selection methods. Additionally, it exhibits competitive performance compared to other advanced feature selection methods. The BPSO-MRFL algorithm presents a promising approach to feature selection in high-dimensional data mining tasks.

1. Introduction

High-dimensional data denote datasets characterized by a large number of features or attributes, wherein the dimensionality typically exceeds the number of observations. However, such data present significant challenges, including the “curse of dimensionality”. This phenomenon adversely impacts the efficacy of conventional data analysis and modeling methodologies, as the distances between data points become increasingly diffuse and sparse in high-dimensional spaces. Consequently, this results in heightened susceptibility to overfitting and elevated computational expenses [1]. In the analysis of high-dimensional data, feature selection is an essential step to reducing model complexity and computational costs and enhance the model’s generalization capability. By eliminating redundant and irrelevant features, feature selection minimizes noise interference, thereby improving the accuracy and robustness of predictive models. Furthermore, it facilitates a deeper understanding of the underlying data structure, enabling the identification of latent patterns and relationships within high-dimensional datasets. As such, feature selection constitutes a critical component in the preprocessing and modeling of high-dimensional data, contributing significantly to both interpretability and performance [2].
In fact, feature selection (FS) is a combinatorial optimization problem that aims to optimize two conflicting objectives: maximizing the accuracy of feature classification and minimizing the number of selected features [3]. Dealing with high-dimensional data remains a challenging task due to the fact that the search space grows exponentially with the number of features. Feature selection methods can broadly be classified into three categories: wrapper methods [4,5], filter methods [6], and embedded methods [7,8]. Filter methods evaluate the importance of features regardless of any specific learning algorithm, making them computationally efficient and capable of handling a large number of features. However, they may not capture dependencies between features or their interaction with the learning algorithm. Filter methods are also prone to overfitting, and the selected features may not necessarily lead to the best performance in a specific learning task. Wrapper methods use a specific learning algorithm to evaluate the usefulness of features. They select a subset of features by repeatedly training and testing the learning algorithm with different subsets of features. Wrapper methods are computationally expensive and may overfit the data, but they are often more effective than filter methods in selecting relevant features for a specific learning task. Embedded methods incorporate feature selection into the learning algorithm itself. These methods typically use regularization techniques, such as Lasso or Ridge regression, to penalize the coefficients of irrelevant features and encourage sparsity in the resulting model. Embedded methods are computationally efficient and can produce models with good predictive performance. However, they may require a large amount of data and may not work well with non-linear models.
The wrapper-based approach to feature subset selection constitutes an NP-hard problem. Although exhaustive search strategies that evaluate all possible subsets can theoretically identify the optimal solution, their computational complexity renders them impractical for most real-world applications. Consequently, efficient global search techniques are necessary to navigate the expansive solution space effectively. Meta-heuristic algorithms have emerged as particularly suitable methods for resolving such complex combinatorial optimization challenges. Empirical studies have demonstrated the efficacy of swarm intelligence algorithms—a prominent subclass of meta-heuristics—in addressing feature selection problems [9]. Notable swarm-intelligence-based optimization techniques applied to this domain include gray wolf optimization (GWO) [10], particle swarm optimization (PSO) [11], gravitational search algorithms (GSAs) [12], and genetic algorithms (GAs) [13], each offering distinct advantages in feature subset exploration and selection.
In recent years, binary particle swarm optimization (BPSO) has gained significant attention due to its conceptual simplicity, ease of implementation, and fast convergence. It has also been successfully applied to solve feature selection problems [14]. BPSO was proposed by Kennedy and Eberhart in 1997 [15]. PSO was converted into BPSO by using a transfer function that maps a continuous search space to a binary one. The updating process is designed to switch the positions of particles between 0 and 1 in binary search spaces. The goal of feature selection is to eliminate as many irrelevant and redundant features as possible. To achieve this goal, existing wrapper-based feature selection methods based on BPSO typically adopt an integrated fitness function that combines maximizing classification performance and minimizing feature subset size [16]. However, with an increase in data dimension, the search space of the feature selection problem grows exponentially. This leads to a problem that satisfactory feature subsets with some key features may not be found, as a large number of feasible feature subsets pose significant challenges to BPSO. BPSO algorithms often search too slowly to obtain good feature subsets. So, further improvements to the exploration and exploitation capabilities of the BPSO algorithm are needed.
Researchers have proposed various BPSO algorithms that use different mechanisms to enhance the search process. Mirjalili et al. [17] introduced six new transfer functions categorized into two types, S-shaped and V-shaped, and significantly enhanced the original BPSO algorithm. Xue et al. [18] proposed a variable-length representation method for PSO-based feature selection. Although these strategies made PSO obtain better solutions in less time, the computational cost of this method was high. Song et al. [19] proposed the Mutual Information-based Bare-Bones PSO algorithm (MIBBPSO). This algorithm employed an effective population initialization strategy based on label correlation, which utilized the correlation between features and labels to accelerate population convergence. However, when solving high-dimensional problems, this method required a significant amount of time, and due to the limitations of the initialization strategy, the diversity of particles was constrained, leading to a higher likelihood of the algorithm getting trapped in local optima. Jain et al. [20] proposed a hybrid method by combining the Correlation-based Feature Selection (CFS) technique with an improved BPSO algorithm. This method also faces the same problem as MIBBPSO. Thaer et al. [21] proposed a method called Boolean particle swarm optimization with evolutionary population dynamics. Six natural selection mechanisms, including the Best-Based, Tournament, Roulette Wheel, Stochastic Universal Sampling, Linear Rank, and Random-Based mechanisms, were employed to select better solutions. Because of the adoption of multiple selection mechanisms, this algorithm also had high computational costs. Cheng et al. [22] introduced competitive particle swarm optimization (CSO). CSO enabled particles to learn from better particles randomly selected from the swarm, leading to improved performance. Due to the introduction of a competitive mechanism, the algorithm still incurs significant computational costs when dealing with high-dimensional problems. In summary, the recently proposed algorithms have all improved the performance of BPSO in feature selection. However, their ability to handle high-dimensional datasets is still limited. Therefore, further research is necessary.
In addition to BPSO, researchers have also investigated the application of other non-PSO-based metaheuristic algorithms in feature selection. Genetic algorithms (GAs) have emerged as one of the most popular metaheuristic algorithms used to solve FS problems. Oh et al. [23] proposed a GA-based FS method that incorporated local search operations and genetic operations, but this method was prone to premature convergence and exhibited low search efficiency. Emary et al. [24] introduced a feature selection method based on the gray wolf optimization (GWO) algorithm, but this algorithm had poor population diversity and exhibited slow convergence speed in later stages. RYM Nakamura et al. [25] proposed a binary bat algorithm for feature selection, which combined the exploration ability of the bat algorithm with the Best Path Forest classifier. Majdi et al. [26] proposed two binary variants of the Whale Optimization Algorithm (WOA). The first variant explored the impact of using Tournament and Roulette Wheel selection mechanisms during the search process, and the second variant incorporated crossover and mutation operators to enhance the exploitation ability of the WOA algorithm. Hichem et al. [27] introduced a novel binary variant of the Grasshopper Optimization Algorithm (GOA). Their algorithm initialized the positions of grasshoppers with binary values and employed simple operators for position updates. Ibrahim et al. [28] introduced eight time-varying S-shaped and V-shaped transfer functions into the Binary Dragonfly Algorithm (BDA). These metaheuristic algorithms mentioned above also face similar challenges as the variants of binary particle swarm optimization algorithm. They have high computational costs, the convergence speed is slow, and it is easy to get trapped in local optima when dealing with high-dimensional problems.
Furthermore, numerous hybrid metaheuristic algorithms have been proposed to deal with feature selection problem. For instance, Qasem et al. [29] presented a binary gray wolf optimization–particle swarm optimization hybrid algorithm, which combining PSO and GWO. Ranya et al. [30] proposed a binary hybridization of GWO and Harris hawks optimization. This method employed an S-shaped transfer function to convert the continuous search space into a binary search space. Zohre et al. [31] introduced a three-stage hybrid feature selection method called information gain-based butterfly optimization algorithm (BOA). This method combined information gain technique with the BOA. Lu et al. [32] proposed a hybrid feature selection technique that integrated mutual information maximization and adaptive genetic algorithm. Ma et al. [33] proposed a two-stage hybrid ant colony algorithm. This algorithm incorporated an interval strategy to determine the optimal subset size of features searched in the additional stage. Overall, hybrid metaheuristic algorithms are also efficient and effective in finding the best subset of features for classification. However, when applied to high-dimensional feature selection problems, these algorithms still suffer from high computational overhead and low convergence accuracy. Thus, there is still room for further improvement in the convergence speed and accuracy of these algorithms.
Manta ray foraging optimization (MRFO) [34] is a swarm intelligence algorithm inspired by the foraging behaviors of manta rays. Specifically, the optimization process of MRFO involves three foraging operators: chain foraging, cyclone foraging, and somersault foraging. The chain foraging strategy makes each individual update its position with respect to the one in front of it and the current global best solution. The cyclone foraging strategy makes each individual update its position with respect to both the one in front of it and a reference position, which can either the best position obtained so far or a random position produced in the search space. The choice between the two depends on the value of iteration. The gradual increase in the value of iteration encourages MRFO to smoothly transition from an exploratory search to an exploitative search. With the value of the random number, MRFO can switch between chain foraging and cyclone foraging. Somersault foraging allows individuals to adaptively search in a changing search range.
The MRFO has been widely applied. Abdel-Mawgoud et al. [35] proposed an improved manta ray foraging optimizer, introducing a simulated annealing operator to enhance the development phase of MRFO and applying it to solve the integration of renewable energy in distribution networks. Supiksha et al. [36] combined the MRFO with the rider optimization algorithm, proposing general adversarial networks based on a manta ray foraging optimizer for effective glaucoma detection. Ibrahim et al. [37] proposed a hybrid improved bat foraging optimization algorithm with the slap swarm algorithm to address IoT task scheduling issues in cloud computing. Yuxian et al. [38] proposed an enhanced elephant herding optimization algorithm and introduced the rolling foraging strategy of bats and Gaussian mutation. Neeraj et al. [39] utilized the MRFO to optimize multiple locally relevant embedding strength values (MESs), balancing invisibility and robustness, and proposed a novel image adaptive watermarking scheme called MantaRayWmark.
In this paper, we propose a novel algorithm called the BPSO-MRFL, which integrates the MRFO search mechanism into BPSO. BPSO-MRFL involves three search phases—chain learning, cyclone learning, and somersault learning—which enhances the algorithm’s search ability and obtains better performance.
The novelty of this work is as follows:
  • The algorithm introduces the chain learning of the manta ray foraging optimization algorithm into the binary particle swarm optimization algorithm, which effectively improves the exploration ability of the algorithm;
  • The algorithm introduces the rolling learning of the manta ray foraging optimization algorithm into the binary particle swarm optimization algorithm, which effectively improves the development ability of the algorithm;
  • The algorithm introduces the whirlwind learning of the manta ray foraging optimization algorithm into the binary particle swarm optimization algorithm to balance the development and exploration ability of the algorithm;
  • We conduct experiments on 10 gene expression datasets with thousands of features that are publicly available to evaluate the performance of the proposed algorithms;
The remainder of this paper is organized as follows. Section 2 presents the background and related work. Section 3 introduces the proposing BPSO-MRFL algorithm. Section 4 presents the experimental datasets, the evaluating metrics of experimental results, and the parameter settings of the comparing algorithms. Section 5 provides the experimental results and comparisons. Finally, Section 6 presents the conclusions of this paper.

2. Related Work

2.1. Binary Particle Swarm Optimization

PSO is a swarm intelligence algorithm inspired by the flocking behavior of birds in nature. In the PSO algorithm, each particle is characterized by a position vector and a velocity vector, and it explores the solution space based on its current best-known position. Additionally, particles share information regarding the best positions discovered by the entire swarm. The velocity and position of each particle are updated according to Equations (1) and (2).
v i t + 1 = w v i t + c 1 × r 1 × p b e s t i x i t + c 2 × r 2 × g b e s t x i t
x i t + 1 = x i t + v i t + 1
where v i t and v i t + 1 denote the velocity of the i-th particle at the t-th and the (t + 1)-th generations, respectively; w is the inertia weight; c1 and c2 are two learning factors that control the particles’ learning rates; pbesti represents the personal best position of the i-th particle; gbest is the global best position found by the swarm; r1 and r2 are two random numbers uniformly distributed in the interval [0, 1]; and x i t and x i t + 1 represent the position of the i-th particle at the t-th and (t + 1)-th generations, respectively.
A sigmoid function, defined in Equation (3), is introduced to map real values into the interval [0, 1]. This function, referred to as the transfer function, plays a crucial role in BPSO.
T f = 1 1 + exp v i t + 1
The transfer function defined in Equation (3) generates a value within the range [0, 1], and the particle’s position in the binary search space is determined based on Equation (4).
x i t + 1 = 1           If   r a n d < Tf 0           If   r a n d Tf
In PSO, the velocity represents the proximity of the current position to the global optimum. A small velocity indicates that the particle is close to the global solution, and thus, its next position should be updated with a minor adjustment. Conversely, a large velocity suggests that the particle requires larger movements. Unlike PSO, BPSO does not update a particle’s position based on its current position. Instead, in BPSO, a positive velocity increases the probability that a bit in the particle’s position will be one, while a negative velocity increases the likelihood that the bit will be zero. Additionally, when the velocity is zero, the value of the sigmoid function is 0.5, leading to an equal probability of the bit being zero or one. As a result, BPSO may exhibit divergence toward the end of the algorithm, since the particle position bits are determined independently of their previous states.

2.2. Manta Ray Foraging Optimization

Manta ray foraging optimization (MRFO) is inspired by the intelligent foraging behaviors of manta rays. To simulate these behaviors, the algorithm incorporates three distinct foraging strategies: chain foraging, cyclone foraging, and somersault foraging.
In chain foraging, individuals form a chain and move toward areas with higher plankton concentration, while also following the individual directly ahead of them. The mathematical modeling of chain foraging is described by Equations (5) and (6).
q i t + 1 = q i t + r 1 q b q i t + α q b q i t                             i = 1 q i t + r 2 q i 1 t q i t + α q b q i t                 i = 2 , , N
α = 2 r 3 log ( r 4 )
where q i t is the current position of the i-th manta ray in the t-th generation; qb is the global best position of all manta rays; and α is a weight coefficient.
In cyclone foraging, manta rays in deep water form a spiral-shaped foraging chain toward a patch of plankton. In addition to spiraling toward the food source, each manta ray also moves toward the individual ahead of it. The mathematical modeling of the spiral movement of manta rays in a two-dimensional space is described by Equations (7) and (8).
q i t + 1 = q b + r 5 q b q i t + β q b q i t                           i = 1 q b + r 6 q i 1 t q i t + β q b q i t             i = 2 , , N
β = 2 e r 8 × T t + 1 t sin 2 π × r 8
where β is a weight coefficient and T is the maximum number of iterations.
The MRFO algorithm employs cyclone foraging behavior to balance exploitation and exploration during the search process. Individuals move along a spiral path toward both the food source and the individual ahead of them. To enhance exploration capability, individuals are occasionally assigned a new random reference position within the search space. This mechanism enables MRFO to perform an extensive global search. The corresponding mathematical modeling is described by Equations (9) and (10).
q r p t = L w + r 9 U p L w
q i t + 1 = q r p t + r 10 q r p t q i t + β q r p t q i t                   i = 1 q r p t + r 11 q i 1 t q i t + β q r p t q i t       i = 2 , , N
where q r p t is a randomly generated position within the search space range and Lw and Up, respectively, represent the lower and upper bounds of the search space.
In somersault foraging, manta rays treat the food’s position as a pivot point, swimming around it and updating their positions relative to the best solution found so far. The corresponding mathematical model is given in Equation (11).
q i t + 1 = q i t + S × r 12 × q b r 13 × q i t
where S is the somersault factor and its value is set to 2.

3. Binary Particle Swarm Optimization with Manta Ray Foraging Learning Strategies (BPSO-MRFL) Algorithm

This section proposes the binary particle swarm optimization with manta ray foraging learning strategies (BPSO-MRFL) for feature selection. The proposed algorithm aims to enhance the performance of the BPSO algorithm in selecting features for high-dimensional datasets. Three different learning strategies of the MRFO algorithm are introduced into the BPSO algorithm, which include chain learning, cyclone learning, and somersault learning. The following subsections will provide a detailed explanation on each learning strategy.

3.1. Chain Learning

In the chain learning phase, the particles work like the manta rays of MRFO. They line up head-to-tail and form a foraging chain. The particles, except for the first one, move towards not only the best particle but also the one in front of them. The velocity update formula is shown in Equation (12).
v i t + 1 = ω v i t + c 1 × r 1 × p b e s t i x i t + c 2 × r 2 × g b e s t x i t               i = 1 ω v i t + c 1 × r 1 × p b e s t i 1 x i t + c 2 × r 2 × g b e s t x i t     i = 2 , , N
where v i t is the velocity of the i-th particle in the previous iteration; pbesti and gbest is the personal best position and global best position of the i-th particle; r1 and r2 are two different random numbers in the interval [0, 1]; c1 and c2 are learning factors; and w is a linearly decreasing inertia weight. The updated formulas of c1 and c2 are shown in Equations (13) and (14):
c 1 = 2 × e r f 0.5 t T
c 2 = 3 × e r f 1.7 t T
where t is the current iteration number and T is the maximum number of iterations.
The MRFO algorithm uses the parameter α to adjust the learning rate of individuals during the chain foraging phase, with the α value in each iteration shown in Figure 1. As can be seen from the figure, the range of α values extends beyond that of the interval [0, 1], and there is a high probability that it will be greater than 1. Therefore, at this stage, each individual tends to learn from the globally optimal individual, indicating that the algorithm is inclined towards global exploration. To enable particles in BPSO to learn in a similar manner, this section adjusts the learning factors c1 and c2, with their iteration curve diagrams shown in Figure 2. As can be seen from the figure, the value of c2 is greater than that of c1, which is beneficial for the algorithm’s exploitation. Both of them will change with the number of iterations. With the increasing number of iterations, the algorithm’s exploitation ability gradually weakens while its exploration ability gradually strengthens.
Then, we use a V-shaped transfer function to convert the particle’s position to the binary search space, as shown in Equations (15) and (16):
T f v i t + 1 = tanh v i t + 1
x i t + 1 = ~ x i t     If   r a n d < Tf x i t       If   r a n d Tf
where x i t is the position of the i-th particle in the previous iteration.

3.2. Cyclone Learning

Cyclone learning includes two parts. In the first part, all particles except for the first one learn not only from the previous particle but also from the best particle. The velocity update formula is shown in Equation (17):
v i t + 1 = ω v i t + c 3 × r 1 × p b e s t i x i t + c 4 × r 2 × g b e s t x i t       i = 1 ω v i t + c 3 × r 1 × p b e s t i 1 x i t + c 4 × r 2 × g b e s t x i t       i = 2 , , N
where c3 and c4 are learning factors. The updated formulas of c3 and c4 are shown in Equations (18) and (19):
c 3 = 3 × e r f 1.7 t T
c 4 = 2 × e r f 0.5 t T
From Equations (18) and (19), it is shown that c3 has a higher value than c4, which is a benefit for the algorithm’s exploration. These two values will change with an increasing number of iterations. The algorithm’s exploration ability gradually diminishes with increasing iterations, while its exploitation ability gradually strengthens.
In the second part, to facilitate a global search, we assign a new random position in the search space as the reference position for each particle, which forces them to seek a new position far from the current best one. Its mathematical equation is shown in Equations (20) and (21):
x r = L W + r a n d × U P L w
v i t + 1 = ω v i t + c 5 × r 1 × p b e s t i x i t + c 6 × r 2 × x r x i t       i = 1 ω v i t + c 5 × r 1 × p b e s t i 1 x i t + c 6 × r 2 × x r x i t           i = 2 , , N
where xr is a random position generated in the search space; LW and UP are the lower and upper limits of the search space, respectively; c5 and c6 are constants, and their values are both 2; and rand is a random number in the interval [0, 1].

3.3. Somersault Learning

In the somersault learning phase, the global best position is viewed as a pivot, and particles roll back and forth near the pivot and their personal best position. The position update formula for this phase can be expressed as Equations (22) and (23):
x i n e w = p b e s t i t + 2 × r 1 × g b e s t r 2 × p b e s t i t
x i t + 1 = 1         x i n e w > 0.5 0         x i n e w < 0.5
where x i n e w is the new position of the i-th particle generated after somersault foraging learning, and its value is a continuous value. Then, it is converted into a binary value by comparing it with the threshold 0.5.
Equation (22) allows each particle to move to a new position between its personal best position and the global best position found so far. As the distance between the two positions decreases, the perturbation on the current position also decreases, and all particles gradually become close to the optimal solution. This adaptive reduction as iterations increase enhances the algorithm’s exploration ability.

3.4. Description of the BPSO-MRFL Algorithm

The pseudocode of the BPSO-MRFL algorithm is presented in Algorithm 1. In Algorithm 1, lines 7~14 are the pseudocode of the cyclone foraging phase. Lines 15~18 are the pseudocode of the chain learning phase, and lines 22~24 are the pseudocode of the somersault learning phase. The flowchart of the BPSO-MRFL algorithm is shown in Figure 3.
Algorithm 1: Pseudocode of the BPSO-MRFL-based FS method.
Input: Maximum number of generations T, swarm size N, the dataset with D features;
Output: A set of selected features;
1Initialize each particle’s position and t = 1;
2Evaluate the fitness value of each particle using Equation (24);
3Update the pbests and gbest;
4while t < T do
5  for i = 1 to N do
6    if rand < 0.5 then
7      if t/T < rand then//Cyclone foraging
8 Update   x r by Equation (20);
9 Update   v i t + 1 by Equation (21);
10 Update   x i t + 1 by Equations (15) and (16);
11      else
12 Update   v i t + 1 by Equations (17)–(19);
13 Update   x i t + 1 by Equations (15) and (16);
14      end
15    else//Chain learning
16 Update   v i t + 1 by Equations (12)–(14);
17 Update   x i t + 1 by Equations (15) and (16);
18    end
19  end
20  Evaluate the fitness value of each particle using Equation (24);
21  Update the pbests and gbest;
22  for i = 1 to N do//Somersault learning
23 Update   x i t + 1 by Equations (22) and (23);
24  end
25  Evaluate the fitness value of each particle using Equation (24);
26  Update the pbests and gbest;
27end
28return gbest;

4. Experiment Design

4.1. Datasets

This study used 10 gene expression datasets to rigorously evaluate the effectiveness of our proposed algorithm for feature selection [40]. A brief description and statistics about the datasets are shown in Table 1, including the number of features, instances of samples, and classes. These datasets are commonly used in research on feature selection due to their high dimensionality and small sample size. They have been extensively studied and applied in various research works.

4.2. Evaluating the Fitness

The purpose of feature selection is to enhance classification accuracy and minimize the number of selected features, which constitutes a multi-objective optimization problem. To address these objectives, a fitness function is devised using the linear weighting method, which is shown in Equation (24):
f i t n e s s = ω × E + 1 ω × d D
where E is the classification error rate of a certain classifier, and d and D represent the number of selected features and the total number of features, respectively. Additionally, ω is a constant and its value is usually set to 0.99.
The wrapper method is implemented in this study using the k-nearest neighbor (KNN) algorithm to evaluate the classification accuracy. The KNN algorithm is an instance-based machine learning algorithm that classifies a new sample into the class of the k-nearest training data points based on their distance. KNN is a non-parametric algorithm since it does not require assumptions or modeling of the data beforehand. In this paper, each dataset was randomly divided into two parts: the training set accounted for 80%, and the remaining 20% was used as the test set. The value of k for the KNN algorithm was set to 5. The details of KNN can be learned in [41].

4.3. Parameter Settings

All the algorithms were executed with the same configurations on different datasets. The BPSO-MRFL algorithm was used for all experiments with the following settings: T = 100 for the maximum number of consecutive iterations, N = 20 for the population size, and a range of 0.4 to 0.9 for the inertia weight parameter.

5. Experiments and Discussion

In this section, we comprehensively evaluate and analyze the proposed BPSO-MRFL by experiments. We compare the BPSO-MRFL algorithm with several feature selection methods reported in the literature, including three classical filtering feature selection algorithms, four non-PSO-based feature selection methods, and four PSO variants reported in the recent literature. The experimental analysis and comparison are carried out in the following three aspects: classification accuracy, number of features, and fitness value. We use ten-fold cross-validation to construct training and test sets for our experiments. Specifically, one fold is reserved as unseen test data and is not used during the FS process. The remaining nine folds constitute the training data, which are exclusively used for FS. We evaluate the performance of the FS method through KNN classification. In these experiments, the parameters used for each method being compared were adjusted according to the values specified in the corresponding papers. And, we run all algorithm 20 times to analyze the results.

5.1. Comparative Analysis

5.1.1. Comparison with Classical Feature Selection Methods

We conduct experiments to compare the performance of BPSO-MRFL with three traditional feature selection methods, namely CFS [42], FCBF [43], and LFS [44]. Table 2, Table 3 and Table 4 show the results of the experiments on ten different datasets. These results are the average value from the algorithm over 20 runs.
After analyzing and comparing the results presented in Table 1, it can be concluded that BPSO-MRFL is the best feature selection method for improving the performance of the classifier. On all 10 datasets, BPSO-MRFL achieves a higher classification accuracy than the other 3 classical feature selection methods. On 40% of the datasets, BPSO-MRFL outperforms the maximum classification accuracy obtained by the other methods by more than 5%. The datasets with the highest proportion of classification accuracy are 9Tumor and Brain Tumor2. These results provide strong evidence that BPSO-MRFL is superior to the other three classical feature selection methods in improving classifier performance.
Based on the data presented in Table 3, it can be concluded that the BPSO-MRFL algorithm is effective in eliminating redundant features. On the majority of the datasets, the optimal feature subset obtained by BPSO-MRFL has fewer features than other methods. On 60% of the datasets, BPSO-MRFL significantly reduces the number of features compared to other methods. Although the LFS algorithm obtains relatively smaller feature subsets on some datasets, the classification accuracy is lower. On the other hand, BPSO-MRFL achieves the best classification accuracy while obtaining a smaller feature subset, indicating that it is a robust method for most datasets. Therefore, the results fully demonstrate that BPSO-MRFL is the most effective method for eliminating redundant features compared to the three classical feature selection methods.
The experimental results presented in Table 4 demonstrate that the BPSO-MRFL algorithm outperforms other comparison algorithms on all 10 datasets, achieving the best average fitness values. This indicates that the proposed algorithm exhibits superior performance compared to the other methods.

5.1.2. Comparison with Other Well-Known Optimizers

In this section, we compare the performance of BPSO-MRFL with other non-PSO well-known optimizers for feature selection tasks.
Specifically, we compare it with GA [45], HLBDA [46], BGWO2 [24], and BGSA [47] optimizers, and the results are presented in Table 5, Table 6, Table 7 and Table 8. These results are the average value and variance of the algorithm over 20 runs.
Upon comparing the data in Table 5, the results indicate significant improvements in the performance of the classifier. On the 10 datasets, BPSO-MRFL outperforms the other optimization algorithms in terms of classification accuracy on 7 datasets. Specifically, BPSO-MRFL demonstrates significantly higher classification accuracy than the other comparative algorithms on the Prostate Tumor and Brain Tumor2 datasets. On Leukemia1, DLBCL, and Leukemia3, the proposed algorithm and BGWO2 show the same average classification accuracy, but the average feature selection ratio of the proposed algorithm is significantly less than that of BGWO2. These results clearly indicate that BPSO-MRFL is the most effective among the five feature selection methods in enhancing the classifier’s performance.
The results of the elimination of redundant features are shown in Table 6. On all datasets, BPSO-MRFL obtains an optimal feature subset with a number of features that is less than 50% of the minimum value achieved by other methods. Furthermore, on 80% of the datasets, the number of features in the optimal subset obtained by BPSO-MRFL is below 20% of the minimum value obtained by other methods. On 30% of the datasets, this number is even below 10% of the minimum value obtained by other methods. And, BPSO-MRFL obtains the fewest features on Brain Tumor2. These findings provide strong evidence that the BPSO-MRFL method outperforms the other four optimizers in eliminating redundant features.
Based on the data presented in Table 7 and Table 8, it is evident that the BPSO-MRFL method achieves superior fitness values compared to the other four methods. Therefore, we can draw the following conclusion: in comparison to these non-PSO-based methods, the BPSO-MRFL algorithm proposed in this study has significant competitive advantages. That also substantiates the progressiveness of the BPSO-MRFL algorithm in dealing with the feature selection problem on high-dimensional data.

5.1.3. Comparison with Different Variants of PSO

In this part, we compare BPSO-MRFL with some different variants of PSO: SBPSO [17], VBPSO [17], QBPSO [48], and UTF-BPSO [49]. The performances of BPSO-MRFL and other PSO-based methods are compared in Table 9, Table 10, Table 11 and Table 12. By comparing the results with those of SBPSO, VBPSO, and QBPSO, it is evident that BPSO-MRFL outperforms these algorithms on all datasets. BPSO-MRFL achieves significantly higher classification accuracy while reducing the number of features on three datasets, namely Leukemia1, DLBCL, and Leukemia3. These findings demonstrate that BPSO-MRFL is a more effective method for feature selection than the other three PSO-based algorithms.
When compared with UTF-BPSO, BPSO-MRFL achieves better FS results on all datasets. Although UTF-BPSO selected fewer features than SBPSO, VBPSO, and QBPSO, the optimal feature subset obtained by BPSO-MRFL has several times fewer features than UTF-BPSO, particularly on the Brain Tumor2 dataset, where the difference is 42.3 times. Moreover, the fitness values of BPSO-MRFL are higher than those of the other BPSO variants on all datasets.

5.1.4. Overall Comparison on the Statistical Results and Radar Chart

In Table 13, the results of the Friedman test [50], a non-parametric statistical test, are present based on the classification accuracy, number of selected features, and fitness value obtained by the algorithms. The minimum value in the table indicates a better result. The algorithms are ranked in the test, and as seen in the table, BPSO-MRFL achieves the best rank. The Wilcoxon signed-rank test, as described in [51], is employed to perform pairwise comparisons between the methods. A p-value greater than 0.05 indicates similar classification performances between the two methods, while a p-value below 0.05 signifies significant differences. Table 14 presents the results of the Wilcoxon test conducted to evaluate BPSO-MRFL against other competing methods. The obtained results demonstrate that, in the majority of cases, the proposed BPSO-MRFL exhibits significantly superior classification performance compare to the other methods. These results demonstrate that the introduced learning strategies significantly enhance the efficiency of BPSO-MRFL in binary search spaces.

5.2. Convergence Analysis

Figure 4 shows the convergence curves of BPSO-MRFL and several comparison algorithms. Each curve represents the result of 1 iteration among 20 runs. We plot the convergence curves of HLBDA, BGWO2, BGSA and UTF-BPSO and compare them with BPSO-MRFL. We chose HLBDA, BGWO2, BGSA, and UTF-BPSO because these algorithms have an average rank in the top five of the Friedman experiment. From Figure 4, it can be seen that the convergence curve of BPSO-MRFL significantly outperforms the other algorithms on most datasets, indicating that BPSO-MRFL has a better convergence ability. On the Leukemia1, Brain Tumor1, Prostate Tumor, Leukemia3, and Lung datasets, BPSO-MRFL has a fast convergence speed and achieves higher or comparable accuracy than the other comparison algorithms. On the DLBCL, Leukemia2, and Brain Tumor2 datasets, the convergence speed of the BPSO-MRFL algorithm in the early iterations is slower than that of other comparison algorithms, but it achieves the same classification accuracy as other the comparison algorithms in the later iterations. As shown in Table 6 and Table 10, the number of features selected by BPSO-MRFL on these three datasets is much lower than that of the other comparison algorithms. This all indicates that the chain learning and cyclone learning strategies enhance the algorithm’s exploration and exploitation abilities. In addition, we can also observe from Figure 4 that the somersault learning strategy allows BPSO-MRFL to jump out of local optima in the later iterations.
On the other hand, Figure 5 presents the boxplot of SBPSO, UTF-BPSO, and BPSO-MRFL. From Figure 5, it can be observed that BPSO-MRFL outperforms SBPSO and UTF-BPSO in terms of both median and mean values. These results strongly support the effectiveness of the proposed learning strategy in achieving the highest prediction accuracy in this study. Moreover, the boxplots also indicate that the proposed algorithm exhibits high stability on most of the datasets, providing further evidence of its superior convergence ability compared to other algorithms.
Figure 6 presents the average classification accuracy of MRFL-BPSO compared with other algorithms for the different datasets. As illustrated in Figure 6, MRFL-BPSO achieves superior classification accuracy over the other algorithms on the datasets except 9Tumor. For the 9Tumor dataset, while MRFL-BPSO is outperformed by BGWO2, it still maintains higher accuracy than the remaining algorithms. Figure 7 demonstrates the size of selected feature subsets obtained by MRFL-BPSO in comparison with other algorithms. The results reveal that MRFL-BPSO selects significantly smaller feature subsets than six algorithms (VPSO, SBPSO, QBPSO, BGSA, UTF-BPSO, and HLBDA), while yielding comparable subset sizes to the other two algorithms (GA and BGWO2). Therefore, these findings provide full evidence for the effectiveness of the MRFL-BPSO algorithm.
Based on the above comparison, it can be concluded that the three proposed learning strategies are effective and BPSO-MRFL has competitive performance in terms of convergence.

6. Conclusions

In this paper, a new algorithm called BPSO-MRFL has been proposed. The algorithm is based on the manta ray foraging search mechanism and has three different search phases: chain learning, cyclone learning, and somersault learning. The chain learning phase enhances the social learning ability of the population. In the cyclone learning phase, the gradual increase in iterations encourages BPSO-MRFL to smoothly transition from an exploratory search to an exploitative search, balancing the algorithm’s exploration and exploitation abilities. The somersault learning phase allows particles to adaptively search in a changing search range, enhancing the local search ability of the algorithm. Therefore, all three search phases contribute to improving the search performance of BPSO-MRFL.
The proposed BPSO-MRFL algorithm has been comprehensively evaluated on gene expression datasets and compared with well-established swarm optimization algorithms, including three classical feature selection, four well-known optimizers, and four different variants of PSO. The experimental results demonstrate that BPSO-MRFL can achieve competitive or even better performance on most datasets. The algorithm combined with the classifier studied in this paper is only the KNN classifier with a few parameters and easy to implement. There are many classifiers that affect the classification performance of feature selection. How to select the optimal classifier needs further research. In future work, we will aim to conduct a deeper study of the BPSO-MRFL algorithm’s performance on ultra-high dimensional datasets and further improve its performance and enhance its efficiency.

Author Contributions

Conceptualization, J.L. and Y.C.; methodology, J.L. and Y.C.; software, Y.C. and J.L.; validation, J.L., Y.C., and S.L.; formal analysis, Y.C. and J.L.; investigation, Y.C. and J.L.; resources, J.L. and Y.C.; data curation, Y.C. and J.L.; writing—original draft preparation, Y.C.; writing—review and editing, J.L, Y.C., and S.L.; visualization, Y.C.; supervision, J.L.; project administration, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Xiamen Municipal Natural Science Foundation, China (grant number: 3502Z20227332), and the Provincial Natural Science Foundation of Fujian, China (grant number: 2023J01349).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are contained within this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BDABinary Dragonfly Algorithm
BOAButterfly Optimization Algorithm
BPSOBinary Particle Swarm Optimization
BPSO-MRFLBinary Particle Swarm Optimization with Manta Ray Foraging Learning Strategies
CFSCorrelation-based Feature Selection
CSOCompetitive Particle Swarm Optimization
FSFeature Selection
GAGenetic Algorithm
GAsGenetic Algorithms
GOAGrasshopper Optimization Algorithm
GSAGravitational Search Algorithm
GWOGray Wolf Optimization
MIBBPSOMutual Information-based Bare-Bones PSO algorithm
MRFOManta Ray Foraging Optimization
PSOParticle Swarm Optimization
QBPSOQuantum-Inspired Binary Particle Swarm Optimization
SBPSOS-Shaped Transfer Function-based Binary Particle Swarm Optimization
UTF-BPSOUpgrade Transfer Function for Binary Particle Swarm Optimization
VBPSOV-Shaped Transfer Function-based Binary Particle Swarm Optimization
WOAWhale Optimization Algorithm

References

  1. Askr, H.; Abdel-Salam, M.; Hassanien, A.E. Copula entropy-based golden jackal optimization algorithm for high-dimensional feature selection problems. Expert. Syst. Appl. 2024, 238, 121582. [Google Scholar] [CrossRef]
  2. Li, M.; Ma, H.; Lv, S.; Wang, L.; Deng, S. Enhanced NSGA-II-based feature selection method for high-dimensional classification. Inf. Sci. 2024, 663, 120269. [Google Scholar] [CrossRef]
  3. Pradip, D.; Chandrashekhar, A. A multi-objective feature selection method using Newton’s law based PSO with GWO. Appl. Soft Comput. 2021, 107, 107394. [Google Scholar] [CrossRef]
  4. Amukta, M.V.; Tirumala, K.B. A hybrid filter-wrapper feature selection using Fuzzy KNN based on Bonferroni mean for medical datasets classification: A COVID-19 case study. Expert Syst. Appl. 2023, 218, 119612. [Google Scholar] [CrossRef]
  5. Hu, P.; Zhu, J. A filter-wrapper model for high-dimensional feature selection based on evolutionary computation. Appl. Intell. 2025, 55, 581. [Google Scholar] [CrossRef]
  6. Qi, Z.; Liu, Y.; Song, Q.; Zhou, N. An improved greedy reduction algorithm based on neighborhood rough set model for sensors screening of exoskeleton. IEEE Sens. J. 2021, 21, 26964–26977. [Google Scholar] [CrossRef]
  7. Zhao, J.; Chen, L.; Pedrycz, W.; Wang, W. Variational inference-based automatic relevance determination kernel for embedded feature selection of noisy industrial data. IEEE Trans. Ind. Electron. 2019, 66, 416–428. [Google Scholar] [CrossRef]
  8. Li, J.; Qi, F.; Sun, X.; Zhang, B.; Xu, X.; Cai, H. Unsupervised feature selection via collaborative embedding learning. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2529–2540. [Google Scholar] [CrossRef]
  9. Kashef, S.; Nezamabadi-pour, H. A new feature selection algorithm based on binary ant colony optimization. In Proceedings of the 5th Conference on Information and Knowledge Technology, Shiraz, Iran, 28–30 May 2013. [Google Scholar] [CrossRef]
  10. Seyedali, M.; Seyed, M.M.; Andrew, L. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
  11. Eberhart, R.; Kennedy, J. A new optimizer using particles swarm theory. In Proceedings of the Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 4–6 October 1995. [Google Scholar] [CrossRef]
  12. Saryazdi, N.P. GSA: A gravitational search algorithm. Inf. Sci. 2009, 179, 2232–2248. [Google Scholar] [CrossRef]
  13. Aličković, Z.E.; Subasi, A. Breast cancer diagnosis using GA feature selection and rotation forest. Neural Comput. Appl. 2017, 28, 753–763. [Google Scholar] [CrossRef]
  14. Chuang, L.Y.; Tsai, S.W.; Yang, C.H. Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 2011, 38, 12699–12707. [Google Scholar] [CrossRef]
  15. Kennedy, J.; Eberhart, R. A discrete binary version of the particle swarm algorithm. In Proceedings of the IEEE International Conference on Computational Cybernetics and Simulation, Orlando, FL, USA, 12–15 October 1997. [Google Scholar] [CrossRef]
  16. Banka, H.; Dara, S. A hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation. Pattern Recogn. 2015, 52, 94–100. [Google Scholar] [CrossRef]
  17. Seyedali, M.; Andrew, L. S-shaped versus V-shaped transfer functions for binary particle swarm optimization. Swarm Evol. Comput. 2013, 9, 1–14. [Google Scholar] [CrossRef]
  18. Tran, B.; Bing, X.; Zhang, M. Variable-Length particle swarm optimization for feature selection on high-dimensional classification. IEEE Trans. Evol. Comput. 2019, 23, 473–487. [Google Scholar] [CrossRef]
  19. Song, X.; Zhang, Y.; Gong, D.; Sun, K. Feature selection using bare-bonesparticle swarm optimization with mutual information. Pattern Recogn. 2021, 112, 107804. [Google Scholar] [CrossRef]
  20. Jain, I.; Jain, V.K.; Jain, R. Correlation feature selection based improved binary particle swarm optimization for gene selection and cancer classification. Appl. Soft Comput. 2018, 62, 203–215. [Google Scholar] [CrossRef]
  21. Thaher, T.; Chantar, H.; Too, J.; Mafarja, M.; Turabieh, H.; Houssein, E.H. Boolean Particle swarm optimization with various evolutionary population dynamics approaches for feature selection problems. Expert. Syst. Appl. 2022, 195, 116550. [Google Scholar] [CrossRef]
  22. Cheng, R.; Jin, Y. A competitive swarm optimizer for large scale optimization. IEEE Trans. Cybern. 2015, 45, 191–204. [Google Scholar] [CrossRef]
  23. Oh, I.; Lee, J.; Moon, B.R. Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1424–1437. [Google Scholar] [CrossRef]
  24. Emary, E.; Zawba, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
  25. Nakamura, R.; Pereira, L.; Costa, K.A.; Rodrigues, D.; Papa, J.P.; Yang, X.S. BBA: A binary bat algorithm for feature selection. In Proceedings of the 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, Ouro Preto, Brazil, 22–25 August 2012; Swarm Intelligence and Bio-Inspired Computation 2013. pp. 225–237. [Google Scholar] [CrossRef]
  26. Majdi, M.; Seyedali, M. Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 2018, 62, 441–453. [Google Scholar] [CrossRef]
  27. Hichem, H.; Elkamel, M.; Rafik, M.; Mesaaoud, M.T.; Ouahiba, C. A new binary grasshopper optimization algorithm for feature selection problem. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 316–328. [Google Scholar] [CrossRef]
  28. Majdi, M.; Ibrahim, A.; Ali, A.H.; Hossam, F.; Philippe, F.; Li, X.; Seyedali, M. Binary dragonfly optimization for feature selection using time-varying transfer functions. Knowl.-Based Syst. 2018, 161, 185–204. [Google Scholar] [CrossRef]
  29. Al-Tashi, Q.; Abdul Kadir, S.J.; Rais, H.M.; Seyedali, M.; Alhussian, H. Binary optimization using hybrid grey wolf optimization for feature selection. IEEE Access 2019, 7, 39496–39508. [Google Scholar] [CrossRef]
  30. Al-Wajih, R.; Abdulkadir, S.J.; Aziz, N.; Al-Tashi, Q.; Talpur, N. Hybrid binary grey wolf with harris hawks optimizer for feature selection. IEEE Access 2021, 9, 31662–31677. [Google Scholar] [CrossRef]
  31. Sadeghian, Z.; Akbari, E.; Nematzadeh, H. A hybrid feature selection method based on information theory and binary butterfly optimization algorithm. Eng. Appl. Artif. Intel. 2021, 97, 104079. [Google Scholar] [CrossRef]
  32. Lu, H.; Chen, J.; Yan, K.; Jin, Q.; Xue, Y.; Gao, Z. A hybrid feature selection algorithm for geneexpression data classification. Neurocomputing 2017, 256, 56–62. [Google Scholar] [CrossRef]
  33. Ma, W.; Zhou, X.; Zhu, H.; Li, L.; Jiao, L. A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recogn. 2021, 116, 107933. [Google Scholar] [CrossRef]
  34. Zhao, W.; Zhang, Z.; Wang, L. Manta ray foraging optimization: An effective bio-inspired optimizer for engineering applications. Eng. Appl. Artif. Intel. 2020, 87, 103300. [Google Scholar] [CrossRef]
  35. Abdel-Mawgoud, H.; Ali, A.; Kamel, S.; Rahmann, C.; Abdel-Moamen, M.A. A modified manta ray foraging optimizer for planning inverter-based photovoltaic with battery energy storage system and wind turbine in distribution networks. IEEE Access 2021, 9, 91062–91079. [Google Scholar] [CrossRef]
  36. Jain, S.; Indora, S.; Atal, D.K. Rider manta ray foraging optimization-based generative adversarial network and CNN feature for detecting glaucoma. Biomed. Signal Proces. 2022, 73, 103425. [Google Scholar] [CrossRef]
  37. Attiya, I.; Elaziz, M.A.; Abualigah, L.; Nguyen, T.N.; El-Latif, A.A.A. An improved hybrid swarm intelligence for scheduling IoT application tasks in the cloud. IEEE Trans. Ind. Inform. 2022, 18, 6264–6272. [Google Scholar] [CrossRef]
  38. Duan, Y.; Liu, C.; Li, S.; Guo, X.; Yang, C. Manta ray foraging and gaussian mutation-based elephant herding optimization for global optimization. Eng. Comput. 2023, 39, 1085–1125. [Google Scholar] [CrossRef]
  39. Sharma, N.K.; Kumar, S.; Rajpal, A.; Kumar, N. MantaRayWmark: An image adaptive multiple embedding strength optimization based watermarking using manta ray foraging and bi-directional ELM. Expert. Syst. Appl. 2022, 200, 116860. [Google Scholar] [CrossRef]
  40. Chen, K.; Xue, B.; Zhang, M.; Zhou, F. An evolutionary multitasking-based feature selection method for high-dimensional classification. IEEE Trans. Cybern. 2022, 52, 7172–7186. [Google Scholar] [CrossRef]
  41. Deng, Z.; Zhu, X.; Cheng, D.; Zong, M.; Zhang, S. Efficient kNN classification algorithm for big data. Neurocomputing 2016, 195, 143–148. [Google Scholar] [CrossRef]
  42. Chou, T.S.; Yen, K.K.; Luo, J.; Pissinou, N.; Makki, K. Correlation-based feature selection for intrusion detection design. In Proceedings of the MILCOM 2007—IEEE Military Communications Conference, Orlando, FL, USA, 29–31 October 2007. [Google Scholar] [CrossRef]
  43. Yu, L.; Liu, H. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the Twentieth International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003; Available online: https://dl.acm.org/doi/10.5555/3041838.3041946 (accessed on 24 December 2024).
  44. Gutlein, M.; Frank, E.; Hall, M.; Karwath, A. Large-scale attribute selection using wrappers. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009; IEEE: Piscataway, NJ, USA, 2009. [Google Scholar] [CrossRef]
  45. Siedlecki, W.; Sklansky, J. A note on genetic algorithms for large-scale feature selection. In Handbook of Pattern Recognition and Computer Vision; World Scientifific: Singapore, 1993; pp. 88–107. [Google Scholar] [CrossRef]
  46. Too, J.; Seyedali, M. A hyper learning binary dragonfly algorithm for feature selection: A COVID-19 case study. Knowl.-Based Syst. 2021, 212, 106553. [Google Scholar] [CrossRef]
  47. Rashedi, E.; Nezamabadi-pour, H.; Saryazdi, S. BGSA: Binary gravitational search algorithm. Nat. Comput. 2010, 9, 727–745. [Google Scholar] [CrossRef]
  48. Jeong, Y.-W.; Park, J.-B.; Jang, S.-H.; Lee, K.Y. A new quantum-inspired binary PSO: Application to unit commitment problems for power systems. IEEE Trans. Power Syst. 2010, 25, 1486–1495. [Google Scholar] [CrossRef]
  49. Zahra, B. UTF: Upgrade transfer function for binary meta-heuristic algorithms. Appl. Soft Comput. 2021, 106, 107346. [Google Scholar] [CrossRef]
  50. Hu, J.; Chen, H.; Heidari, A.A.; Wang, M.; Zhang, X.; Chen, Y.; Pan, Z. Orthogonal learning covariance matrix for defects of grey wolf optimizer: Insights, balance, diversity, and feature selection. Knowl.-Based Syst. 2021, 213, 106684. [Google Scholar] [CrossRef]
  51. Carrasco, J.; García, S.; Rueda, M.M.; Das, S.; Herrera, F. Recent trends in the use of statistical tests for comparing swarm and evolutionary computing algorithms: Practical guidelines and a critical review. Swarm Evol. Comput. 2020, 54, 100665. [Google Scholar] [CrossRef]
Figure 1. The iteration curve of α.
Figure 1. The iteration curve of α.
Biomimetics 10 00315 g001
Figure 2. Iteration curves of c1 and c2.
Figure 2. Iteration curves of c1 and c2.
Biomimetics 10 00315 g002
Figure 3. The flowchart of MRFL-BPSO.
Figure 3. The flowchart of MRFL-BPSO.
Biomimetics 10 00315 g003
Figure 4. Convergence curves of the MRFL-BPSO algorithms and other comparison algorithms on each dataset.
Figure 4. Convergence curves of the MRFL-BPSO algorithms and other comparison algorithms on each dataset.
Biomimetics 10 00315 g004
Figure 5. Boxplots of fitness values for MRFL-BPSO versus SBPSO and UTF-BPSO in dealing with the 10 datasets.
Figure 5. Boxplots of fitness values for MRFL-BPSO versus SBPSO and UTF-BPSO in dealing with the 10 datasets.
Biomimetics 10 00315 g005
Figure 6. Comparison of MRFL-BPSO against other methods in terms of average accuracy.
Figure 6. Comparison of MRFL-BPSO against other methods in terms of average accuracy.
Biomimetics 10 00315 g006
Figure 7. Comparison of MRFL-BPSO against other methods in terms of average feature subset size.
Figure 7. Comparison of MRFL-BPSO against other methods in terms of average feature subset size.
Biomimetics 10 00315 g007
Table 1. Summary of the experimental datasets.
Table 1. Summary of the experimental datasets.
IDDatasetsInstancesFeaturesClasses
1Leukemia17253273
2DLBCL7754692
39Tumor6057269
4Brain Tumor19059205
5Prostate Tumor10210,5092
6Leukemia27271293
7Brain Tumor25010,3674
8Leukemia37211,2253
911Tumor17412,53311
10Lung20312,6005
Table 2. Comparison of classification accuracy with classical feature selection methods.
Table 2. Comparison of classification accuracy with classical feature selection methods.
DatasetsCFSFCBFLFSMRFL-BPSO
Leukemia10.96940.95160.97661.0000
DLBCL0.99210.98430.97081.0000
9Tumor0.63770.62300.57260.8504
Brain Tumor10.91480.91290.88410.9467
Prostate Tumor0.95470.95110.94350.9914
Leukemia20.92560.93500.92470.9616
Brain Tumor20.85150.87640.81130.9900
Leukemia30.99790.99670.94681.0000
11Tumor0.90240.90620.82600.9516
Lung0.96770.96610.91900.9754
Note: Bold indicates the best results.
Table 3. Comparison of the number of features with classical feature selection methods.
Table 3. Comparison of the number of features with classical feature selection methods.
DatasetsCFSFCBFLFSMRFL-BPSO
Leukemia19749227.96
DLBCL8866183.14
9Tumor47322657.76
Brain Tumor1142106227.45
Prostate Tumor5949136.80
Leukemia2119712117.19
Brain Tumor21177555.10
Leukemia313880155.13
11Tumor37939425200.53
Lung5504531020.31
Note: Bold indicates the best results.
Table 4. Comparison of fitness values with classical feature selection methods.
Table 4. Comparison of fitness values with classical feature selection methods.
DatasetsCFSFCBFLFSMRFL-BPSO
Leukemia10.030470.048000.023190.00002
DLBCL0.007940.015710.028940.00001
9Tumor0.358750.373300.423200.14448
Brain Tumor10.084540.086380.114740.05365
Prostate Tumor0.044960.048500.055910.00949
Leukemia20.073820.064500.074560.03894
Brain Tumor20.147180.122420.186820.01074
Leukemia30.002190.003340.052640.00001
11Tumor0.096950.093160.172280.04822
Lung0.032400.033900.080160.02436
Note: Bold indicates the best results.
Table 5. Comparison of classification accuracy with other well-known optimizers.
Table 5. Comparison of classification accuracy with other well-known optimizers.
DatasetsGAHLBDABGWO2BGSAMRFL-BPSO
Leukemia10.9424 ± 0.010.9477 ± 0.011.0000 ± 0.000.9475 ± 0.011.0000 ± 0.00
DLBCL0.9619 ± 0.010.9656 ± 0.011.0000 ± 0.000.9644 ± 0.011.0000 ± 0.00
9Tumor0.5813 ± 0.030.6047 ± 0.020.8881 ± 0.020.6034 ± 0.040.8504 ± 0.02
Brain Tumor10.8650 ± 0.010.8677 ± 0.010.9448 ± 0.010.8685 ± 0.010.9467 ± 0.01
Prostate Tumor0.9142 ± 0.010.9174 ± 0.010.9838 ± 0.010.9215 ± 0.010.9914 ± 0.01
Leukemia20.8381 ± 0.010.8442 ± 0.010.9716 ± 0.010.8508 ± 0.020.9914 ± 0.01
Brain Tumor20.8375 ± 0.020.8495 ± 0.020.9563 ± 0.020.8433 ± 0.030.9900 ± 0.01
Leukemia30.9496 ± 0.010.9538 ± 0.011.0000 ± 0.000.9553 ± 0.021.0000 ± 0.00
11Tumor0.8536 ± 0.010.8618 ± 0.010.9622 ± 0.010.8681 ± 0.010.9516 ± 0.01
Lung0.9105 ± 0.000.9114 ± 0.000.9742 ± 0.010.9138 ± 0.000.9754 ± 0.01
Note: Bold indicates the best results.
Table 6. Comparison of the number of features with other well-known optimizers.
Table 6. Comparison of the number of features with other well-known optimizers.
DatasetsGAHLBDABGWO2BGSAMRFL-BPSO
Leukemia12114.68 ± 7.172253.52 ± 25.0854.07 ± 5.262037.37 ± 10.837.96 ± 3.06
DLBCL2165.74 ± 8.402281.01 ± 26.6236.54 ± 3.592076.37 ± 8.403.14 ± 0.66
9Tumor2304.79 ± 11.192456.16 ± 36.92146.44 ± 13.832241.27 ± 23.8257.76 ± 19.06
Brain Tumor12366.83 ± 11.162481.96 ± 30.6960.92 ± 6.192271.22 ± 11.687.45 ± 4.10
Prostate Tumor2393.32 ± 7.882516.43 ± 31.3866.97 ± 7.622297.80 ± 13.206.80 ± 3.17
Leukemia22926.30 ± 8.053030.57 ± 36.19102.74 ± 8.682841.34 ± 15.4317.19 ± 7.68
Brain Tumor24405.86 ± 11.784441.26 ± 46.54100.90 ± 12.234289.51 ± 13.665.10 ± 1.39
Leukemia34805.86 ± 13.784832.15 ± 51.7183.26 ± 8.704688.20 ± 22.835.13 ± 0.97
11Tumor5470.28 ± 21.205605.70 ± 58.50400.94 ± 23.755415.72 ± 26.71200.53 ± 36.36
Lung5427.71 ± 9.535345.34 ± 66.72167.43 ± 21.275288.86 ± 18.8520.31 ± 5.27
Note: Bold indicates the best results.
Table 7. Comparison of fitness values with other well-known optimizers.
Table 7. Comparison of fitness values with other well-known optimizers.
DatasetsGAHLBDABGWO2BGSAMRFL-BPSO
Leukemia10.06104 ± 0.010.05597 ± 0.010.00010 ± 0.000.05585 ± 0.010.00002 ± 0.00
DLBCL0.04164 ± 0.010.03820 ± 0.010.00007 ± 0.000.03907 ± 0.010.00001 ± 0.00
9Tumor0.41856 ± 0.030.39567 ± 0.020.11104 ± 0.020.39652 ± 0.040.14448 ± 0.02
Brain Tumor10.13766 ± 0.010.13522 ± 0.010.05472 ± 0.010.13406 ± 0.010.05365 ± 0.01
Prostate Tumor0.08897 ± 0.010.08603 ± 0.010.01613 ± 0.010.08161 ± 0.010.00949 ± 0.01
Leukemia20.16439 ± 0.010.15852 ± 0.010.02826 ± 0.010.15166 ± 0.020.03894 ± 0.01
Brain Tumor20.16510 ± 0.020.15330 ± 0.020.04332 ± 0.020.15925 ± 0.030.01074 ± 0.01
Leukemia30.05422 ± 0.010.05006 ± 0.010.00007 ± 0.000.04846 ± 0.020.00001 ± 0.00
11Tumor0.14926 ± 0.010.14131 ± 0.010.03770 ± 0.010.13487 ± 0.010.04822 ± 0.01
Lung0.09291 ± 0.000.09195 ± 0.000.02565 ± 0.010.08957 ± 0.000.02436 ± 0.01
Bold indicates the best results.
Table 8. Comparison of the statistical variance of fitness values with other well-known optimizers.
Table 8. Comparison of the statistical variance of fitness values with other well-known optimizers.
DatasetsGAHLBDABGWO2BGSAMRFL-BPSO
Leukemia18.14 × 10−31.01 × 10−29.88 × 10−69.99 × 10−38.29 × 10−6
DLBCL8.53 × 10−31.00 × 10−26.57 × 10−61.09 × 10−21.45 × 10−6
9Tumor2.77 × 10−21.95 × 10−21.71 × 10−23.88 × 10−22.06 × 10−2
Brain Tumor11.09 × 10−25.79 × 10−31.12 × 10−28.42 × 10−31.35 × 10−2
Prostate Tumor9.84 × 10−36.13 × 10−38.54 × 10−36.78 × 10−38.32 × 10−3
Leukemia21.35 × 10−21.47 × 10−28.20 × 10−31.74 × 10−21.24 × 10−2
Brain Tumor22.33 × 10−22.39 × 10−21.58 × 10−22.61 × 10−29.65 × 10−3
Leukemia31.41 × 10−21.39 × 10−27.75 × 10−61.56 × 10−29.76 × 10−7
11Tumor1.31 × 10−21.16 × 10−21.06 × 10−21.19 × 10−21.11 × 10−2
Lung4.91 × 10−34.32 × 10−35.90 × 10−34.32 × 10−37.20 × 10−3
Bold indicates the best results.
Table 9. Comparison of classification accuracy with different variants of PSO.
Table 9. Comparison of classification accuracy with different variants of PSO.
DatasetsSBPSOVBPSOQBPSOUTF-BPSOMRFL-BPSO
Leukemia10.9484 ± 0.010.9491 ± 0.010.9433 ± 0.010.9911 ± 0.011.0000 ± 0.00
DLBCL0.9629 ± 0.010.9615 ± 0.010.9663 ± 0.010.9961 ± 0.011.0000 ± 0.00
9Tumor0.5964 ± 0.020.5841 ± 0.030.5983 ± 0.030.7548 ± 0.020.8504 ± 0.02
Brain Tumor10.8691 ± 0.010.8638 ± 0.010.8664 ± 0.010.9128 ± 0.010.9467 ± 0.01
Prostate Tumor0.9225 ± 0.010.9210 ± 0.000.9180 ± 0.010.9578 ± 0.010.9914 ± 0.01
Leukemia20.8497 ± 0.020.8410 ± 0.020.8453 ± 0.010.9158 ± 0.020.9914 ± 0.01
Brain Tumor20.8448 ± 0.020.8402 ± 0.020.8462 ± 0.030.9128 ± 0.020.9900 ± 0.01
Leukemia30.9612 ± 0.020.9524 ± 0.020.9567 ± 0.020.9924 ± 0.011.0000 ± 0.00
11Tumor0.8641 ± 0.010.8583 ± 0.010.8650 ± 0.010.9171 ± 0.010.9516 ± 0.01
Lung0.9137 ± 0.010.9118 ± 0.010.9107 ± 0.000.9484 ± 0.000.9754 ± 0.01
Note: Bold indicates the best results.
Table 10. Comparison of the number of features with different variants of PSO.
Table 10. Comparison of the number of features with different variants of PSO.
DatasetsSBPSOVBPSOQBPSOUTF-BPSOMRFL-BPSO
Leukemia12482.87 ± 7.282410.29 ± 10.042253.32 ± 9.53138.45 ± 24.357.96 ± 3.06
DLBCL2538.26 ± 7.242463.62 ± 9.542296.30 ± 9.4297.83 ± 13.423.14 ± 0.66
9Tumor2694.09 ± 11.42613.56 ± 8.642463.75 ± 15.34362.47 ± 65.1157.76 ± 19.06
Brain Tumor12760.83 ± 8.012679.51 ± 12.462498.81 ± 10.76135.60 ± 26.367.45 ± 4.10
Prostate Tumor2786.19 ± 9.842711.89 ± 15.032531.60 ± 9.77156.56 ± 30.776.80 ± 3.17
Leukemia23357.62 ± 15.293272.85 ± 12.983088.87 ± 17.92240.89 ± 32.4917.19 ± 7.68
Brain Tumor24926.52 ± 13.574823.65 ± 13.284600.55 ± 15.64215.85 ± 24.675.10 ± 1.39
Leukemia35346.87 ± 12.625237.68 ± 17.45003.23 ± 14.94208.33 ± 23.985.13 ± 0.97
11Tumor6059.46 ± 16.995947.66 ± 19.145731.80 ± 21.88828.62 ± 144.87200.53 ± 36.36
Lung6002.85 ± 11.375877.43 ± 8.845616.36 ± 14.22344.80 ± 81.6720.31 ± 5.27
Note: Bold indicates the best results.
Table 11. Comparison of fitness values with different variants of PSO.
Table 11. Comparison of fitness values with different variants of PSO.
DatasetsSBPSOVBPSOQBPSOUTF-BPSOMRFL-BPSO
Leukemia10.05578 ± 0.010.05494 ± 0.010.06036 ± 0.010.00903 ± 0.010.00002 ± 0.00
DLBCL0.04135 ± 0.010.04263 ± 0.010.03755 ± 0.010.00407 ± 0.010.00001 ± 0.00
9Tumor0.40426 ± 0.020.41632 ± 0.030.40194 ± 0.030.24335 ± 0.020.14448 ± 0.02
Brain Tumor10.13423 ± 0.010.13932 ± 0.010.13647 ± 0.010.08652 ± 0.010.05365 ± 0.01
Prostate Tumor0.08144 ± 0.010.08276 ± 0.000.08538 ± 0.010.04202 ± 0.010.00949 ± 0.01
Leukemia20.15347 ± 0.020.16200 ± 0.020.15750 ± 0.010.08374 ± 0.020.03894 ± 0.01
Brain Tumor20.15838 ± 0.020.16285 ± 0.020.15670 ± 0.030.08657 ± 0.020.01074 ± 0.01
Leukemia30.04321 ± 0.020.05181 ± 0.020.04736 ± 0.020.00773 ± 0.010.00001 ± 0.00
11Tumor0.13942 ± 0.010.14498 ± 0.010.13826 ± 0.010.08275 ± 0.010.04822 ± 0.01
Lung0.09016 ± 0.010.09202 ± 0.010.09286 ± 0.000.05134 ± 0.000.02436 ± 0.01
Note: Bold indicates the best results.
Table 12. Comparison of the statistical variance of fitness values with different variants of PSO.
Table 12. Comparison of the statistical variance of fitness values with different variants of PSO.
DatasetsSBPSOVBPSOQBPSOUTF-BPSOMRFL-BPSO
Leukemia11.03 × 10−21.13 × 10−28.13 × 10−31.15 × 10−28.29 × 10−6
DLBCL7.49 × 10−31.01 × 10−29.97 × 10−36.11 × 10−31.45 × 10−6
9Tumor2.47 × 10−22.86 × 10−23.04 × 10−21.91 × 10−22.06 × 10−2
Brain Tumor15.84 × 10−38.23 × 10−39.18 × 10−37.33 × 10−31.35 × 10−2
Prostate Tumor5.86 × 10−34.00 × 10−39.43 × 10−39.70 × 10−38.32 × 10−3
Leukemia21.66 × 10−21.57 × 10−21.06 × 10−21.60 × 10−21.24 × 10−2
Brain Tumor22.45 × 10−21.98 × 10−22.66 × 10−21.70 × 10−29.65 × 10−3
Leukemia31.61 × 10−21.88 × 10−21.67 × 10−28.42 × 10−39.76 × 10−7
11Tumor8.35 × 10−31.17 × 10−21.05 × 10−29.52 × 10−31.11 × 10−2
Lung6.29 × 10−35.67 × 10−34.67 × 10−34.59 × 10−37.20 × 10−3
Note: Bold indicates the best results.
Table 13. Overall rank by the F-test for all algorithm based on accuracy, number of features, and fitness value.
Table 13. Overall rank by the F-test for all algorithm based on accuracy, number of features, and fitness value.
AlgorithmGAHLBDABGWO2BGSASBPSOVBPSOBQPSOUTF-BPSOMRFL-BPSO
Accuracy8.86.11.555.35.37.46.131.45
Features5.1624986.931
Fitness8.86.11.75.15.47.56.131.3
Average rank7.576.071.754.86.577.636.3731.25
Finally rank852479631
Note: Bold indicates the best results.
Table 14. The results of the Wilcoxon test of the proposed MRFL-BPSO against other methods.
Table 14. The results of the Wilcoxon test of the proposed MRFL-BPSO against other methods.
DatasetAlgorithm
GAHLBDABGWO2BGSASBPSOVBPSOBQPSOUTF-BPSO
Leukemia17.90 × 10−97.89 × 10−9NaN7.83 × 10−97.82 × 10−97.89 × 10−97.48 × 10−99.25 × 10−4
DLBCL7.29 × 10−97.45 × 10−9NaN7.66 × 10−96.91 × 10−97.45 × 10−97.38 × 10−99.49 × 10−3
9Tumor6.77 × 10−86.77 × 10−88.03 × 10−66.76 × 10−86.77 × 10−86.77 × 10−86.77 × 10−86.77 × 10−8
Brain Tumor16.54 × 10−86.57 × 10−87.75 × 10−16.55 × 10−86.53 × 10−86.56 × 10−86.46 × 10−81.24 × 10−7
Prostate Tumor5.44 × 10−85.40 × 10−85.45 × 10−35.53 × 10−85.01 × 10−84.99 × 10−85.46 × 10−85.42 × 10−8
Leukemia26.73 × 10−86.73 × 10−86.78 × 10−36.74 × 10−86.73 × 10−86.73 × 10−86.71 × 10−81.89 × 10−7
Brain Tumor25.35 × 10−85.34 × 10−81.16 × 10−65.36 × 10−85.37 × 10−85.36 × 10−85.35 × 10−85.32 × 10−8
Leukemia37.70 × 10−97.88 × 10−9NaN7.85 × 10−97.68 × 10−97.90 × 10−97.68 × 10−93.93 × 10−4
11Tumor6.79 × 10−86.79 × 10−88.35 × 10−36.79 × 10−86.79 × 10−86.79 × 10−86.79 × 10−81.23 × 10−7
Lung6.76 × 10−86.75 × 10−84.40 × 10−16.76 × 10−86.75 × 10−86.75 × 10−86.75 × 10−86.75 × 10−8
Sum101051010101010
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Chen, Y.; Li, S. Binary Particle Swarm Optimization with Manta Ray Foraging Learning Strategies for High-Dimensional Feature Selection. Biomimetics 2025, 10, 315. https://doi.org/10.3390/biomimetics10050315

AMA Style

Liu J, Chen Y, Li S. Binary Particle Swarm Optimization with Manta Ray Foraging Learning Strategies for High-Dimensional Feature Selection. Biomimetics. 2025; 10(5):315. https://doi.org/10.3390/biomimetics10050315

Chicago/Turabian Style

Liu, Jianhua, Yuxiang Chen, and Shanglong Li. 2025. "Binary Particle Swarm Optimization with Manta Ray Foraging Learning Strategies for High-Dimensional Feature Selection" Biomimetics 10, no. 5: 315. https://doi.org/10.3390/biomimetics10050315

APA Style

Liu, J., Chen, Y., & Li, S. (2025). Binary Particle Swarm Optimization with Manta Ray Foraging Learning Strategies for High-Dimensional Feature Selection. Biomimetics, 10(5), 315. https://doi.org/10.3390/biomimetics10050315

Article Metrics

Back to TopTop