A Novel MBPSO–BDGWO Ensemble Feature Selection Method for High-Dimensional Classification Data

Sancar, Nuriye

doi:10.3390/informatics13010007

Open AccessArticle

A Novel MBPSO–BDGWO Ensemble Feature Selection Method for High-Dimensional Classification Data

by

Nuriye Sancar

Department of Mathematics, Near East University, 99138 Nicosia, Turkey

Informatics 2026, 13(1), 7; https://doi.org/10.3390/informatics13010007

Submission received: 8 November 2025 / Revised: 14 December 2025 / Accepted: 8 January 2026 / Published: 12 January 2026

Download

Browse Figure

Versions Notes

Abstract

In a high-dimensional classification dataset, feature selection is crucial for improving classification performance and computational efficiency by identifying an informative subset of features while reducing noise, redundancy, and overfitting. This study proposes a novel metaheuristic-based ensemble feature selection approach by combining the complementary strengths of Modified Binary Particle Swarm Optimization (MBPSO) and Binary Dynamic Grey Wolf Optimization (BDGWO). The proposed MBPSO–BDGWO ensemble method is specifically designed for high-dimensional classification problems. The performance of the proposed MBPSO–BDGWO ensemble method was rigorously evaluated through an extensive simulation study under multiple high-dimensional scenarios with varying correlation structures. The ensemble method was further validated on several real datasets. Comparative analyses were conducted against single-stage feature selection methods, including BPSO, BGWO, MBPSO, and BDGWO, using evaluation metrics such as accuracy, the F1-score, the true positive rate (TPR), the false positive rate (FPR), the AUC, precision, and the Jaccard stability index. Simulation studies conducted under various dimensionality and correlation scenarios show that the proposed ensemble method achieves a low FPR, a high TPR/Precision/F1/AUC, and strong selection stability, clearly outperforming both classical and advanced single-stage methods, even as dimensionality and collinearity increase. In contrast, single-stage methods typically experience substantial performance degradation in high-correlation and high-dimensional settings, particularly BPSO and BGWO. Moreover, on the real datasets, the ensemble method outperformed all compared single-stage methods and produced consistently low MAD values across repetitions, indicating robustness and stability even in ultra-high-dimensional genomic datasets. Overall, the findings indicate that the proposed ensemble method demonstrates consistent performance across the evaluated scenarios and achieves higher selection stability compared with the single-stage methods.

Keywords:

feature selection; ensemble; modified binary particle swarm optimization; binary dynamic grey wolf optimization; high-dimensional data

1. Introduction

High-dimensional datasets, where the number of features (p) is large relative to the sample size (n), have become increasingly prevalent with advances in data acquisition and storage technologies across fields such as biomedical research, finance, bioinformatics, text mining, and computer vision [1,2,3]. Although such datasets provide rich information, they also introduce significant analytical challenges, including increased model complexity, the risk of overfitting, degraded generalization performance, and a substantial computational burden [4,5]. In particular, redundant and irrelevant features may mask meaningful patterns, inflate variance, and limit interpretability, ultimately reducing predictive reliability. Feature selection (FS) is a fundamental strategy for addressing these challenges in high-dimensional data by identifying a compact, informative subset S ⊆ {1, …, p} that maximizes predictive performance while reducing noise and redundancy [6]. FS plays a critical role in the machine learning pipeline by discarding irrelevant or weakly informative features and retaining the most discriminative ones, thereby improving generalization ability, enhancing computational efficiency, and strengthening interpretability [2,3]. With the exponential growth of data dimensionality, effective feature selection has become indispensable for building robust and scalable learning models, particularly in scenarios where p ≫ n. Its primary goal is to produce a minimal yet highly informative feature subset that improves prediction accuracy, reduces computational costs, and prevents overfitting [7].

According to their selection strategy, feature selection methods can be divided into two categories: single-stage feature selection methods and ensemble feature selection methods [8]. While traditional single-stage feature selection methods provide benefits in reducing data size and improving model performance, each has inherent limitations. Traditional single-stage feature selection methods, such as filter, wrapper, and embedded techniques, have been widely used in machine learning and data analysis [9]. Filter methods rely on statistical measures (e.g., correlation, mutual information) to rank features independently of the learning algorithm, making them computationally efficient; however, they can be suboptimal due to ignoring feature interactions [9]. Wrapper methods, on the other hand, can select more successful feature subsets by interacting directly with the classifier; however, this process is both computationally expensive and carries the risk of overfitting [6]. Embedded methods, such as LASSO and decision tree-based techniques, integrate feature selection within the model training process, balancing efficiency and performance, but being highly dependent on the chosen model [3]. Throughout this study, the term ‘single-stage methods’ refers to stand-alone feature selection methods that operate independently without combining multiple methods.

However, it is known that single-stage feature selection methods may face difficulties in high-dimensional settings, which include increased computational cost and the tendency to select redundant or noninformative features, which can adversely affect classification performance [4]. Additionally, single-stage feature selection methods, including both traditional and metaheuristic-based approaches, may suffer from challenges such as computational inefficiency and premature convergence under certain conditions, particularly when fixed parameter settings or limited update mechanisms restrict search adaptability [10,11,12]. Although metaheuristic algorithms are generally designed to balance exploration and exploitation, classical variants with static control parameters may still become trapped in local optima, especially in complex, high-dimensional search spaces. It is observed that, especially when a single variable selection method is used, the selected features often form a suboptimal subset [13].

Therefore, ensemble feature selection approaches have been developed to improve robustness, as these methods combine multiple selection strategies to yield a more stable and accurate feature set [10,11]. The stability of feature selection can be improved with the use of feature selection ensembles, which aggregate the results of multiple base feature selectors [14]. Because it combines multiple techniques, ensemble feature selection demonstrates superior consistency and resilience when applied to high-dimensional data where several optimal feature subsets may exist [15,16]. The ensemble feature selection method combines the strengths of multiple single-stage feature selection algorithms, which enable the selection of lower-dimensional and more meaningful features [8,17]. Such methods overcome the weaknesses of single-stage methods while providing advantages such as high accuracy, model stability, and generalizability. Especially in high-dimensional, imbalanced, or noisy datasets, ensemble-based methods demonstrate a significant superiority over classical approaches in producing more stable and reliable feature subsets. Thus, they provide the potential to improve classification performance and reduce overfitting [18,19].

Many ensemble approaches have been developed in the literature to perform efficient and balanced feature selection on high-dimensional datasets. Almomani [13] developed a model in which nature-inspired optimization algorithms such as PSO, GWO, FFA, and GA were used as an ensemble, thus increasing the accuracy rate in detecting network attacks and reducing the processing cost by reducing the number of features. Also, Wang et al. [15] presented SA-EFS, a ranking-based ensemble feature selection model, by utilizing the outputs of chi-square, maximum information coefficient, and XGBoost methods. They succeeded in obtaining more stable feature subsets on high-dimensional datasets. Tu et al. [20] proposed a multi-strategy ensemble GWO (MEGWO) algorithm that includes strategies such as global-best lead, cooperative hunting, and dispersed foraging by applying Grey Wolf Optimization (GWO). They showed that this method is superior to the compared methods in terms of both accuracy and convergence speed. Moreover, Singh and Singh [21] reported significant performance gains in metrics such as accuracy, sensitivity, and AUC in experiments on twenty different medical datasets using a hybrid ensemble feature selection model that combines filter and wrapper methods. Similarly, Robindro et al. [22] developed a hybrid ensemble method called HDFS (PSO-MI) that operates in a distributed structure using mutual information and three different objective functions. This approach determines the most influential features by re-evaluating the combination of selected sub-feature sets across data distributions and provides high classification accuracy. On the other hand, Mandal et al. [23] proposed an ensemble feature selection method that combines filter and wrapper-based approaches to address the classification challenges encountered in high-dimensional, low-sample (HDLSS) datasets. In this study, five different filtering techniques, including Chi-square, Gini Index, F-score, Mutual Information, and Symmetric Uncertainty, were combined, and the resulting feature ranking was optimized using a wrapper search strategy implemented with the Differential Evaluation metaheuristic algorithm. Also, Ab Hamid et al. [24] proposed an ensemble filter approach combining multiple filters such as IG, GR, Chi-squared, and Relief-F to address the shortcomings of filter-based feature selection algorithms. Furthermore, the goal was to simultaneously optimize feature selection and classifier parameters using a PSO-based optimized SVM classifier. The dataset demonstrated that the proposed Ensemble-PSO-SVM method significantly increased classification accuracy. Xu et al. [25] in their study for cancer diagnosis, proposed an ensemble feature selection method called NMICFS-PSO by combining Neighborhood Mutual Information (NMI)-based correlation-based feature selection (CFS) and Particle Swarm Optimization (PSO). In experiments conducted on various gene expression data, this method effectively eliminated unnecessary features and achieved high accuracy rates with a leave-one-out cross-validated SVM classifier.

As a result of the comprehensive literature review, it has been observed that metaheuristic optimization algorithms have gained significant attention in the feature selection problem, which is inherently combinatorial, non-convex, and characterized by a vast discrete search space, where exhaustive or deterministic methods become computationally infeasible [26]. This is primarily because metaheuristics provide flexible, population-based search mechanisms that can efficiently explore large, multimodal, and highly complex feature spaces in which deterministic optimization techniques become impractical. Their ability to balance exploration and exploitation, handle non-linear interactions between features, and operate effectively in discrete or binary search spaces makes them particularly well-suited for feature selection tasks [26]. Unlike traditional feature selection methods, which typically rely on strict mathematical models and assumptions, metaheuristic approaches avoid explicit formulations and instead employ intelligent, stochastic, nature-inspired strategies. By integrating mechanisms of exploration (global search) and exploitation (local refinement), these algorithms provide a more flexible and effective framework for feature subset selection [26,27,28].

Since the feature selection problem is inherently a combinatorial optimization task, where each feature is either selected (1) or excluded (0), the search space is discrete rather than continuous. Accordingly, binary metaheuristic algorithms have become a natural choice, as they represent candidate feature subsets using binary vectors and are explicitly designed to operate in discrete search spaces, which enables efficient exploration of relevant feature combinations while discarding irrelevant or redundant variables [27,29,30]. Among metaheuristic methods, swarm intelligence-based methods have recently attracted a lot of attention in the feature selection area due to their ease of use and promise for global search [28,29,31]. In this context, in the literature review, it has been observed that Binary Particle Swarm Optimization (BPSO) and Binary Grey Wolf Optimization (BGWO) algorithms are widely used for feature selection due to their strong exploratory capabilities, effective convergence behavior, and ability to produce high-quality feature subsets that yield competitive classification accuracy [26,28,32]. BPSO, developed by Kennedy and Eberhart [33] as the binary version of the classical PSO, and inspired by the social behavior of bird flocking, is known for its fast convergence and simplicity of implementation, making it efficient for discovering promising feature subsets in a discrete search space. Because of this structure, BPSO stands out with its small number of parameters, simple applicability, fast convergence, and global search ability [33,34,35,36]. Several studies have confirmed that BPSO variants are highly effective for high-dimensional feature selection, successfully identifying relevant features and improving learning performance [22,37,38,39,40,41]. On the other hand, the Binary Grey Wolf Optimization (BGWO) algorithm is a binary version of GWO which is developed by Mirjalili et al. [42] and includes balancing exploration and exploitation mechanisms through leadership behaviors based on an alpha–beta–delta–omega hierarchy [43]. Thanks to this structure, BGWO has a high local refinement capability and the capacity to produce more stable solutions, allowing strong exploitation abilities and diversity in search. On the other hand, BGWO has been observed to yield quite successful results in feature selection problems, especially in high-dimensional data environments [44,45,46].

BPSO and BGWO are among the most outstanding swarm intelligence algorithms, demonstrating exceptional performance in high-dimensional feature selection problems due to their balance between exploration and exploitation [28]. BPSO’s probabilistic global exploration, rapid convergence, and discrete position-updating mechanism and BGWO’s hierarchical leadership structure with strong local refinement ability complement each other, making them ideal candidates for ensemble feature selection algorithms in high-dimensional data scenarios [28]. Algorithms together have been shown in the literature to produce effective results that provide both high accuracy and a reduction in the number of selected features by combining the fast global exploration power of BPSO with the precise local search capabilities of BGWO [28,47,48,49,50]. Based on these foundations, several modified binary variants have been introduced to further improve convergence stability and robustness. The Modified Binary PSO (MBPSO) incorporates dynamic inertia weights and adaptive cognitive–social coefficients, which provide a smoother transition between exploration and exploitation and improve convergence stability [51,52,53,54]. Likewise, the Binary Dynamic GWO (BDGWO), recently proposed by Erdoğan, Karakoyun, & Gülcü [55], improves the original BGWO through dynamic leadership adaptation and XOR-based bit-level updates, strengthening local refinement capability and reducing the risk of premature convergence. These modifications allow both MBPSO and BDGWO to achieve a stronger balance between global diversity and local search precision. Therefore, combining MBPSO’s exploratory behavior with BDGWO’s exploitation and local optimization capability in an ensemble approach is motivated by their complementary search characteristics and is intended to provide a complementary exploration–exploitation mechanism, which has the potential to facilitate the identification of more stable and compact feature subsets, particularly in high-dimensional data [28,47,48,49,50,54,55]. This formulation aims to balance global search diversity and local refinement within a unified framework by leveraging the complementary strengths of the two algorithms.

Motivated by these reviews, this study proposes a metaheuristic-based ensemble feature selection method that integrates the complementary strengths of MBPSO and BDGWO. The proposed method leverages MBPSO’s probabilistic global exploration and BDGWO’s dynamic bit-level exploitation to achieve effective, stable, and robust feature subset selection in high-dimensional data. In the first stage, both algorithms independently search the solution space using the same objective function to generate candidate feature subsets. In the second stage, these subsets are combined and refined through a hybrid scoring mechanism based on voting scores and mutual information, after which the most informative and non-redundant features are selected to construct the final feature set.

Accordingly, the Section 2 first introduces the proposed ensemble feature selection method and its algorithmic stages, followed by the hyperparameter settings and implementation details. Next, a comprehensive simulation study design is presented to assess performance under various scenarios. Simulation results are then reported and discussed, after which the proposed method is evaluated on a real dataset to validate its practical effectiveness. Finally, the Section 4 summarizes the main findings and highlights future research directions.

2. Materials and Methods

This section introduces the proposed MBPSO–BDGWO ensemble feature selection method and its computational flow. It also gives details of the simulation design, performance evaluation metrics, benchmark methods, and real datasets used to assess performance, along with implementation details and experimental settings.

2.1. Binary PSO

Particle Swarm Optimization (PSO) was developed as a population-based, nature-inspired metaheuristic optimization method by Kennedy and Eberhart [34]. To adapt PSO for discrete problems such as feature selection, Binary PSO (BPSO) was proposed by Kennedy and Eberhart [33]. In BPSO, each particle’s position is represented as a binary vector, where 1 denotes that a feature is selected and 0 denotes that it is not. Velocities of the particles are updated using the same principle as continuous PSO, but a transfer function (commonly a sigmoid) maps velocities into probabilities, determining whether each bit becomes 0 or 1. The algorithm relies on three key parameters: the inertia weight (w), which helps particles maintain momentum and aims to balance exploration and exploitation; the cognitive coefficient

(φ_{1})

, which steers particles toward their personal best solution (Pbest); and the social coefficient

{(φ}_{2}

), which attracts particles toward the global best solution (Gbest).

Modified BPSO

Although the parameters φ₁, φ₂, and w are theoretically designed to balance exploration and exploitation, classical BPSO often suffers in practice from premature convergence and high sensitivity to fixed parameter settings such as inertia weight and acceleration coefficients [56]. Consequently, it may become trapped in local optima and fail to maintain a stable exploration–exploitation balance [57]. To overcome these limitations, a modified version has been proposed, in which the inertia weight and the cognitive and social coefficients are updated dynamically as a function of iteration number. In its modified version, these parameters are dynamically adjusted and allowing the search space to be explored more efficiently and comprehensively [51,52,53,54,58]. The steps of the optimization process in Modified Binary Particle Swarm Optimization (MBPSO) are as follows:

Step 1 (Parameter initialization): Tuning parameters of MBPSO are defined where p is the number of particles in the swarm,

t_{m a x}

as the maximm number of iterations,

v_{m a x}

as maximum velocity, d as the dimension of the particle, and

{(φ}_{1 i,} φ_{1 f})

,

{(φ}_{2 i,} φ_{2 f})

,

{(w}_{1,} w_{2})

are the lower and upper bounds of the cognitive coefficient, social coefficient, and inertia weight, respectively.

Step 2 (Particle initialization): Each particle is assigned a random position and velocity. Positions and velocities can be shown as follows.

X = (\vec{x_{1}}, \dots, \vec{x_{i}}, \dots, \vec{x_{P}})

V = (\vec{v_{1}}, \dots, \vec{v_{i}}, \dots, \vec{v_{P}})

Here,

\vec{x_{i}}

is the vector for particle i of the swarm, and

\vec{v_{i}}

is the velocity of particle i.

Step 3 (Fitness evaluation): The fitness function (F) value is computed for each particle.

F (X) = (F (\vec{x_{1}}), \dots, (\vec{x_{i}}), \dots, (\vec{x_{P}}))

Step 4 (Personal best positions update): For each particle, its personal best position

X_{b e s t}

is updated:

x_{i}^{b e s t} = x_{i} i f F (x_{i}) > F (x_{i}^{b e s t})

All personal bests form the matrix as follows:

X_{b e s t} = ({\vec{x_{1}}}^{b e s t} \dots, {\vec{x_{i}}}^{b e s t}, \dots, {\vec{x_{P}}}^{b e s t})

(1)

Step 5 (Global best position update): The best particle among all personal bests is identified as the global best

{(\vec{x_{g}}}^{b e s t})

the global best, which denotes the swarm’s overall best position so far.

{\vec{x_{g}}}^{b e s t} = \max_{\vec{x} \in X} (or \min) (F (\vec{x}))

(2)

Step 6 (Dynamic parameters update):

φ_{1}

,

φ_{2}

and w are updated functions of the iteration number t, where

φ_{1}

and

φ_{2}

represent social and cognitive coefficients,

w

is inertia weight parameter. Here,

{(φ}_{1 i} {, φ}_{1 f}),

{(φ}_{2 i} {, φ}_{2 f}),

and

{(w}_{1,} w_{2}),

are intervals that include potential values of

φ_{1}

,

φ_{2}

and

w

. The parameter intervals

{(φ}_{1 i} {, φ}_{1 f}),

{(φ}_{2 i} {, φ}_{2 f}),

and

{(w}_{1,} w_{2})

in Equation (3) define the lower and upper bounds within which these coefficients are dynamically updated at each iteration. These ranges were selected based on commonly adopted PSO stability analyses to regulate the exploration–exploitation balance. The results of each of these parameters’ computations for each iteration are provided as follows:

φ_{1} = {[(φ}_{1 f} - {(φ}_{1 i}) * (t / t_{m a x})] + φ_{1 i} φ_{2} = {[(φ}_{2 f} - φ_{2 i}) * (t / t_{m a x})] + φ_{2 i} w = {[(w}_{2} - w_{1} * ((t_{m a x} - t) / t_{m a x})] + w_{1}

(3)

where t is the current number of iterations. Because the values of

φ_{1}

,

φ_{2}

and

w

are recalculated at every iteration based on the current iteration index, these coefficients are time-varying (iteration-dependent), which provides an adaptive balance between exploration and exploitation.

Step 7 (Velocity update): Update velocities of the particles in the swarm according to the previous best and global best positions:

{\vec{v}}_{j + 1} = w . {\vec{v}}_{j} + φ_{1} r_{1} ({\vec{x_{j}}}^{b e s t} - {\vec{x}}_{j}) + φ_{2} r_{2} ({\vec{x_{g}}}^{b e s t} - x_{j})

(4)

With random numbers

r_{1}

and

r_{2}

which are generated from a uniform distribution from the interval [0, 1]. After computing the velocity, velocity clamping is applied to ensure numerical stability:

{\vec{v}}_{j + 1} = m a x (- v_{m a x}, \min (v_{m a x}, {\vec{v}}_{j + 1}))

Step 8 (Position update, binary transfer): Update the positions of the particles using the following equation:

{\vec{p}}_{j + 1 =} \{\begin{matrix} 1, r_{3} < s i g m ({\vec{v}}_{j + 1}) \\ 0, o t h e r w i s e \end{matrix}

(5)

where

s i g m ({\vec{v}}_{j + 1}) = \frac{1}{1 + e^{- {\vec{v}}_{j + 1}}}

is a sigmoid function and

r_{3}

is generated from a uniform distribution with the interval [0, 1].

Step 9: Steps 3 through 8 are repeated for the maximum number of iterations.

2.2. Binary GWO

The Grey Wolf Optimization (GWO) is a population-based metaheuristic introduced by Mirjalili, Mirjalili, and Lewis [42] and inspired by the social hierarchy and hunting strategies of grey wolves in nature. To address discrete optimization problems such as feature selection, the Binary Grey Wolf Optimizer (BGWO) was proposed by Emary, Zawbaa, and Hassanien [43]. In BGWO, each wolf is represented as a binary vector, where 1 denotes a selected feature and 0 denotes an unselected feature. The algorithm mimics the leadership hierarchy of the pack, where the top three wolves (α, β, and δ) guide the search, while the rest of the wolves update their positions accordingly. Instead of continuous updates, BGWO uses transfer functions (e.g., sigmoid) to map the continuous position updates of wolves into probabilities, which are then defined as binary positions. The optimization process starts with the random initialization of the wolf pack, where each binary vector represents a candidate feature subset. In each iteration, the fitness of every wolf is evaluated based on a predefined objective function. The α, β, and δ wolves are updated as the best three solutions, and the remaining wolves adjust their binary positions relative to these leaders using the transfer function mechanism. This iterative process of evaluation, leadership update, and binary position adjustment continues until a stopping criterion, such as the maximum number of iterations, is reached, at which point the α wolf represents the optimal or near-optimal feature subset. Classical binary GWO mainly relies on transfer functions to map continuous updates into binary space, which can lead to inefficient exploration, instability in position updates, and premature convergence due to biased or unfair value distributions [55,59].

Binary Dynamic Grey Wolf Optimization Algorithm (BDGWO)

To address these shortcomings, a Binary Dynamic Grey Wolf Optimization Algorithm (BDGWO) has been proposed by Erdoğan, Karakoyun, & Gülcü [55]. This modified version improves classical BGWO by introducing a dynamic coefficient method (pDCM) and employing the XOR logical operator in the position-update phase [55,60]. The dynamic coefficient method adaptively adjusts the influence of alpha, beta, and delta wolves according to their solution quality, which allows better leaders to have a greater impact. XOR-based binarization expands the solution pool and maintains diversity, helping to prevent premature convergence. Experimental studies across numerous datasets show that the BDGWO consistently outperforms standard BGWO and other binary metaheuristics in terms of accuracy, robustness, and exploration–exploitation balance [55]. The steps of the Binary Dynamic Grey Wolf Optimization Algorithm (BDGWO) are as follows:

Step 1: Determine hyperparameters of BDGWO where population size is the number of wolves (p), dimension (d), and maximum number of iterations (t_max).

Step 2: A population of p wolves is randomly initialized as binary vector

X_{i} \in {\{0, 1\}}^{d}

Step 3: Determine fitness function F, and each wolf’s fitness is computed.

Step 4: Determine leaders α, β, δ where the best is α, the second best as β, and the third best is δ.

Step 5: Compute pDCM weights. Relative contributions of leaders are computed by Equations (6) and (7) as:

{r a t e}_{α} = \frac{1}{F_{α}}, {r a t e}_{β} = \frac{1}{F_{β}}, {r a t e}_{δ} = \frac{1}{F_{δ}}

(6)

w_{α} = \frac{{r a t e}_{α}}{{r a t e}_{α} + {r a t e}_{β} + {r a t e}_{δ}}, w_{β} = \frac{{r a t e}_{β}}{{r a t e}_{α} + {r a t e}_{β} + {r a t e}_{δ}}, w_{δ} = \frac{{r a t e}_{δ}}{{r a t e}_{α} + {r a t e}_{β} + {r a t e}_{δ}}

(7)

Then, according to these weights, the indices are partitioned as:

a = ⌊w_{α} . d⌋, b = ⌊w_{β} . d⌋, c = d - a - b

Randomly choose disjoint sets

S_{α}, S_{β}, S_{δ}

of sizes a, b, c. pDCM is a dynamic mechanism that decides which dimensions and how many of them will be influenced by the alpha, beta, and delta wolves when generating new candidate solutions. d denotes the dimensionality of the binary search space, which corresponds to the number of features.

Although Equations (6) and (7) use the inverse of the fitness value (1/F), this transformation does not indicate a minimization process. The algorithm follows a maximization-based fitness formulation, and leader order (α, β, δ) is determined directly by the highest fitness values. The inversion is applied only for normalization and therefore does not change the optimization direction or the dominance of leaders.

Step 6: For each non-leader wolf (i.e., except α, β, δ), the positions of wolves is updated by Equation (8):

X_{i, j} (t + 1) = X_{i, j} (t) \oplus X_{α, j} (t), i f j \in S_{α} X_{i, j} (t + 1) = X_{i, j} (t) \oplus X_{β, j} (t), i f j \in S_{β} X_{i, j} (t + 1) = X_{i, j} (t) \oplus X_{δ, j} (t), i f j \in S_{δ}

(8)

Here,

\oplus

denotes the bitwise XOR operator. If bits are equal, the result is 0; if different, the result is 1. This mechanism provides discrete, directionally meaningful bit updates that strengthen local exploitation around leader wolves. Unlike sigmoid-based probabilistic binarization, XOR produces sharper and more informative transitions in the binary search space.

Step 7: New solutions are evaluated, fitness values updated, and α, β, δ reassigned. pDCM weights and index partitions (

S_{α}, S_{β}, S_{δ})

are recalculated.

Step 8: Steps 5–7 are repeated until the t_max is reached.

2.3. Proposed MBPSO-BDGWO-Based Ensemble Feature Selection Method

In this study, an ensemble feature selection method is proposed that combines the feature selection outputs of the MBPSO and BDGWO algorithms. In the first stage, both algorithms are run using an AUC-based objective function on the same dataset to generate candidate feature subsets. In the MBPSO, the inertia weight (w) and the cognitive (c₁) and social (c₂) coefficients are dynamically updated as a function of the number of iterations (27). BDGWO implements a dynamic coefficient mechanism (pDCM) and bitwise XOR-based location updating. The relative influences of the leader wolves α, β, and δ are adapted according to the solution quality; the index space is partitioned into subsets

S_{α}, S_{β},

and

S_{δ}

according to these weights, and the wolf position in each dimension is updated by performing an XOR operation with the leader. In the second stage, the candidate subsets obtained from the MBPSO and BDGWO algorithms are combined based on a hybrid feature importance score based on voting score and Mutual Information. In the final stage, these combined scores are filtered using a robust median–MAD thresholding scheme and constrained within the

[k_{m i n}, k_{m a x}]

range, which yields a compact, non-redundant, and highly informative final feature subset.

In the proposed ensemble method, the combined use of two algorithms with different binarization strategies (sigmoid-based MBPSO and XOR-based BDGWO) increases information diversity and reduces the risk of similar search traces. While MBPSO’s probabilistic sigmoid transfer function explores the search space smoothly and gradually, BDGWO’s bitwise XOR update provides local optimization with efficient jumps. Therefore, this variety in the ensemble phase creates true information diversity and prevents being trapped in a uniform search trace. Consequently, MBPSO enables global exploration in a large space, while BDGWO performs more precise local optimizations in these areas. This synergy reduces the probability of missing informative variables and produces a more meaningful, smaller, but more effective feature set. Thus, the proposed ensemble method is expected to exhibit superior performance in terms of both accuracy and robustness on high-dimensional datasets [42,47,55,59].

Nevertheless, the performance of the proposed ensemble approach strongly depends on a well-defined objective function. The design of F(S) should carefully balance discriminative accuracy, subset compactness, and redundancy control to fully exploit the strengths of the ensemble method.

Objective Function

In this study, the Area Under the Receiver Operating Characteristic Curve (AUC)-based objective function was designed as seen in Equation (9)

F (S) = {A U C}_{R F} (S) - β \frac{|S|}{p} - λ \frac{1}{|S| (|S| - 1)} \sum_{i \neq j, i, j \in S} | ρ_{i j} | (1 - u_{i}) (1 - u_{j})

(9)

with the redundancy term

R_{p, u} (S) = \frac{1}{| S | (|S| - 1)} \sum_{i \neq j, i, j \in S} | ρ_{i j} | (1 - u_{i}) (1 - u_{j}

).

Here, S is the selected feature set,

|S|

is the number of selected features, p is the total number of features,

β \geq 0

is the parsimony penalty weight, λ

\geq 0

is the correlation penalty weight,

ρ_{i j}

is the pairwise Pearson correlation between features i and j, and

u_{i} \in [0, 1]

is the unique contribution of feature i obtained from conditional permutation importance (CPI). This fitness function jointly optimizes three critical aspects of feature selection: discriminative performance via the AUC, subset compactness via a parsimony penalty, and adaptive redundancy control through a CPI-weighted correlation penalty. β ∈ [0.01, 0.05] represents mild parsimony regularization suitable for high-dimensional wrappers, and λ ∈ [0.05, 0.2] is the correlation penalty strength.

A sensitivity analysis was conducted for the penalty parameters β and λ, where β ∈ {0, 0.01, 0.02, 0.03, 0.04, 0.05} and λ = λ₀·median(|ρ|) with λ₀ ∈ {0, 0.05, 0.1, 0.2}. The results indicated that β = 0 and λ₀ = 0 led to over-selection with higher FPR, whereas small penalty values (β = 0.01, λ₀ = 0.05) improved sparsity and discriminative performance. The configuration β = 0.03 and λ₀ = 0.1 achieved the most favorable balance between AUC, parsimony, and subset stability (highest Jaccard consistency with low FPR), while larger penalties (β = 0.05, λ₀ = 0.2) caused slight performance degradation due to over-penalization. Accordingly, β = 0.03 and λ₀ = 0.1 were selected for all experiments. All experiments were repeated 30 times with fixed seeds.

Employing AUC in the proposed ensemble feature selection method provides a robust and generalizable evaluation criterion that performs reliably for both balanced and imbalanced datasets. This makes it particularly suitable for high-dimensional feature selection tasks where both discriminative power and model compactness are desired. AUC is robust to class imbalance and remains stable across varying class prevalence rates, an essential advantage in high-dimensional or skewed data, unlike accuracy or error rate [61,62,63]. AUC is also a consistent and discriminative performance because it jointly considers both sensitivity (true positive rate) and specificity (false positive rate) within a single scalar value. Previous studies have confirmed that AUC-based metrics outperform traditional accuracy-based criteria for feature selection, which offers greater reliability even when only a few features are selected [64,65,66,67,68,69,70].

While maximizing AUC provides strong discriminative performance, optimizing it alone does not guarantee optimal feature selection in high-dimensional data. In such settings, high collinearity and grouped variable structures may lead to redundant or unstable selections, even if classification performance appears satisfactory. Therefore, additional components providing subset compactness and redundancy control are necessary to achieve optimal feature selection. The second term of the objective function introduces a parsimony regularizer that penalizes excessively large subsets and mitigates overfitting. This principle has been widely adopted across numerous metaheuristic feature selection formulations [26]. Finally, the last term of the fitness functions is the redundancy term, which minimizes unnecessary feature overlap by penalizing pairwise correlations. However, unlike conventional correlation penalties, it is adaptively modulated through

(1 - u_{i}) (1 - u_{j})

where ui measures the unique predictive contribution of feature i derived from the Conditional Permutation Importance (CPI) [71,72]. CPI estimates a feature’s conditional contribution to model performance while accounting for correlations among predictors, thereby isolating variables with a truly unique effect [71,72]. CPI was computed using 50 conditional permutations per feature, stratified sampling, random_state = 42, and n_jobs = −1. To ensure comparability across datasets and feature subsets, CPI scores were min–max normalized to [0, 1]. Thus, CPI prevents discarding relevant predictors that are correlated with other meaningful variables. After min–max normalization,

u_{i}

∈ [0, 1], and the weighting

(1 - u_{i}) (1 - u_{j})

reduces penalization for correlated but informative variables, while strongly penalizing redundant, low-information pairs.

By balancing discriminative performance, compactness, and adaptive redundancy control, the objective function steers the MBPSO and BDGWO algorithms toward selecting optimal non-redundant and highly discriminative feature subsets. This formulation provides robustness across low, moderate, and high-correlation environments, including those with grouped variable structures.

In this study, Random Forest (RF) [73] is employed as the classifier for fitness evaluation due to its robustness and suitability for high-dimensional data. RF is particularly advantageous because it can effectively handle multicollinearity, high-dimensional, and complex datasets, while providing stable accuracy even under varying training parameters [74,75,76].

The following stages have been introduced in the proposed method’s algorithm:

Preliminary Step: Data Splitting

Split the dataset into training (70%) and testing (30%) using a stratified split with a fixed random seed.

All feature selection, tuning, and model fitting are performed on the training set only. The test set was used for the performance evaluation of feature selection methods.

Stage 1: All hyperparameters for MBPSO, BDGWO, and Random Forest were tuned through empirical sensitivity analysis and preliminary cross-validated experiments.

For Random Forest:

n_estimators = 300, bootstrap = True, oob_score = True, criterion = ‘gini’, max_depth = None, min_samples_leaf = 2, min_samples_split = 6, max_features = (‘sqrt’ if p ≤ 100 else ‘log2’), n_jobs = −1, random_state = 42)

For Modified BPSO:

p: 50,

t_{m a x}

: 400,

d = p (a s n u m b e r o f f e a t u r e s), (φ_{1 i,} φ_{1 f})

= (2, 3),

(φ_{2 i,} φ_{2 f})

= (2, 3),

(w_{1,} w_{2})

= (0.9, 2),

v_{m a x} = 4

.

For BDGWO:

p: 50,

t_{m a x} : 400, d = p (a s n u m b e r o f f e a t u r e s)

.

Stage 2: Initialize a population of size P (particles for MBPSO and wolves for BDGWO) with binary position vectors X_i ∈ {0, 1}^P. Each vector represents a candidate feature subset, where a value of 1 indicates that the corresponding feature is selected, and 0 indicates exclusion.

Stage 3: Run the MBPSO using the designed objective function F(S), where the fitness of each candidate subset is evaluated via Random Forest (RF) as the classifier.

Stage 4: Run BDGWO (on the same training set) with the designed objective function F(S) using RF as the classifier.

Stage 5: Calculate weighted voting score for each feature.

For each feature

;

v_{i}^{M B P S O}

,

v_{i}^{B D G W O} \in \{0, 1\}

indicate if feature i is selected by each algorithm. Weighted voting score is computed by Equation (10):

{v o t e}_{i} = \frac{w_{P} v_{i}^{M B P S O} + w_{G} v_{i}^{B D G W O}}{w_{M} + w_{G}}

(10)

where

w_{P}

and

w_{G}

are the median 5-fold cross-validated AUC values of MBPSO and BDGWO-based feature subsets, respectively.

Then, voting scores are rescaled to [0, 1] via min-max normalization by Equation (11):

{v o t e}_{i}^{n o r m} = \frac{{v o t e}_{i} - m i n (v o t e)}{\max (v o t e) - m i n (v o t e)}

(11)

Stage 6: Compute mutual information (MI) for each feature.

For each feature

X_{i}

the mutual information between the feature and the target variable y is computed by Equation (12):

{M I}_{i} = I (X_{i}; Y) = \sum_{x_{i \in X}} \sum_{y \in Y} p (x_{i}, y) l o g (\frac{p (x_{i}, y)}{p (x_{i}) p (y)})

(12)

Here,

p (x_{i}, y)

denotes the joint probability distribution of feature

X_{i}

and target y, while

p (x_{i})

and p(y) are their marginal distributions. Mutual Information measures the reduction in uncertainty about y when

X_{i}

known that is, how much knowing the feature improves prediction of the target. Then, mutual information values are rescaled to [0, 1] via min-max normalization.

Stage 7: To adaptively balance the contribution of voting Score and Mutual Information, the weighting coefficient α is computed from the relative predictive performance of each criterion.

Two candidate subsets are obtained from weighted voting score and MI by applying robust threshold criteria as median + τ MAD where MAD is median absolute deviation, with a constraint interval that bounds the subset size [k_min, k_max] where

k_{m i n} = ⌈\log_{2} p⌉

and

k_{m a x} = m a x (k_{m i n}, ⌊\sqrt{p}⌋)

. The parameter τ was selected from a small, predefined grid τ ∈ {1.0, 1.25, 1.5} using inner 5-fold cross-validation based on AUC. For each criterion (Selection Score and Mutual Information), the predictive capability is quantified using 5-fold cross-validated AUC values obtained from Random Forest classifiers trained on the corresponding feature subsets:

{A U C}_{v o t e} = median (AUC (S_{v o t e}) {A U C}_{M I} = median (AUC (S_{M I})

Then the adaptive fusion weight

α_{f u s e}

is calculated by Equation (13):

α_{f u s e} = \frac{{A U C}_{v o t e}}{{A U C}_{v o t e} + {A U C}_{M I}}

(13)

If either subset is empty,

α_{f u s e}

is fixed to 0.5.

Stage 8: The final feature importance score is calculated by combining both evidence sources using the data-driven weight α, by Equation (14):

{F i n a l_s c o r e}_{i} = α_{f u s e} * {v o t e}_{i}^{n o r m} + (1 - α_{f u s e}) * {M I}_{i}^{n o r m}

(14)

Stage 9: To determine final subset of relevant features, a robust median-MAD threshold method as seen in Equation (15) is applied to the final score distribution:

T = m e d i a n (F i n a l_s c o r e) + τ * M A D ({F i n a l}_{s c o r e})

(15)

The parameter τ is selected from a small, predefined grid τ ∈ {1.0, 1.25, 1.5} using 5-fold cross-validation based on AUC, which provides a data-driven yet computationally efficient threshold calibration. The lower and upper bounds of the selected feature subset size were set as [k_min, k_max], where

k_{m i n} = ⌈\log_{2} p⌉

and

k_{m a x} = \max (k_{m i n}, ⌊\sqrt{p}⌋)

following the sub-linear search space bounding heuristics commonly adopted in high-dimensional feature selection [77,78,79]. Features satisfying

{F i n a l_s c o r e}_{i} \geq T

are retained subject to reliable limits on subset size k

\in

[k_min, k_max] and constrain the selected subset to a reasonable search range, preventing both under-selection and over-selection while improving stability by reducing unnecessary variability in high-dimensional settings.

This median + MAD thresholding criterion provides a distribution-adaptive and outlier-resistant cutoff, automatically adjusting to the empirical variability of the scoring function. Unlike fixed or rank-based rules (e.g., top-k), it does not require an arbitrary k and remains stable across various dimensionalities and correlation structures.

Median and MAD (Median Absolute Deviation) are robust statistics that effectively handle heavy-tailed or skewed score distributions often encountered in high-dimensional feature selection [80].

Stage 10: The final selected feature subset was evaluated using a Random Forest (RF) classifier on the test dataset to assess the feature selection performance of the proposed ensemble method.

The flow chart of the proposed ensemble feature selection method has been illustrated in Figure 1.

2.4. Simulation Study Design

The feature selection efficiency of the proposed MBPSO-BDGWO-based ensemble method was analyzed through a comprehensive simulation study. The proposed method was compared with single-stage standalone traditional BPSO [33], traditional BGWO [43], MBPSO [54], and BDGWO [55] under various simulated high-dimensional data scenarios. Random Forest (RF) was employed as the classifier to evaluate candidate feature subsets. All simulation procedures were implemented in R Studio version 4.3.2. The design matrix X

\in R^{(n \times p)}

was generated from multivariate normal distributions X∼N(0, Σ), under various correlation structures [81,82] where

Σ_{j k} = \{\begin{matrix} 1, j = k \\ ρ, j \neq k \end{matrix} w i t h ρ \in {0.10, 0.40, 0.90} .

We set n = 50. The number of variables was determined as 1.2, 2, and 4 times the sample size, i.e., 60, 100, and 200, respectively. The number of informative features (p_inf) is determined to be 0.20 times the number of all features (p_inf = 12 for p = 60,

p_{i n f}

= 20 for p = 100,

p_{i n f}

= 40 for p = 200). Also, the true coefficients,

β_{j},

are generated by a uniform distribution

β_{j} ~ U n i (1.2, 1.5) .

The binary outcome Y was generated using nonlinear functions of informative variables to reflect the complex relationships that RF can capture [71,73]. The nonlinear setting incorporates quadratic effects and interaction terms, which are particularly suitable to assess RF’s ability to model complex dependencies.

z = \sum_{j = 1}^{p_{i n f} / 2} β_{j} X_{j}^{2} + \sum_{j = (p_{i n f} / 2) + 1}^{p_{i n f}} β_{j} (X_{j} * X_{j + 1})

(16)

y is generated by the logit function as

P (Y = 1| X) = \frac{1}{1 + e^{- z}} .

The binary outcome y assumes the value 1 if

P (Y = 1| X) >

0.5, and the value 0 if

P (Y = 1| X) <

0.5.

Summarily, the following scenarios for simulation design were considered in this study:

I—n = 50, p = 60, ρ = 0.10, 0.40, 0.90

II—n = 50, p = 100, ρ = 0.10, 0.40, 0.90

III—n = 50 p = 200, ρ = 0.10, 0.40, 0.90

IV—The variables are generated as containing grouped-variable scenarios for n = 50 and p = 60.

X_{i, g k} = Z_{g} + e_{i, g k}, g = 1,2, 3 k = 1, \dots, 4

With

Z_{g} ~ N (0, 1)

as latent factor of group g,

e_{i, g k} ~ N (0, 0.04)

as independently and identically distributed. The remaining 48 variables generated as independently

a s X_{j} ~ N (0, 1),

for i = 13, 17, …, 60.

V—The variables are generated as containing grouped-variable scenarios for n = 50 and p = 100

X_{i, g k} = Z_{g} + e_{i, g k}, g = 1,2, 3, 4,5 k = 1, \dots, 4

With

Z_{g} ~ N (0, 1)

as latent factor of group g,

e_{i, g k} ~ N (0, 0.04)

as independently and identically distributed. The remaining 80 variables generated as independently

a s X_{j} ~ N (0, 1),

for i = 21, 17, …, 100.

VI—The variables are generated as containing grouped-variable scenarios for n = 50 and p = 200.

X_{i, g k} = Z_{g} + e_{i, g k}, g = 1,2, 3, 4,5 k = 1, \dots, 8

With

Z_{g} ~ N (0, 1)

as latent factor of group g,

e_{i, g k} ~ N (0, 0.04)

as independently and identically distributed. The remaining 160 variables generated as independently

a s X_{j} ~ N (0, 1),

for i = 41, 17, …, 200.

2.4.1. Performance Evaluation Metrics

True Positive Rate (TPR) (or Recall), False Positive Rate (FPR), Precision, F1-Score, Accuracy, and Area Under the ROC Curve (AUC) were employed to evaluate and compare the feature selection performance of the proposed ensemble method against the standalone single-stage methods—traditional BPSO, traditional BGWO, MBPSO, and BDGWO—based on the standard classification matrix as seen in Table 1.

Where True Positives (TP): Features accurately determined to be significant, False Positives (FP): non-significant variables erroneously classified as significant, True Negatives (TN): Non-significant variables accurately classified as redundant, False Negatives (FN): Significant variables erroneously classified as unimportant.

A c c u r a c y = \frac{T P + T N}{T P + T N + F N + F P}

(17)

A U C = \int_{0}^{1} T P R (F P R^{- 1} (x)) d x

(18)

P r e c i s i o n = \frac{T P}{T P + F P}

(19)

T P R (o r R e c a l l) = \frac{T P}{T P + F N}

(20)

F P R = \frac{F P}{T N + F P}

(21)

F 1 - s c o r e = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(22)

100 random repetitions of the simulations were performed. Every simulated data set was split into a training set (70%) and a testing set (30%) for each iteration of the simulation. The proposed method and the other compared methods were implemented on the training set, and the performances of the methods were analyzed on the testing set. Performance evaluation metrics were calculated on the testing set, and the median of the performance metrics was reported. During training, Random Forest (RF) was employed as the classifier within the optimization process of each feature selection method (BPSO, BGWO, MBPSO, BDGWO, and the proposed Ensemble method).

In addition to performance metrics such as AUC, Accuracy, and F1-Score, the stability of the proposed feature selection method was quantified using the average pairwise Jaccard index [83,84]. The Jaccard index measures the similarity between two feature subsets

S_{i}

and

S_{j}

obtained from different simulation runs and is defined as Equation (23):

J (S_{i}, S_{j}) = \frac{|S_{i} \cap S_{j}|}{|S_{i} \cup S_{j}|}

(23)

where

|S_{i} \cap S_{j}|

denotes the number of features selected in both subsets, and

|S_{i} \cup S_{j}|

denotes the total number of unique features across the two subsets.

The average Jaccard index across all pairs of runs

(\bar{J})

was used as a stability measure, as defined in Equation (24):

\bar{J} = \frac{2}{R (R - 1)} \sum_{i < j} J (S_{i}, S_{j})

(24)

where R is the number of simulations. Higher values of

\bar{J}

indicate more consistent feature selection across repetitions, which reflects good stability of the method. The average pairwise Jaccard index

\bar{J}

was used for stability assessments. Low stability values (e.g.,

\bar{J}

≈ 0.18) indicate highly variable feature subsets [84]. While no universal threshold exists, values above ~0.75–0.90 are commonly interpreted as strong stability in high-dimensional feature selection studies [85].

In the first stage of the proposed two-phase ensemble feature selection method, the MBPSO and BDGWO algorithms were executed in parallel to reduce computational time and improve efficiency. The simulated datasets were generated using fixed random seeds (set.seed) to maintain identical random conditions across runs. As a result, both algorithms were applied to the same simulated training dataset, which shows that performance differences arose solely from their respective search strategies rather than stochastic variation in the data. The implementation was carried out in the R programming environment by utilizing the parallel package for multi-core execution.

Shared data structures and functions were synchronized using the makeCluster(), clusterExport(), and parLapply() functions. This parallel setup ensured that both algorithms operated consistently under identical conditions while achieving a significant reduction in total computational time. The time complexity of the proposed ensemble method is O(PT(p + CRF)), where p is the population size, T is the number of iterations, p is the number of features, and CRF denotes the cost of evaluating the RF-based objective function. Since MBPSO and BDGWO are executed in parallel and the ensemble stage (MI computation and median–MAD thresholding) adds only O(p) overhead, the method remains computationally scalable in high-dimensional settings. All experiments were executed within the same computing environment equipped with an Intel Core i7-8700K processor (3.7 GHz, 6 cores; Intel Corporation, Santa Clara, CA, USA), 32 GB RAM, and an NVIDIA (NVIDIA Corporation, Santa Clara, CA, USA) GTX 1080 Ti GPU.

2.4.2. Ablation Experiments

An ablation study was conducted to verify the contribution of each component in the proposed ensemble method. To assess the individual contribution of each component in the proposed ensemble method, an ablation study was performed. Each module (AUC-based objective, parsimony penalty in objective function, adaptive redundancy in objective function, voting score and MI fusion, robust median + MAD thresholding, adaptive weighting α, and bounded subset range [k_min, k_max] was removed or replaced one at a time while keeping all other settings constant. For each configuration, performance was evaluated using AUC, False Positive Rate (FPR), stability (mean Jaccard index

\bar{J}

, True Positive Rate (TPR), Precision, Accuracy, and F1-score. The results in Table 2 demonstrate that removing any component leads to a noticeable degradation in performance, confirming that each module plays an essential role in achieving a balanced and stable feature selection performance. Ablation analysis was performed on the simulated dataset with n = 50, p = 100, and high correlation (ρ = 0.90). This setting was selected as it represents a balanced yet challenging scenario. Ablation analysis results demonstrated the importance of combining all proposed components. The full ensemble architecture provided the highest performance, providing the best balance in terms of both discrimination (AUC = 0.891) and stability (

\bar{J}

= 0.837). Optimizing based solely on AUC resulted in a high false positive rate and low stability because it did not include parsimony or correlation control. Adding only parsimony provided limited improvement, while removing the Voting + MI fusion significantly reduced performance, demonstrating the need for information source integration. Using only median instead of median + MAD and applying a fixed weight instead of adaptive α weakened performance due to both inaccurate thresholding and insufficient data fit. Furthermore, removing the [kmin, kmax] size constraint resulted in over- or under-selection, making the results unstable. Finally, optimizing for accuracy instead of AUC yielded the lowest results by far. These findings demonstrate that the combined use of AUC-based evaluation, parsimony, adaptive fusion, robust thresholding, and size constraint is crucial for the success of the proposed method. Consequently, each module contributes meaningfully to performance; the full ensemble provides the highest AUC, lowest FPR, and greatest stability.

3. Results

The simulation study compared the proposed MBPSO–BDGWO-based ensemble method with single-stage metaheuristic approaches (BPSO, BGWO, MBPSO, and BDGWO) to evaluate their feature selection performance, as well as the effects of dimensionality (p) and correlation (ρ) on feature selection effectiveness and selection stability (

\bar{J}

) on various high-dimensional data scenarios. The performance evaluation of the proposed and comparative feature selection methods was conducted using various evaluation metrics. The True Positive Rate (TPR/Recall) represented the proportion of truly significant features that were correctly identified, which indicated the model’s ability to detect meaningful features. The False Positive Rate (FPR) represented the proportion of redundant or non-significant features that were incorrectly selected, thus reflecting redundancy sensitivity. Precision measured the accuracy of the selected subset by expressing how many of the chosen features were genuinely relevant. The F1-score, as the harmonic mean of Precision and Recall, provided a balanced measure of selection accuracy. The Area Under the Curve (AUC) evaluated the overall discriminative power of each method independent of a specific threshold, which indicated robustness in distinguishing relevant and irrelevant features. Finally, the average Jaccard index (

\bar{J}

) assessed the stability of feature selection across repeated runs, with higher values reflecting more consistent and stable subsets.

Table 3 shows the simulation results for each scenario, reporting the median and MAD values of the performance metrics obtained from 100 independent simulation runs. Particularly in classical algorithms (BPSO and BGWO), increasing correlation led to a decrease in TPR and an increase in FPR, which consequently lowered F1-score and AUC while significantly reducing

\bar{J}

. For BPSO, the results changed within Accuracy = 0.471–0.709 (MAD = 0.164–0.278), TPR = 0.445–0.770 (MAD = 0.182–0.257), F1 = 0.403–0.695 (MAD = 0.171–0.256), and AUC = 0.457–0.715 (MAD = 0.144–0.247), while FPR remained high (0.301–0.533) (MAD = 0.081–0.145) and Precision was moderate (0.328–0.619) (MAD = 0.146–0.219), which yielded a low

\bar{J}

= 0.209–0.456. These metrics indicated that BPSO frequently over-selected irrelevant variables, which shows the highest FPR and lowest stability, as reflected by its reduced

\bar{J}

. Compared to BPSO, BGWO showed a slight improvement, with Accuracy = 0.496–0.724 (MAD = 0.147–0.267), TPR = 0.365–0.757 (MAD = 0.142–0.224), and AUC = 0.481–0.719 (MAD = 0.142–0.237), though still constrained by FPR = 0.236–0.499 (MAD = 0.064–0.122) and a moderate

\bar{J}

= 0.274–0.491. In both classical methods, rising correlation substantially lowered TPR and F1-score, while inflated FPR confirmed their tendency toward redundant feature selection and unstable results. On the other hand, MBPSO introduced dynamic inertia and adaptive cognitive–social coefficients that effectively mitigated the performance decline observed in classical BPSO as well as BGWO. MBPSO achieved Accuracy = 0.581–0.799 (MAD = 0.102–0.175), TPR = 0.525–0.820 (MAD = 0.147–0.224), and AUC = 0.571–0.795 (MAD = 0.108–0.179), while reducing FPR to 0.237–0.393 (MAD = 0.062–0.115) and improving F1 = 0.511–0.759 (MAD = 0.104–0.171) and Precision = 0.498–0.700 (MAD = 0.091–0.197). This indicated better detection of meaningful variables and more reliable subset selection, thanks to its adaptive inertia and dynamic social–cognitive parameters, which improved robustness against correlation. The stability index also rose to

\bar{J}

= 0.355–0.583, which shows greater stability. On the other hand, BDGWO further improved feature selection performance through its pDCM and XOR-based bit-level updates, improving feature selection performance even under high correlation according to other single-stage methods, reaching Accuracy = 0.606–0.842 (MAD = 0.085–0.139), TPR = 0.545–0.833 (MAD = 0.072–0.132), and AUC = 0.619–0.834 (MAD = 0.085–0.153), while maintaining low FPR = 0.186–0.379 (MAD = 0.027–0.057) and high Precision = 0.585–0.733 (MAD = 0.077–0.119). With

\bar{J}

= 0.460–0.697, BDGWO delivered the most balanced single-method performance, effectively minimizing redundant selections and providing more consistent results than other one-stage methods. However, these results still fell short of the proposed ensemble method. Finally, the proposed Ensemble method took advantage of the complementary strengths of MBPSO and BDGWO. By integrating MBPSO’s global search capability with BDGWO’s local refinement and employing Mutual Information–voting score fusion combined with median + MAD thresholding, it achieved compact, non-redundant, and highly reliable feature subsets. As a result, it consistently maintained Accuracy = 0.873–0.910 (MAD = 0.034–0.060), TPR = 0.940–0.982 (MAD = 0.042–0.066), F1-score = 0.867–0.898 (MAD = 0.024–0.074), and AUC = 0.889–0.919 (MAD = 0.036–0.089), while keeping FPR = 0.011–0.094 (MAD = 0.008–0.022), Precision = 0.862–0.905 (MAD = 0.039–0.073), and

\bar{J}

= 0.845–0.893. These results clearly demonstrated that the proposed ensemble model sustained high discriminative power and selection stability even under increasing correlation and dimensionality. As correlation rose, redundancy in BPSO and BGWO inflated FPR. It reduced Precision and F1-score, whereas MBPSO’s adaptive dynamics yielded smoother transitions between exploration and exploitation, which resulted in lower FPR and more stable TPR/AUC values. BDGWO effectively removed correlated redundancies, delivering the best standalone balance of low FPR and high TPR by dynamically balancing leadership via pDCM and XOR updates.

The proposed ensemble method combined these complementary behaviors, MBPSO’s probabilistic global exploration and BDGWO’s bit-level local exploitation, under an AUC-based objective that penalized unnecessary features, which produced compact, highly relevant, and consistent subsets. Even with high correlation or large p, FPR remained ≤0.09, Precision remained ≥0.86, and J stayed in the 0.845–0.893 range, with uniformly low MAD values, which evidenced robust, consistent performance. In grouped structures with high intra-group correlation, classical algorithms tended to select similar features within the same group repeatedly and which caused an increasing FPR and a reduction in

\bar{J}

, along with large dispersion across runs. While MBPSO partially overcame this issue through its adaptive parameter control, it still exhibited moderate redundancy under strong correlation, as indicated by non-negligible MAD values. In contrast, BDGWO performed better in representative feature selection by effectively distinguishing relevant group members through its dynamic leadership and bit-level update mechanism. However, the proposed Ensemble method further advanced this improvement by achieving both the lowest median FPR and the smallest MAD values, systematically eliminating intra-group redundancy. The proposed method, by optimizing exploration and exploitation dynamics, demonstrated superior feature selection performance, maintaining the highest discriminative power, stability, and reproducibility even under challenging conditions in high-dimensional settings.

For each scenario and metric, Wilcoxon signed-rank tests were also applied to paired results from 100 independent simulations to compare the ensemble method with each single-stage feature selection method. As shown in Table 4, the proposed ensemble method significantly outperforms all compared single-stage feature selection approaches (p < 0.05). The proposed ensemble method significantly outperforms classical single-stage methods (BPSO and BGWO) across all scenarios and evaluation metrics (p ≈ 0.000). While the differences with advanced methods (MBPSO and BDGWO) remain statistically significant (p < 0.05), the magnitude of the p-values decreases as the feature dimensionality and correlation level increase. This result indicates the advantage of the ensemble approach in feature selection in complex high-dimensional settings. These results confirm that the proposed ensemble method provides statistically robust and consistent feature selection performance under high correlation and dimensionality.

Table 5 gives the average computational time (in seconds) by each feature-selection method across different dimensional settings. As expected, runtime increases with the number of features p because the search space expands and the objective evaluation becomes more computationally demanding. MBPSO and BDGWO require slightly longer runtimes than their classical version because of dynamic parameter adaptation and additional update operations, whereas the proposed ensemble method is the most time-consuming since it combines both optimization processes and performs an additional multi-stage fusion step. Despite this overhead, considering the successful results obtained, the ensemble method remains computationally feasible and scalable for high-dimensional datasets.

Real Data Set Applications

The Self-Care Activities Dataset (SCADI) [86], Toxicity [87], Lung [88], and Prostate [89] datasets were utilized to evaluate the performance of the proposed ensemble-based feature selection method that integrates Modified Binary Particle Swarm Optimization (MBPSO) and Binary Dynamic Grey Wolf Optimization (BDGWO), in comparison with classical metaheuristic-based standalone methods BPSO, BGWO, MBPSO, and BDGWO on real datasets.

SCADI dataset comprises 70 observations and 205 features that include demographic factors such as age and gender, as well as self-care activity indicators derived from the International Classification of Functioning, Disability, and Health: Children and Youth Version (ICF-CY). The response feature reflects the level of self-care difficulty experienced by children with physical or motor impairments. The output categories were consolidated into two groups for binary classification: Class 1 (Having Problem) and Class 2 (No Problem), where the classification was determined by occupational therapists based on self-care performance evaluations (e.g., dressing, feeding, personal hygiene, health maintenance, and safety awareness). Among these observations, 16 had no problem, and the rest were having problems. The purpose of applying feature selection to this dataset is to determine the most influential factors associated with self-care limitations among children.

The Toxicity dataset contains 171 molecules designed for functional domains of the core clock protein CRY1, which is important for creating the circadian rhythm. Among these molecules, 56 are labeled as toxic, while the remaining are classified as non-toxic. Each molecule is characterized by 1203 numerical descriptors capturing various structural and physicochemical properties. The primary task is binary classification (toxic vs. non-toxic), where the proposed method aims to identify the most informative molecular descriptors that contribute to toxicity prediction.

The Lung Cancer dataset comprises 181 samples with 12,533 gene expression features, designed to distinguish between two histological subtypes of lung cancer: adenocarcinoma and malignant mesothelioma. The dataset is highly imbalanced, which contains 150 adenocarcinoma samples and 31 mesothelioma samples, and represents an ultra-high-dimensional, low-sample-size (p ≫ n) classification problem. Feature selection is applied to identify a compact set of discriminative genes that can effectively differentiate between these two cancer subtypes while reducing redundancy and noise in the genomic data. On the other hand, the Prostate Cancer dataset has 102 samples with 12,600 gene expression features obtained from oligonucleotide microarray experiments. It includes 52 prostate tumor samples and 50 non-tumor samples, which is a relatively balanced binary classification problem. The primary objective of feature selection on this dataset is to identify the most informative genes associated with prostate cancer progression. The Lung Cancer and Prostate Cancer datasets were accessed from the datamicroarray GitHub repository (version 0.2.3), which provides high-dimensional microarray gene expression datasets commonly used in feature selection studies [90].

The proposed ensemble feature selection method, alongside classical metaheuristic-based single-stage methods, was applied to the training set (70% of observations), and the classification performance of these methods was assessed on the test set (30% of observations) using Random Forest as the base classifier on the real datasets. Feature selection performance of the methods was evaluated across several metrics, including FPR, TPR, Precision, Accuracy, F1-score, and Area Under the Receiver Operating Characteristic Curve (AUC). Each experiment was repeated 30 times, and the median of the performance measures with MAD value was reported for each method shown in Table 6.

Results from real datasets demonstrate that the proposed Ensemble method consistently outperforms classical and advanced one-stage methods (BPSO, BGWO, MBPSO, BDGWO) across all datasets, including the ultra-high-dimensional Lung and Prostate datasets. Specifically, the proposed ensemble method consistently achieved the highest values for Accuracy, F1-score, Precision, and AUC, while maintaining low FPR even under severe dimensionality and class imbalance conditions. These results indicate that the proposed method maintains strong discriminative capability while effectively suppressing false positive selections in both moderate- and ultra-high-dimensional settings.

In contrast, classical methods such as BPSO and BGWO exhibited notable performance degradation, particularly on the Lung and Prostate datasets, where high dimensionality led to redundant feature selection and unstable performance. MBPSO partially overcame this problem through dynamic parameter adaptation, while BDGWO, with its pDCM and XOR-based update mechanism, achieved the strongest performance among single-stage methods. Nevertheless, the ensemble method further improved upon these results by combining MBPSO’s global exploration ability with BDGWO’s effective local exploitation, resulting in consistently higher TPR and Precision values and a lower FPR value.

Furthermore, the very low MAD values observed for the ensemble method across all datasets indicate high stability and reproducibility of the selected feature subsets. In contrast, higher MAD values in classical methods reflect greater variability and instability in feature selection outcomes. These findings confirm that the proposed ensemble approach is particularly effective for high- and ultra-high-dimensional feature selection problems.

Additionally, the median and MAD values of the number of selected features for each method are reported in Table 7. Across all datasets, BPSO and BGWO tended to select large and highly variable subsets, especially in the Lung and Prostate datasets. MBPSO and BDGWO produced more compact and moderately stable subsets, while the proposed ensemble method consistently achieved the most parsimonious and stable feature sets, selecting a very small number of informative features even in ultra-high-dimensional genomic datasets.

These results demonstrated the effectiveness of the MI–voting fusion, adaptive α weighting, and median + MAD thresholding in preventing both under-selection and over-selection. Overall, these results confirm that the proposed ensemble method performs effectively across a wide range of real-world datasets, from moderate to extremely high dimensionality, which yields superior predictive performance, strong stability, and compact non-redundant feature subsets.

4. Conclusions

In this study, an ensemble feature selection method is proposed for high-dimensional data. In the proposed approach, MBPSO and BDGWO are independently applied to the same training data in the first stage, where each algorithm explores the feature space using an AUC-based objective function that integrates a parsimony and correlation penalty (CPI). As a result, each method generates its own candidate feature subset through probabilistic global exploration (MBPSO) and bit-level local optimization (BDGWO). In the subsequent stage, these subsets are combined through a Mutual Information (MI)-weighted voting mechanism and a median + MAD-based thresholding process, systematically eliminating redundant or irrelevant variables. Ablation experiments further verified the design choices, demonstrating that each component contributes essentially to the overall performance of the proposed ensemble method. Simulation studies conducted under various dimensionality and correlation scenarios show that the proposed ensemble method achieves a low FPR, a high TPR/Precision/F1/AUC, and strong selection stability, clearly outperforming both classical and advanced single-stage methods (BPSO, BGWO, MBPSO, and BDGWO), even as dimensionality and collinearity increase. In contrast, single-stage methods typically experience substantial performance degradation in high-correlation and high-dimensional settings, particularly BPSO and BGWO. Moreover, on the real datasets, the ensemble method outperformed all compared single-stage methods, achieving higher AUC/Accuracy/F1-scores and a lower FPR, along with consistently low MAD values across repetitions, which indicates strong robustness and stability even in ultra-high-dimensional genomic datasets. While the TPR declined and the FPR increased in the single-stage methods under high correlation, the proposed ensemble method consistently maintained the best balance across evaluation metrics. On the other hand, unlike many existing studies that demonstrate the performance of the feature selection methods solely on real datasets, our study verifies real data experiments with well-controlled simulation studies. Since the true informative features are known in simulated environments, our evaluation objectively verifies the feature-selection capability of the methods, resulting in a more reliable and transparent assessment of their actual performance.

This study has some limitations: First, the proposed ensemble framework was implemented using a Random Forest as the base classifier, and its performance may vary with other learning algorithms. Future research should focus on applying the proposed MBPSO–BDGWO ensemble to different data structures and correlation scenarios and comparing it with other ensemble-based or hybrid feature selection approaches. On the other hand, due to the high computational cost of metaheuristic methods, all methods were evaluated using a 70/30 train–test split across 100 repeated simulations. In future studies, the evaluation will be extended to k-fold cross-validation on additional datasets to further assess robustness.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were used in this study.SCADI: available from the UCI Machine Learning Repository (https://doi.org/10.24432/C5C89G, accessed on 17 October 2025); Toxicity: available from the UCI Machine Learning Repository (https://doi.org/10.24432/C59313, accessed on 17 October 2025); Lung and Prostate microarray datasets: available from the datamicroarray GitHub repository (https://github.com/ramhiser/datamicroarray, accessed on 10 December 2025). The simulation data generated during the study are available on request from the author.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area Under the Receiver Operating Characteristic Curve
BDGWO	Binary Dynamic Grey Wolf Optimization
BGWO	Binary Grey Wolf Optimization
BPSO	Binary Particle Swarm Optimization
CPI	Conditional Permutation Importance
FS	Feature Selection
FPR	False Positive Rate
GWO	Grey Wolf Optimization
$\bar{J}$	Average Pairwise Jaccard Index
MAD	Median Absolute Deviation
MBPSO	Modified Binary Particle Swarm Optimization
MI	Mutual Information
pDCM	Proposed Dynamic Coefficient Method
PSO	Particle Swarm Optimization
RF	Random Forest
TPR	True Positive Rate
XOR	Exclusive OR Logical Operator

References

Pourahmadi, M. High-Dimensional Covariance Estimation: With High-Dimensional Data; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Wang, L.; Jiang, S.; Jiang, S. A feature selection method via analysis of relevance, redundancy, and interaction. Expert Syst. Appl. 2021, 183, 115365. [Google Scholar] [CrossRef]
Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Bellman, R. Dynamic programming. Math. Sci. Eng. 1967, 40, 101–137. [Google Scholar]
Ayesha, S.; Hanif, M.K.; Talib, R. Overview and comparative study of dimensionality reduction techniques for high-dimensional data. Inf. Fusion. 2020, 59, 44–58. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Venkatesh, B.; Anuradha, J. A review of feature selection and its methods. Cybern. Inf. Technol. 2019, 19, 3–26. [Google Scholar] [CrossRef]
Akhy, S.A.; Mia, M.B.; Mustafa, S.; Chakraborti, N.R.; Krishnachalitha, K.C.; Rabbany, G. A comprehensive study on ensemble feature selection techniques for classification. In Proceedings of the 2024 11th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 28 February–1 March 2024; IEEE: New York, NY, USA, 2024; pp. 1319–1324. [Google Scholar]
Gnana, D.A.A.; Balamurugan, S.A.A.; Leavline, E.J. Literature review on feature selection methods for high-dimensional data. Int. J. Comput. Appl. 2016, 136, 9–17. [Google Scholar] [CrossRef]
Hijazi, N.M.; Faris, H.; Aljarah, I. A parallel metaheuristic approach for ensemble feature selection based on multi-core architectures. Expert. Syst. Appl. 2021, 15, 182. [Google Scholar] [CrossRef]
Wu, T.; Hao, Y.; Yang, B.; Peng, L. ECM-EFS: An ensemble feature selection based on enhanced co-association matrix. Pattern Recognit 2023, 139, 109449. [Google Scholar] [CrossRef]
Saeys, Y.; Inza, I.; Larranaga, P. A review of feature selection techniques in Bioinformatics. Bioinformatics 2007, 23, 2507–2517. [Google Scholar] [CrossRef]
Almomani, O. A feature selection model for network intrusion detection system based on PSO, GWO, FFA, and GA algorithms. Symmetry 2020, 12, 1046. [Google Scholar] [CrossRef]
Spooner, A.; Mohammadi, G.; Sachdev, P.S.; Brodaty, H.; Sowmya, A.; Sydney Memory and Ageing Study and the Alzheimer’s Disease Neuroimaging Initiative. Ensemble feature selection with data-driven thresholding for Alzheimer’s disease biomarker discovery. BMC Bioinform. 2023, 24, 9. [Google Scholar] [CrossRef]
Wang, J.; Xu, J.; Zhao, C.; Peng, Y.; Wang, H. An ensemble feature selection method for high-dimensional data based on sort aggregation. Syst. Sci. Control Eng. 2019, 7, 32–39. [Google Scholar] [CrossRef]
Manikandan, G.; Abirami, S. A Survey on Feature Selection and Extraction Techniques for High-Dimensional Microarray Datasets. In Knowledge Computing and its Applications; Margret Anouncia, S., Wiil, U., Eds.; Springer: Singapore, 2018. [Google Scholar] [CrossRef]
Zhuang, Y.; Fan, Z.; Gou, J.; Huang, Y.; Feng, W. A importance-based ensemble method using an adaptive threshold searching for feature selection. Expert. Syst. Appl. 2025, 267, 126152. [Google Scholar] [CrossRef]
Sumant, A.S.; Patil, D. Stability Investigation of Ensemble Feature Selection for High Dimensional Data Analytics. In Proceedings of the Third International Conference on Image Processing and Capsule Networks, ICIPCN 2022, Bangkok, Thiland, 20–21 May 2022; Chen, J.I.Z., Tavares, J.M.R.S., Shi, F., Eds.; Springer: Cham, Switzerland, 2022; Volume 514. [Google Scholar] [CrossRef]
Guney, H.; Oztoprak, H. A robust ensemble feature selection technique for high-dimensional datasets based on minimum weight threshold method. Comput. Intell. 2022, 38, 1616–1658. [Google Scholar] [CrossRef]
Tu, Q.; Chen, X.; Liu, X. Multi-strategy ensemble grey wolf optimizer and its application to feature selection. Appl. Soft Comput. 2019, 76, 16–30. [Google Scholar] [CrossRef]
Singh, N.; Singh, P. A hybrid ensemble-filter wrapper feature selection approach for medical data classification. Chemom. Intell. Lab. Syst. 2021, 217, 104396. [Google Scholar] [CrossRef]
Robindro, K.; Devi, S.S.; Clinton, U.B.; Takhellambam, L.; Singh, Y.R.; Hoque, N. Hybrid distributed feature selection using particle swarm optimization-mutual information. Data Sci. Manag. 2024, 7, 64–73. [Google Scholar] [CrossRef]
Mandal, A.K.; Nadim, M.; Saha, H.; Sultana, T.; Hossain, M.D.; Huh, E.N. Feature subset selection for high-dimensional, low sampling size data classification using ensemble feature selection with a wrapper-based search. IEEE Access 2024, 12, 62341–62357. [Google Scholar] [CrossRef]
Ab Hamid, T.M.T.; Sallehuddin, R.; Yunos, Z.M.; Ali, A. Ensemble based filter feature selection with harmonize particle swarm optimization and support vector machine for optimal cancer classification. Mach. Learn. Appl. 2021, 5, 100054. [Google Scholar] [CrossRef]
Xu, J.; Sun, L.; Gao, Y.; Xu, T. An ensemble feature selection technique for cancer recognition. BioMed Mater. Eng. 2014, 24, 1001–1008. [Google Scholar] [CrossRef]
Barrera-García, J.; Cisternas-Caneo, F.; Crawford, B.; Gómez Sánchez, M.; Soto, R. Feature selection problem and metaheuristics: A systematic literature review about its formulation, evaluation and applications. Biomimetics 2023, 9, 9. [Google Scholar] [CrossRef] [PubMed]
Dokeroglu, T.; Deniz, A.; Kiziloz, H.E. A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 2022, 494, 269–296. [Google Scholar] [CrossRef]
Piri, J.; Mohapatra, P.; Dey, R.; Acharya, B.; Gerogiannis, V.C.; Kanavos, A. Literature review on hybrid evolutionary approaches for feature selection. Algorithms 2023, 16, 167. [Google Scholar] [CrossRef]
Nguyen, B.H.; Xue, B.; Zhang, M. A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evol. Comput. 2020, 54, 100663. [Google Scholar] [CrossRef]
Xue, B.; Zhang, M.; Browne, W.N.; Yao, X. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 2016, 20, 606–626. [Google Scholar] [CrossRef]
Gu, S.; Cheng, R.; Jin, Y. Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput. 2018, 22, 811–822. [Google Scholar] [CrossRef]
Pham, T.H.; Raahemi, B. Bio-inspired feature selection algorithms with their applications: A systematic literature review. IEEE Access 2023, 11, 43733–43758. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R.C. A discrete binary version of the particle swarm algorithm. In Proceedings of the 1997 IEEE International Conference on Systems, Man, and Cybernetics (SMC’97), Orlando, FL, USA, 12–15 October 1997; IEEE: New York, NY, USA, 1997; Volume 5, pp. 4104–4108. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95 International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar] [CrossRef]
Ünler, A.; Murat, A.E.; Chinnam, R.B. mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci. 2011, 181, 4625–4641. [Google Scholar] [CrossRef]
Abdmouleh, Z.; Gastli, A.; Ben-Brahim, L.; Haouari, M.; Al-Emadi, N.A. Review of Optimization Techniques applied for the Integration of Distributed Generation from Renewable Energy Sources. Renew. Energy 2017, 113, 266–280. [Google Scholar] [CrossRef]
Tran, B.; Zhang, M.; Xue, B. A PSO-based hybrid feature selection algorithm for high-dimensional classification. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016; IEEE: New York, NY, USA, 2016; pp. 3801–3808. [Google Scholar]
Gupta, D.K.; Reddy, K.S.; Shweta; Ekbal, A. PSO-ASENT: Feature selection using particle swarm optimization for aspect-based sentiment analysis. In Proceedings of the International Conference on Applications of Natural Language to Information Systems (NLDB 2015), Passau, Germany, 17–19 June 2015; Springer: Cham, Switzerland, 2015; pp. 220–233. [Google Scholar]
Brezočnik, L. Feature selection for classification using particle swarm optimization. In Proceedings of the IEEE EUROCON 2017—17th International Conference on Smart Technologies, Ohrid, North Macedonia, 6–8 July 2017; IEEE: New York, NY, USA, 2017; pp. 966–971. [Google Scholar]
Abualigah, L.M.; Khader, A.T.; Hanandeh, E.S. A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J. Comput. Sci. 2018, 25, 456–466. [Google Scholar] [CrossRef]
Chuang, L.; Yang, C.; Yang, C. Tabu Search and Binary Particle Swarm Optimization for Feature Selection Using Microarray Data. J. Comput. Biol. 2009, 16, 1689–1703. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
Shen, C.; Zhang, K. Two-stage improved Grey Wolf optimization algorithm for feature selection on high-dimensional classification. Complex. Intell. Syst. 2022, 8, 2769–2789. [Google Scholar] [CrossRef]
Yousef, J.; Youssef, A.; Keshk, A. A hybrid swarm intelligence based feature selection algorithm for high dimensional datasets. Int. J. Comput. Info 2021, 8, 67–86. [Google Scholar] [CrossRef]
Too, J.; Abdullah, A.R.; Mohd Saad, N.; Mohd Ali, N.; Tee, W. A New Competitive Binary Grey Wolf Optimizer to Solve the Feature Selection Problem in EMG Signals Classification. Computers 2018, 7, 58. [Google Scholar] [CrossRef]
El-Kenawy, E.S.; Eid, M. Hybrid Gray Wolf and Particle Swarm Optimization for Feature Selection. Int. J. Innov. Comput. Inf. Control 2020, 16, 831–844. [Google Scholar] [CrossRef]
Al-Tashi, Q.; Abdulkadir, S.J.; Rais, H.M.; Mirjalili, S.; Alhussian, H. Binary Optimization Using Hybrid Grey Wolf Optimizatio for Feature Selection. IEEE Access 2019, 7, 39496–39508. [Google Scholar] [CrossRef]
El-Hasnony, I.M.; Barakat, S.I.; Elhoseny, M.; Mostafa, R.R. Improved feature selection model for big data analytics. IEEE Access 2020, 8, 66989–67004. [Google Scholar] [CrossRef]
Abdo, M.A.; Mostafa, R.; Abdel-Hamid, L. An optimized hybrid approach for feature selection based on Chi-square and particle swarm optimization algorithms. Data 2024, 9, 20. [Google Scholar] [CrossRef]
Qasim, O.S.; Algamal, Z.Y. Feature selection using particle swarm optimization-based logistic regression model. Chemom. Intell. Lab. Syst. 2018, 182, 41–46. [Google Scholar] [CrossRef]
Cervante, L.; Xue, B.; Zhang, M.; Shang, L. Binary particle swarm optimisation for feature selection: A filter based approach. In Proceedings of the 2012 IEEE Congress on Evolutionary Computation, Brisbane, QLD, Australia, 10–15 June 2012; IEEE: New York, NY, USA, 2012; pp. 1–8. [Google Scholar] [CrossRef]
Ma, Y.; Jiang, C.; Hou, Z.; Wang, C. The formulation of the optimal strategies for the electricity producers based on the particle swarm optimization algorithm. IEEE Trans. Power Syst. 2006, 21, 1663–1671. [Google Scholar] [CrossRef]
Egrioglu, E.; Yolcu, U.; Aladag, C.H.; Kocak, C. An ARMA type fuzzy time series forecasting method based on particle swarm optimization. Math. Probl. Eng. 2013, 2013, 935815. [Google Scholar] [CrossRef]
Erdoğan, F.; Karakoyun, M.; Gülcü, Ş. A novel binary Grey Wolf Optimizer algorithm with a new dynamic position update mechanism for feature selection problem. Soft Comput. 2024, 28, 12623–12654. [Google Scholar] [CrossRef]
Shami, T.M.; El-Saleh, A.A.; Alswaitti, M.; Al-Tashi, Q.; Summakieh, M.A.; Mirjalili, S. Particle swarm optimization: A comprehensive survey. IEEE Access 2022, 10, 10031–10061. [Google Scholar] [CrossRef]
Abbes, W.; Kechaou, Z.; Hussain, A.; Qahtani, A.M.; Almutiry, O.; Dhahri, H.; Alimi, A.M. An Enhanced Binary Particle Swarm Optimization (E-BPSO) algorithm for service placement in hybrid cloud platforms. Neural Comput. Applic 2023, 35, 1343–1361. [Google Scholar] [CrossRef]
Sancar, N.; Onakpojeruo, E.P.; Inan, D.; Ozsahin, D.U. Adaptive elastic net based on modified PSO for Variable selection in cox model with high-dimensional data: A comprehensive simulation study. IEEE Access 2023, 11, 127302–127316. [Google Scholar] [CrossRef]
Pan, J.-S.; Hu, P.; Snášel, V.; Chu, S.-C. A survey on binary metaheuristic algorithms and their engineering applications. Artif. Intell. Rev. 2023, 56, 6101–6167. [Google Scholar] [CrossRef] [PubMed]
Karakoyun, M.; Ozkis, A.; Kodaz, H. A new algorithm based on gray wolf optimizer and shuffled frog leaping algorithm to solve multi-objective optimization problems. Appl. Soft Comput. 2020, 96, 106560. [Google Scholar] [CrossRef]
Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
Chawla, N.V. Data Mining for Imbalanced Datasets: An Overview. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2009; pp. 875–886. [Google Scholar] [CrossRef]
Ling, C.X.; Huang, J.; Zhang, H. AUC: A better measure than accuracy in comparing learning algorithms. In Proceedings of the Canadian Conference on Artificial Intelligence (AI 2003), Halifax, NS, Canada, 11–13 June 2003; Springer: Berlin/Heidelberg, Germany; pp. 329–341. [Google Scholar]
Sun, L. AVC: Selecting discriminative features on basis of AUC by feature ranking. BMC Bioinform. 2017, 18, 146. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Motoda, H. Computational Methods of Feature Selection; Chapman and Hall/CRC: Boca Raton, FL, USA, 2007. [Google Scholar]
Chen, X.W.; Wasikowski, M. FAST: A ROC-based feature selection metric for small samples and imbalanced data classification problems. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), Las Vegas, NV, USA, 24–27 August 2008; ACM: New York, NY, USA, 2008; pp. 124–132. [Google Scholar]
Xu, J.W.; Suzuki, K. Max-AUC feature selection in computer-aided detection of polyps in CT colonography. IEEE J. Biomed. Health Inf. 2014, 18, 585–593. [Google Scholar] [CrossRef]
Tian, Y.; Shi, Y.; Chen, X.; Chen, W. AUC Maximizing Support Vector Machines with Feature Selection. Procedia Comput. Sci. 2011, 4, 1691–1698. [Google Scholar] [CrossRef]
Vivek, Y.; Ravi, V.; Krishna, P.R. Feature subset selection for big data via parallel chaotic binary differential evolution and feature-level elitism. Comput. Electr. Eng. 2025, 123, 110232. [Google Scholar] [CrossRef]
Yang, T.; Ying, Y. AUC maximization in the era of big data and AI: A survey. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
Strobl, C.; Boulesteix, A.L.; Zeileis, A.; Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 2007, 8, 25. [Google Scholar] [CrossRef]
Debeer, D.; Strobl, C. Conditional permutation importance revisited. BMC Bioinform. 2020, 21, 307. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Billah, M.; Islam, A.K.M.S.; Bin Mamoon, W.; Rahman, M.R. Random forest classifications for landuse mapping to assess rapid flood damage using Sentinel-1 and Sentinel-2 data. Remote Sens. Appl. Soc. Environ. 2023, 30, 100947. [Google Scholar] [CrossRef]
Maxwell, A.E.; Warner, T.A.; Fang, F. Implementation of machine-learning classification in remote sensing: An applied review. Int. J. Rem. Sens. 2018, 39, 2784–2817. [Google Scholar] [CrossRef]
Seijo-Pardo, B.; Porto-Díaz, I.; Bolón-Canedo, V.; Alonso-Betanzos, A. Ensemble feature selection: Homogeneous and heterogeneous approaches. Knowl. Based Syst. 2017, 118, 124–139. [Google Scholar] [CrossRef]
Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 2013, 34, 483–519. [Google Scholar] [CrossRef]
Maldonado, S.; López, J.; Vairetti, C. An alternative approach for feature selection using support vector machines. Inf. Sci. 2014, 279, 163–175. [Google Scholar] [CrossRef]
Leys, C.; Ley, C.; Klein, O.; Bernard, P.; Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 2013, 49, 764–766. [Google Scholar] [CrossRef]
Díaz-Uriarte, R.; Álvarez de Andrés, S. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006, 7, 3. [Google Scholar] [CrossRef]
Kursa, M.B.; Rudnicki, W.R. The all relevant feature selection using random forest. arXiv 2011, arXiv:1106.5112. [Google Scholar] [CrossRef]
Kuncheva, L.I. A stability index for feature selection. In Proceedings of the Artificial Intelligence and Applications (AIAP07), Innsbruck, Austria, 12–14 February 2007; p. 390395. [Google Scholar]
Nogueira, S.; Sechidis, K.; Brown, G. On the stability of feature selection algorithms. J. Mach. Learn. Res. 2018, 18, 1–54. Available online: http://jmlr.org/papers/v18/17-514.html (accessed on 10 October 2025).
Somol, P.; Novovicová, J. Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1921–1939. [Google Scholar] [CrossRef]
Bushehri, S.; Dehghanizadeh, M.; Kalantar, S.; Zarchi, M. SCADI; UCI Machine Learning Repository: Noida, India, 2018. [Google Scholar] [CrossRef]
Gül, Ş.; Rahim, F. Toxicity; UCI Machine Learning Repository: Noida, India, 2021. [Google Scholar] [CrossRef]
Gordon, G.J.; Jensen, R.V.; Hsiao, L.-L.; Gullans, S.R.; Blumenstock, J.E.; Ramaswamy, S.; Richards, W.G.; Sugarbaker, D.J.; Bueno, R. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 2002, 62, 4963–4967. [Google Scholar] [PubMed]
Singh, D.; Febbo, P.G.; Ross, K.; Jackson, D.G.; Manola, J.; Ladd, C.; Tamayo, P.; Renshaw, A.A.; D’Amico, A.V.; Richie, J.P.; et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2002, 1, 203–209. [Google Scholar] [CrossRef] [PubMed]
Ramhiser, J. datamicroarray: Microarray Gene Expression Datasets for High-Dimensional Classification. GitHub Repository 2015. Available online: https://github.com/ramhiser/datamicroarray (accessed on 10 December 2025).

Figure 1. Flow chart of the proposed ensemble method.

Table 1. Classification matrix.

	Selected Significant	Selected Redundant
True Significant	TP	FN
True Redundant	FP	TN

Table 2. Ablation results for the proposed ensemble.

Configuration	AUC	FPR	TPR	Precision	Accuracy	F1-Score	$\bar{J}$
Full Ensemble (Proposed)	0.891	0.110	0.887	0.851	0.905	0.883	0.837
AUC only	0.645	0.425	0.653	0.587	0.629	0.624	0.363
AUC + parsimony penalty	0.700	0.347	0.776	0.711	0.753	0.735	0.585
No voting score + MI fusion (only with voting score)	0.603	0.498	0.614	0.574	0.620	0.591	0.312
No median + MAD threshold (only median threshold)	0.741	0.314	0.769	0.722	0.753	0.748	0.598
No adaptive alpha (fixed as 0.5)	0.718	0.385	0.743	0.705	0.734	0.719	0.574
No [k_min, k_max] restriction	0.625	0.425	0.648	0.612	0.629	0.632	0.421
Accuracy objective instead of AUC	0.508	0.523	0.531	0.499	0.514	0.517	0.400

Table 3. Simulation results with median (MAD) values.

Scenario	n	p		Method	Accuracy	TPR	F1	FPR	AUC	Precision	$\bar{J}$
I-1	50	60	ρ = 0.10	BPSO	0.709 (0.214)	0.770 (0.226)	0.695 (0.198)	0.301 (0.107)	0.715 (0.205)	0.619 (0.188)	0.456
				BGWO	0.724 (0.206)	0.757 (0.198)	0.702 (0.157)	0.236 (0.089)	0.719 (0.191)	0.640 (0.157)	0.491
				MBPSO	0.791 (0.143)	0.815 (0.180)	0.759 (0.141)	0.237 (0.084)	0.795 (0.143)	0.700 (0.121)	0.583
				BDGWO	0.842 (0.107)	0.833 (0.102)	0.802 (0.093)	0.226 (0.043)	0.834 (0.099)	0.733 (0.093)	0.697
				Ensemble	0.906 (0.036)	0.962 (0.054)	0.898 (0.032)	0.077 (0.010)	0.919 (0.036)	0.896 (0.049)	0.856
I-2	50	60	ρ = 0.40	BPSO	0.675 (0.232)	0.674 (0.182)	0.662 (0.171)	0.340 (0.129)	0.686 (0.234)	0.573 (0.208)	0.388
				BGWO	0.709 (0.254)	0.667 (0.169)	0.673 (0.150)	0.302 (0.064)	0.700 (0.159)	0.604 (0.158)	0.446
				MBPSO	0.799 (0.102)	0.820 (0.147)	0.744 (0.154)	0.331 (0.086)	0.781 (0.119)	0.694 (0.127)	0.504
				BDGWO	0.830 (0.127)	0.821 (0.072)	0.785 (0.110)	0.186 (0.027)	0.828 (0.091)	0.709 (0.100)	0.663
				Ensemble	0.905 (0.046)	0.942 (0.049)	0.895 (0.039)	0.022 (0.008)	0.917 (0.044)	0.901 (0.052)	0.893
I-3	50	60	ρ = 0.90	BPSO	0.610 (0.253)	0.553 (0.257)	0.574 (0.202)	0.346 (0.137)	0.617 (0.190)	0.498 (0.194)	0.286
				BGWO	0.658 (0.247)	0.595 (0.212)	0.612 (0.191)	0.364 (0.093)	0.636 (0.214)	0.514 (0.114)	0.311
				MBPSO	0.724 (0.120)	0.567 (0.168)	0.671 (0.141)	0.271 (0.081)	0.733 (0.108)	0.566 (0.105)	0.415
				BDGWO	0.797 (0.116)	0.645 (0.094)	0.726 (0.132)	0.221 (0.036)	0.787 (0.123)	0.657 (0.117)	0.585
				Ensemble	0.900 (0.041)	0.982 (0.058)	0.883 (0.066)	0.012 (0.022)	0.903 (0.089)	0.889 (0.045)	0.885
IV	50	60	Grouped	BPSO	0.574 (0.277)	0.472 (0.245)	0.515 (0.205)	0.430 (0.119)	0.563 (0.247)	0.352 (0.219)	0.237
				BGWO	0.600 (0.173)	0.390 (0.142)	0.553 (0.140)	0.399 (0.077)	0.588 (0.158)	0.328 (0.199)	0.308
				MBPSO	0.698 (0.175)	0.553 (0.160)	0.622 (0.154)	0.295 (0.079)	0.682 (0.179)	0.514 (0.118)	0.386
				BDGWO	0.723 (0.092)	0.57 (0.086)	0.689 (0.096)	0.281 (0.037)	0.737 (0.104)	0.608 (0.115)	0.495
				Ensemble	0.885 (0.050)	0.964 (0.045)	0.894 (0.042)	0.018 (0.019)	0.891 (0.077)	0.905 (0.049)	0.851
II-1	50	100	ρ = 0.10	BPSO	0.606 (0.164)	0.745 (0.243)	0.591 (0.233)	0.396 (0.102)	0.618 (0.151)	0.607 (0.175)	0.432
				BGWO	0.621 (0.267)	0.735 (0.201)	0.601 (0.201)	0.336 (0.108)	0.632 (0.165)	0.623 (0.178)	0.471
				MBPSO	0.691 (0.159)	0.800 (0.184)	0.656 (0.121)	0.331 (0.091)	0.696 (0.110)	0.687 (0.121)	0.560
				BDGWO	0.741 (0.104)	0.815 (0.130)	0.701 (0.114)	0.281 (0.057)	0.741 (0.098)	0.720 (0.077)	0.675
				Ensemble	0.902 (0.045)	0.958 (0.066)	0.893 (0.038)	0.082 (0.021)	0.915 (0.043)	0.892 (0.039)	0.862
II-2	50	100	ρ = 0.40	BPSO	0.552 (0.248)	0.652 (0.233)	0.553 (0.231)	0.433 (0.109)	0.584 (0.144)	0.558 (0.168)	0.373
				BGWO	0.597 (0.147)	0.640 (0.219)	0.571 (0.193)	0.391 (0.107)	0.597 (0.169)	0.592 (0.115)	0.430
				MBPSO	0.681 (0.175)	0.785 (0.197)	0.643 (0.116)	0.366 (0.083)	0.675 (0.116)	0.681 (0.140)	0.502
				BDGWO	0.729 (0.124)	0.805 (0.108)	0.683 (0.092)	0.271 (0.044)	0.731 (0.085)	0.703 (0.114)	0.650
				Ensemble	0.910 (0.034)	0.960 (0.058)	0.895 (0.039)	0.079 (0.011)	0.917 (0.070)	0.896 (0.043)	0.888
II-3	50	100	ρ = 0.90	BPSO	0.506 (0.278)	0.540 (0.246)	0.471 (0.191)	0.449 (0.108)	0.521 (0.158)	0.492 (0.157)	0.270
				BGWO	0.523 (0.186)	0.580 (0.208)	0.511 (0.132)	0.441 (0.074)	0.539 (0.142)	0.505 (0.169)	0.305
				MBPSO	0.619 (0.120)	0.555 (0.224)	0.566 (0.171)	0.361 (0.062)	0.631 (0.121)	0.553 (0.133)	0.402
				BDGWO	0.696 (0.089)	0.630 (0.100)	0.623 (0.117)	0.296 (0.045)	0.686 (0.117)	0.644 (0.119)	0.565
				Ensemble	0.898 (0.043)	0.980 (0.044)	0.881 (0.024)	0.011 (0.009)	0.902 (0.075)	0.887 (0.068)	0.882
V	50	100	Grouped	BPSO	0.479 (0.243)	0.460 (0.253)	0.416 (0.256)	0.519 (0.081)	0.472 (0.193)	0.343 (0.170)	0.223
				BGWO	0.504 (0.251)	0.380 (0.168)	0.456 (0.128)	0.486 (0.086)	0.495 (0.192)	0.320 (0.136)	0.290
				MBPSO	0.593 (0.134)	0.540 (0.200)	0.526 (0.136)	0.386 (0.091)	0.585 (0.147)	0.508 (0.101)	0.370
				BDGWO	0.621 (0.139)	0.560 (0.123)	0.586 (0.128)	0.373 (0.054)	0.636 (0.124)	0.595 (0.077)	0.480
				Ensemble	0.883 (0.036)	0.962 (0.053)	0.891 (0.037)	0.018 (0.014)	0.889 (0.063)	0.903 (0.072)	0.848
III-1	50	200	ρ = 0.10	BPSO	0.573 (0.255)	0.728 (0.216)	0.577 (0.199)	0.411 (0.132)	0.609 (0.198)	0.593 (0.182)	0.410
				BGWO	0.596 (0.213)	0.723 (0.224)	0.591 (0.124)	0.347 (0.118)	0.623 (0.158)	0.618 (0.192)	0.455
				MBPSO	0.679 (0.147)	0.788 (0.176)	0.645 (0.104)	0.343 (0.094)	0.683 (0.153)	0.677 (0.197)	0.545
				BDGWO	0.727 (0.085)	0.805 (0.132)	0.691 (0.144)	0.289 (0.045)	0.733 (0.141)	0.712 (0.113)	0.661
				Ensemble	0.895 (0.056)	0.956 (0.042)	0.889 (0.057)	0.090 (0.012)	0.911 (0.078)	0.887 (0.073)	0.857
III-2	50	200	ρ = 0.40	BPSO	0.561 (0.274)	0.638 (0.200)	0.537 (0.201)	0.446 (0.145)	0.571 (0.187)	0.545 (0.205)	0.355
				BGWO	0.590 (0.204)	0.622 (0.206)	0.559 (0.143)	0.401 (0.122)	0.591 (0.170)	0.582 (0.181)	0.415
				MBPSO	0.667 (0.106)	0.775 (0.189)	0.633 (0.123)	0.379 (0.109)	0.657 (0.162)	0.668 (0.120)	0.490
				BDGWO	0.719 (0.092)	0.792 (0.129)	0.671 (0.123)	0.283 (0.053)	0.723 (0.153)	0.695 (0.094)	0.642
				Ensemble	0.879 (0.060)	0.940 (0.063)	0.885 (0.074)	0.094 (0.017)	0.908 (0.070)	0.892 (0.051)	0.884
III-3	50	200	ρ = 0.90	BPSO	0.508 (0.255)	0.522 (0.199)	0.456 (0.217)	0.461 (0.135)	0.513 (0.160)	0.484 (0.179)	0.253
				BGWO	0.517 (0.217)	0.567 (0.204)	0.496 (0.136)	0.451 (0.095)	0.531 (0.216)	0.498 (0.129)	0.287
				MBPSO	0.606 (0.155)	0.544 (0.155)	0.551 (0.140)	0.376 (0.104)	0.619 (0.173)	0.545 (0.091)	0.380
				BDGWO	0.683 (0.102)	0.620 (0.088)	0.611 (0.065)	0.303 (0.050)	0.673 (0.107)	0.634 (0.080)	0.555
				Ensemble	0.884 (0.041)	0.976 (0.056)	0.879 (0.051)	0.012 (0.013)	0.903 (0.061)	0.883 (0.061)	0.880
VI	50	200	Grouped	BPSO	0.471 (0.223)	0.445 (0.190)	0.403 (0.238)	0.533 (0.129)	0.457 (0.152)	0.328 (0.146)	0.209
				BGWO	0.496 (0.199)	0.365 (0.187)	0.439 (0.185)	0.499 (0.108)	0.481 (0.237)	0.305 (0.117)	0.274
				MBPSO	0.581 (0.137)	0.525 (0.176)	0.511 (0.133)	0.393 (0.115)	0.571 (0.135)	0.498 (0.128)	0.355
				BDGWO	0.606 (0.094)	0.545 (0.097)	0.571 (0.088)	0.379 (0.055)	0.619 (0.119)	0.585 (0.114)	0.460
				Ensemble	0.873 (0.050)	0.961 (0.062)	0.867 (0.054)	0.018 (0.019)	0.892 (0.053)	0.862 (0.057)	0.845

Table 4. Wilcoxon signed-rank test p-values comparing the proposed ensemble method with single-stage feature selection methods across simulation scenarios.

Scenario	Metric	Ensemble vs. BPSO	Ensemble vs. BGWO	Ensemble vs. MBPSO	Ensemble vs. BDGWO
	Accuracy	0.0021	0.0016	0.0092	0.0120
I-1	TPR	0.0017	0.0013	0.0103	0.0118
	F1-Score	0.0007	0.0009	0.0100	0.0107
	FPR	0.0000	0.0000	0.0089	0.0090
	AUC	0.0012	0.0015	0.0137	0.0149
	Precision	0.0001	0.0001	0.0117	0.0132
	Accuracy	0.0012	0.0012	0.0069	0.0081
	TPR	0.0009	0.0010	0.0082	0.0080
I-2	F1-Score	0.0003	0.0005	0.0071	0.0078
	FPR	0.0000	0.0000	0.0038	0.0041
	AUC	0.0008	0.0009	0.0076	0.0085
	Precision	0.0000	0.0000	0.0070	0.0073
	Accuracy	0.0000	0.0000	0.0051	0.0057
	TPR	0.0000	0.0000	0.0045	0.0052
	F1-Score	0.0000	0.0000	0.0050	0.0062
I-3	FPR	0.0000	0.0000	0.0016	0.0025
	AUC	0.0000	0.0000	0.0062	0.0069
	Precision	0.0000	0.0000	0.0025	0.0032
	Accuracy	0.0000	0.0000	0.0022	0.0033
	TPR	0.0000	0.0000	0.0010	0.0012
IV	F1-Score	0.0000	0.0000	0.0028	0.0035
	FPR	0.0000	0.0000	0.0019	0.0012
	AUC	0.0000	0.0000	0.0023	0.0041
	Precision	0.0000	0.0000	0.0009	0.0012
	Accuracy	0.0000	0.0000	0.0082	0.0098
	TPR	0.0000	0.0000	0.0092	0.0100
II-1	F1-Score	0.0000	0.0000	0.0076	0.0093
	FPR	0.0000	0.0000	0.0031	0.0045
	AUC	0.0000	0.0000	0.0103	0.0119
	Precision	0.0000	0.0000	0.0079	0.0094
	Accuracy	0.0000	0.0000	0.0078	0.0092
	TPR	0.0000	0.0000	0.0074	0.0085
	F1-Score	0.0000	0.0000	0.0084	0.0087
II-2	FPR	0.0000	0.0000	0.0042	0.0040
	AUC	0.0000	0.0000	0.0088	0.0092
	Precision	0.0000	0.0000	0.0068	0.0089
	Accuracy	0.0000	0.0000	0.0011	0.0019
	TPR	0.0000	0.0000	0.0002	0.0015
	F1-Score	0.0000	0.0000	0.0010	0.0018
II-3	FPR	0.0000	0.0000	0.0005	0.0010
	AUC	0.0000	0.0000	0.0015	0.0020
	Precision	0.0000	0.0000	0.0001	0.0023
	Accuracy	0.0000	0.0000	0.0009	0.0013
	TPR	0.0000	0.0000	0.0001	0.0001
V	F1-Score	0.0000	0.0000	0.0007	0.0005
	FPR	0.0000	0.0000	0.0003	0.0003
	AUC	0.0000	0.0000	0.0011	0.0008
	Precision	0.0000	0.0000	0.0000	0.0001
	Accuracy	0.0000	0.0000	0.0080	0.0086
	TPR	0.0000	0.0000	0.0085	0.0090
III-1	F1-Score	0.0000	0.0000	0.0070	0.0068
	FPR	0.0000	0.0000	0.0024	0.0033
	AUC	0.0000	0.0000	0.0096	0.0099
	Precision	0.0000	0.0000	0.0073	0.0088
	Accuracy	0.0000	0.0000	0.0074	0.0082
	TPR	0.0000	0.0000	0.0077	0.0078
III-2	F1-Score	0.0000	0.0000	0.0069	0.0074
	FPR	0.0000	0.0000	0.0019	0.0033
	AUC	0.0000	0.0000	0.0081	0.0087
	Precision	0.0000	0.0000	0.0062	0.0070
	Accuracy	0.0000	0.0000	0.0009	0.0019
	TPR	0.0000	0.0000	0.0001	0.0008
III-3	F1-Score	0.0000	0.0000	0.0006	0.0007
	FPR	0.0000	0.0000	0.0003	0.0005
	AUC	0.0000	0.0000	0.0009	0.0018
	Precision	0.0000	0.0000	0.0000	0.0010
	Accuracy	0.0000	0.0000	0.0005	0.0008
	TPR	0.0000	0.0000	0.0000	0.0004
VI	F1-Score	0.0000	0.0000	0.0003	0.0005
	FPR	0.0000	0.0000	0.0000	0.0000
	AUC	0.0000	0.0000	0.0009	0.0011
	Precision	0.0000	0.0000	0.0000	0.0007

Note: p-values reported as 0.0000 indicate p < 10⁻⁴.

Table 5. Average runtime (seconds) for each method across different feature dimensions (n = 50).

Method	p = 60	p = 100	p = 200
BPSO	20.4	33.8	56.9
BGWO	22.1	38.7	62.4
MBPSO	27.3	44.6	68.2
BDGWO	32.9	48.8	79.1
MBPSO–BDGWO	37.2	57.9	92.6

Table 6. Real dataset results.

	Measure	BPSO	BGWO	MBPSO	BDGWO	Ensemble
SCADI	Accuracy	0.568 (0.089)	0.618 (0.061)	0.659 (0.058)	0.790 (0.039)	0.895 (0.011)
	TPR	0.617 (0.082)	0.675 (0.073)	0.735 (0.068)	0.805 (0.042)	0.911 (0.014)
	F1-score	0.575 (0.078)	0.632 (0.066)	0.673 (0.054)	0.782 (0.038)	0.886 (0.010)
	FPR	0.412 (0.061)	0.406 (0.052)	0.385 (0.042)	0.296 (0.032)	0.094 (0.008)
	AUC	0.559 (0.093)	0.620 (0.065)	0.686 (0.045)	0.773 (0.040)	0.903 (0.015)
	Precision	0.564 (0.073)	0.611 (0.067)	0.646 (0.059)	0.785 (0.042)	0.876 (0.018)
Toxicity	Accuracy	0.509 (0.085)	0.553 (0.086)	0.593 (0.055)	0.686 (0.031)	0.863 (0.014)
	TPR	0.533 (0.079)	0.567 (0.070)	0.600 (0.043)	0.663 (0.037)	0.875 (0.011)
	F1-score	0.506 (0.081)	0.542 (0.089)	0.587 (0.060)	0.679 (0.029)	0.857 (0.008)
	FPR	0.423 (0.080)	0.391 (0.072)	0.375 (0.042)	0.331 (0.021)	0.107 (0.0009)
	AUC	0.546 (0.085)	0.563 (0.072)	0.592 (0.047)	0.668 (0.030)	0.894 (0.016)
	Precision	0.498 (0.090)	0.530 (0.081)	0.577 (0.058)	0.646 (0.033)	0.829 (0.010)
Lung	Accuracy	0.468 (0.110)	0.470 (0.103)	0.473 (0.060)	0.553 (0.066)	0.726 (0.012)
	TPR	0.405 (0.131)	0.434 (0.126)	0.446 (0.051)	0.579 (0.054)	0.781 (0.009)
	F1-score	0.399 (0.119)	0.422 (0.106)	0.439 (0.048)	0.573 (0.059)	0.750 (0.008)
	FPR	0.458 (0.198)	0.441 (0.108)	0.402 (0.044)	0.351 (0.022)	0.121 (0.0010)
	AUC	0.461 (0.122)	0.460 (0.114)	0.462 (0.054)	0.585 (0.062)	0.787 (0.015)
	Precision	0.384 (0.099)	0.399 (0.109)	0.428 (0.041)	0.569 (0.058)	0.742 (0.009)
Prostate	Accuracy	0.482 (0.105)	0.441 (0.115)	0.496 (0.071)	0.604 (0.052)	0.796 (0.015)
	TPR	0.451 (0.111)	0.448 (0.120)	0.469 (0.043)	0.589 (0.042)	0.769 (0.012)
	F1-score	0.428 (0.114)	0.421 (0.107)	0.472 (0.053)	0.598 (0.053)	0.778 (0.007)
	FPR	0.453 (0.157)	0.456 (0.142)	0.441 (0.038)	0.342 (0.015)	0.115 (0.0005)
	AUC	0.477 (0.112)	0.440 (0.103)	0.481 (0.061)	0.602 (0.049)	0.793 (0.017)
	Precision	0.410 (0.084)	0.403 (0.096)	0.459 (0.050)	0.586 (0.033)	0.762 (0.013)

Table 7. Median and MAD values of the number of selected features by each method.

	Method	Median	MAD	Min	Max
SCADI (p = 205, n = 70)	BPSO	146	14	128	171
	BGWO	152	11	137	175
	MBPSO	85	7	74	98
	BDGWO	74	5	66	88
	Ensemble	12	2	10	14
Toxicity (p = 1203, n = 171)	BPSO	851	42	793	974
	BGWO	798	36	745	872
	MBPSO	544	23	498	587
	BDGWO	323	18	295	352
	Ensemble	15	4	11	29
Lung (p = 12,533, n = 181)	BPSO	1243	96	1090	2047
	BGWO	985	84	754	1224
	MBPSO	587	55	498	675
	BDGWO	208	21	170	249
	Ensemble	29	5	17	42
Prostate (p = 12,600, n = 102)	BPSO	987	87	844	1653
	BGWO	1132	92	986	1758
	MBPSO	349	31	267	547
	BDGWO	135	19	107	180
	Ensemble	38	4	24	47

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sancar, N. A Novel MBPSO–BDGWO Ensemble Feature Selection Method for High-Dimensional Classification Data. Informatics 2026, 13, 7. https://doi.org/10.3390/informatics13010007

AMA Style

Sancar N. A Novel MBPSO–BDGWO Ensemble Feature Selection Method for High-Dimensional Classification Data. Informatics. 2026; 13(1):7. https://doi.org/10.3390/informatics13010007

Chicago/Turabian Style

Sancar, Nuriye. 2026. "A Novel MBPSO–BDGWO Ensemble Feature Selection Method for High-Dimensional Classification Data" Informatics 13, no. 1: 7. https://doi.org/10.3390/informatics13010007

APA Style

Sancar, N. (2026). A Novel MBPSO–BDGWO Ensemble Feature Selection Method for High-Dimensional Classification Data. Informatics, 13(1), 7. https://doi.org/10.3390/informatics13010007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Novel MBPSO–BDGWO Ensemble Feature Selection Method for High-Dimensional Classification Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Binary PSO

Modified BPSO

2.2. Binary GWO

Binary Dynamic Grey Wolf Optimization Algorithm (BDGWO)

2.3. Proposed MBPSO-BDGWO-Based Ensemble Feature Selection Method

Objective Function

2.4. Simulation Study Design

2.4.1. Performance Evaluation Metrics

2.4.2. Ablation Experiments

3. Results

Real Data Set Applications

4. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI