Multi-Variable Evaluation via Position Binarization-Based Sparrow Search

Hua, Jiwei; Gu, Xin; Sun, Debing; Zhu, Jinqi; Wang, Shuqin

doi:10.3390/electronics14163312

Open AccessArticle

Multi-Variable Evaluation via Position Binarization-Based Sparrow Search

by

Jiwei Hua

¹

,

Xin Gu

¹,

Debing Sun

¹,

Jinqi Zhu

^2,* and

Shuqin Wang

^1,*

¹

College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China

²

Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(16), 3312; https://doi.org/10.3390/electronics14163312

Submission received: 14 March 2025 / Revised: 11 August 2025 / Accepted: 14 August 2025 / Published: 20 August 2025

Download

Browse Figures

Versions Notes

Abstract

The Sparrow Search Algorithm (SSA), a metaheuristic renowned for rapid convergence, good stability, and high search accuracy in continuous optimization, faces inherent limitations when applied to discrete multi-variable combinatorial optimization problems like feature selection. To enable effective multi-variable evaluation and discrete feature subset selection using SSA, a novel binary variant, Position Binarization-based Sparrow Search Algorithm (BSSA), is proposed. BSSA employs a sigmoid transformation function to convert the continuous position vectors generated by the standard SSA into binary solutions, representing feature inclusion or exclusion. Recognizing that the inherent exploitation bias of SSA and the complexity of high-dimensional feature spaces can lead to premature convergence and suboptimal solutions, we further enhance BSSA by introducing stochastic Gaussian noise (zero mean) into the sigmoid transformation. This strategic perturbation actively diversifies the search population, improves exploration capability, and bolsters the algorithm’s robustness against local optima stagnation during multi-variable evaluation. The fitness of each candidate feature subset (solution) is evaluated using the classification accuracy of a Support Vector Machine (SVM) classifier. The BSSA algorithm is compared with four high-performance optimization algorithms on 12 diverse benchmark datasets selected from the UCI repository, utilizing multiple performance metrics. Experimental results demonstrate that BSSA achieves superior performance in classification accuracy, computational efficiency, and optimal feature selection, significantly advancing multi-variable evaluation for feature selection tasks.

Keywords:

feature selection; binary sparrow search algorithm; multi-variable evaluation; bionic optimization; stochastic perturbation

1. Introduction

Feature selection serves as a critical preprocessing step in machine learning, effectively reducing data dimensionality while enhancing model performance by doing away with the redundant and irrelevant features. It has been successfully applied to several different areas, such as text classification [1], image retrieval [2], and medical diagnostics [3]. However, selecting optimal feature subsets remains particularly challenging for high-dimensional datasets due to their inherent complexity and combinatorial search space constraints. The primary objective of this paper is to develop a novel optimization algorithm, specifically a position binarization-based Sparrow Search Algorithm (BSSA), to select the optimal feature subset while simultaneously improving classification accuracy.

The feature selection process identifies an optimal subset of M features (M ≤ N) from a dataset containing N features, thereby reducing dimensionality, noise, and computational complexity while improving classification performance. There are three types of feature selection methods: Filter [4], Wrapper [5,6], and Embedded [7,8] approaches. Filter methods select features independently of the learning algorithm [9], while Embedded methods synergistically integrate feature selection within model construction, combining advantages from both Filter and Wrapper paradigms. Wrapper methods utilize machine learning classifiers to assess feature subsets, often achieving superior accuracy through classifier-specific optimization. As the feature selection problem requires the selection of M out of N total features and is an optimization problem at its core, some Wrapper methods are based on stochastic optimization algorithms. Furthermore, Wrapper methods employ specific classifiers (e.g., SVM) to identify the best subset of characteristics. A suitable classifier is needed to evaluate the quality of the selected features. In general, Wrapper methods can obtain better classification accuracies than Filter methods. This is because they can be customized based on the accuracy between the classifier and the dataset. Consequently, wrapper methods are widely adopted in heuristic-based feature selection.

To find a solution with reasonable computational cost, different metaheuristic algorithms are utilized for the same problem [10,11]. These algorithms can be classified into several groups, such as evolutionary, physics-based, human behavior-based, and population-based algorithms [12,13,14]. Among these metaheuristic algorithms, swarm intelligence algorithms [15] simulate animal behavior and focus on the collective behavior of multiple interacting agents adhering to a simple set of rules, which are often inspired by natural phenomena. Metaheuristic algorithms are also widely used in feature selection problems. The in-depth research on feature selection algorithms has led to an emergence of numerous new algorithms. Therefore, choosing an appropriate algorithm has become a crucial issue [16,17].

In recent years, randomized optimization algorithms, including the genetic algorithm [18,19], ant colony algorithm [20], artificial bee colony algorithm [21], and particle swarm optimization algorithm [22], have garnered significant attention from researchers. These algorithms have been employed in systematic and in-depth research and are used to deal with related feature selection problems [23]. While each algorithm offers its unique benefits, it is not without its limitations. Take an example: particle swarm optimization algorithms exhibit premature convergence in high-dimensional spaces due to limited search diversity, and ant colony algorithms can have slower search times.

The Sparrow Search Algorithm (SSA) is an optimization algorithm inspired by natural behavior [24]. It derives from the observation of sparrow scavenging and anti-predation behaviors. Compared with other traditional swarm intelligence algorithms, such as particle swarm optimization (PSO) and grey wolf optimizer (GWO), SSA offers rapid convergence, good stability, and strong global search capability [24]. Zhang and Ding used SSA to optimize the stochastic configuration network [25], and Li and Wang proposed a multi-objective sparrow search algorithm (MOSSA) to solve complex MOPsThus [26]. Sun et al. developed two types of binary SSA for feature selection [27]. As evidenced by studies [25,26,27], improved SSA variants demonstrate superior performance over traditional swarm intelligence algorithms in both engineering applications and feature selection problems, particularly in benchmark tests. Therefore, the SSA is a promising approach to address feature selection problems.

However, existing swarm intelligence-based feature selection methods (e.g., BPSO, BGWOA) suffer from premature convergence in high-dimensional spaces due to limited exploration diversity and sensitivity to local optima, which can adversely affect the performance of learning algorithms, there remains a critical need for algorithms that balance exploration-exploitation trade-offs while ensuring robust binarization. To address this limitation, we introduce the SSA, owing to its superior global search capabilities. However, its native form lacks mechanisms for discrete search spaces, necessitating a probabilistic binarization extension. Therefore, a novel Binary Sparrow Search Algorithm (BSSA) is proposed to select the optimal feature subset, thereby enhancing classification performance through efficient dimensionality reduction. An S-shaped transformation function is employed to binarize continuous positions by mapping each dimension of the SSA’s position vector to (0, 1] intervals. To further increase the diversity of solutions and mitigate premature convergence, stochastic perturbation is introduced using additive Gaussian noise with zero mean normal distribution (σ = 0.2) to the values of the sigmoid transformation function. Then, the random threshold is adopted to determine the updated binary sparrow search position. To determine the best solution, an SVM classifier is used to evaluate each feature subset.

The main contributions of this study are as follows:

(1): A novel stochastic binarization mechanism using Gaussian-perturbed sigmoid mapping to mitigate premature convergence;
(2): A unified framework integrating BSSA with SVM classifiers to simultaneously optimize feature cardinality and accuracy;
(3): Extensive validation across 12 UCI datasets demonstrating statistically significant improvements in accuracy, computation time, and feature reduction.

The rest structure of the paper is organized as follows: Section 2 introduces the theoretical knowledge on the continuous Sparrow Search Algorithm; Section 3 presents the position binarization-based Sparrow Search Algorithm; Section 4 explains a feature selection method based on the proposed BSSA algorithm; and Section 5 describes and analyzes the experimental results. Lastly, Section 6 concludes the paper and highlights its future research direction.

2. Background

2.1. Continuous Sparrow Search Algorithm

SSA is a group intelligence optimization algorithm that has emerged in recent years. Every position of a sparrow corresponds to a solution. The update mode of the algorithm can be divided into moving closer to the current optimal position and moving closer to the origin. Sparrows are social birds known for their intelligence and having superior memory capabilities compared to other birds [24]. The principles of bionics that have inspired this algorithm are as follows.

The algorithm formulates the sparrows’ scavenging process as a “finder-participant” model, incorporating detection and warning mechanisms [25,26]. The explorers in the model have high fitness and a substantial search range, effectively supporting the population’s search and foraging efforts. Participants follow their finders to forage, aiming to improve their fitness. At the same time, some participants may spy on the observers, initiate food competition, or forage around to increase their predation rate. When detecting threats from predators, all sparrows will immediately engage in anti-predatory behavior.

The Sparrow Search Algorithm optimizes parameters through simulating sparrows’ dual behaviors of food search and anti-predation. Suppose there are N sparrows within a D-dimensional search space, where the position of the ith sparrow is represented as

X_{i} = [X_{i 1} \dots X_{i d} \dots X_{i D}]

for

i = 1, 2, 3, \dots, N

, with

X_{i d}

denoting the position of the ith sparrow in the d dimension.

The population of sparrows consists of three distinct roles: producers, predators, and scroungers. Producers are responsible for seeking food sources, while scroungers, who make up 10 to 20% of the population, follow producers for foraging. All sparrows will fly to a safe zone when the alert value exceeds the safe threshold. During each iteration, producers update their positions according to Equation (1).

X_{i}^{t + 1} = \{\begin{matrix} X_{i}^{t} \cdot \exp (\frac{- t}{α \cdot {i t e r}_{m a x}}) i f A V < S T \\ X_{i}^{t} + Q \cdot L i f A V \geq S T \end{matrix},

(1)

where t denotes the current number of iterations,

X_{i}^{t}

represents the current position of sparrow i at iteration t, and

{i t e r}_{m a x}

is the maximum iteration count. Q is a random number following normal distribution, and L is a 1 × D matrix, wherein each element is 1.

A V \in [0, 1]

denotes the alarm value, and

S T \in [0.5,1]

represents the safety threshold.

α

is a uniformly distributed random number in [0,1]. When

A V < S T

, producers maintain broad exploration without fear of being detected by the predator; conversely,

A V \geq S T

triggers collective relocation to safer areas.

Some scroungers keep a watchful eye on the producers throughout the foraging process, and once they find that the producers have found a better food source, the scroungers will immediately compete for the food. If successful, they will obtain food directly. The updated formula for the scrounger’s position is as specified in Equation (2).

X_{i}^{t + 1} = \{\begin{array}{l} Q \cdot e x p (\frac{X_{G w o r s t}^{t} - X_{i}^{t}}{i^{2}}) i f i > n / 2 \\ {X P}_{i}^{t + 1} + |X_{i}^{t} - {X P}_{i}^{t + 1}| \cdot A^{+} \cdot L o t h e r w i s e \end{array},

(2)

where

{X P}_{i}

represents the best position for the producer and

X_{G w o r s t}

is the worst position for all sparrows. A indicates that each element value of a 1 × D matrix is randomly assigned a value of 1 or −1, and

A^{+} = A^{T} {(A A^{T})}^{- 1}

. If

i > n / 2

, it indicates that the scrounger failed to obtain the food and will need to fly elsewhere for higher energy.

While foraging, the sparrows situated at the periphery of the population frequently encounter peril. They will fly to a secure location upon detecting danger. Supposing that between 10% to 20% of all sparrows in each cohort will come across dangerous situations. The updated position is calculated as Equation (3).

X_{i}^{t + 1} = \{\begin{array}{l} X_{G b e s t}^{t} + β \cdot (X_{i}^{t} - X_{G b e s t}^{t}) i f f (X_{i}^{t}) \neq {f (X}_{G b e s t}) \\ X_{i}^{t} + K \cdot (\frac{|X_{i}^{t} - X_{G w o r s t}^{t}|}{(f (X_{i}^{t}) - {f (X}_{G w o r s t})) + ε}) i f f (X_{i}^{t}) = {f (X}_{G b e s t}) \end{array},

(3)

where

β

is a random number drawn from a normal distribution with a mean of 0 and a variance of 1. K is the random control factor. ε represents a constant with the smallest value.

X_{G b e s t}

is the present best position for the population. The fitness value of the current sparrow

X_{i}^{t}

is represented by

f (X_{i}^{t})

.

{f (X}_{G w o r s t})

and

{f (X}_{G b e s t})

are representative of the present global worst and best fitness values, respectively. If

f (X_{i}^{t}) \neq {f (X}_{G b e s t})

, the sparrow is at the periphery of the population, where it is more likely to encounter the predators. Conversely, if

f (X_{i}^{t}) = {f (X}_{G b e s t})

, the sparrow is situated at the center of the population and therefore needs to approach other sparrows to lower its risk of predation.

2.2. Related Work

Swarm intelligence algorithms have gained significant attention for feature selection problems due to their ability to efficiently explore large search spaces. Swarm intelligence algorithms mimic the decentralized and self-organized behavior of natural swarms to solve optimization problems [28,29]. In feature selection, these algorithms explore the feature space to identify the most relevant subset of features while discarding irrelevant or redundant ones [30]. The key advantage of swarm intelligence lies in its ability to efficiently search large and complex spaces, making it well-suited for feature selection tasks [31].

Particle Swarm Optimization is one of the most widely used swarm intelligence algorithms for feature selection [32]. Inspired by the social behavior of bird flocking, PSO maintains a population of particles, each representing a potential feature subset. Unler et al. [33] developed a modified discrete particle swarm optimization (PSO) algorithm for the feature subset selection problem. This approach embodies an adaptive feature selection procedure which dynamically accounts for the relevance and dependence of the features included the feature subset. Xiao et al. [32] proposed a novel binary particle swarm optimization (NBPSO) algorithm. In this work, a dynamic weight search strategy is introduced to meet the different requirements for solution performance in different stages. Meanwhile, it combines with the Lévy flight search strategy to reduce the possibility of falling into local optimal solution.

Magnetic Optimization Algorithm (MOA) has been inspired by magnetic field theory. Mirjalili et al. [34] proposed the binary version of MOA for feature selection. Mirjalili et al. [35] proposed the binary bat algorithm. Grey wolf optimizer (GWO) is one of the latest bio-inspired optimization techniques; it simulates the hunting process of grey wolves in nature. Emary et al. [36] proposed binary grey wolf optimization approaches to select optimal feature subsets for classification purposes, in which sigmoidal function is used to squash the continuous updated position, then stochastically threshold these values to find the updated binary grey wolf position. Cuckoo Search (CS) is a metaheuristic search algorithm that is inspired by the obligate brood parasitism of some cuckoo species that lay their eggs in the nests of other host birds. Pereira et al. [37] proposed a binary version of the Cuckoo Search; it is evaluated with different transfer functions that map continuous solutions to binary ones. The Optimum-Path Forest classifier accuracy is used as the fitness function. Hussien et al. [38] proposed a S-shaped binary whale optimization algorithm to select the optimal feature subset for dimensionality reduction and classification problems. This approach utilized a sigmoid transfer function (S-shape) in every dimension that defines the probability of transforming the position vectors’ elements from 0 to 1 and vice versa and hence force the search agents to move in a binary space. A novel binary sparrow search algorithm is proposed to efficiently solve feature selection optimization problems for classification, and their classification performance is also excellent [27,39].

3. The Position Binarization Based Binary Sparrow Search Algorithm

3.1. Method

SSA has greatly improved the exploration and utilization of optimized search spaces owing to its strong global search capabilities and adaptability [39]. Nonetheless, since feature selection problems are essentially binarization problems, and are formalized as finding a binary vector

X \in {\{0, 1\}}^{D},

maximizing classifier accuracy while minimizing the number of selected features, where

x_{i} = 1

denotes that feature

f_{i}

be selected, the original Sparrow Search Algorithm cannot be used directly to solve them. Accordingly, it is necessary to develop a binarization version of SSA to accommodate feature selection problems.

In wrapper-based feature selection methods, classifiers are usually used for evaluating per individual, and the intelligent optimization algorithm is employed to maximize the evaluation values. In other words, the intelligent optimization algorithms are adopted to identify the combinations of features that optimize classifier performance, with search agents corresponding to these feature combinations exploring in the D-dimensional search space. Binarization operators are more direct than continuous operators because they limit the value of each dimension to 0 or 1 in this search space [35]. Therefore, we propose a position binarization-based sparrow optimization algorithm to solve the feature selection problem. The rationale behind developing binary SSA is that the solution should be a string of 0 and 1, where each value denotes whether the corresponding feature is selected or not in the feature selection problem. In this paper, the same search mechanism is used as in the original SSA for the binary variant.

3.1.1. The Updated Position Binarization

Transfer functions are commonly used in binarization methods due to their effectiveness and simplicity [15]. They define the probability of converting a continuous value to a binarized value with 0 or 1. Therefore, we also employ a transfer function to design the binary version of SSA, in which the probability that every dimension of the sparrow position vector changes into 0 or 1 is provided by the transfer function. In BSSA, the new positions of the sparrows obtained by the equations of the above updated position are continuous values in the global or local search, so these positions must be binarized to determine whether their corresponding features are selected or not. Compared to the Tanh transfer function, the Sigmoidal (S-shaped) transfer function offers a more probabilistic interpretation and can smoothly map continuous positions to the interval [0,1] without abrupt thresholds. This characteristic is crucial for preserving swarm diversity during early exploration [34]. Consequently, like in the PSO, GWO, and Cuckoo for feature selection [33,34,35,36,37,38], a Sigmoidal transfer function is employed to determine the transition probability in each dimension of the continuous solution vector, which ensures that the sparrow moves in the binary search space. The S-shaped transfer function is denoted as Equation (4) [34,40].

S (X_{i d}^{t + 1}) = \frac{1}{1 + e^{- X_{i d}^{t + 1}}},

(4)

where

X_{i d}^{t + 1}

is the dth dimension of the position vector of the ith sparrow in generation t + 1. To obtain binary values from the continuous output of the sigmoid transfer function, a threshold must be established. The probability of a feature being selected (changing to 1) or deselected (changing to 0) increases with the slope of the transfer function. As a result, the binary solution is obtained using a general random threshold, and then the updated position binarization is computed as Equation (5).

x_{i d}^{t + 1} = \{\begin{matrix} 0 i f r d < S (x_{i d}^{t + 1}) \\ 1 i f r d \geq S (x_{i d}^{t + 1}) \end{matrix},

(5)

where,

x_{i d}^{t + 1}

is the updated binarized position, and rd is a random number located at [0,1]. If

r d < S (x_{i d}^{t + 1})

, the value of

x_{i d}^{t + 1}

is changed to 0; otherwise, to 1.

3.1.2. The Updated Position Binarization Based on a Small Perturbation

Unlike in reference [27], a stochastic perturbation with additive Gaussian noise [41], having zero mean, is introduced to the values of the sigmoid transformation function as the alternative probabilities, further increasing the diversity of solutions and the robustness of the algorithm. Gaussian noise is selected over uniform perturbations for its heavier tails, which enhance escape from local optima while maintaining population stability, as validated in prior feature selection studies [41]. Figure 1a illustrates the comparison of probability values before and after the perturbation was introduced. The figure indicates that all probability values may randomly increase or decrease because of the introduced random perturbations. Consequently, for the positions close to the threshold, adding perturbation may significantly change the corresponding binarized positions, as Figure 1b demonstrates. The binarized positions that were originally binary 1 may become 0, and vice versa. Other binarized positions remain unaffected. By stochastically modifying the selection probabilities, especially for features near the threshold, this process increases the exploration of different feature subsets, leading to enhanced solution diversity. The probability of altering the position for adding perturbation is calculated as Equation (6).

S^{'} (x_{i d}^{t + 1}) = S (x_{i d}^{t + 1}) + N (μ,),

(6)

where

N (μ,)

denotes a Gaussian distributed variable with mean

μ = 0

and standard deviation σ, which is set to 0.2 in this paper.

The updated position binarization based on a small perturbation is computed as Equation (7).

x_{i d}^{t + 1} = \{\begin{matrix} 0 i f r d < S^{'} (x_{i d}^{t + 1}) \\ 1 i f r d \geq S^{'} (x_{i d}^{t + 1}) \end{matrix},

(7)

3.1.3. The BSSA Framework

The framework of the position binarization-based Sparrow Search Algorithm is given by Algorithm 1, in which Gaussian noise (

N (0, 0.2)

) is applied per iteration to all agents to escape local optima, with bounded variance ensuring stability. A pivotal design choice in BSSA is the dual binarization mechanism, which evaluates both the original sigmoid-transformed position

X_{j}^{'}

(Equation (5)) and its perturbed counterpart

X_{j}^{″}

(Equation (7)). By competitively selecting the superior solution between

X_{j}^{'}

and

X_{j}^{″}

via fitness comparison, the algorithm achieves a balance between exploitation (preserving high-quality solutions derived from swarm dynamics) and exploration (probing alternative subspaces via Gaussian noise).

Algorithm 1: The position binarization-based Sparrow Search Algorithm

Input:
N: the number of sparrows

{i t e r}_{m a x}

: maximum number of iterations
AV: the alarm value
SN: the number of sparrows observed to be at risk
PN: the number of producers
σ: perturbation
Output:

X_{G b e s t} :

the optimal location of the population

f_{G b e s t} :

the global optimal fitness value
Initialize a population of N sparrows

X = \{X_{1}, X_{2}, \dots, X_{N}\}

and set the corresponding parameters;
Calculate the fitness value of each sparrow

f = \{f_{1}, f_{2}, \dots, f_{N}\}

;

X P = \{{X P}_{1}, {X P}_{2}, \dots, {X P}_{N}\}

,

f P = \{{f P}_{1}, {f P}_{2}, \dots, {f P}_{N}\}

;

X P = X

, fP = f;
t = 0;
While (t <

{i t e r}_{m a x}

)
Sort the fitness values to identify the present best positions

X_{G b e s t}

, it is the best fitness

f_{G b e s t}

and the worst positions

X_{G w o r s t}

;
AV = random (0,1);
for j = 1: PN
Compute the new position of the jth sparrow using Equation (1);
end for
for j = (PN + 1): N
Compute the new position of the jth sparrow using Equation (2);
end for
for j = 1: SN
Compute the new position of the jth sparrow using Equation (3);
end for
for j = 1: N
Calculate the changing probability

S (X_{j})

by using Equation (4);
Using Equation (5), obtain the binary solution;
Get the current new location

X_{j}^{'}

;
Calculate the changing probability with perturbation

S^{'} (X_{j})

by using Equation (6);
Using Equation (7) obtain the binary solution;
Get the current new location

X_{j}^{″}

;
Calculate the fitness value

f (X_{j}^{'})

of

X_{j}^{'}

and the fitness value

f (X_{j}^{″})

of

X_{j}^{″}

;
If

f (X_{j}^{″}) > f (X_{j}^{'})

, update the new position

X_{j}^{'} = X_{j}^{″}

and

f (X_{j}^{'}) = f (X_{j}^{″})

;
If

f (X_{j}^{'}) > f (X_{j})

, update the new position

{{X P}_{j} = X}_{j} = X_{j}^{'}

and

f (X_{j}) = f ({X P}_{j}) = f (X_{j}^{'})

;
end for
t = t + 1;
end while
return

f_{G b e s t}

,

X_{G b e s t}

3.2. Computation Complexity Analysis

Suppose there are N sparrows within the D-dimensional search space, the computational complexity of the proposed BSSA is analyzed as follows, focusing on its core operations in the context of feature selection:

(1): The computation complexity of Population Initialization is O(N D);
(2): The computation complexity of Position Update Mechanisms, including Sigmoid transformation, Gaussian noise injection, and threshold-based binarization, is O(N D);
(3): Because its L2 regularization inherently penalizes model complexity, mitigating overfitting when evaluating sparse feature subsets, the SVM classifier is employed for evaluating the feature subset selected by BSSA. Assuming SVM classifier training on the selected feature subset (size k ≤ D) with M training samples, the computation complexity of fitness evaluation is O(N M² k).

Therefore, the total complexity of the BSSA is O(N(D + M² k)).

4. Multi-Variable Evaluation Using Position Binarization-Based Sparrow Optimization Algorithm

In this study, we employ the BSSA to select the best feature subset. A position of a sparrow is used to denote a solution of the feature selection problem, which is represented as a one-dimensional binary position vector whose length is determined by the number of features in the dataset. Each dimension of the position vector indicates whether the corresponding feature is selected or not. Specifically, the sigmoid outputs

S (x_{i d}^{t + 1})

or

S^{'} (x_{i d}^{t + 1})

represents the probability of setting feature d to 1, with Gaussian noise diversifying search trajectories. If

S (x_{i d}^{t + 1})

or

S^{'} (x_{i d}^{t + 1})

exceeds a randomly generated number, the d-th component of the position vector is assigned a value of 0, which indicates that the corresponding feature is not selected. Conversely, the component is assigned a value of 1, indicating that the feature is selected.

To determine the optimal solution and maintain the balance of feature selection, the fitness function combines two objectives: minimizing the classification error rate and the number of selected features. Formally, the fitness function in this paper is expressed as

F i t n e s s = α {E R}_{s} + β \frac{| S |}{| T |},

(8)

where, |S| corresponds to the length of the selected feature subset S, |T| is the total amount of characteristics,

{E R}_{S}

denotes the classification error rate of the SVM classifier with L2 regularization (hyperparameter C), penalizing overly complex models and mitigating overfitting, which is used to evaluate classification performance of the selected features [42]. The constants α and β are the weights of classification error rate and the length of feature subset, respectively, which govern the balance between feature reduction and accuracy. Specifically,

α \in [0, 1]

, and

β

is calculated as

1 - α

. Since improving accuracy means reducing the error rate, and accuracy is obviously more important, this paper prioritizes minimizing the error rate by assigning a high weight of

α = 0

.99 and a correspondingly low weight of

β = 0

.01, as recommended by [43], which dynamically prunes redundant features, achieving sparsity without sacrificing generalization.

5. Experimental Results and Discussion

5.1. Data Description

To evaluate the effectiveness of our proposed position binarization-based algorithm, we conducted experiments and comparative analyses using 12 benchmark datasets obtained from the UCI machine learning library [44]. These datasets were selected to ensure a diverse range of characteristics, including variations in dimensionality (ranging from 9 to 6033 features), class distributions (spanning from 2 to 8 classes), and sample sizes (varying from 102 to 5000 instances). Details of these datasets are described in Table 1. These datasets were selected because they contained numerous attributes and samples, covering a wide range of problems suitable for testing the BSSA algorithm. Additionally, we selected several high-dimensional datasets for evaluation. To mitigate overfitting and reliably estimate generalization performance, we employed the K-fold cross-validation technique [45]. Each dataset was divided into K folds of approximately equal size. For each iteration i (i = 1 to K), fold i was held out as the test set. The remaining K − 1 folds constituted the training set on which the feature selection algorithm (BSSA or comparator) was applied to select the optimal feature subset. The classifier (SVM) was used to evaluate the selected best feature subset on the test set (fold i) using only those selected features. This process was repeated K times, with each fold serving as the test set exactly once. The average performance over all K test folds provides the estimate of the algorithm’s generalization ability for unseen data.

We employed a wrapper feature selection approach based on the SVM classifier using K = 5-fold cross-validation for all datasets. During training, the position of each search agent represents a candidate subset of features. To evaluate the proposed feature selection algorithm, it is compared with existing optimization algorithms, namely the binary particle swarm optimization algorithm (BPSO) [33], binary gray Wolf optimization algorithm (BGWOA) [36], binary cuckoo search algorithm (BCSA) [37], and binary whale optimization algorithm (BWOA) [38]. As we all know, by setting the parameter values of the algorithm, the performance of the algorithm can be significantly affected. Parameter tuning requires extensive experiments in practical applications to explore its impact on the algorithm [46]. In this study, each algorithm was performed for 20 independent runs, with a maximum iteration number of 100 in each trial. To ensure reproducibility, the rng function in MATLAB was used to set random seeds before each run. The parameter settings for BSSA and each comparative algorithm are provided in Table 2. All experiments were conducted using Matlab_R2017b as the integrated development environment, with the operating system being MacOS 11.4 on 6-core Intel Core i7 (3.2 GHz) (Intel Corporation, Santa Clara, CA, USA) with 32 GB unified memory. All algorithms used canonical implementations in Matlab_R2017b.

Table 1. List of datasets.

No.	Dataset	Features	Samples	Classes
1	BreastEW	30	569	2
2	Clean1	166	476	2
3	forest	27	198	8
4	KrvskpEW	36	3196	2
5	WaveformEW	40	5000	3
6	glass	9	214	6
7	dermatology	33	366	6
8	lung-cancer	55	366	3
9	Z-Alizideh	49	140	2
10	sonarEW	60	208	2
11	LUNG2	3312	203	2
12	PRO	6033	102	2

Table 2. Initial values of algorithm control parameters.

Algorithm	Parameter	Value (s)
all algorithms	Population size	40
all algorithms	The number of iterations	100
BPSO	Learn the factors c1 and c2	c1 = c2 = 2
	Constriction factor k	0.729
	Inertial factor	Dynamic
	Acceleration constants in PSO	[2,2]
	Inertia w in PSO	[0.9,0.6]
	The optimal solution (a)	It is decreasing linearly from 2 to 0
BGWOA	Collaborative coefficient vector (A)	[−a,a]
	Collaborative coefficient vector (C)	Random value [0,2]
	Step size scaling factor ( $α$ )	$α > 0$
BCSA	Probability of being discovered by the host ( $p_{a}$ )	$p_{a} \in [0, 1]$
	Step size scaling factor ( $α)$	$α > 0$
	Search agents’ number	8
BWOA	Search domain	[0,1]
	α parameter in the fitness function	0.99
	β parameter in the fitness function	0.01
	The number of discoverers	20%
BSSA	Detecting the number of endangered sparrows	10%
	Safe threshold	0.8
	Perturbation σ	0.2

5.2. Evaluation Criteria

The following measures are applied to assess the performance of each optimization algorithm on every dataset, where M is the number of the algorithm runs.

Averaged classification accuracy is calculated as specified in Equation (9).

A v g A C C = \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{N} \sum_{i = 1}^{N} e q u a l (T_{i}, P_{i})

(9)

where N is the number of test samples,

T_{i}

represents the true class for the ith sample, and

P_{i}

represents its predicted class, which is obtained using the selected feature subset generated by an optimization algorithm. The equal function is 1 if

T_{i} e q u a l s

P_{i}

and 0 otherwise.

The average number of selected features is computed as specified in Equation (10).

A v g F N = \frac{1}{M} \sum_{i = 1}^{M} s i z e (X_{G b e s t})

(10)

where

s i z e (X_{G b e s t})

denotes the number of features in the best feature subset.

Averaged fitness value is calculated as specified in Equation (11).

A v g f i t n e s s = \frac{1}{M} \sum_{i = 1}^{M} f_{G b e s t}^{i}

(11)

where

f_{G b e s t}^{i}

is the best fitness value in the ith run.

Optimal fitness value reflects peak solution quality. It is calculated as specified in Equation (12).

B e s t f i t n e s s = m i n {f_{G b e s t}^{i}, i = 1,2, \dots, M}

(12)

where

f_{G b e s t}^{i}

is the best fitness value, which is the minimum value of fitness function gained in the ith run.

Worst fitness value indicates the robustness of the algorithm. It is calculated as specified in Equation (13).

W o r s t f i t n e s s = m a x {f_{G w o r s t}^{i}, i = 1,2, \dots, M}

(13)

where

f_{G w o r s t}^{i}

is the worst fitness value, the maximum value of fitness function gained in the ith run.

Averaged computational time is calculated as specified in Equation (14).

A v g C T = \frac{1}{M} \sum_{i = 1}^{M} {C T}_{i}

(14)

where

{C T}_{i}

is the calculation time in the ith run.

Standard deviation of the optimal fitness values in M runs is calculated as specified in Equation (15).

S D = \sqrt{\frac{1}{M - 1} \sum {(f_{G b e s t}^{i} - A v g f i t n e s s)}^{2}}

(15)

The Wilcoxon signed rank test is a nonparametric statistical test that is often used to compare two algorithms in terms of their statistical difference [47,48,49].

5.3. Experimental Results

Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10 show the experimental results of running the proposed BSSA algorithm and four other metaheuristic feature selection methods, including BPSO, BGWOA, BCSA, and BWOA, on 12 datasets for 100 iterations. The tables highlight the best performance results for each dataset in bold. Additionally, the Win/Loss/Tie (W/L/T) notation represents the number of datasets where BSSA achieves better (or worse or equal) performance in comparison to the other four algorithms.

5.3.1. Classification Performance and Statistical Significance

Table 3 displays the comparison results of all algorithms based on averaged classification accuracy (Equation (9)) and W/L/T. BSSA has superior classification accuracies compared to the other four algorithms on 10 datasets, while BGWOA outperformed the other algorithms on the remaining two datasets. BSSA achieved the highest average classification accuracy (96.11%) across the 12 datasets, surpassing the second-ranked BWOA (83.99%) by over 14%. These results demonstrate that the BSSA algorithm is highly capable of exploring the search space of solutions to efficiently determine the best subset of features that achieve the highest classification accuracy. Notably, on high-dimensional datasets (e.g., LUNG2, D = 3312), BSSA retains 11.8 features (Table 8) while achieving 97.02% accuracy, demonstrating its ability to discard redundant features without sacrificing discriminative power. This is because the sigmoid-Gaussian hybrid binarization (Equations (6) and (7)) enhances exploration in sparse subspaces, avoiding the overfitting to noisy features prevalent in high dimensions.

Table 4, Table 5 and Table 6 indicate the average (Equation (11)), best (Equation (12)), and worst (Equation (13)) fitness data obtained by all algorithms for each dataset. Across these tables, BSSA has the lowest average, best, and worst fitness metric values compared to other algorithms. Following BSSA is BWOA, then BCSA, BGWOA, and BPSO, in order. Notably, these rankings align with the sorting of the algorithms by classification accuracy. From Table 4, we can see that BSSA achieves the lowest average fitness values (0.07) across datasets, indicating optimal balance between feature reduction and classification performance. Its fitness is 56% lower than BWOA (0.16) and 77% lower than BPSO (0.31), demonstrating superior optimization capability. The exception is KrvskpEW, where BWOA (0.07) slightly outperforms BSSA (0.08), possibly due to dataset-specific characteristics. The consistent low fitness values confirm the efficacy of the dual binarization mechanism in maintaining solution quality throughout iterations. As shown in Table 5, BSSA achieves the best fitness values in 11/12 cases, with particularly large margins on complex datasets: 400% better than BWOA on WaveformEW (0.01 vs. 0.27) and 94% better on Z-Alizideh (0.05 vs. 0.23). This demonstrates BSSA’s exceptional ability to locate near-optimal solutions within the search space. The Gaussian perturbation mechanism proves crucial for escaping local optima, as evidenced by the algorithm’s 0.04 average best fitness—65% lower than BWOA’s 0.14. Table 6 shows that BSSA maintains the most stable solution quality, with worst-case fitness values 40% lower than BWOA (0.12 vs. 0.20) and 67% lower than BPSO (0.12 vs. 0.36). Even in its poorest performance (forest: 0.27), BSSA outperforms BGWOA’s best case (0.05). This robustness stems from the competitive selection between original and perturbed solutions, which prevents severe performance degradation. The tight fitness range (0.01–0.27) confirms consistent high-quality feature subset selection across runs.

Table 4. Comparison between BSSA and other algorithms for average fitness value (Avgfitness).

No.	Dataset	BPSO	BGWOA	BCSA	BWOA	BSSA
1	BreastEW	0.11	0.23	0.20	0.18	0.09
2	Clean1	0.33	0.08	0.44	0.06	0.01
3	forest	0.68	24.16	0.32	0.06	0.04
4	KrvskpEW	0.16	0.43	0.12	0.07	0.08
5	WaveformEW	0.22	0.21	0.30	0.29	0.16
6	Glass	0.73	0.24	0.16	0.26	0.05
7	dermatology	0.47	0.03	0.32	0.03	0.01
8	lung-cancer	0.55	0.23	0.38	0.33	0.04
9	Z-Alizideh	0.10	0.23	0.29	0.24	0.10
10	sonarEW	0.06	0.08	0.11	0.08	0.03
11	LUNG2	0.21	0.24	0.23	0.24	0.19
12	PRO	0.08	0.28	0.25	0.04	0.03
	average	0.31	2.2	0.26	0.16	0.07

Table 5. Comparison between BSSA and other algorithms for best fitness value (Bestfitness).

No.	Dataset	BPSO	BGWOA	BCSA	BWOA	BSSA
1	BreastEW	0.11	0.18	0.19	0.12	0.02
2	Clean1	0.32	0.07	0.41	0.05	0.00
3	forest	0.61	23.80	0.23	0.04	0.01
4	KrvskpEW	0.03	0.09	0.08	0.05	0.02
5	WaveformEW	0.21	0.21	0.29	0.27	0.01
6	Glass	0.69	0.05	0.11	0.21	0.03
7	dermatology	0.41	0.02	0.29	0.02	0.01
8	lung-cancer	0.52	0.21	0.31	0.32	0.07
9	Z-Alizideh	0.18	0.18	0.22	0.23	0.05
10	sonarEW	0.05	0.06	0.10	0.07	0.02
11	LUNG2	0.20	0.21	0.21	0.21	0.17
12	PRO	0.06	0.24	0.22	0.03	0.02
	average	0.28	2.11	0.22	0.14	0.04

To evaluate the stability of the BSSA algorithm, the standard deviation of the classification accuracy obtained by running all algorithms for 100 iterations was calculated using Equation (15). These values are listed in Table 7. As we all know, a low standard deviation signifies a high level of stability in an algorithm’s performance. Table 7 indicates that BSSA exhibits the smallest standard deviation on 10/12 datasets (average 0.013). Its minimal performance fluctuation (50–70% lower SD than competitors) validates the effectiveness of Gaussian noise (σ = 0.2) in maintaining population diversity without compromising convergence. The slightly higher SD on WaveformEW (0.011 vs. BWOA’s 0.003) suggests potential sensitivity to temporal patterns, while exceptionally low SD on lung-cancer (0.001) demonstrates reliability on clinical data. Overall, BSSA achieves the lowest average standard deviation of 0.013, which is 0.024 less than the second-ranked BWOA.

Table 6. Comparison between BSSA and other algorithms for worst fitness value (Worstfitness).

No.	Dataset	BPSO	BGWOA	BCSA	BWOA	BSSA
1	BreastEW	0.15	0.36	0.24	0.19	0.13
2	Clean1	0.38	0.12	0.45	0.07	0.06
3	forest	0.71	24.21	0.34	0.06	0.27
4	KrvskpEW	0.23	0.60	0.20	0.12	0.09
5	WaveformEW	0.24	0.23	0.31	0.31	0.23
6	Glass	0.77	0.37	0.17	0.30	0.07
7	dermatology	0.48	0.05	0.33	0.31	0.01
8	lung-cancer	0.59	0.50	0.40	0.35	0.07
9	Z-Alizideh	0.22	0.30	0.30	0.27	0.19
10	sonarEW	0.26	0.15	0.13	0.12	0.09
11	LUNG2	0.23	0.23	0.23	0.27	0.20
12	PRO	0.10	0.32	0.31	0.06	0.03
	average	0.36	2.29	0.28	0.2	0.12

Table 7. Standard deviation (SD) comparison between BSSA and other algorithms.

No.	Dataset	BPSO	BGWOA	BCSA	BWOA	BSSA
1	BreastEW	0.011	0.013	0.008	0.012	0.006
2	Clean1	0.01	0.005	0.005	0.032	0.003
3	forest	0.021	0.016	0.030	0.025	0.010
4	KrvskpEW	0.030	0.061	0.035	0.040	0.020
5	WaveformEW	0.220	0.058	0.041	0.003	0.011
6	Glass	0.003	0.006	0.004	0.023	0.004
7	dermatology	0.077	0.093	0.052	0.063	0.056
8	lung-cancer	0.003	0.002	0.004	0.052	0.001
9	Z-Alizideh	0.018	0.029	0.022	0.020	0.016
10	sonarEW	0.009	0.009	0.047	0.003	0.005
11	LUNG2	0.089	0.048	0.034	0.067	0.012
12	PRO	0.030	0.059	0.065	0.037	0.013
	average	0.043	0.033	0.029	0.031	0.013

Furthermore, statistical tests are necessary to prove the results obtained from these algorithms are not random due to the algorithms’ inherent randomness. The Wilcoxon signed rank test is utilized to determine whether the BSSA algorithm is statistically different from other metaheuristic algorithms. Table 8 presents the p-values of the Wilcoxon signed rank test, calculated based on the mean accuracy results of BSSA and other algorithms on 12 datasets, with a 5% level of significance. A p-value less than 0.05 indicates a significant difference between two algorithms. From Table 3 and Table 8, it can be observed that BSSA performs statistically significantly better than BPSO, BGWOA, BCSA, and BWOA on 10, 10, 11, and 7 datasets, respectively. This also demonstrates that BSSA can work well in feature selection.

Table 8. The p-values of the Wilcoxon test of the proposed BSSA vs. other algorithms (p < 0.05 is significant and denoted in bold).

No.	Dataset	BPSO	BGWOA	BCSA	BWOA
1	BreastEW	0.00593	0.000561	2.81 × 10⁻⁶	3.32 × 10⁻⁶
2	Clean1	1.82 × 10⁻¹⁰	0.000985	5.88 × 10⁻⁸	8.64 × 10⁻⁹
3	forest	0.0781	0.000515	0.00119	0.0000344
4	KrvskpEW	7.95 × 10⁻⁷	0.0431	0.000821	0.0796
5	WaveformEW	0.0431	0.04312	0.000655	0.0431
6	Glass	0.0171	0.00314	0.140	0.000801
7	dermatology	0.138	0.0431	0.0356	0.0461
8	lung-cancer	0.00377	0.000653	0.000647	0.231
9	Z-Alizideh	0.000431	0.000801	0.000650	0.00377
10	sonarEW	0.00356	0.00314	0.00759	0.0431
11	LUNG2	0.000650	0.000655	0.000803	0.0146
12	PRO	0.0109	0.00687	0.000623	0.114

5.3.2. Feature Selection Efficiency

Table 9 illustrates the average number of attributes selected (Equation (10)) by BSSA and four alternative algorithms. Notably, there is a significant difference in the average selection size among these algorithms. BSSA outperforms other algorithms in terms of average selection size on 11 datasets, with the exception of Glass, signifying its superior capability in selecting the minimum number of attributes while achieving higher accuracy. For example, on Clean1 (D = 166), BSSA retains 66.1 features vs. BPSO’s 77.24, indicating superior redundancy elimination. These findings suggest that the proposed BSSA algorithm is able to select more effective features for the classification task.

Table 9. Comparison between BSSA and other algorithms for the number of selected features (AvgFN).

No.	Dataset	BPSO	BGWOA	BCSA	BWOA	BSSA
1	BreastEW	16	13.6	12	15	10
2	Clean1	77.24	97.2	94	67	66.10
3	forest	21.79	11	14.6	16	6.40
4	KrvskpEW	20.8	10.94	30.80	27.60	10.20
5	WaveformEW	22.7	32.54	34.40	36.40	18.67
6	Glass	30.27	5.40	8.30	3	5.60
7	dermatology	32.70	16.6	31.23	28	12.53
8	lung-cancer	21.13	25	29	29	10.67
9	Z-Alizideh	19.03	27	22.8	31	8.40
10	sonarEW	21.59	30.4	32.9	23	19.67
11	LUNG2	12.4	13.4	13.8	14.6	11.8
12	PRO	11.6	13.6	10.8	11	10.8
	average	25.6	24.72	27.89	25.13	15.9

5.3.3. Convergence

Figure 2 and Figure 3 show the convergence curves of the fitness values for each algorithm tested on 12 datasets, with each iteration’s values representing the average of 30 fitness values. The BSSA algorithm has the fastest convergence speed on BreastEW, Clean1, KrvskpEW, lung cancer, and sonarEW datasets, while the BGWOA algorithm has the fastest convergence on forest, glass, dermatology, Z-Alizideh, and PRO datasets. The BSSA algorithm has the second-fastest convergence on the two remaining high-dimensional datasets. In comparison to BSSA, BPSO, BGWOA, and BCSA, the BWOA algorithm has the slowest convergence rate. The observed convergence rates suggest BSSA’s search dynamics are efficient across datasets of varying dimensionality. Notably, BSSA achieved rapid convergence on both low-dimensional (e.g., BreastEW) and some high-dimensional datasets, demonstrating its adaptability. In contrast, algorithms like BWOA exhibited slower convergence on certain datasets (e.g., Clean1), potentially indicating challenges in navigating complex search spaces efficiently. The convergence rates directly correlate with solution quality—stagnation implies suboptimal feature subsets.

5.3.4. Computational Efficiency

The average computation time of the BSSA algorithm and other optimization algorithms is compared and presented in Table 10 using Equation (14). For fairness, all algorithms are performed for the same number of iterations. The findings indicated that BSSA has shorter computation time on six datasets, while BWOA performs better on five datasets. However, the average computation time for the BSSA algorithm over 12 datasets is the smallest. Consequently, the study shows that the BSSA algorithm’s computational speed is slightly higher than the other four algorithms.

We selected datasets from the UCI repository to analyze the effect of the position binarization-based sparrow optimization algorithm through feature selection. By analyzing these evaluation indicators, BSSA demonstrates efficacy as a wrapper-based feature selection method suitable for a wide range of classification problems. When compared to other commonly typical feature selection methods, BSSA achieves superior results.

Table 10. Comparison between BSSA and other algorithms for computational times (AvgCT).

No.	Dataset	BPSO	BGWOA	BCSA	BWOA	BSSA
1	BreastEW	1.19	0.46	8.12	1.59	1.11
2	Clean1	2.91	2.58	7.98	1.93	1.72
3	forest	21.79	24.16	15.41	1.81	1.13
4	KrvskpEW	19.61	48.97	15.89	13.03	1.30
5	WaveformEW	33.53	305.29	43.72	86.64	1.12
6	Glass	30.27	1.46	19.98	0.55	1.20
7	dermatology	33.70	1.44	16.33	0.56	1.21
8	lung-cancer	21.13	1.50	10.19	0.49	1.06
9	Z-Alizideh	1.564	2.48	19.71	0.06	1.22
10	sonarEW	1.09	4.46	15.43	0.05	4.11
11	LUNG2	18.09	11.73	16.36	14.36	11.05
12	PRO	16.17	9.78	7.61	13.02	13.92
	average	16.75	34.53	16.39	11.17	3.35

The collective results demonstrate that BSSA’s innovative binarization approach (sigmoid mapping + Gaussian perturbation) simultaneously optimizes three critical objectives: (1) maximizing accuracy through enhanced exploration (96.11% avg), (2) minimizing feature count via aggressive yet intelligent pruning (15.9 avg features), and (3) maintaining computational efficiency (3.35 s avg runtime). The statistical validation and cross-dataset consistency confirm its robustness as a feature selection framework.

5.4. Sensitivity Analysis of Parameters

In order to investigate the effects of key parameters—specifically, the weight of the classification error rate α in fitness function and the Gaussian-distributed stochastic perturbation σ on the classification accuracy of the proposed BSSA—a series of experiments was conducted on seven datasets, which include two small datasets (Glass and forest), three medium datasets (sonarEW, Z-Alizideh, and Clean1), and two large datasets (LUNG2 and PRO). Each dataset was partitioned using 5-fold cross-validation. The parameter α was varied across ten different values —0.9, 0.91, 0.92, …, 0.98 and 0.99—while σ was tested at six different values: 0, 0.1, 0.2, 0.3, 0.4 and 0.5. The sensitivity analysis of the parameters is reported in the following section.

Figure 4 and Figure 5, along with Tables S1 and S2, show the performance of BSSA in terms of classification accuracy, as the weight of the classification error rate α and the Gaussian-distributed stochastic perturbation σ are varied across selected datasets, respectively. As indicated in Figure 4 and Table S1, there is a positive correlation between the value of α and the classification accuracies obtained, with the optimal average performance across all datasets occurring at α = 0.99. Consequently, further evaluations of σ were conducted at this optimal value of α, as depicted in Figure 5 and Table S2. The results indicate that a σ value of 0.2 yields superior performance relative to the other tested values across the majority of datasets.

6. Conclusions

This study employs the sigmoid transfer function for binarization processing on the sparrow optimization algorithm and then proposes a position binarization-based sparrow search to address the feature selection optimization problem. The Support Vector Machine (SVM) classifier is used to evaluate the quality of feature subsets selected by each feature selection algorithm. Four high-performance optimization algorithms are compared with BSSA using a set of evaluation criteria on 12 UCI datasets. The experimental results indicate that the proposed BSSA can search the feature space more efficiently. Compared to other algorithms, the BSSA exhibits better results in all aspects and converges towards the optimal solution. The BSSA algorithm has the fastest convergence speed on low dimension datasets than high dimension datasets.

For future research, more thorough analysis, such as adaptive parameter tuning and comparisons with emerging methods [15,41], can be conducted on the position binarization-based sparrow optimization algorithm, which can be applied to a wider variety of classifiers to address practical issues. Although BSSA shows promise for static feature selection, its online adaptation to temporal drift (e.g., via sliding-window reinitialization) remains unexplored. Future work will integrate drift detection modules to activate vigilance resets, enabling lifelong learning in IoT sensor networks. Additionally, the algorithm can be integrated with other swarm intelligence algorithms to broaden its research applications. Further extension of the position binarization-based Sparrow Search Algorithm can be explored to address more practical problems.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/electronics14163312/s1, Table S1: Effects of different weight of classification error rate on classification accuracy. This table is associated with the analyses presented in Figure 4; Table S2: Effects of different Gaussian-distributed stochastic perturbation on classification accuracy. This table is associated with the results shown in Figure 5.

Author Contributions

Conceptualization: J.H., S.W. and J.Z.; Methodology: J.H.; Software: X.G. and D.S.; Validation: D.S., S.W., J.Z. and X.G.; Formal Analysis: J.H.; Investigation: J.H. and D.S.; Resources: D.S.; Data Curation: S.W.; Writing—Original Draft Preparation: J.H.; Writing—Review and Editing: S.W.; Visualization: S.W.; Supervision: S.W. and J.Z.; Project Administration: S.W. and J.Z.; Funding Acquisition: J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Municipal Government of Quzhou: Grant Nos. 2023D015, 2023D007, 2023D033 and 2024D058; Tianjin Science and Technology Program Projects: Grant Nos. 24YDTPJC00630, 15JCYBJC46600, and 19JCZDJC35100; Tianjin Municipal Education Commission Research Program Project: Grant No. 2022KJ012.

Data Availability Statement

The datasets used during the current study are available online: http://archive.ics.uci.edu/ml/index.php (accessed on 22 May 2021).

Acknowledgments

The authors wish to acknowledge the support of all team members and institutions involved in this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khine, A.H.; Wettayaprasit, W.; Duangsuwan, J. A new word embedding model integrated with medical knowledge for deep learning-based sentiment classification. Artif. Intell. Med. 2024, 148, 102758. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Zhu, P. Online group streaming feature selection based on fuzzy neighborhood granular ball rough sets. Expert Syst. Appl. 2024, 249, 123778. [Google Scholar] [CrossRef]
Wang, S.Q.; Wei, J.M. Feature selection based on measurement of ability to classify subproblems. Neurocomputing 2017, 224, 155–165. [Google Scholar] [CrossRef]
Li, J.; Luo, T.; Zhang, B.; Chen, M.; Zhou, J. An efficient multi-objective filter–wrapper hybrid approach for high-dimensional feature selection. J. King Saud. Univ-Com. 2024, 36, 2024. [Google Scholar] [CrossRef]
Gaugel, S.; Reichert, M. Data-driven multi-objective optimization of hydraulic pump test cycles via wrapper feature selection. Cirp. J. Manuf. Sci. Tec. 2024, 50, 14–25. [Google Scholar] [CrossRef]
Oluwaseun, P.; Keng, H. Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification. Comput. Mod. Eng. Sci. 2024, 141, 1847–1865. [Google Scholar] [CrossRef]
Yue, J.; Zhao, J.; Feng, L.; Zhao, C. A survey and experimental study for embedding-aware generative models: Features, models, and any-shot scenarios. J. Process Contr. 2024, 143, 103297. [Google Scholar] [CrossRef]
An, H.; Yang, J.; Zhang, X.; Ruan, X.; Wu, Y.; Li, S.; Hu, J. A class-incremental learning approach for learning feature-compatible embeddings. Neural Netw. 2024, 180, 106685. [Google Scholar] [CrossRef]
Sarkar, S.S.; Sheikh, K.H.; Mahanty, A.; Mali, K.; Ghosh, A.; Sarkar, R. A harmony search-based wrapper-filter feature selection approach for microstructural image classification. Integr. Mater. Manuf. Innov. 2021, 10, 1–19. [Google Scholar] [CrossRef]
Shokooh, T.; Mohammad, H. A binary metaheuristic algorithm for wrapper feature selection. INT J. Comput. Sci. Eng. 2021, 8, 168–172. [Google Scholar]
Sancho, S.S. Modern meta-heuristics based on nonlinear physics processes: A review of models and design procedures. Phys. Rep. 2016, 655, 1–70. [Google Scholar] [CrossRef]
Laith, A. Multi-verse optimizer algorithm: A comprehensive survey of its results, variants, and applications. Neural Comput. Appl. 2020, 32, 12381–12401. [Google Scholar] [CrossRef]
Bolaji, A.L.A.; Al-Betar, M.A.; Awadallah, M.A.; Khader, A.T.; Abualigah, L.M. A comprehensive review: Krill herd algorithm and its applications. Appl. Soft Comput. 2016, 49, 437–446. [Google Scholar] [CrossRef]
Akinola, O.O.; Ezugwu, A.E.; Agushaka, J.O.; Zitar, R.A.; Abualigah, L. Multiclass feature selection with metaheuristic optimization algorithms: A review. Neural. Comput. Applic. 2022, 34, 19751–19790. [Google Scholar] [CrossRef]
Wang, Z.; Liu, Y.; Zhang, H. A Novel Binary Dragonfly Algorithm for Feature Selection. Pattern Recogn. Lett. 2023, 171, 276–283. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, X.; Liu, J. A novel multi-objective feature selection algorithm based on improved sparrow search and chaotic local search. Knowl.-Based Syst. 2023, 269, 110694. [Google Scholar] [CrossRef]
Kumar, R.; Rai, B.; Samui, P. Prediction of mechanical properties of high-performance concrete and ultrahigh-performance concrete using soft computing techniques: A critical review. Struct. Concr. 2025, 26, 1309–1337. [Google Scholar] [CrossRef]
Goldberg, D.E. Genetic Algorithm in Search, Optimization and Machine Learning; Addison-Wesley: London, UK, 1989; pp. 122–129. [Google Scholar]
Xue, Y.; Zhu, H.K.; Ferrante, N. A self-adaptive multi-objective feature selection approach for classification problems. Integr. Comput. Aid Eng. 2022, 29, 3–21. [Google Scholar] [CrossRef]
Zheng, R.; Liu, M.Q.; Zhang, Y.; Wang, Y.L. An optimization method based on improved ant colony algorithm for complex product change propagation path. Intell. Syst. Appl. 2024, 23, 200412. [Google Scholar] [CrossRef]
Cai, Q.; Zhou, X.; Jie, A.; Zhong, M.; Wang, M.; Wang, H.; Peng, H.; Gao, X.; Zhang, Y.; Wang, Y. Enhancing Artificial Bee Colony Algorithm with Dynamic Best Neighbor-guided Search Strategy. Proceedings of 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
Bangyal, W.H.; Nisar, K.; Soomro, T.R.; Ibrahim, A.A.; Mallah, G.A.; Hassan, N.U.; Rehman, N.U. An improved particle swarm optimization algorithm for data classification. Appl. Sci. 2023, 13, 283. [Google Scholar] [CrossRef]
Ileberi, E.; Sun, Y. Machine Learning-Assisted Cervical Cancer Prediction Using Particle Swarm Optimization for Improved Feature Selection and Prediction. IEEE Access 2024, 12, 152684–152695. [Google Scholar] [CrossRef]
Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Contr. Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
Zhang, C.; Ding, S. A stochastic configuration network based on chaotic sparrow search algorithm. Knowl.-Based Syst. 2021, 220, 106924. [Google Scholar] [CrossRef]
Li, B.; Wang, H. Multi-objective sparrow search algorithm: A novel algorithm for solving complex multi-objective optimisation problems. Expert Syst. Appl. 2022, 210, 118414. [Google Scholar] [CrossRef]
Sun, L.; Si, S.; Ding, W. BSSFS: Binary sparrow search algorithm for feature selection. Int. J. Mach. Learn. Cybern. 2023, 14, 2633–2657. [Google Scholar] [CrossRef]
Tang, J.; Liu, G.; Pan, Q. A Review on Representative Swarm Intelligence Algorithms for Solving Optimization Problems: Ap-plications and Trends. IEEE/CAA J. Automatica Sin. 2021, 8, 1627–1643. [Google Scholar] [CrossRef]
Ahmed Shaban, A.; Ibrahim, I.M. Swarm intelligence algorithms: A survey of modifications and applications. Int. J. Sci. World 2025, 11, 59–65. [Google Scholar] [CrossRef]
Nguyen, B.H.; Xue, B.; Zhang, M. A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evol. Comput. 2020, 54, 100663. [Google Scholar] [CrossRef]
Gao, J.; Wang, Z.; Lei, Z.; Wang, R.L.; Wu, Z.; Gao, S. Feature selection with clustering probabilistic particle swarm optimization. Int. J. Mach. Learn. Cybern. 2024, 15, 3599–3617. [Google Scholar] [CrossRef]
Xiao, X.; Na, X.; Zu, Z.; Ma, H.; Ren, W. A Novel Binary Particle Swarm Optimization Algorithm for Feature Selection. In Proceedings of the 2024 36th Chinese Control and Decision Conference (CCDC), Xi’an, China, 25–27 May 2024; pp. 4386–4391. [Google Scholar] [CrossRef]
Unler, A.; Murat, A. A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 2010, 206, 528–539. [Google Scholar] [CrossRef]
Mirjalili, S.; Hashim, S.Z. BMOA: Binary magnetic optimization algorithm. Int. J. Mach. Learn. Comput. 2012, 2, 204–208. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Yang, X.S. Binary bat algorithm. Neural Comput. Appl. 2014, 25, 663–681. [Google Scholar] [CrossRef]
Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
Pereira, L.A.M.; Rodrigues, D.; Almeida, T.N.S.; Ramos, C.C.O.; Papa, J.P. A binary cuckoo search and its application for feature selection. In Cuckoo Search and Firefly Algorithm; Yang, X.S., Ed.; Springer: New York, NY, USA, 2014; pp. 141–154. [Google Scholar]
Hussien, A.G.; Hassanien, A.E.; Houssein, E.H. S-shaped binary whale optimization algorithm for feature selection. In Recent Trends in Signal and Image Processing; Bhattacharyya, S., Mukherjee, A., Bhaumik, H., Eds.; Springer: New York, NY, USA, 2019; pp. 79–87. [Google Scholar]
Liang, Q.; Chen, B.; Wu, H.; Ma, C.; Li, S. A novel modified sparrow search algorithm with application in side lobe level reduction of linear antenna array. Wirel. Commun. Mob. Comput. 2021, 1, 9915420. [Google Scholar] [CrossRef]
Chang, D.; Rao, C.; Xiao, X.; Hu, F.; Goh, M. Multiple strategies based Grey Wolf Optimizer for feature selection in performance evaluation of open-ended funds. Swarm Evol. Comput. 2024, 86, 101518. [Google Scholar] [CrossRef]
Mi, L.N.; Guo, Y.F.; Zhang, M.; Zhuo, X.J. Stochastic resonance in gene transcriptional regulatory system driven by Gaussian noise and Lévy noise. Chaos Solitons Fract. 2022, 167, 113096. [Google Scholar] [CrossRef]
Mandal, A.K.; Nadim, M.; Saha, H.; Sultana, T.; Hossain, M.D.; Huh, E.N. Feature subset selection for high-dimensional, low sampling size data classification using ensemble feature selection with a wrapper-based search. IEEE Access 2024, 12, 62341–62357. [Google Scholar] [CrossRef]
Tijjani, S.; Wahab, M.N.A.; Noor, M.H.M. An enhanced particle swarm optimization with position update for optimal feature selection. Expert Syst. Appl. 2024, 247, 123337. [Google Scholar] [CrossRef]
Markelle, K.; Rachel, L.; Kolby, N. The UCI Machine Learning Repository. 2021. Available online: https://archive.ics.uci.edu (accessed on 13 August 2025).
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009; pp. 96–106. [Google Scholar]
Abdel-Basset, M.; Ding, W.; El-Shahat, D. A hybrid Harris Hawks optimization algorithm with simulated annealing for feature selection. Artif. Intell. Rev. 2021, 54, 593–637. [Google Scholar] [CrossRef]
Bagkavos, D.; Patil, P.N. Improving the Wilcoxon signed rank test by a kernel smooth probability integral transformation. Stat. Probabil. Lett. 2021, 171, 109026. [Google Scholar] [CrossRef]
Vierra, A.; Razzaq, A.; Andreadis, A. Continuous variable analyses: T-test, Mann–Whitney U, Wilcoxon sign rank. In Handbook for Designing and Conducting Clinical and Translational Research, Translational Surgery; Eltorai, A.E.M., Bakal, J.A., Newell, P.C., Osband, A.J., Eds.; Academic Press: San Diego, CA, USA, 2023; pp. 165–170. [Google Scholar] [CrossRef]
Ohyver, M.; Moniaga, J.V.; Sungkawa, I.; Subagyo, B.E.; Chandra, I.A. The Comparison Firebase Realtime Database and MySQL Database Performance using Wilcoxon Signed-Rank Test. Procedia Comput. Sci. 2019, 157, 396–405. [Google Scholar] [CrossRef]

Figure 1. Effects of perturbation on sigmoid transfer function and the position binarization. (a) Comparison of the original sigmoid function

S (X)

(red) and perturbed sigmoid function

S^{'} (X)

(green); (b) shows the position binarization effect near the threshold rd with and without perturbation.

Figure 1. Effects of perturbation on sigmoid transfer function and the position binarization. (a) Comparison of the original sigmoid function

S (X)

(red) and perturbed sigmoid function

S^{'} (X)

(green); (b) shows the position binarization effect near the threshold rd with and without perturbation.

Figure 2. The convergence curves of the fitness function values for each algorithm on the first 6 datasets.

Figure 3. The convergence curves of the fitness function values for each algorithm on the last 6 datasets.

Figure 4. Effects of different weights of classification error rate on classification accuracy.

Figure 5. Effects of different Gaussian-distributed stochastic perturbation on classification accuracy.

Table 3. Comparison between BSSA and other algorithms for classification accuracies (AvgACC).

No.	Dataset	BPSO	BGWOA	BCSA	BWOA	BSSA
1	BreastEW	88.87	78.81	80.67	79.49	97.97
2	Clean1	67.19	83.07	56.51	88.42	89.92
3	forest	90.33	98.99	68.37	94.87	96.98
4	KrvskpEW	94.19	81.78	85.71	90.13	98.72
5	WaveformEW	78.91	76.50	73.29	75.38	99.32
6	Glass	70.00	78.21	84.54	73.81	98.6
7	dermatology	77.97	98.36	68.37	97.26	95.61
8	lung-cancer	71.55	80.95	61.82	66.67	89.91
9	Z-Alizideh	90.28	78.19	71.32	76.67	97.70
10	sonarEW	93.57	94.20	89.18	92.68	98.99
11	LUNG2	93.12	79.89	81.84	89.24	97.05
12	PRO	80.33	60.02	64.09	83.23	95.80
	average	83.03	82.41	73.81	83.99	96.11
	W/L/T	12/0/0	10/2/0	12/0/0	12/0/0	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hua, J.; Gu, X.; Sun, D.; Zhu, J.; Wang, S. Multi-Variable Evaluation via Position Binarization-Based Sparrow Search. Electronics 2025, 14, 3312. https://doi.org/10.3390/electronics14163312

AMA Style

Hua J, Gu X, Sun D, Zhu J, Wang S. Multi-Variable Evaluation via Position Binarization-Based Sparrow Search. Electronics. 2025; 14(16):3312. https://doi.org/10.3390/electronics14163312

Chicago/Turabian Style

Hua, Jiwei, Xin Gu, Debing Sun, Jinqi Zhu, and Shuqin Wang. 2025. "Multi-Variable Evaluation via Position Binarization-Based Sparrow Search" Electronics 14, no. 16: 3312. https://doi.org/10.3390/electronics14163312

APA Style

Hua, J., Gu, X., Sun, D., Zhu, J., & Wang, S. (2025). Multi-Variable Evaluation via Position Binarization-Based Sparrow Search. Electronics, 14(16), 3312. https://doi.org/10.3390/electronics14163312

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Variable Evaluation via Position Binarization-Based Sparrow Search

Abstract

1. Introduction

2. Background

2.1. Continuous Sparrow Search Algorithm

2.2. Related Work

3. The Position Binarization Based Binary Sparrow Search Algorithm

3.1. Method

3.1.1. The Updated Position Binarization

3.1.2. The Updated Position Binarization Based on a Small Perturbation

3.1.3. The BSSA Framework

3.2. Computation Complexity Analysis

4. Multi-Variable Evaluation Using Position Binarization-Based Sparrow Optimization Algorithm

5. Experimental Results and Discussion

5.1. Data Description

5.2. Evaluation Criteria

5.3. Experimental Results

5.3.1. Classification Performance and Statistical Significance

5.3.2. Feature Selection Efficiency

5.3.3. Convergence

5.3.4. Computational Efficiency

5.4. Sensitivity Analysis of Parameters

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI