Next Article in Journal
Hybrid Stochastic–Machine Learning Framework for Postprandial Glucose Prediction in Type 1 Diabetes
Previous Article in Journal
Binary Differential Evolution with a Limited Maximum Number of Dimension Changes
Previous Article in Special Issue
Predicting the Magnitude of Earthquakes Using Grammatical Evolution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Selection Method Based on Simultaneous Perturbation Stochastic Approximation Technique Evaluated on Cancer Genome Data Classification

by
Satya Dev Pasupuleti
and
Simone A. Ludwig
*
Department of Computer Science, North Dakota State University, Fargo, ND 58105, USA
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(10), 622; https://doi.org/10.3390/a18100622
Submission received: 13 August 2025 / Revised: 18 September 2025 / Accepted: 29 September 2025 / Published: 1 October 2025
(This article belongs to the Special Issue Algorithms in Data Classification (3rd Edition))

Abstract

Cancer classification using high-dimensional genomic data presents significant challenges in feature selection, particularly when dealing with datasets containing tens of thousands of features. This study presents a new application of the Simultaneous Perturbation Stochastic Approximation (SPSA) method for feature selection on large-scale cancer datasets, representing the first investigation of the SPSA-based feature selection technique applied to cancer datasets of this magnitude. Our research extends beyond traditional SPSA applications, which have historically been limited to smaller datasets, by evaluating its effectiveness on datasets containing 35,924 to 44,894 features. Building upon established feature-ranking methodologies, we introduce a comprehensive evaluation framework that examines the impact of varying proportions of top-ranked features (5%, 10%, and 15%) on classification performance. This systematic approach enables the identification of optimal feature subsets most relevant to cancer detection across different selection thresholds. The key contributions of this work include the following: (1) the first application of SPSA-based feature selection to large-scale cancer datasets exceeding 35,000 features, (2) an evaluation methodology examining multiple feature proportion thresholds to optimize classification performance, (3) comprehensive experimental validation through comparison with ten state-of-the-art feature selection and classification methods, and (4) statistical significance testing to quantify the improvements achieved by the SPSA approach over benchmark methods. Our experimental evaluation demonstrates the effectiveness of the feature selection and ranking-based SPSA method in handling high-dimensional cancer data, providing insights into optimal feature selection strategies for genomic classification tasks.

Graphical Abstract

1. Introduction

The information technology industry uses the buzzword Big Data for high-dimensional data with a large number of features. Big Data has three characteristics, which are simply called the 3Vs—volume, velocity, and variety. This means that Big Data has a large volume, a huge variety, and changes rapidly [1]. Data are divided into numerous categories based on the data size, and certain datasets up to sizes of 10 terabytes or more can be considered as Big Data. Most industries, such as the internet, biomedicine, and astronomy, have massive data with a great number of features [2]. Scaling large databases is a huge issue in Big Data systems, as they usually contain lots of redundant and irrelevant data, which consumes computing resources and also contributes to performance reduction. Thus, it is important to reduce the unnecessary features and extract the necessary and valuable features in order to build good models based on this Big Data. The dimensionality reduction will lower the consumption of computer resources and also improve the model’s performance [3].
There are algorithms that reduce the dimensionality of the data that can make the learning model more generalized and denser [4]. Dimensionality reduction is divided into two types—feature extraction and feature selection [5]. The feature extraction technique aims to convert high-dimensional data into a low-dimensional space. Features of low-dimensional data are a linear or nonlinear combination of the original features. The feature selection technique selects the best feature subsets from the original features using a certain process. Usually, feature extraction is said to improve the model performance, but they tend to compress and transform the original features, leading to data distortion and affecting the efficiency of data processing [6]. On the other hand, feature selection retains the semantic meaning of the original features and thus has better interpretability. In the feature selection technique, the most relevant features are chosen from the original dataset, whereas feature extraction creates new features by transforming the existing ones. This will reduce the cost of the feature collection [7].
Feature selection is divided into three categories—filter, wrapper, and embedded. The filter feature selection technique assumes that data is completely independent of the classifier algorithm and forms the subset of features according to their measurement of contribution to class attributes [8]. For the wrapper feature technique, the domain knowledge is needed, and a performance metric is employed on the classification algorithm for feature subset evaluation, and based on the results, it searches for an optimal feature subset [9]. The embedded feature selection technique incorporates feature selection into the learning process of the classifier, and then searches for a feature subset by a functional optimization that is designed in advance. The embedded technique thus deletes the features that have a minor influence on the outcome of the model result and only retains good features that are essential for the model result [10].
The rest of the paper is organized as follows: Section 2 discusses related work and highlights the shortcomings. Section 3 introduces and describes our proposed feature selection model as well as the comparison models, and also describes different classification models used in this research. Section 4 explains the experiment setup, statistical analysis, and results. Finally, Section 5 concludes the paper.

2. Related Work

Many researchers in the past have carried out experiments to use feature selection techniques and applied classification methods on reduced-dimensional data to improve the performance of the models. The authors in [11] proposed the integration of Gradient Boosting (GB), Random Forest (RF), Logistic Regression with Lasso Regularization, Logistic Regression with Ridge Regression, and SVM with the K-Means-Clustering-based feature selection method. They applied the proposed model to the Coimbra breast cancer dataset. The authors developed a gene expression-based cancer classification network in [12]. In this network, they used AlexNet-based transfer learning to extract the features and then used a hybrid fuzzy ranking network to rank and select the features and finally used a multi kernel Support Vector Machine for multiclass classification on colon, ovarian, and lymphography cancer data.
Traditional gradient-based methods such as Forward Selection, Backward Elimination, and Stepwise Regression represent classical approaches to feature selection [13,14]. However, these methods often struggle with high-dimensional datasets due to computational complexity and local optima issues [15]. Derivative-free optimization methods, including Random Search [16] and Bayesian Optimization [17], have gained attention for their ability to handle discrete and non-convex optimization landscapes typical in feature selection problems.
The Boruta method [18] represents a notable wrapper-based approach that utilizes Random Forest as a base classifier to identify relevant features. Boruta addresses the feature selection problem by comparing the importance of original features with randomly permuted shadow features, providing statistical significance testing for feature relevance. While Boruta has demonstrated effectiveness in identifying truly relevant features and handling feature interactions [19,20], the method is computationally heavy, particularly when applied to high-dimensional datasets with tens of thousands of features. The computational burden stems from the need to repeatedly train Random Forest models with augmented feature sets, including shadow features, making it less practical for large-scale genomic datasets [21].
Other ensemble-based methods include Recursive Feature Elimination (RFE) with various base classifiers [22], stability selection [23], and bootstrap-based feature ranking [24]. These methods often provide robust feature selection but at increased computational cost.
In [25], the authors integrated Binary Particle Swarm Optimization (BPSO) and Grey Wolf Optimizer (GWO) algorithm for feature selection on the Breast Cancer Wisconsin dataset. Another approach, introducing a guided PSO approach, was presented in [26]. In [27], the authors used the Krill Herd (KH) optimization algorithm to address problems in feature selection methods. They incorporated adaptive genetic operators to enhance the KH algorithm.
A Genetic Algorithm-based feature selection model (GA-FS) is proposed in [28] and was applied on a breast cancer dataset. The authors combined GA-FS with different classification models and compared the Accuracy before and after GA-FS. Authors in [29] proposed a two-stage feature selection method to classify colon cancer. In the filtering phase, they used ReliefF for feature ranking and then selected the best gene expression subset from 2000 features. Then finally applied the Support Vector Machine classifier for classifying colon cancer.
The authors applied Recursive Feature Selection (RFE) on different classification models to compare the performance of the models with regard to Accuracy, Precision, and F-measures [30]. The authors used five types of feature selection methods to classify gene expression datasets for ovarian, leukemia, and central nervous system (CNS) cancer in [31], and after discovering the minimal feature sets, applied five classifiers for classifying the data. In [32], authors proposed the Gradient Boosting Deep Feature Selection (GBDFS) algorithm to reduce the feature dimension of omics data, and thus, improved the classifier Accuracy of gastric cancer subtype classification.
James Spall introduced the Simultaneous Perturbation Stochastic Approximation (SPSA), which is a pseudo-gradient descent stochastic optimization algorithm in [33]. Initially, Spall introduced the SPSA method into the control area to tune a large number of neurons of a neural network controller with applications in a water treatment plant. In the beginning, SPSA was used in many successful applications in control problems, such as traffic signal control [34], robot arm control [35], etc.
In [36], the authors adopted Spall’s SPSA approach for the first time to perform feature selection for a Nearest Neighbor classifier with the Minkowski distance metric for Artificial Nose and Golub Gene datasets. Later in [37], the authors introduced the concept of Binary SPSA (BSPSA). The feature selection problem is treated as a stochastic optimization problem where the features are represented as binary variables. BSPSA is used for feature selection on both small and large datasets. In [38], the authors proposed the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm that mitigates the slow convergence issue of BSPSA in feature selection and feature ranking. The authors compared SPSA with the four wrapper methods on eight datasets (the largest dataset contains 2400 features) and further applied classification on the datasets using four classifiers’ mean classification rates.
In [39], the authors also used SPSA with Barzilai and Borwein (BB) non-monotone gains on various public datasets with Nearest Neighbors Naïve Bayes classifiers as wrappers. They compared the proposed method with full features against seven popular meta-heuristics-based FS algorithms. SPSA-BB converges to a good feature set in about 50 iterations on average, regardless of the number of features (the largest dataset contains 1000 features). The authors in [40] generated subsets using Simultaneous Perturbation Stochastic Approximation (SPSA), migrating birds optimization, and Simulated Annealing algorithms. The subsets generated by the algorithms are evaluated by using correlation-based FS, and the performance of the algorithms is measured using a Decision Tree (C4.5) as the classifier. The computational experiments are conducted on the 15 datasets taken from the UCI machine learning repository. The authors concluded that the SPSA algorithm outperforms other algorithms in terms of Accuracy values, and all algorithms reduce the number of features by more than 50%.
The authors in [41] present SPFSR, a novel stochastic approximation approach for performing simultaneous k-best feature ranking (FR) and feature selection (FS) based on Simultaneous Perturbation Stochastic Approximation (SPSA) with Barzilai and Borwein (BB) non-monotone gains. The proposed method is performed on 47 public datasets, which contain both classification and regression problems, with the mean Accuracy reported from four different classifiers and four different regressors, respectively. The authors concluded that for over 80% of classification experiments and over 85% of regression experiments, SPFSR provided a statistically significant improvement or equivalent performance compared to existing, well-known FR techniques.
As seen by the related work in the paragraphs above, the SPSA method for feature selection has traditionally been applied to smaller datasets. In this research, we investigate its effectiveness on large-scale datasets used for cancer classification. Our approach builds on prior work, particularly [41], which employed feature ranking; however, we extended this by evaluating the impact of using varying proportions of the top-ranked features (5%, 10%, and 15%). Specifically, we apply feature selection and ranking via the SPSA method to datasets containing over 35,000 features (ranging from 35,924 to 44,894), with the goal of identifying features most relevant to cancer detection. To the best of our knowledge, this is the first study to apply the SPSA-based feature selection technique to such large cancer datasets. We conducted a comprehensive experimental evaluation and analysis, including comparisons with state-of-the-art feature selection and classification methods. Additionally, we assessed whether SPSA yields statistically significant improvements over ten benchmark methods.

3. Proposed Approach and Comparison Methods

In this section, we discuss our proposed methodology of feature selection based on the SPSA algorithm. Then, we discuss other popular feature selection models—RelChaNet, ReliefF, Genetic Algorithm, Mutual Information, Simulated Annealing, and Minimum Redundancy Maximum Relevance feature selection types that we are going to use to compare our SPSA feature selection method with. Further, we explain all the classification models we used in this research—Decision Tree, K-Nearest Neighbors, Light Gradient Boosting Machine, Logistic Regression, Support Vector Machine, and Extreme Gradient Boosting.
As illustrated in Figure 1, all ten cancer datasets were first divided into training (80%) and testing (20%) subsets. Feature selection was performed only on the training data using all seven feature selection methods. From each training set, the top 5%, 10%, and 15% of features were selected, resulting in 30 reduced feature subsets across all datasets. These selected feature subsets were then applied to the corresponding test sets. Next, classification models were trained on the reduced training sets and evaluated on the held-out test sets. Performance was assessed using Accuracy, Precision, Recall, F1 Score, and Balanced Accuracy.

3.1. Proposed Methodology

3.1.1. Simultaneous Perturbation Stochastic Approximation (SPSA) Algorithm as Feature Selection (FS) Method

Spall introduced the Simultaneous Perturbation Stochastic Approximation (SPSA) [33], which is a pseudo-gradient descent stochastic optimization algorithm. The algorithm first starts with a random solution of a vector, and it gradually moves towards the optimal solution during iterations, where the current solution is perturbed simultaneously by offsets that are random and generated from a specific probability distribution.
Let us say L : R p R is a real-valued objective function. Gradient descent search starts from an arbitrary initial solution and iteratively moves toward a local minimum of the objective function L . At each step, the gradient of the objective function is evaluated, and the algorithm updates the solution in the direction of the negative gradient L . The process continues until converging to a local minimum, where the gradient is zero. In the language of machine learning, L can be called as loss function for the minimization problem. This gradient descent method cannot be applied where the loss function and loss function’s gradient are unknown. Therefore, stochastic pseudo-gradient descent algorithms such as SPSA are used, so that the gradient from noisy loss function measures is approximated and does not need the loss function information.
At each iteration κ , SPSA evaluates three noisy measurements of loss functions Y κ + and Y κ . Y κ + and Y κ are used for gradient approximations, and Y ( W ^ κ + 1 ) is used to measure the performance of the next iteration W ^ κ + 1 .
As per [33], the functions for tuning the parameters are shown in Equations (1) and (2):
a K : = a ( A + K ) α
c K : = c γ K
where a, A , α , c, and γ are algorithmic hyperparameters of SPSA. Here, a is the initial scaling constant for the step size, A is the stability constant that shifts the denominator to reduce large updates in early iterations, α is the decay rate of the step size, typically chosen in ( 0.5 , 1 ] for convergence guarantees, c is the initial perturbation constant controlling the magnitude of random offsets, and γ is the decay factor controlling how quickly perturbations decrease across iterations.
These parameters are dimensionless and require tuning for the problem at hand. Following the SPSA literature [33,42], we set the initial values via preliminary experiments, and then performed a sensitivity analysis by varying one parameter at a time within a reasonable range while keeping the others fixed. For each configuration, we ran 10 independent trials and reported the mean results.
SPSA does not have an automatic stopping rule; thus, we specify the maximum number of iterations as the stopping rule. The iteration sequence specified here as the stopping criterion must be monotone and satisfy the condition
lim k a k = 0 .
Let us illustrate how the SPSA algorithm is used as a feature selection technique. Assume X is a data matrix with dimensions n × p , where n represents observations and p represents features. Assume Y is a response vector with dimensions n × 1 . The vector { X , Y } constitutes a dataset. Let { X = X 1 , X 2 X p } denote a feature set where the j t h feature in X is represented by X j . For a subset that is non-empty, represented as X X , we define L C ( X , Y ) as the true value of the performance criterion of a wrapper classifier denoted by C on the dataset. We train the classifier C albeit with an unknown L C and compute the error rate denoted by y C ( X , Y ) .
In this study, the wrapper classifier C is implemented as a linear Support Vector Machine (SVM) with class-weighted loss to handle imbalance across datasets. This choice is motivated by the small-n, large-p nature of our datasets ( p 36,000–45,000 features vs. n 600 samples), where linear models with regularization provide stable estimates and reduce the risk of overfitting. For robustness, we also verified results using logistic regression with an elastic net penalty, which promotes sparsity while maintaining stability under correlated features.
Since the true error L C is unknown, we instead compute the empirical error rate denoted by y C ( X , Y ) , which can be expressed as y C = L C + ε , where ε represents the noise arising from finite-sample estimation, variability in cross-validation splits, and stochastic elements of training. Thus, y C serves as a noisy but unbiased estimate of L C , and SPSA leverages this noisy feedback in approximating the gradient. The feature selection problem can therefore be defined by the non-empty feature set X * , and it can be determined by Equation (3).
X * = arg min X X y C ( X , Y )

3.1.2. Barzilai–Borwein (BB) Method

According to the non-monotone methods concept, the non-monotone feature remembers the data provided by previous iterations. One of the first non-monotone search methods, the Barzilai–Borwein (BB) method, is described as a gradient method with a two-point step size [43]. With the motivation from Newton’s method, the BB method targets to approximate the Hessian matrix instead of doing the direct computation. Thus, it computes the series of objective values that are decreasing monotonically, and hence, the BB method performs better than the classical steep-descent methods in terms of both performance and cost of computation.
A lot of research has happened in steepest descent methods like the BB method and the Cauchy method [44], and the research concluded that the convergence analysis of the BB method found to linearly converge in a convex quadratic form [45,46]. The famous BB methods, well studied in different research areas, are Cauchy BB and cyclic BB. Cauchy BB is the combination of BB and the Cauchy methods, which performs better than the original BB and reduces the computation complexity by half, but it includes the steepest descent method, whereas cyclic BB has an extra process that determines the appropriate cycle length. Due to this shortcoming, we use the original BB method with a smoothing effect for our SPSA feature selection (SPSA-FS) algorithm.

3.1.3. Using the BB Method in SPSA-FS

In the SPSA feature selection algorithm, we improved the speed of the convergence by adopting a non-monotone BB step size strategy. Let w ^ k R p denote the estimated parameter vector that is the feature weights at iterations k. g ^ k = g ^ ( w ^ k ) represents the estimated gradient of objective function with respect to w ^ k . The gradient estimates here are noisy because of the stochastic nature of the optimization; thus, we apply the smoothing to stabilize the updates.
The BB step size at iteration k, denoted a ^ k , can be computed as shown in Equation (4). This approximates the inverse Hessian using differences between the gradients and consecutive parameter vectors without computing the second derivatives.
a ^ k = ( w ^ k w ^ k 1 ) ( g ^ k g ^ k 1 ) ( g ^ k g ^ k 1 ) ( g ^ k g ^ k 1 )
For the reduction in the step size fluctuations, we smooth the step size by taking the average of window τ iterations. It is shown in Equation (5):
b ^ k = 1 τ n = k τ + 1 k a ^ n
where b ^ k in the above equation is the smoothed step size at iteration k.
Likewise, to stabilize the gradient estimates, we average the current gradient with the previous m gradients as shown in Equation (6), where g ¯ k is the smoothed gradient that is used to update w ^ k .
g ¯ k = 1 m + 1 n = k m k g ^ n
By using the smoothed step size and gradient estimates, the SPSA algorithm achieves more stable estimates and converges faster, especially in the cases of optimizing complex or noisy functions.
As explained above, the SPSA algorithm is an iterative stochastic optimization algorithm, regardless of the number of features, that approximates the gradients of the objective function with only a few functional evolutions per iteration, which gives SPSA scalability, noise tolerance, global search tendency, and computational efficiency. SPSA works well for high-dimensional features without exponential cost. It handles noisy evaluation metrics better than deterministic methods. Also, SPSA perturbs multiple features simultaneously, which helps to avoid getting easily trapped in local optima. Finally, SPSA required fewer evaluations than the methods that compute gradients or the methods that evaluate fitness scores feature by feature. The hyperparameters used for SPSA are described in Table 1.

3.2. Feature Selection Algorithms for Comparison

3.2.1. Neural Network Feature Selection Using Relative Change Scores (RelChaNet)

Network Pruning is a technique that identifies the less relevant features or neurons and removes them. This technique has been extensively studied since 1988 [47]. Among the recent advances in the pruning technique, the Neural Network Feature Selection Using Relative Change Scores (RelChaNet) method builds upon these foundational concepts by measuring the induced change in network parameters to guide the feature selection [48]. The authors introduced a lightweight feature selection algorithm that uses the pruning of neurons and input layer regrowth of a dense neural network. The neuron pruning happens when a gradient sum metric measures the relative change that occurs in a network once the feature enters, and in the meantime, the neurons grow again randomly. Figure 2 illustrates the relative change score calculation that is embedded in RelChaNet.
Consider a neural network that consists of an input layer whose size is equal to the total number of features that need to be selected (K) and some candidate features. Multiple mini-batches are determined by the n m b hyperparameter. The first layer gradients G 1 are combined in the matrix S. In the next step, this sum of gradients is normalized by the L 1 norm with regard to each input neuron. This is followed by applying Z-standards to the resulting vector, which produces the score vector s. The candidate scores are then used to update the high scores h.
Ultimately, all the features with K high scores remain in the network while the other features are redrawn randomly. Before training, the first layer weights are reinitialized, and the two hyperparameters used in RelChaNet adapt to the dataset characteristics fed to the network. RelChaNet overcomes the general drawbacks by allowing candidate features multiple mini-batches to demonstrate their relevance potential in the network and compares that relevance as an induced change rather than their absolute weights. The algorithm considers the network of the multi-layer perceptron with feed-forward architecture. This architecture is integrated into back-propagation training using the Adam optimizer.
Let us consider a dataset with features N, selected features K, and the number of hidden layer neurons n h i d d e n . The hyperparameters for the algorithm are the ratio of candidate features c r a t i o considered at each iteration, and total mini-batches n m b . Let us initialize the candidate features K c with Equation (7).
K c = r o u n d ( c r a t i o ( N K ) )
The input layer size is calculated as the number of selected features K plus K c . First, we choose the features randomly to populate the input layer; then, we start training the neural network that runs for n m b mini-batches. The first-layer gradients are aggregated by addition. These gradient sums are normalized later, which results in a relative change score s i , which is calculated by Equation (8).
s i = j = 1 n h i d d e n S i j f o r i { 1 , , K + K c }
These scores are used for candidate features to update their high scores h. These features with high scores remain, and then new feature candidates are drawn randomly. This cycle will be repeated to gather as many features with a high score h, and also compare this score with incoming new features that are added in the next iterations. The hyperparameters used for RelChaNet are described in Table 2.

3.2.2. ReliefF

ReliefF is a popular feature selection algorithm that is widely used in many industry data applications. It is a filter-based feature selection method that selects the best features by feature weight calculations [49]. Relief was proposed by Kira in their 1992 paper [50], and the proposed algorithm is limited to two-class classification problems. The Relief algorithm assigns different weights to the data features based on the correlation between classes and the features. The feature whose weight is greater than the set threshold will be selected as an important feature.
The main limitation of the Relief algorithm is that it only handles two-class classification problems. In 1994, Kononenko proposed the ReliefF algorithm, which is an extension of the Relief algorithm [51]. ReliefF can handle multiclass classification problems.
Let us assume that we have class labels of a certain training dataset C = { c 1 , c 2 , …, c l }. A sample R i is selected randomly from this training dataset; then, ReliefF searches for K-Nearest Neighbors, which are also called hits of R i from the same class, that is H j (j = 1, 2, …, k), and hits of R i from the different classes that is M j (c) (j = 1, 2, …, k). This procedure repeats m times. Therefore, the feature A weight is updated using Equation (9):
W ( A ) = W ( A ) j = 1 k d i f f ( A , R i , H j ) ( m × k ) + c c l a s s ( R ) j = 1 k p ( c ) 1 p ( c l a s s ( R ) ) × d i f f ( A , R i , M j ( c ) m × k
where m is the total number of iterations and d i f f (A, R 1 , R 2 ) denotes the difference between samples R 1 and R 2 in feature A. This difference is calculated using Equation (10).
d i f f ( A , R 1 , R 2 ) = | R 1 A R 2 A | m a x ( A ) m i n ( A ) if A is continuous 0 , if A is discrete and R 1 A = R 2 A 1 , if A is discrete and R 1 A R 2 A
The hyperparameters used for ReliefF are described in Table 3.

3.2.3. Genetic Algorithm Based Feature Selection Method (GA)

The Genetic Algorithm is an evolutionary algorithm that is inspired by the process of natural selection and genetics for finding optimal solutions in a vast solution space. According to the natural selection theory, the fittest individuals are selected, and then they are used to produce the offspring. The fittest parents’ characteristics are passed on to the offspring using crossover and mutation for a better chance of survival. The GA contains two types of components. The first one defines the meta-parameters fitness function, selection strategy, crossover, mutation rates, and population size. The second component is an iterative evolutionary loop that applies the first component repeatedly so that it improves the population [52]. In this loop, the algorithm performs the following steps: (1) Fitness evaluation of each individual in the current population. (2) Select parents based on fitness values. (3) Offspring generation through crossover and mutation. (4) Producing the next generation. This evolutionary loop continues until a stopping criterion, such as the value of the maximum number of generations, is met.
Initial Population Generation
This is the first step in the GA implementation. The initial population consists of 50 chromosomes, each representing a randomly generated feature subset. For each chromosome, the genes are assigned randomly as 0 or 1, which indicates either exclusion or inclusion of corresponding features. In order to avoid redundancy, duplicates in the initial population are minimized. This results in diverse candidate feature subsets for the evolutionary algorithm to explore. The length of the chromosome is the total number of features in the subset.
Fitness Function
The fitness function evaluates each of the chromosome’s feature subsets’ quality based on the classification performance. In this step, we use the KNN (K-Nearest Neighbor) accuracy, which is trained and tested on selected features. We set the KNN parameters as k = 5 and Euclidean distance as the metric. Before we proceed to classification, the features are normalized using z-score scaling for comparability across the dimensions. The fitness score of each chromosome is calculated as the average classification accuracy from 5-fold cross-validation on training data.
Selection
This is an important step in the process in which parent chromosomes are selected from the current population based on their fitness scores using tournament selection. In the selection method, a subset of individuals is selected randomly, and the fittest individual among them is selected as a parent. We selected tournament selection as a selection criterion, which ensures that the fittest individuals have more chances to get selected while also maintaining diversity. This selected parent then proceeds to the next step.
Crossover and Mutation
In this step, offspring chromosomes that form the next generation are generated by the crossover and mutation process. Here, we implement two-point crossover, where two crossover points are randomly selected along the parent chromosomes. The segments between points are swapped to produce offspring. Here, we set the crossover rate as 0.5, which means fifty percent of parents that are selected go through the crossover process, and the rest of the parents are unchanged. During the mutation process, random alterations are introduced to offspring for genetic diversity and to explore the search space. A mutation flips the individual genes with a mutation rate of 0.01 per gene. For example, if there are 40,000 features, then due to this mutation rate, approximately 400 mutations per chromosome per generation happen, which results in balancing the exploration.
Creating Next Generation
The next generation is created by the replacement of the whole current population with newly created offspring. This replacement will make sure that only new chromosomes will survive for the next iteration. Among all the individuals in the final generation, the most fitted chromosomes (the feature subset that yields the best classification accuracy) are selected as the optimal feature set.
Stopping Criterion
There should be a general stopping criterion for terminating the process of GA. Here, we used a fixed number of iterations as the stopping criterion, which we set to 20. Once the limit is reached, the GA execution is terminated, and the best-performing chromosome from the last generation returns as the selected feature subset.
The hyperparameters used for GA are described in Table 4.

3.2.4. Mutual Information-Based Feature Selection (MI)

The filter-based feature selection methods rank features based on their association with the target class. In simple filter approaches, the features are individually scored, and the features with high scores are selected. But in greedy methods, dependencies between features are considered by selecting features iteratively, which provides the highest incremental contribution, given that the features are already selected. At each step, the feature, together with the previously selected set, maximizes the relevance and is added. This process continues until the desired number of features is selected. In this research, the MI feature selection uses a simple ranking approach where features are ranked based on MI scores with class label, without considering dependencies among features.
MI of two random variables is a quantitative measurement of dependency between the variables [53]. This will be defined by the probability density function (PDF) of variables, say X, Y and joint ( X , Y ) as f X , f Y and f X , Y , respectively, [54].
M { X ; Y } = f X , Y ( x , y ) log f ( X , Y ) ( x , y ) f X ( x ) f Y ( y ) d x d y
If the variables X and Y are completely independent, then the joint PDF is equal to the product of the PDF of X and the PDF of Y, which will be equated as f X , Y = f X f y , and MI becomes zero.
M I ( X ; Y ) = 0
Entropy is a measure of the uncertainty or randomness in a random variable. For a variable X, entropy is defined as:
h ( X ) = f X ( x ) l o g f X ( x ) d x
MI is expressed in terms of entropy as:
M I ( X ; Y ) = h ( Y ) h ( Y | X )
where h ( Y / X ) is the uncertainty of Y when X is known. If Y and X are independent, then h ( Y / X ) = h ( Y ) and M I ( X ; Y ) = 0 .
In feature selection problems, usually the features X are continuous and the class label Y is discrete including ours. So, here we estimate the MI between the continuous features and discrete labels, which requires computing the conditional PDF of the feature given each class. This can be carried out using techniques like kernel density estimation or histogram binning [55]. The MI between X and Y with possible Y is calculated by Equation (15):
M I ( X ; Y ) = y Y P ( Y = y ) f X | Y ( x | y ) log f X | Y ( x | y ) f X ( x ) d x
where P ( Y = y ) is the prior probability of class y, f X | Y ( x | y ) is the conditional PDF of X given Y = y , and f X ( x ) is the marginal PDF of X. By using this estimation, features are ranked based on MI scores with a class label.The hyperparameters used for MI are described in Table 5.

3.2.5. Simulated Annealing (SA)

Simulated Annealing (SA) is a stochastic technique that is inspired by statistical mechanics. SA is used for finding globally optimal solutions to large optimization problems. The SA algorithm works with the assumption that some parts of the current solution belong to a potentially better one, and thus, these parts should be retained by exploring the current solution’s neighbors. With the assumption of minimizing the objective function, SA jumps from hill to hill, and thus, escapes or avoids sub-optimal solutions. When a system, say S, contains a set of possible states in thermal equilibrium at a temperature T, the probability that it is in a certain state s is called P T ( s ) . P T ( s ) depends on T and E ( s ) of state s. That probability follows the Boltzmann distribution.
MI is expressed in terms of entropy as:
P T ( s ) = e x p E ( s ) k T Z
where k is Boltzmann constant and Z acts as a normalization factor and it is defined as:
Z = s S e x p E ( s ) k T
Consider s a current state as described above, and s as neighboring state. The probability of the transition from s to s can be formulated as:
P T ( s s ) = P T ( s ) P T ( s ) = e x p E k T
where E = E ( S ) E ( S ) .
If P T ( s ) P T ( s ) , then the move is accepted and, if P T ( s ) < P T ( s ) , then the move is accepted with probability of P T ( s , s ) < 1 . The probability depends on the current temperature T, and it decreases when T does. There is a probability of T being lower at the end, in which the state is called the freezing point, and at this state, the transition is unlikely and the system is considered frozen. At this state, to increase the chance of maximizing the probability of finding the minimal energy state, thermal equilibrium should be reached. In order to reach equilibrium, annealing is scheduled to escape becoming stuck at a local minimum. Hence, the SA algorithm is introduced, and in SA, T is initially set at a high value to approximate thermal equilibrium. Then, small decrements of T are performed, and the process is iterated until the system is considered frozen. Reaching a near-optimal solution depends on how well the cooling schedule is designed, but it results in an inherently slow process because of the thermal equilibrium requirements at every point of temperature T.
To perform SA, we need four components to be performed. The components are configuration, move set, objective function, and cooling schedule. In the configuration step, the model represents all the possible solutions that the system can take, and then, it is used to find a near-optimal solution. Move sets are the computations that we need to perform to move from one state to another as part of the annealing process. The objective function is used to measure how good and optimal a given current state is. The cooling schedule anneals the problem from a random solution to a good solution. This component schedules when to reduce the current temperature and when to stop the annealing process.
At the beginning of the SA process, an initial solution is selected randomly and is assumed to be an optimal solution. If T does not satisfy the termination condition, then the neighboring solution is selected and the cost is calculated for that solution. If the cost of the newly selected neighbor solution is less than or equal to the current optimal solution, then the current optimal solution is replaced by the newly selected neighbor solution [56].
The hyperparameters used for SA are described in Table 6.

3.2.6. Minimum Redundancy Maximum Relevance (MRMR)

The Minimum Redundancy Maximum Relevance (MRMR) method was first introduced by Ding and Peng in [57] to address redundancy problems with high dimensional and high-throughput datasets related to cancer. The MRMR method helps in identifying the features that are most relevant to the class labels and less redundant with respect to each other, and thus, results in improving the classification performance.
The MRMR algorithm works in a filter-based framework using Mutual Information (MI) to evaluate two criteria. The first criterion is relevance, which is quantified as Mutual Information I ( f ; c ) between a candidate feature f and the target class label c. The features are selected based on high Mutual Information. The second criterion is redundancy, which is defined as the average MI I ( f i ; f j ) between the candidate feature and each feature that has been selected already. The features are selected based on the minimum average MI to avoid the highly correlated (redundant) features [55].
At each step, a new feature is selected from the unselected features by maximizing the following condition:
S c o r e ( f ) = I ( f ; c ) 1 | S | s S I ( f ; s )
where S is the current feature set that is selected within the subset of size | S | . The MRMR optimization is formulated as:
M R M R ( S ) = 1 | S | f i S I ( f i ; c ) 1 | S | 2 f i , f j S I ( f i ; f j )
The greedy incremental process continues until a predefined number of features are selected. The greedy incremental algorithm follows the steps below:
  • Initialize S as empty;
  • At each step, evaluate the remaining candidate features using the above score condition;
  • Add the feature that maximizes the score;
  • Repeat the process until the desired number of features is selected.
The hyperparameters used for MRMR are described in Table 7.

3.3. Classification Models

After generating the feature subsets from seven feature selection methods defined in the above section, we pass those feature subsets to different classification models in order to see how the features generated from different feature selection algorithms affect the classification model performance. The classifier’s objective is to learn how to classify the objects by analyzing the dataset, where we know which classes the instance belongs to. We used six different classification models in this research, and we briefly describe each of the models in the subsections below.

3.3.1. Decision Tree (DT)

The instances are representations of attribute value vectors, and the input data that is fed to the classifiers has such types of vectors where each vector belongs to a class. The output typically consists of mapping from attribute values to classes, and with learning, the model is able to classify known and unseen instances [58].
The Decision Tree method is an example model of representation of mappings that contains attribute nodes linked to multiple sub-trees or leaves or decision nodes that are labeled with classes. The decision node computes the outcome or decision based on an attribute value, and each decision is associated with one sub-tree. In DT, an instance at the root node is classified, and the outcome of that node will be a sub-tree. This process will continue until the outcome of that instance is determined. The depth of the Decision Tree is based on how many sub-trees it was divided into, and this determines the total conditions used in the decision rules, which is not a fixed number [59]. The hyperparameters used for DT are described in Table 8.

3.3.2. K-Nearest Neighbors (KNN)

KNN is one of the popular machine learning classification methods, which works on the principle of classifying unlabeled data based on its Nearest Neighbors. The concept of the KNN method was first proposed by Fix and Hodge in 1951 [60], and later developed by Cover and Heart in 1967 [61]. KNN is also used for prediction problems in which the label of the sample is predicted as the one with the majority label among its Nearest Neighbors.
KNN classifies the objects according to the distance between two samples. In general, the Euclidean distance formula is used to calculate the distance between two training or testing objects [62]. The formula is given in Equation (21).
d x y = i = 1 n ( x i y i ) 2
The hyperparameters used for KNN are described in Table 9.

3.3.3. Extreme Gradient Boosting (XGB)

The XGBoost algorithm, an ensemble learning method introduced by Chen and Guestrin [63] in 2016, improved Gradient Boosting by optimizing computational efficiency and scalability. XGB has been implemented in several programming languages and software libraries since its introduction and makes it accessible for performing both regression and classification tasks.
This method is a hybrid model of multiple base learners. It explores different base learners and picks a learning function that reduces the loss. The idea of ’ensembling’ the additive models is to train the predictors sequentially, and correct the predecessor by fitting the new predictor to the residual errors made by the previous predictor. In each step, the model optimizes the parameters. The inference and training of this learning method can be expressed by Equations (22) and (23):
θ * , f * = arg min θ , f 1 N i = 1 N L f ( i ) ( x ( i ) ; θ ) , y ( i )
p ^ y | x ; θ * = f * θ , x
where θ and f represent both the model set and its parameter set. L is the model train loss function. p ^ ( y x ; θ * ) is the predicted conditional probability of the output y given input x and optimized parameters θ * . The hyperparameters used for XGB are described in Table 10.

3.3.4. Logistic Regression (LR)

LR is used frequently for binary and linear classification tasks [64]. LR is used for estimating the probabilities of classes because it models the associations with the logistic data distribution. LR performs well with linearly separable classes, and this method is best used to identify the class decision boundaries [65]. This method focuses on the relationship between independent variables ( x 0 , x 1 , …, x n ) and a dependent variable Y. The logistic function, which is a sigmoid function, is used for the logistic model calculation. In this calculation, each value between negative infinity and positive infinity is generated as the range of input and output values between 0 and 1 [66].
Logistic regression is able to interpret the vector variables in the data and to conduct the coefficient or weight evaluation for each of those input variables, and in turn is able to predict the class that the vector belongs to. LR is the method used for the datasets in which the independent variables are known the results. The hyperparameters used for LR are described in Table 11.

3.3.5. Support Vector Machine (SVM)

SVM determines an optimal hyperplane that maximizes the margin between the hyperplane and the closest data point to that hyperplane. The patterns observed in the optimal hyperplane are called support vectors. Identifying a hyperplane that is optimal using SVM can be seen in Figure 3.
The SVM hyperplane is the set of points x satisfying Equation (24):
g ( x ) = w T x + b = 0
where w is the weight vector orthogonal to the hyperplane and b is an offset from the origin. In the case of linear SVMs, g ( x ) 0 is considered the closest point of a class, and g ( x ) < 0 is considered as the closest point of another class [68].
The distance between two support vectors is defined by Equation (25):
d = 2 w 2
where w is the optimal weight vector orthogonal to the seperating hyperplane, which is obtained through SVM optimization.
d should be increased for better separation, and proportionally w should be reduced using the Lagrange function in Equation (26):
L p ( w , b , α ) = 1 2 w 2 l = 1 n α l { y l ( W T x l + b ) 1 }
where y l ( W T x l + b ) 1 i = 1, 2, …, n and y l = {+1, −1} represent the class labels and α l is the Lagrange multiplier. L p should be reduced for optimal w and b computation. The hyperparameters used for SVM are described in Table 12.

3.3.6. Light Gradient Boosting Machine (LGBM)

LGBM is a variant method based on the Gradient Boosting Decision Tree (GBDT) and has better optimization than the XGB method. GBDT is an ensemble algorithm that combines multiple Decision Trees as base learners [69]. Each newly added tree pays increased attention to the samples that are misclassified by the previous trees. Through repetitive training of new Decision Trees and increasing their weights, GBDT gradually reduces model error and improves classification accuracy [70].
The LGBM uses the GBDT concept in its method, and the core concept involves sorting and bucketing the attributes to ensure all split points are checked. In training, LGBM selects leaf nodes for splitting and growing, and thus reduces the loss function. LGBM also introduces Gradient-Based One-Side Sampling (GOSS) to improve the effectiveness of model training. GOSS basically concentrates on the samples that have larger gradients, ignores the samples with low gradients, and amplifies the small gradient data weight. This process allows for effective utilization of large gradient samples and also retains some information from small gradient data samples that are disregarded [71].
The hyperparameters used for LGBM are described in Table 13.

3.4. Description of Datasets

The datasets used in this study were obtained from the Cancer Genome Atlas (TCGA) repository. TCGA is a cancer genomics program that characterizes 20,000 primary cancer and matched normal samples of 33 cancer types in total. This genomics program started in 2006 as a joint effort of both the National Cancer Institute and the National Human Genome Research Institute in the United States of America, and brought researchers from several institutions together. Due to the efforts of this program, TCGA was able to create 2.5 petabytes of data of transcriptomics, epigenomics, genomics, and proteomics data [72].
From this website, we collected a total of 10 types of cancer genomic datasets to use in this paper: the Colon and Rectal Adenocarcinomas (COAD), Head and Neck Squamous Cell Carcinoma (HNSC), Kidney Chromophobe (KICH), Kidney Renal Papillary (KIRP), Liver Hepatocellular Carcinoma (LIHC), Lung Squamous Cell (LUSC), Prostate Adenocarcinoma (PRAD), Stomach Adenocarcinoma (STAD), Thyroid Cancer (THCA), and Uterine Corpus Endometrioid Carcinoma (UCEC). All datasets are high-dimensional data that have features between 35,924 to 44,878. We used the datasets without applying additional preprocessing or normalization steps. This decision was made to ensure that all feature selection methods were evaluated on identical input data, thereby isolating the effect of feature selection from any influence of preprocessing techniques. While data normalization and transformation are often applied in research studies, our focus was on the comparative evaluation of feature selection algorithms under a consistent setting.
The summary of each dataset is shown in Table 14.

3.5. Experiment Setup

We passed all ten cancer datasets and fed them as input to all seven feature selection methods that we considered in this research—SPSA (our proposed model), RelChaNet, ReliefF, GA, MI, SA, and MRMR. We selected the top 5%, 10%, and 15% of features from each of the ten datasets. We used all the 30 dimensionally reduced datasets (top 5%, 10%, and 15% feature subsets in each of the 10 cancer datasets) and passed them as input to six classification models and calculated the performance metrics. We divided each dataset for training and testing with a split of 80% and 20%, respectively.

3.6. Evaluation Metrics

To compare how the subsets of data produced by different feature selection algorithms as well as how they perform with the classification models, we considered performance metrics such as Accuracy, Precision, Balanced Accuracy, Recall, and F1 Scores [73].
Accuracy is the ratio of all classifications that are correct, whether they are negative or positive. It is calculated using Equation (27).
A c c u r a c y = T P + T N T P + T N + F P + F N
Recall is also known as the true positive rate, which calculates all positives that are classified as positives. It is calculated using Equation (28).
R e c a l l = T P T P + F N
Precision is the proportion of all positive classifications that are actually positive. It is calculated using Equation (29).
P r e c i s i o n = T P T P + F P
F1 Score is a harmonic mean of both Precision and Recall, which balances both. This metric is preferred over Accuracy for class-imbalanced datasets. It is calculated using Equation (30).
F 1 = 2 T P 2 T P + F P + F N
Sensitivity is the same as Recall explained above, and Specificity is used to measure the proportion of True Negatives over the Total Negatives. It is calculated using Equation (31).
S p e c i f i c i t y = T N T N + F P
Balanced Accuracy is the arithmetic mean of sensitivity and specificity. It is used in cases of imbalanced data. It is calculated using Equation (32).
B a l a n c e d   A c c u r a c y = S e n s i t i v i t y + S p e c i f i c i t y 2
In all the above evaluation metric equations, T P is True Positive, T N is True Negative, F P is False Positive, and F N is False Negative.

3.7. Computational Resource Consumption Measurement

For the experiments, we used the Python (version 3.12.11) programming language to re-implement all the feature selection methods and classification models. For the experiments, we used the Center for Computationally Assisted Science and Technology (CCAST), an advanced computing infrastructure at North Dakota State University. We ran all the experiments on a JupyterLab setup of a 16-Core CPU with 128 GB memory with a GPU allocation. Table 15 provides the average execution runtime (in seconds) of the feature selection algorithms across all ten datasets, as well as the average execution runtime of the classifiers combined with the feature selection algorithms across the same datasets.

4. Results

We applied seven feature selection methods on the cancer datasets and extracted the top attributes that will help to improve the performance of the classification models. According to all the tables in the Appendix A, we can see that our model SPSA achieved mostly higher Balanced Accuracy compared to ReliefF, RelChaNet, GA, and MI in case of all the top 5%, 10%, and 15% of feature subsets.
For the DT classification, SPSA achieved a 100% Accuracy for COAD’s and KICH’s top 5, 10, 15 percent feature sets as shown in Table A1, Table A2, and Table A3, respectively. SPSA often achieves near-perfect or perfect results across the datasets and different feature selection percentages, suggesting that SPSA is effective and robust in selecting features across different datasets. As for RelChaNet, its performance is generally good but varies more significantly between datasets. For example, Accuracy and F1 Score drop notably for certain datasets (e.g., THCA and UCEC). Regarding ReliefF, it shows competitive results, generally close to SPSA’s performance, although there is some variation, such as slightly lower F1 Scores and Recall for certain datasets. GA scored average to very well in performance metrics with the small feature set (5%) with all the datasets, but it did not do better with the 10% and 15% feature sets. With the MI, the feature selection performance is good with the 10% feature set among most of the datasets, but with smaller and larger feature sets, it did not show good performance compared to other feature selection methods, and in some cases, it performed worse on most of the datasets. SA shows gradual improvement with higher feature subsets (10% and 15%). This feature selection method is slightly less stable in performance compared to SPSA or MI feature selection methods. MRMR is more consistent and has better values for Accuracy, Precision, F1 Score, and Balanced Accuracy, but lower and less consistent Recall scores for the 10% feature subset, and average values of Recall for the 15% feature subset. In particular, SPSA consistently ranks among the top methods for most datasets and subsets, with its strongest performance on COAD, KICH, LIHC, STAD, and THCA, while in a few cases (e.g., PRAD and UCEC), MRMR or SA slightly surpass its results. This shows that SPSA provides stable and reliable feature selection across subsets, often matching or outperforming traditional methods.
For the KNN classification, SPSA achieved 100% Accuracy for COAD’s and KICH’s top 5 and 10 percent feature sets as shown in Table A4, Table A5, and Table A6, respectively. ReliefF also achieved 100% Accuracy for the same feature sets on COAD. SPSA shows consistently good values across different datasets, maintaining high classification performance even with a reduced number of features, whereas ReliefF shows slightly less consistent results, but still offers competitive performance and high Accuracy for many datasets. RelChaNet, on the other hand, tends to underperform, particularly with fewer features, and appears to be less robust across datasets compared to the other two methods. The GA feature selection method did not work well with KNN and scored low on all the datasets except for COAD, where it performed best. MI worked well with most of the small feature sets on all the datasets, but it performed occasionally average and, most of the time, worse than the 15% feature set. The results for SA indicate that the feature subsets that are generated have lower Accuracy and Precision compared to other methods in most cases, except for very few datasets where the Precision is high. The Balanced Accuracy is lowest across all feature subset sizes and the datasets. The MRMR feature subset performance is more variable. It shows high Accuracy and Recall for the KIRP and PRAD datasets, and for other datasets, it performed poorly for Balanced Accuracy and Recall. The Balanced Accuracy is frequently lower than SPSA and ReliefF. Overall, SPSA shows strong and consistent performance across datasets, especially at 5% features, where it often achieves near-perfect Accuracy and Recall compared to other methods. At 10% and 15%, it remains competitive, though in some datasets, MRMR or GA slightly surpass it, indicating SPSA’s advantage is most pronounced with smaller feature subsets.
For the LGBM classification, SPSA, RelChaNet, and ReliefF feature selection methods achieved 100% Accuracy for all COAD’s feature sets as shown in Table A7, Table A8, and Table A9, respectively. SPSA tends to be the most stable and highest-performing method across the different datasets and feature levels. RelChaNet occasionally showed variability and lower performance, suggesting potential dataset-specific challenges. ReliefF was comparable to SPSA in many cases, but occasionally showed variability, particularly for smaller feature sets. GA did not perform well on all datasets except for the COAD and KICH datasets, where it performed on par with SPSA on the 10% feature set. MI also worked well with COAD, but it performed poorly with the other datasets compared to the other feature selection methods. The SA feature subsets performed poorly for some datasets, like COAD and LUSC, at 15%, but were excellent in a few cases. The Balanced Accuracy is lowest among all feature selection methods, especially for the 15% feature subsets across the ten datasets. The MRMR feature subsets generated mixed results, where the Accuracy and Balanced Accuracy were lowest, especially for the 15% subset. Some datasets show decent Recall but poor F1 and Balanced Accuracy. Overall, SPSA demonstrates consistently strong performance across 5%, 10%, and 15% feature subsets, often matching or surpassing other methods in Accuracy and F1 while maintaining higher Balanced Accuracy. Unlike methods such as SA or MRMR that show variability, SPSA remains stable and reliable across all datasets.
For the LR classification, SPSA achieved 100% Accuracy for all KICH’s top 5 and 10 percent feature sets, as shown in Table A10, Table A11, and Table A12, respectively. SPSA emerges as the most robust method, maintaining high performance with fewer features. RelChaNet offers a good balance, with strengths at mid-level feature percentages, but some sensitivity to feature set size. ReliefF shows potential in specific datasets but lacks the broad consistency demonstrated by the other two methods. GA is effective on the LIHC dataset and showed good results with 10% and 15% feature sets for the majority of the datasets. MI also performed and achieved good scores on the KICH dataset; however, the performance was worse with most of the 10% and 15% feature sets. The SA feature subsets produced high Accuracy, Precision, Recall, and F1 Score results across most datasets. It frequently achieved perfect scores for the UCEC, KIRP, and HNSC datasets and has a strong Balanced Accuracy across all datasets. The MRMR feature subsets have very strong performance with perfect or near-perfect Precision, Recall, and F1 Scores. This feature selection method generated excellent results for the THCA, UCEC, HNSC, and PRAD datasets for all feature subsets. Overall, SPSA shows stable and competitive performance across all subsets. At 5%, it secures strong results—often surpassing RelChaNet, ReliefF, and MI—while remaining close to GA. At 10%, it maintains reliable Accuracy and Recall across datasets, outperforming weaker methods though occasionally behind SA and MRMR. By 15%, SPSA continues to deliver high scores, particularly in THCA and UCEC, confirming its robustness and consistent competitiveness with other leading methods.
For SVM classification, SPSA achieved 100% Accuracy for only COAD’s top 5 percent feature set but outperformed ReliefF and RelChaNet for all the other datasets’ feature sets as shown in Table A13, Table A14, and Table A15, respectively. SPSA often leads in Precision and Recall metrics across most datasets, showcasing its ability to identify high-importance features that strongly influence classification. RelChaNet provides more stable but generally moderate performance, sometimes closing in on SPSA’s results but also demonstrating variable results with feature changes. ReliefF’s performance suggests it is less effective in filtering out critical features, leading to consistently lower performance results. GA did not perform well overall among most of the datasets, but it scored well on the PRAD dataset, and with the 15% feature set of the STAD dataset. MI performed well with the 15% feature set on most datasets, but produced low Accuracy among the 5% and 10% feature sets for all datasets. The SA feature subsets are among the best performers, especially for the 15% subset, with an Accuracy of above 0.99 among the KIRP, PRAD, STAD, and UCEC datasets. The Precision, F1 Score, and Recall are high in many datasets, but the Balanced Accuracy is slightly behind SPSA and MRMR feature selection methods. The MRMR feature subsets have variable Accuracy and F1 Scores and are less consistent across the datasets. The Recall drops significantly, and the Balanced Accuracy achieved mixed results—with some datasets, it is good, but it is poor in others. Overall, SPSA shows strong and consistent performance across 5%, 10%, and 15% feature subsets. At 5%, it achieves near-perfect results on several datasets, clearly outperforming ReliefF, RelChaNet, and GA. With 10% features, SPSA maintains robust Accuracy and balanced results, especially in LIHC, THCA, and UCEC. At 15%, it remains competitive—particularly in THCA and UCEC—though SA and MRMR occasionally surpass it.
For the XGB classification, SPSA achieved 100% Accuracy for COAD’s and KICH’s top 5, 10, 15 percent feature sets, and THCA’s top 5 percent feature set as shown in Table A16, Table A17 and Table A18, respectively. SPSA generally achieves very high Accuracy and consistent metrics across the different datasets. RelChaNet displays lower Accuracy and performance metrics compared to SPSA and ReliefF in most cases. ReliefF generally performs on par with SPSA, often showing similarly high Accuracy and metrics across most datasets and feature percentages. Most of the time, GA did not perform well across most of the datasets, but performed well in Accuracy and Precision for the UCEC and COAD datasets. MI performed subpar across all datasets except KIRP, where it achieved a score as good as SPSA’s. The SA feature subsets performed occasionally decently and achieved perfect values for the 15% feature subset for the LUSC dataset, but they have high inconsistency, often with poor Balanced Accuracy and Recall, and struggle with most of the datasets, which we can interpret due to the lack of robustness across the datasets. The MRMR feature subsets performed well for the PRAD dataset at 5%, and for STAD at 10%, but they have poor performance for Balanced Accuracy and Precision, especially for the KICH and KIRP datasets. Overall, SPSA consistently delivered strong and stable results across 5%, 10%, and 15% subsets, often matching or outperforming ReliefF, GA, and MI, with perfect scores in datasets like COAD, KICH, and THCA. While MRMR occasionally surpassed SPSA at higher subsets (e.g., PRAD, STAD), SPSA generally proved more reliable and robust than SA and RelChaNet, maintaining high Accuracy and balanced performance across datasets.
A colored heat map table representation of the Balanced Accuracy scores for each classification model used in this research across all ten cancer datasets is shown in Figure 4 for the 5% feature set. Please note that we opted to only display the best values, which are the 5% feature set results.

4.1. Statistical Analysis

Next, we study the effects of the features selected from the seven feature selection methods on Accuracy among the ten cancer datasets. We used the Friedman test, which is a non-parametric statistical test, to determine whether or not there is a statistically significant difference between the paired treatments, where treatments are arranged in a randomized repeated-measure design.
We are doing this statistical test only for the top 5% feature selection data. The Friedman test for this research uses the following null and alternative hypotheses:
The null hypothesis ( H 0 ): The seven feature selection methods used in this research have an equal effect on Accuracy among the ten cancer datasets.
The alternative hypothesis ( H a ): At least one feature selection method used in this research has a different effect from the others based on Accuracy among the ten cancer datasets.

4.1.1. DT

First, we calculated the summary statistics of the DT balanced classifier Accuracy scores of all ten datasets after applying the feature selection algorithms. We visualized the above summary with a violin plot in Figure 5.
Later, we applied the Friedman test and we obtained the test statistic of 32.61, and the p-value of 0.00001. Since the p-value is less than 0.05, which is statistically significant, the null hypothesis should be rejected. Therefore, we have sufficient evidence to conclude that the type of feature selection method used leads to statistically significant differences in Accuracy scores between the ten cancer datasets.
Next, we performed the Nemenyi post hoc test to identify which feature selection methods have different effects on Accuracy. The Nemenyi post hoc test returns the following p-values for each pairwise comparison of means as shown in Table 16.
At α = 0.05, the pairs below have statistically significant differences in the Accuracy scores among the ten cancer datasets.
  • SPSA vs. GA;
  • SPSA vs. MI;
  • SPSA vs. SA;
  • ReliefF vs. SA.

4.1.2. KNN

The summary statistics of the KNN classifier balanced classifier scores of all ten datasets after the applied feature selection algorithms are visualized as a violin plot in Figure 6.
For the Friedman test, we obtained the test statistic as 34.37 and the p-value as 0.000006. Since the p-value is less than 0.05, which is statistically significant, the null hypothesis should be rejected. Therefore, we have sufficient evidence to conclude that the type of feature selection method used leads to statistically significant differences in Accuracy scores among the ten cancer datasets.
Next, the Nemenyi post hoc test returns the following p-values for each pairwise comparison of means as shown in Table 17.
At α = 0.05, the pairs below have statistically significant differences in the Accuracy scores among the ten cancer datasets.
  • SPSA vs. RelChaNet;
  • SPSA vs. GA;
  • SPSA vs. MI;
  • SPSA vs. SA;
  • SPSA vs. MRMR;
  • ReliefF vs. MI.

4.1.3. LGBM

The summary statistics of the LGBM classifier balanced classifier scores of all ten datasets after feature selection algorithms applied are visualized as a violin plot in Figure 7.
For the Friedman test, we obtained the test statistic as 47.06 and the p-value as 0.00000001. Since the p-value is less than 0.05, which is statistically significant, the null hypothesis should be rejected. Therefore, we have sufficient evidence to conclude that the type of feature selection method used leads to statistically significant differences in Accuracy scores among the ten cancer datasets.
Next, the Nemenyi post hoc test returns the following p-values for each pairwise comparison of means as shown in Table 18.
At α = 0.05, the pairs below have statistically significant differences in Accuracy scores among the ten cancer datasets.
  • SPSA vs. GA;
  • SPSA vs. MI;
  • SPSA vs. SA;
  • SPSA vs. MRMR;
  • RelChaNet vs. SA;
  • RelChaNet vs. MRMR;
  • ReliefF vs. GA;
  • ReliefF vs. SA;
  • ReliefF vs. MRMR.

4.1.4. LR

The summary statistics of the LR classifier balanced classifier scores of all ten datasets after the applied feature selection algorithms are visualized with a violin plot in Figure 8.
For the Friedman test, we obtained the test statistic as 30.59 and the p-value as 0.00003. Since the p-value is less than 0.05, which is statistically significant, the null hypothesis should be rejected. Therefore, we have sufficient evidence to conclude that the type of feature selection method used leads to statistically significant differences in Accuracy scores among the ten cancer datasets.
Next, the Nemenyi post hoc test returns the following p-values for each pairwise comparison of means as shown in Table 19.
At α = 0.05, the pairs below have statistically significant differences in Accuracy scores among the ten cancer datasets.
  • SPSA vs. GA;
  • SPSA vs. MI;
  • SPSA vs. SA;
  • SPSA vs. MRMR.

4.1.5. SVM

The summary statistics of the SVM classifier balanced classifier scores of all ten datasets after feature selection algorithms applied are visualized with a violin plot in Figure 9.
For the Friedman test, we obtained the test statistic as 28.51 and the p-value as 0.000075. Since the p-value is less than 0.05, which is statistically significant, the null hypothesis should be rejected. Therefore, we have sufficient evidence to conclude that the type of feature selection method used leads to statistically significant differences in Accuracy scores among the ten cancer datasets.
Next, the Nemenyi post hoc test returns the following p-values for each pairwise comparison of means as shown in Table 20.
At α = 0.05, the pairs below have statistically significant differences in Accuracy scores among the ten cancer datasets.
  • SPSA vs. GA;
  • SPSA vs. MI;
  • SPSA vs. SA;
  • SPSA vs. MRMR.

4.1.6. XGB

The summary statistics of the XGB classifier balanced classifier scores of all ten datasets after feature selection algorithms applied are visualized as a violin plot in Figure 10.
For the Friedman test, we obtained the test statistic as 36.31 and the p-value as 0.000002. Since the p-value is less than 0.05, which is statistically significant, the null hypothesis should be rejected. Therefore, we have sufficient evidence to conclude that the type of feature selection method used leads to statistically significant differences in Accuracy scores among the ten cancer datasets.
Next, the Nemenyi post hoc test returns the following p-values for each pairwise comparison of means as shown in Table 21.
At α = 0.05, the pairs below have statistically significant differences in Accuracy scores among the ten cancer datasets.
  • SPSA vs. GA;
  • SPSA vs. MI;
  • SPSA vs. SA;
  • SPSA vs. MRMR;
  • ReliefF vs. GA;
  • ReliefF vs. SA.

5. Conclusions

This research successfully demonstrated the effectiveness of the Simultaneous Perturbation Stochastic Approximation (SPSA) method for feature selection in large-scale cancer classification tasks, making an advancement in the application of the SPSA technique to high-dimensional genomic datasets. Our comprehensive experimental evaluation across datasets containing over 35,000 features establishes SPSA as a viable and superior alternative to existing feature selection methodologies for cancer detection applications. The experimental results provide compelling evidence for the efficacy of the SPSA-based approach. Through systematic evaluation using six diverse classification algorithms (Decision Trees, K-Nearest Neighbors, LightGBM, Logistic Regression, XGBoost, and Support Vector Machines), we demonstrated that SPSA-generated feature subsets consistently achieve superior classification performance compared to four state-of-the-art feature selection methods. Our approach yielded mostly higher and often perfect classification Accuracy across nearly all ten reduced-dimensional datasets, while maintaining competitive computational efficiency with average and frequently lower computation times.
The robustness of our findings is underscored by the comprehensive evaluation framework employing multiple performance metrics, including Accuracy, Balanced Accuracy, Precision, Recall, and F1 Score. The consistent advantage of SPSA-based feature selection across these diverse metrics and multiple classifier architectures validates the method’s reliability and generalizability for high-dimensional cancer classification tasks.
Our investigation revealed that while SPSA maintains consistently high performance across most classifier combinations, there are isolated instances of reduced performance when integrated with certain classifiers. However, these cases represent minimal exceptions rather than systematic limitations, and the overall performance profile strongly favors the SPSA approach.
The successful application of SPSA to datasets exceeding 35,000 features establishes a new benchmark for feature selection in high-dimensional biomedical data analysis. We believe that researchers working with high-dimensional genomic, proteomic, or other biomedical datasets can leverage the SPSA-based feature selection method to significantly improve the Accuracy and reliability of their machine learning models.
This work opens several avenues for future research, including the exploration of hybrid approaches combining SPSA with other optimization techniques, the investigation of adaptive parameter tuning for different dataset characteristics, and the extension to multiclass cancer classification problems.

Author Contributions

Conceptualization, S.D.P.; Methodology, S.D.P.; Software, S.D.P.; Validation, S.D.P.; Formal analysis, S.D.P.; Investigation, S.D.P.; Resources, S.A.L.; Writing – original draft, S.D.P.; Writing – review and editing, S.D.P. and S.A.L.; Visualization, S.D.P.; Supervision, S.A.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available at https://www.nature.com/articles/ng.2764#rightslink (accessed on 18 September 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Decision Trees with 5% feature selection.
Table A1. Decision Trees with 5% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00001.0000
RelChaNet1.00001.00001.00001.00001.0000
ReliefF1.00001.00001.00001.00001.0000
GA1.00001.00001.00001.00000.9385
MI1.00001.00001.00001.00000.9722
SA0.97030.96640.93980.95240.8605
MRMR1.00001.00001.00001.00001.0000
HNSCSPSA0.97640.98790.77780.85100.9289
RelChaNet0.95880.78530.87340.82240.9413
ReliefF0.97050.83020.93200.87310.7920
GA0.97050.83020.93200.87310.8575
MI0.96470.81190.87650.84060.8130
SA0.96060.95020.93970.94480.8394
MRMR0.98230.98210.97750.97970.8647
KICHSPSA1.00001.00001.00001.00000.9722
RelChaNet0.92850.91670.94440.92510.9444
ReliefF0.92850.91670.94440.92510.7569
GA1.00001.00001.00001.00000.9275
MI1.00001.00001.00001.00001.0000
SA0.98230.98210.97750.97970.8654
MRMR0.98230.97780.98220.97990.8749
KIRPSPSA0.92780.89910.92630.91120.8039
RelChaNet0.82470.78030.75840.76790.7920
ReliefF0.91750.88290.93150.90160.8535
GA0.92780.89910.92630.91120.8075
MI0.82470.78030.75840.76790.7974
SA1.00001.00001.00001.00001.0000
MRMR0.98230.98210.97750.97970.8654
LIHCSPSA0.92120.89090.89090.89090.8575
RelChaNet0.90550.87540.85760.86590.8678
ReliefF0.89760.84970.88690.86560.8175
GA0.90550.87540.85760.86590.7635
MI0.90550.87540.85760.86590.8157
SA0.94810.92400.91200.91780.8278
MRMR0.92850.95000.90000.91810.9053
LUSCSPSA0.92770.83080.76430.79250.7541
RelChaNet0.90960.77020.72970.74760.8130
ReliefF0.91560.78190.78190.78190.8280
GA0.92770.83080.76430.79250.8459
MI0.92770.83080.76430.79250.8459
SA0.94480.92790.91790.92270.8185
MRMR0.98230.93130.88580.90710.8195
PRADSPSA0.94570.94370.82980.87480.8503
RelChaNet0.92770.87260.81920.84280.8609
ReliefF0.88550.76810.77730.77260.8375
GA0.92770.87260.81920.84280.8839
MI0.94570.94370.82980.87480.8849
SA0.92850.95000.90000.91810.8495
MRMR1.00001.00001.00001.00000.9862
STADSPSA0.94810.91460.92590.92010.8981
RelChaNet0.89620.82890.87960.85000.7239
ReliefF0.94070.90740.90740.90740.8852
GA0.94810.91460.92590.92010.9305
MI0.94810.91460.92590.92010.8823
SA0.98230.98210.97750.97970.8629
MRMR1.00001.00001.00001.00000.9721
THCASPSA0.96470.95970.95970.95970.9197
RelChaNet0.54070.48910.48960.48890.4375
ReliefF0.95880.96500.94110.95180.8621
GA0.96470.95970.95970.95970.8943
MI0.96470.95970.95970.95970.8746
SA0.94570.92870.77440.83000.8294
MRMR0.92850.95000.90000.91810.9053
UCECSPSA0.91520.91200.91360.91270.8086
RelChaNet0.84740.84230.84370.84290.7424
ReliefF0.88130.88150.87250.87620.7943
GA0.91520.91200.91360.91270.8462
MI0.88130.88150.87250.87620.7460
SA0.96900.96610.95450.96010.8293
MRMR0.99410.99690.94440.96900.9586
Table A2. Decision Trees with 10% feature selection.
Table A2. Decision Trees with 10% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00001.0000
RelChaNet0.98720.99350.75000.83011.0000
ReliefF1.00001.00001.00001.00000.9722
GA0.98720.99350.75000.83010.8949
MI0.98720.99350.75000.83010.8949
SA0.94480.91990.92940.92450.8517
MRMR0.96380.98050.83330.89010.8493
HNSCSPSA0.97640.98790.77780.85100.8764
RelChaNet0.96470.81190.87650.84060.7746
ReliefF0.96470.80450.92890.85420.8357
GA0.97050.83020.93200.87310.8678
MI0.97050.83020.93200.87310.8057
SA0.92850.95000.90000.91810.8926
MRMR0.98230.93130.88580.90710.8195
KICHSPSA1.00001.00001.00001.00000.9444
RelChaNet0.96420.95450.97220.96190.5000
ReliefF0.89280.88460.91670.88930.8295
GA0.97050.83020.93200.87310.8742
MI0.96470.80450.92890.85420.9257
SA0.94910.95050.94450.94720.9017
MRMR0.98820.99390.88890.93440.9273
KIRPSPSA0.93810.91410.93340.92300.8231
RelChaNet0.85560.81610.81610.81610.8130
ReliefF0.90720.87450.90010.88580.7954
GA0.96470.80450.92890.85420.7658
MI0.85560.81610.81610.81610.8549
SA1.00001.00001.00001.00000.9893
MRMR0.98230.93130.88580.90710.8836
LIHCSPSA0.94480.93800.90640.92080.9075
RelChaNet0.92910.89850.90760.90290.9063
ReliefF0.89760.85220.87540.86280.8175
GA0.88650.84990.87380.86040.8450
MI0.92910.89850.90760.90290.9592
SA0.98820.99390.88890.93440.9273
MRMR1.00001.00001.00001.00000.9483
LUSCSPSA0.93370.84350.79200.81490.6261
RelChaNet0.92770.81310.81310.81310.7920
ReliefF0.84330.60890.61940.61370.7585
GA0.94480.93800.90640.92080.8459
MI0.92770.81310.81310.81310.7850
SA1.00001.00001.00001.00000.9721
MRMR1.00001.00001.00001.00001.0000
PRADSPSA0.93970.87820.87820.87820.8644
RelChaNet0.92160.83870.85040.84440.8711
ReliefF0.90960.81990.80870.81410.7936
GA0.89750.73590.74740.74150.8238
MI0.92160.83870.85040.84440.8936
SA1.00001.00001.00001.00001.0000
MRMR1.00001.00001.00001.00001.0000
STADSPSA0.95550.92160.94440.93240.9166
RelChaNet0.92590.87770.89810.88730.8540
ReliefF0.93330.88570.91670.89990.8239
GA0.92160.83870.85040.84440.8346
MI0.92590.87770.89810.88730.8923
SA1.00001.00001.00001.00000.9382
MRMR1.00001.00001.00001.00001.0000
THCASPSA0.96470.97520.94550.95840.9600
RelChaNet0.94700.94140.93720.93920.9259
ReliefF0.95290.94010.95570.94720.8852
GA0.95550.92160.94440.93240.7832
MI0.95290.95030.94150.94570.8861
SA0.94840.93920.92820.93350.8296
MRMR0.98230.93130.88580.90710.8195
UCECSPSA0.89260.88870.89030.88950.8519
RelChaNet0.86440.86120.85810.85950.7210
ReliefF0.85870.85770.84920.85270.6954
GA0.85870.85770.84920.85270.7349
MI0.85870.85770.84920.85270.7239
SA0.96290.96110.92130.93960.8475
MRMR0.94910.95050.94450.94720.9017
Table A3. Decision Trees with 15% feature selection.
Table A3. Decision Trees with 15% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00000.9967
RelChaNet0.99360.99680.87500.92690.9289
ReliefF0.99360.99680.87500.92690.5000
GA0.99360.99680.8750.92690.9500
MI0.96470.81190.87650.84060.9367
SA0.98820.99390.88890.93440.8593
MRMR0.94840.93920.92820.93350.8296
HNSCSPSA0.97640.88270.88270.88270.9289
RelChaNet0.96470.80450.92890.85420.8826
ReliefF0.97050.83020.93200.87310.8245
GA0.96470.80450.92890.85420.9063
MI0.96470.81190.87650.84060.9256
SA0.94480.91990.92940.92450.8517
MRMR1.00001.00001.00001.00000.9364
KICHSPSA1.00001.00001.00001.00001.0000
RelChaNet0.92850.95000.90000.91810.4972
ReliefF1.00001.00001.00001.00001.0000
GA0.92850.95000.90000.91810.8043
MI0.92850.95000.90000.91810.8623
SA0.94480.91990.92940.92450.8385
MRMR0.95270.94410.92300.93300.8496
KIRPSPSA0.91750.89490.89490.89490.8423
RelChaNet0.85560.82020.80390.81140.8130
ReliefF0.88650.84990.87380.86040.7423
GA0.85560.81610.81610.81610.8249
MI0.88650.84990.87380.86040.8256
SA0.95550.93060.93060.93060.8364
MRMR1.00001.00001.00001.00000.9591
LIHCSPSA0.94480.92790.91790.92270.9127
RelChaNet0.92120.88520.90240.89330.8575
ReliefF0.89760.85220.87540.86280.8349
GA0.92120.88520.90240.89330.8395
MI0.94480.93800.90640.92080.8409
SA0.98230.93130.88580.90710.8195
MRMR1.00001.00001.00001.00001.0000
LUSCSPSA0.91560.77950.80630.79200.8340
RelChaNet0.90960.76510.82730.79130.8374
ReliefF0.89750.73590.74740.74150.7836
GA0.91560.78190.78190.78190.8035
MI0.92770.83080.76430.79250.7829
SA0.99410.99690.94440.96900.9586
MRMR0.96980.98300.89580.93320.8196
PRADSPSA0.93970.87820.87820.87820.8538
RelChaNet0.93370.87030.85740.86370.8782
ReliefF0.92160.83870.85040.84440.8579
GA0.92160.83870.85040.84440.8592
MI0.92160.83870.85040.84440.8936
SA0.98230.93130.88580.90710.8836
MRMR0.96290.96110.92130.93960.8475
STADSPSA0.95550.92160.94440.93240.9351
RelChaNet0.94070.89960.92130.90990.9259
ReliefF0.94810.92400.91200.91780.8720
GA0.94810.92400.91200.91780.8823
MI0.94810.92400.91200.91780.8843
SA0.94480.91990.92940.92450.8385
MRMR0.98820.99390.88890.93440.9656
THCASPSA0.95290.95030.94150.94570.9411
RelChaNet0.93520.92450.92850.92640.7943
ReliefF0.93520.92450.92850.92640.8345
GA0.93520.92450.92850.92640.8289
MI0.95290.95030.94150.94570.8340
SA0.98510.97690.97690.97690.8296
MRMR1.00001.00001.00001.00001.0000
UCECSPSA0.88700.88870.87730.88180.8416
RelChaNet0.85870.85420.86350.85650.7214
ReliefF0.88700.88210.88750.88430.7529
GA0.88700.88210.88750.88430.7560
MI0.88700.88210.88750.88430.7540
SA0.96380.98050.83330.89010.8493
MRMR0.98230.93130.88580.90710.8195
Table A4. K-Nearest Neighbors with 5% feature selection.
Table A4. K-Nearest Neighbors with 5% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00000.9275
RelChaNet0.99360.99680.87500.92690.8647
ReliefF1.00001.00001.00001.00000.9179
GA1.00001.00001.00001.00000.8954
MI1.00001.00001.00001.00000.8754
SA0.97450.48730.50000.49350.8596
MRMR0.95880.97920.61110.67120.8691
HNSCSPSA0.98820.99390.88890.93440.8562
RelChaNet0.97050.98490.72220.80000.8346
ReliefF0.98820.99390.88890.93440.8296
GA0.97050.98490.72220.80000.8523
MI0.97050.98490.72220.80000.8426
SA0.96470.96920.95020.95890.8821
MRMR0.84740.84770.83550.84010.7195
KICHSPSA0.89280.89180.87220.88050.7653
RelChaNet0.78570.76670.76670.76670.6589
ReliefF0.85710.86250.82220.83630.7346
GA0.78570.76670.76670.76670.6587
MI0.92850.95000.90000.91810.8076
SA0.97170.97180.96980.97080.8936
MRMR0.95480.95770.94930.95300.8563
KIRPSPSA0.89690.88520.84430.86160.7594
RelChaNet0.81440.77880.71480.73550.6946
ReliefF0.77310.70950.69880.70360.6358
GA0.81440.77880.71480.73550.6595
MI0.91750.90330.88270.89220.8023
SA0.80740.90300.51850.48200.6743
MRMR0.96420.97370.95000.96020.8745
LIHCSPSA0.92910.93980.86150.89270.8056
RelChaNet0.87400.90120.74480.78760.7590
ReliefF0.90550.92360.81150.85060.8236
GA0.85030.88260.69480.73400.7295
MI0.87400.90120.74480.78760.7369
SA0.89750.94850.52780.52550.7395
MRMR0.85540.42770.50000.46100.7295
LUSCSPSA0.92160.89080.66330.71880.8167
RelChaNet0.89750.75340.57660.60290.7640
ReliefF0.91560.83700.65990.70790.7934
GA0.92160.89080.66330.71880.8158
MI0.90960.85650.60770.64960.7798
SA0.96610.96500.96500.96500.8295
MRMR0.96470.98200.66670.74090.8736
PRADSPSA0.93970.90100.84360.86900.8276
RelChaNet0.89750.82360.71510.75300.7649
ReliefF0.91560.88910.74300.79200.7956
GA0.91560.88910.74300.79200.7677
MI0.92160.95810.72920.79240.8064
SA0.89750.94850.52780.52550.8384
MRMR0.89750.94650.64580.69760.7854
STADSPSA0.96290.93710.94910.94290.8746
RelChaNet0.92590.90350.85650.87730.8195
ReliefF0.94810.93580.89810.91540.8257
GA0.88800.86540.76390.80000.7727
MI0.96290.93710.94910.94290.8403
SA0.92940.95280.89090.9140.8547
MRMR0.92940.94490.89570.9150.8469
THCASPSA0.98820.99150.98180.98640.8953
RelChaNet0.61480.55370.54240.53820.5043
ReliefF0.96470.96920.95020.95890.8459
GA0.98820.98660.98660.98660.8678
MI0.96470.96920.95020.95890.8496
SA0.88970.93780.76670.80670.7934
MRMR0.96470.96920.95020.95890.8821
UCECSPSA0.93220.94830.91780.92800.8497
RelChaNet0.71750.83770.65750.64270.6195
ReliefF0.82480.88520.78770.80040.7058
GA0.90960.93330.89040.90270.7698
MI0.87570.91270.84930.86350.7596
SA0.88140.90010.71760.76520.7397
MRMR0.97450.48730.50000.49350.8596
Table A5. K-Nearest Neighbors with 10% feature selection.
Table A5. K-Nearest Neighbors with 10% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00000.9275
RelChaNet0.99360.99680.87500.92690.8974
ReliefF1.00001.00001.00001.00000.9195
GA1.00001.00001.00001.00000.8678
MI1.00001.00001.00001.00000.8659
SA0.83700.75110.68980.71180.7937
MRMR0.89750.94850.52780.52550.8384
HNSCSPSA0.98820.99390.88890.93440.8434
RelChaNet0.98230.99090.83330.89540.8534
ReliefF0.97640.98790.77780.85100.8436
GA0.98230.99090.83330.89540.8734
MI0.98820.99390.88890.93440.8256
SA0.92940.94490.89570.91500.8075
MRMR0.95880.97920.61110.67120.8296
KICHSPSA0.92850.95000.90000.91810.8076
RelChaNet0.78570.76670.76670.76670.6589
ReliefF0.82140.80990.79440.80090.7068
GA0.89280.89180.87220.88050.7824
MI0.85710.84440.84440.84440.7246
SA0.97450.48730.50000.49350.8396
MRMR0.85560.83510.77950.80050.7847
KIRPSPSA0.87620.90080.78140.81770.7329
RelChaNet0.84530.85410.73590.76810.7267
ReliefF0.81440.77880.71480.73550.6926
GA0.77310.70950.69880.70360.6106
MI0.81440.80250.69040.71660.6967
SA0.73190.36600.50000.42260.6057
MRMR0.95270.97090.90000.92940.8295
LIHCSPSA0.92120.91950.85640.88240.8396
RelChaNet0.85030.88260.69480.73400.7496
ReliefF0.90550.92360.81150.85060.7560
GA0.88970.91260.77820.82020.7649
MI0.85030.88260.69480.73400.7296
SA0.96470.98200.66670.74090.8594
MRMR0.96470.98200.66670.74090.8736
LUSCSPSA0.91560.83700.65990.70790.7967
RelChaNet0.89150.69820.52440.52120.7814
ReliefF0.90360.77140.62880.66620.7745
GA0.91560.83700.65990.70790.7893
MI0.92160.89080.66330.71880.7935
SA0.95880.97920.61110.67120.8691
MRMR0.85040.92120.68340.67840.7694
PRADSPSA0.93970.90100.84360.86900.8315
RelChaNet0.91560.95510.70830.77060.8064
ReliefF0.90360.87120.70130.75080.7697
GA0.93970.90100.84360.86900.7957
MI0.93970.90100.84360.86900.8069
SA0.89750.94850.52780.52550.8175
MRMR0.97170.97180.96980.97080.8936
STADSPSA0.94070.89960.92130.90990.8694
RelChaNet0.88800.86540.76390.80000.7695
ReliefF0.90370.88150.80090.83260.7946
GA0.94070.89960.92130.90990.8248
MI0.88800.86540.76390.80000.7610
SA0.89750.94850.52780.52550.7395
MRMR0.95880.97920.61110.67120.8691
THCASPSA0.99410.99110.99570.99330.9017
RelChaNet0.96470.96920.95020.95890.8470
ReliefF0.99410.99110.99570.99330.8756
GA0.98820.99150.98180.98640.8678
MI0.61480.55370.54240.53820.4975
SA0.95880.96510.94110.95180.6257
MRMR0.94700.47350.50000.48640.8905
UCECSPSA0.90960.93330.89040.90270.8560
RelChaNet0.75140.85140.69860.69700.6396
ReliefF0.80790.87680.76710.77800.6849
GA0.80220.87410.76030.77030.6978
MI0.71750.83770.65750.64270.5949
SA0.97450.48730.50000.49350.8492
MRMR0.85030.91810.68330.72370.7493
Table A6. K-Nearest Neighbors with 15% feature selection.
Table A6. K-Nearest Neighbors with 15% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA0.99360.99680.87500.92690.8974
RelChaNet0.99360.99680.87500.92690.8974
ReliefF0.99360.99680.87500.92690.8753
GA0.99360.99680.87500.92690.8652
MI0.99360.99680.87500.92690.8935
SA0.71420.68750.66670.67250.5945
MRMR0.97640.96930.97790.97340.8417
HNSCSPSA0.98230.99090.83330.89540.8567
RelChaNet0.97050.98490.72220.80000.8325
ReliefF0.97640.98790.77780.85100.8436
GA0.97640.98790.77780.85100.8236
MI0.98230.99090.83330.89540.8167
SA0.92090.92780.91020.91690.8536
MRMR0.89150.44580.50000.47130.7804
KICHSPSA0.89280.92860.85000.87330.7549
RelChaNet0.78570.76670.76670.76670.6395
ReliefF0.85710.84440.84440.84440.7368
GA0.89280.89180.87220.88050.7368
MI0.78570.76670.76670.76670.6634
SA0.72160.36460.49300.41920.6295
MRMR0.76370.38190.50000.43300.6349
KIRPSPSA0.91750.90330.88270.89220.7950
RelChaNet0.81440.80250.69040.71660.6754
ReliefF0.86590.85700.78660.81190.7387
GA0.77310.70950.69880.70360.6495
MI0.87620.90080.78140.81770.7523
SA0.89750.94650.64580.69760.7854
MRMR0.97640.96930.97790.97340.8417
LIHCSPSA0.92120.93440.84480.87910.8265
RelChaNet0.88970.91260.77820.82020.7694
ReliefF0.89760.91810.79480.83560.7694
GA0.92120.91950.85640.88240.8059
MI0.90550.92360.81150.85060.7567
SA0.89750.94850.52780.52550.8275
MRMR0.97450.48730.50000.49350.8596
LUSCSPSA0.92160.89080.66330.71880.8192
RelChaNet0.89750.75340.57660.60290.7943
ReliefF0.90960.85650.60770.64960.7983
GA0.91560.83700.65990.70790.7893
MI0.92160.89080.66330.71880.8084
SA0.95270.97090.90000.92940.8295
MRMR0.77160.88490.51670.46720.6493
PRADSPSA0.93970.91740.82630.86370.8315
RelChaNet0.92160.95810.72920.79240.8186
ReliefF0.89150.84870.65960.70510.7694
GA0.90360.87120.70130.75080.7594
MI0.91560.88910.74300.79200.7939
SA0.96470.98200.66670.74090.8594
MRMR0.95290.96090.93200.94460.8327
STADSPSA0.92590.91730.84260.87330.8075
RelChaNet0.88800.86540.76390.80000.7695
ReliefF0.89620.85110.81020.82820.7843
GA0.90370.88150.80090.83260.7850
MI0.89620.85110.81020.82820.7740
SA0.75000.86000.65000.64940.6987
MRMR0.92910.95750.85000.88960.8528
THCASPSA0.99410.99110.99570.99330.8794
RelChaNet0.98820.98660.98660.98660.8653
ReliefF0.98230.98210.97750.97970.8657
GA0.98820.98660.98660.98660.8594
MI0.98820.98660.98660.98660.8693
SA0.95270.97090.90000.92940.8639
MRMR0.89750.94650.64580.69760.7854
UCECSPSA0.87570.91270.84930.86350.7957
RelChaNet0.80220.87410.76030.77030.7594
ReliefF0.79090.86880.74660.75480.6594
GA0.87570.91270.84930.86350.7496
MI0.79090.86880.74660.75480.6597
SA0.85540.42770.50000.46100.7295
MRMR0.86140.93030.52080.50250.7964
Table A7. LightGBM with 5% feature selection.
Table A7. LightGBM with 5% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00000.9265
RelChaNet1.00001.00001.00001.00000.8769
ReliefF1.00001.00001.00001.00000.8976
GA1.00001.00001.00001.00000.8597
MI1.00001.00001.00001.00000.8695
SA0.72280.64060.84460.62750.6289
MRMR0.84530.81710.89440.82900.7257
HNSCSPSA0.99410.99690.94440.96900.9056
RelChaNet0.98230.89690.93820.91640.8658
ReliefF0.98820.94130.94130.94130.8368
GA0.98230.89690.93820.91640.8459
MI0.98820.94130.94130.94130.8495
SA0.84330.74930.91640.77390.7296
MRMR0.83730.74500.90370.77290.7504
KICHSPSA0.96420.95450.97220.96190.9670
RelChaNet0.92850.95000.90000.91810.8047
ReliefF0.96420.95450.97220.96190.8486
GA0.92850.95000.90000.91810.8076
MI0.96420.95450.97220.96190.8175
SA1.00001.00001.00001.00000.8691
MRMR0.83730.74500.90370.77290.7504
KIRPSPSA0.96900.96610.95450.96010.8898
RelChaNet0.93810.93090.90900.91920.8296
ReliefF0.93810.92120.92120.92120.8285
GA0.91750.90330.88270.89220.7986
MI0.97930.97370.97370.97370.8538
SA0.84330.74930.91640.77390.7296
MRMR0.97170.97000.97190.97090.8318
LIHCSPSA0.96060.94120.95120.94610.9284
RelChaNet0.95270.94410.92300.93300.8429
ReliefF0.92910.89310.91910.90500.8158
GA0.96850.96700.94480.95530.8487
MI0.96060.94120.95120.94610.8478
SA0.72280.64060.84460.62750.6289
MRMR0.96420.95450.97220.96190.8219
LUSCSPSA0.95780.97740.80560.86780.8743
RelChaNet0.93970.92120.74660.80500.8157
ReliefF0.93970.92120.74660.80500.8168
GA0.95180.93560.80220.85340.8185
MI0.95180.97440.77780.84400.8276
SA0.84530.81710.89440.82900.7285
MRMR0.92780.89530.95720.91760.7968
PRADSPSA0.98190.98970.93750.96140.9076
RelChaNet0.97590.96690.93400.94950.8645
ReliefF0.96980.96240.91310.93570.8196
GA0.96980.94600.93050.93800.8594
MI0.97590.96690.93400.94950.8438
SA0.92590.86490.95370.89760.8196
MRMR0.83730.74500.90370.77290.7504
STADSPSA0.97770.97160.95830.96480.8896
RelChaNet0.97770.97160.95830.96480.8494
ReliefF0.97770.97160.95830.96480.8494
GA0.97770.97160.95830.96480.8396
MI0.98510.97690.97690.97690.8754
SA0.84330.74930.91640.77390.7296
MRMR0.92120.87500.94850.90140.8659
THCASPSA0.99410.99570.99090.99320.9048
RelChaNet0.65180.60020.56180.55090.5296
ReliefF0.99410.99570.99090.99320.8694
GA0.99410.99570.99090.99320.8589
MI0.97640.97310.97310.97310.8965
SA0.84330.74930.91640.77390.7296
MRMR1.00001.00001.00001.00000.8695
UCECSPSA0.96610.96500.96500.96500.8959
RelChaNet0.94910.95050.94450.94720.8195
ReliefF0.94910.95320.94240.94700.8195
GA0.96610.96500.96500.96500.8789
MI0.94910.95320.94240.94700.8478
SA0.98870.99520.99590.99630.8591
MRMR0.83520.62560.91470.65930.7089
Table A8. LightGBM with 10% feature selection.
Table A8. LightGBM with 10% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00000.9527
RelChaNet1.00001.00001.00001.00000.8850
ReliefF1.00001.00001.00001.00000.8858
GA1.00001.00001.00001.00000.8965
MI1.00001.00001.00001.00000.8958
SA0.83730.74500.90370.77290.7504
MRMR0.92590.86490.95370.89760.7846
HNSCSPSA0.99410.99690.94440.96900.9168
RelChaNet0.98230.89690.93820.91640.8749
ReliefF0.98820.94130.94130.94130.8068
GA0.98820.94130.94130.94130.8345
MI0.98820.94130.94130.94130.8697
SA0.83730.74500.90370.77290.7504
MRMR0.98230.97410.98700.98010.8569
KICHSPSA0.96420.95450.97220.96190.8948
RelChaNet0.92850.95000.90000.91810.7968
ReliefF0.96420.95450.97220.96190.8368
GA0.96420.95450.97220.96190.8486
MI0.92850.95000.90000.91810.7924
SA0.98230.97410.98700.98010.8694
MRMR1.00001.00001.00001.00001.0000
KIRPSPSA0.97930.97370.97370.97370.8947
RelChaNet0.95870.94750.94750.94750.8267
ReliefF0.92780.91220.90200.90690.8176
GA0.96900.96610.95450.96010.8496
MI0.93810.92120.92120.92120.8196
SA0.92120.87500.94850.90140.8659
MRMR0.98870.99520.99590.99630.8689
LIHCSPSA0.96850.96700.94480.95530.9059
RelChaNet0.95270.92700.94600.93600.8385
ReliefF0.96060.94120.95120.94610.8489
GA0.96060.94120.95120.94610.8657
MI0.96060.95020.93970.94480.8594
SA0.83730.74500.90370.77290.7504
MRMR0.81920.68750.89860.71630.7286
LUSCSPSA0.95180.93560.80220.85340.8497
RelChaNet0.93370.91260.71880.77830.8106
ReliefF0.95180.97440.77780.84400.8286
GA0.95780.97740.80560.86780.7945
MI0.93970.92120.74660.80500.8109
SA0.92120.88540.95250.90360.8091
MRMR0.92120.87500.94850.90140.7056
PRADSPSA0.98190.98970.93750.96140.9184
RelChaNet0.96980.94600.93050.93800.8496
ReliefF0.93370.87030.85740.86370.8286
GA0.93370.87030.85740.86370.8265
MI0.96980.96240.91310.93570.8296
SA0.96810.72340.98540.80950.8494
MRMR0.77100.66070.87160.66960.7056
STADSPSA0.98510.97690.97690.97690.8946
RelChaNet0.97770.97160.95830.96480.8494
ReliefF0.97770.97160.95830.96480.7956
GA0.97030.96640.93980.95240.8596
MI0.97770.95960.97220.96580.8496
SA0.83730.74500.90370.77290.7504
MRMR0.85560.82500.90140.83930.7195
THCASPSA0.99410.99570.99090.99320.8960
RelChaNet0.97640.97310.97310.97310.8659
ReliefF0.97640.97310.97310.97310.8567
GA0.98230.98210.97750.97970.8675
MI0.99410.99570.99090.99320.8392
SA1.00001.00001.00001.00000.9067
MRMR0.96420.95450.97220.96190.8219
UCECSPSA0.96610.96700.96300.96490.8469
RelChaNet0.96040.96010.95820.95910.8496
ReliefF0.96040.96010.95820.95910.8754
GA0.96040.96230.95610.95890.8493
MI0.96610.96500.96500.96500.8156
SA0.76470.59830.88240.58350.6496
MRMR0.98230.97410.98700.98010.8536
Table A9. LightGBM with 15% feature selection.
Table A9. LightGBM with 15% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00000.8796
RelChaNet1.00001.00001.00001.00000.8694
ReliefF1.00001.00001.00001.00000.8793
GA0.98230.89690.93820.91640.8549
MI0.98820.94130.94130.94130.8437
SA0.72280.64060.84460.62750.6289
MRMR0.92120.88540.95250.90360.8091
HNSCSPSA0.98820.94130.94130.94130.8978
RelChaNet0.98230.99090.83330.89540.8569
ReliefF0.98820.94130.94130.94130.8386
GA0.98230.89690.93820.91640.8695
MI0.98820.94130.94130.94130.8694
SA1.00001.00001.00001.00000.8974
MRMR0.83730.74500.90370.77290.7504
KICHSPSA0.96420.95450.97220.96190.8948
RelChaNet0.92850.95000.90000.91810.8285
ReliefF0.96420.95450.97220.96190.8564
GA0.96420.95450.97220.96190.8275
MI0.92850.95000.90000.91810.7850
SA0.96040.95650.96430.95960.8402
MRMR0.72280.64060.84460.62750.6289
KIRPSPSA0.96900.96610.95450.96010.8865
RelChaNet0.91750.90330.88270.89220.7952
ReliefF0.95870.94750.94750.94750.8406
GA0.93810.92120.92120.92120.8186
MI0.91750.90330.88270.89220.8058
SA0.98870.99520.99590.99630.8506
MRMR0.87260.58330.93460.60790.7495
LIHCSPSA0.96850.96700.94480.95530.8697
RelChaNet0.96060.94120.95120.94610.8549
ReliefF0.96060.95020.93970.94480.8539
GA0.92910.89310.91910.90500.8076
MI0.96060.94120.95120.94610.8386
SA0.96610.96250.96910.96530.8405
MRMR0.96810.72340.98540.80950.8494
LUSCSPSA0.95180.93560.80220.85340.8478
RelChaNet0.95180.97440.77780.84400.8368
ReliefF0.95180.97440.77780.84400.8205
GA0.93370.91260.71880.77830.8429
MI0.95780.97740.80560.86780.8438
SA0.70000.57500.84160.53630.5876
MRMR0.54210.59570.74320.48800.4109
PRADSPSA0.97590.96690.93400.94950.8869
RelChaNet0.97590.96690.93400.94950.8329
ReliefF0.96980.96240.91310.93570.8798
GA0.98190.98970.93750.96140.8569
MI0.96980.96240.91310.93570.8769
SA0.92120.87500.94850.90140.8659
MRMR0.54210.59570.74320.48800.4076
STADSPSA0.98510.97690.97690.97690.8759
RelChaNet0.97770.95960.97220.96580.8295
ReliefF0.97030.96640.93980.95240.8459
GA0.97030.96640.93980.95240.8503
MI0.97770.95960.97220.96580.8396
SA0.56020.59890.75340.50150.4892
MRMR0.82530.72640.89790.75480.7195
THCASPSA0.99410.99570.99090.99320.8869
RelChaNet0.98230.98210.97750.97970.8295
ReliefF0.98820.98660.98660.98660.8439
GA0.98820.98660.98660.98660.8439
MI0.98820.98660.98660.98660.8495
SA0.83520.62560.91470.65930.7295
MRMR0.88820.66070.94100.71190.8056
UCECSPSA0.96610.96500.96500.96500.8374
RelChaNet0.96040.96230.95610.95890.8358
ReliefF0.94910.94680.94860.94760.8295
GA0.94910.95050.94450.94720.8292
MI0.94910.95050.94450.94720.8295
SA0.98230.97410.98700.98010.8749
MRMR0.98820.98470.99280.99250.8967
Table A10. Logistic Regression with 5% feature selection.
Table A10. Logistic Regression with 5% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA0.98080.78570.99020.85870.8940
RelChaNet0.96810.72340.98540.80950.8494
ReliefF0.87260.58330.93460.60790.7495
GA0.97450.75640.99760.83230.8539
MI0.98080.78570.99020.85870.8945
SA0.98230.98210.97750.97970.8295
MRMR0.95780.97740.80560.86780.7945
HNSCSPSA0.88820.66070.94100.71190.8056
RelChaNet0.83520.62560.91470.65930.7089
ReliefF0.70000.57500.84160.53630.5849
GA0.83520.62560.91470.65930.7057
MI0.76470.59830.88240.58350.6329
SA0.95780.97740.80560.86780.7945
MRMR0.91750.90330.88270.89220.8058
KICHSPSA1.00001.00001.00001.00000.9058
RelChaNet0.96420.95450.97220.96190.8592
ReliefF0.96420.95450.97220.96190.8305
GA1.00001.00001.00001.00000.8596
MI1.00001.00001.00001.00000.8974
SA0.93970.92120.74660.80500.8168
MRMR0.96060.94120.95120.94610.8657
KIRPSPSA0.91750.88540.94870.90320.8495
RelChaNet0.84530.81710.89440.82900.7285
ReliefF0.90720.87140.93660.89240.8694
GA0.85560.82500.90140.83930.7195
MI0.84530.81710.89440.82900.7098
SA0.94910.95050.94450.94720.8195
MRMR0.97590.96690.93400.94950.8438
LIHCSPSA0.92120.88540.95250.90360.8596
RelChaNet0.83460.79410.89180.80970.7290
ReliefF0.92120.87500.94850.90140.7056
GA0.83460.79410.89180.80970.7098
MI0.83460.79410.89180.80970.7109
SA0.96420.95450.97220.96190.8175
MRMR0.98820.94130.94130.94130.8068
LUSCSPSA0.81920.68750.89860.71630.7598
RelChaNet0.77100.66540.87320.67200.6459
ReliefF0.61440.59120.73500.53110.4986
GA0.72280.64060.84460.62750.6289
MI0.81920.68750.89860.71630.7286
SA0.98820.94130.94130.94130.8495
MRMR0.96420.95450.97220.96190.9670
PRADSPSA0.85540.75000.91550.78720.7698
RelChaNet0.83730.74500.90370.77290.7209
ReliefF0.79510.70690.88030.72470.6792
GA0.84330.74930.91640.77390.7296
MI0.79510.70690.88030.72470.6589
SA0.96980.96240.91310.93570.8296
MRMR0.95180.93560.80220.85340.8478
STADSPSA0.94810.89710.96760.92590.8578
RelChaNet0.94070.89520.96930.92840.8395
ReliefF0.92590.86490.95370.89760.8196
GA0.92590.86490.95370.89760.8056
MI0.94810.90530.97390.93950.8194
SA0.96060.95020.93970.94480.8594
MRMR0.98230.98210.97750.97970.8675
THCASPSA0.98820.98250.99130.98670.8958
RelChaNet0.57770.59680.60490.57270.4896
ReliefF0.98230.97410.98700.98010.8749
GA0.98820.98470.99280.99250.8695
MI0.98820.98470.99280.99250.8967
SA0.97770.97160.95830.96480.8494
MRMR0.97770.97160.95830.96480.8896
UCECSPSA0.98870.99520.99590.99630.8768
RelChaNet0.96610.96200.97120.96540.8589
ReliefF0.97170.97000.97190.97090.8318
GA0.96610.96200.97120.96540.8478
MI0.96610.96200.97120.96540.8496
SA0.91750.90330.88270.89220.8058
MRMR0.92910.89310.91910.90500.8158
Table A11. Logistic Regression with 10% feature selection.
Table A11. Logistic Regression with 10% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA0.98080.78570.99020.85870.9285
RelChaNet0.96810.72340.98540.80950.8265
ReliefF0.87260.58330.93460.60790.6589
GA0.97450.75640.99760.83230.8156
MI0.97450.75640.99760.83230.8356
SA0.95270.94410.92300.93300.8429
MRMR0.96060.94120.95120.94610.8549
HNSCSPSA0.88820.66070.94100.71190.7596
RelChaNet0.83520.62560.91470.65930.6854
ReliefF0.70000.57500.84160.53630.4596
GA0.70000.57500.84160.53630.5876
MI0.79410.60340.89230.61750.6840
SA0.97030.96640.93980.95240.8503
MRMR1.00001.00001.00001.00000.8976
KICHSPSA1.00001.00001.00001.00000.8691
RelChaNet0.96420.95450.97220.96190.8596
ReliefF0.96420.95450.97220.96190.8195
GA1.00001.00001.00001.00000.8695
MI1.00001.00001.00001.00000.9067
SA0.96850.96700.94480.95530.8487
MRMR0.93970.92120.74660.80500.8109
KIRPSPSA0.91750.88540.94870.90320.8267
RelChaNet0.84530.81710.89440.82900.7285
ReliefF0.90720.87140.93660.89240.8075
GA0.90720.87140.93660.89240.7567
MI0.91750.88540.94870.90320.8056
SA0.97640.97310.97310.97310.8965
MRMR0.96850.96700.94480.95530.9059
LIHCSPSA0.92120.88540.95250.90360.8749
RelChaNet0.83460.79410.89180.80970.7594
ReliefF0.92120.87500.94850.90140.7748
GA0.92120.87500.94850.90140.8295
MI0.92120.87500.94850.90140.8659
SA0.98820.94130.94130.94130.8697
MRMR1.00001.00001.00001.00000.8850
LUSCSPSA0.81920.68750.89860.71630.7056
RelChaNet0.77100.66540.87320.67200.5987
ReliefF0.61440.59120.73500.53110.4892
GA0.61440.59120.73500.53110.5186
MI0.70480.63930.83940.61930.5984
SA0.99410.99690.94440.96900.9056
MRMR0.96060.94120.95120.94610.8549
PRADSPSA0.85540.75000.91550.78720.7504
RelChaNet0.83730.74500.90370.77290.6594
ReliefF0.79510.70690.88030.72470.7297
GA0.72890.67390.84150.66390.6087
MI0.84330.74930.91640.77390.7285
SA0.96420.95450.97220.96190.8275
MRMR0.95180.93560.80220.85340.8478
STADSPSA0.94810.89710.96760.92590.8239
RelChaNet0.94070.89520.96930.92840.8168
ReliefF0.92590.86490.95370.89760.8372
GA0.94810.90530.97390.93950.8195
MI0.92590.86490.95370.89760.7846
SA0.93810.92120.92120.92120.8196
MRMR0.97590.96690.93400.94950.8438
THCASPSA0.98820.98250.99130.98670.9284
RelChaNet0.57770.59680.60490.57270.8567
ReliefF0.98230.97410.98700.98010.8694
GA0.57770.59680.60490.57270.4489
MI0.98230.97410.98700.98010.8569
SA0.97770.97160.95830.96480.8494
MRMR0.93970.92120.74660.80500.8109
UCECSPSA0.98870.99520.99590.99630.8914
RelChaNet0.96610.96200.97120.96540.8402
ReliefF0.97170.97000.97190.97090.8405
GA0.95480.95060.95950.95390.8285
MI0.97170.97000.97190.97090.8568
SA0.96060.94120.95120.94610.8549
MRMR1.00001.00001.00001.00000.8597
Table A12. Logistic Regression with 15% feature selection.
Table A12. Logistic Regression with 15% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA0.97450.75640.99760.83230.8867
RelChaNet0.76430.54880.87910.52010.6496
ReliefF0.97450.75000.98690.82670.8539
GA0.76430.54880.87910.52010.6696
MI0.97450.75640.99760.83230.8495
SA0.96610.96700.96300.96490.8469
MRMR0.93970.92120.74660.80500.8157
HNSCSPSA0.77050.59380.87890.58900.6984
RelChaNet0.76470.59830.88240.58350.6496
ReliefF0.54700.55230.76090.43760.4285
GA0.83520.62560.91470.65930.7295
MI0.58820.55700.78260.46340.4395
SA0.93370.91260.71880.77830.8106
MRMR0.96060.94120.95120.94610.8386
KICHSPSA0.96420.95450.97220.96190.8596
RelChaNet0.96420.95450.97220.96190.8493
ReliefF0.96420.95450.97220.96190.8957
GA0.96420.95450.97220.96190.8219
MI0.96420.95450.97220.96190.8489
SA0.92850.95000.90000.91810.7968
MRMR0.96040.96230.95610.95890.8358
KIRPSPSA0.91750.88540.94870.90320.8478
RelChaNet0.84530.81710.89440.82900.7078
ReliefF0.91750.88240.94370.90350.7984
GA0.84530.81710.89440.82900.7257
MI0.92780.89530.95720.91760.7968
SA0.95180.97440.77780.84400.8276
MRMR0.99410.99570.99090.99320.8960
LIHCSPSA0.92120.88540.95250.90360.8296
RelChaNet0.84250.80000.89690.81750.7694
ReliefF0.89760.84880.93300.87500.7639
GA0.92120.88540.95250.90360.8091
MI0.83460.79410.89180.80970.7257
SA0.97590.96690.93400.94950.8438
MRMR0.96610.96500.96500.96500.8789
LUSCSPSA0.72280.64060.84460.62750.6289
RelChaNet0.70480.63930.83940.61930.6176
ReliefF0.54210.59570.74320.48800.4076
GA0.72280.64060.84460.62750.6096
MI0.54210.59570.74320.48800.4109
SA0.97770.97160.95830.96480.8896
MRMR0.98820.94130.94130.94130.8694
PRADSPSA0.84330.74930.91640.77390.7296
RelChaNet0.72890.67390.84150.66390.6086
ReliefF0.82530.72640.89790.75480.7195
GA0.85540.75000.91550.78720.7349
MI0.85540.75000.91550.78720.7195
SA0.97030.96640.93980.95240.8596
MRMR0.96980.96240.91310.93570.8798
STADSPSA0.94810.90530.97390.93950.8239
RelChaNet0.86660.80000.91670.82950.7395
ReliefF0.92590.86490.95370.89760.7946
GA0.86660.80000.91670.82950.7439
MI0.94810.90530.97390.93950.8594
SA0.96420.95450.97220.96190.8948
MRMR0.91750.90330.88270.89220.7952
THCASPSA0.98820.98470.99280.99250.8597
RelChaNet0.97050.95830.97830.96710.8540
ReliefF0.98230.97410.98700.98010.8375
GA0.98230.97410.98700.98010.8536
MI0.98230.97410.98700.98010.8569
SA0.96610.96500.96500.96500.8789
MRMR1.00001.00001.00001.00000.8796
UCECSPSA0.98870.99520.99590.99630.8506
RelChaNet0.95480.95060.95950.95390.8346
ReliefF0.97170.96870.97390.97100.8589
GA0.98870.99520.99590.99630.8689
MI0.98870.99520.99590.99630.8591
SA1.00001.00001.00001.00000.8796
MRMR0.96610.96700.96300.96490.8469
Table A13. SVM with 5% feature selection.
Table A13. SVM with 5% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00000.9298
RelChaNet0.97450.48730.50000.49350.8958
ReliefF0.97450.48730.50000.49350.8386
GA0.98080.99040.62500.69510.8756
MI0.97450.48730.50000.49350.8396
SA0.84530.85410.73590.76810.7267
MRMR0.90960.93330.89040.90270.8560
HNSCSPSA0.96470.98200.66670.74090.8594
RelChaNet0.96470.98200.66670.74090.8417
ReliefF0.94700.47350.50000.48640.8398
GA0.95880.97920.61110.67120.8682
MI0.95880.97920.61110.67120.8692
SA0.80220.87410.76030.77030.6978
MRMR0.98820.98660.98660.98660.8594
KICHSPSA0.96420.97370.95000.96020.8745
RelChaNet0.71420.68750.66670.67250.6095
ReliefF0.75000.86000.65000.64940.6285
GA0.89280.92860.85000.87330.7598
MI0.75000.86000.65000.64940.6894
SA0.91750.90330.88270.89220.7950
MRMR0.99360.99680.87500.92690.8974
KIRPSPSA0.74220.65850.56800.56420.6849
RelChaNet0.72160.36460.49300.41920.6295
ReliefF0.73190.51160.53050.49170.6257
GA0.74220.86980.51920.46220.6295
MI0.74220.65850.56800.56420.6197
SA0.91560.88910.74300.79200.7956
MRMR0.87620.90080.78140.81770.7329
LIHCSPSA0.95270.97090.90000.92940.8639
RelChaNet0.85820.67640.70000.68120.7349
ReliefF0.76370.38190.50000.43300.6349
GA0.95270.97090.90000.92940.8193
MI0.95270.97090.90000.92940.8295
SA0.77310.70950.69880.70360.6495
MRMR0.94070.89960.92130.90990.8248
LUSCSPSA0.89750.94850.52780.52550.8275
RelChaNet0.89750.94850.52780.52550.7594
ReliefF0.89150.44580.50000.47130.7697
GA0.89750.94850.52780.52550.7395
MI0.89150.70570.59760.62440.7941
SA0.90360.87120.70130.75080.7594
MRMR0.92120.93440.84480.87910.8265
PRADSPSA0.89150.94370.62500.67020.8467
RelChaNet0.86740.77080.57630.59690.7548
ReliefF0.85540.42770.50000.46100.7386
GA0.89750.94650.64580.69760.7632
MI0.87950.93830.58330.61000.7943
SA0.94810.93580.89810.91540.8257
MRMR0.92910.93980.86150.89270.8056
STADSPSA0.95550.93060.93060.93060.8943
RelChaNet0.90370.94630.75930.81310.7689
ReliefF0.74810.57890.56480.56930.6278
GA0.90370.94630.75930.81310.7496
MI0.74810.57890.56480.56930.6295
SA0.99360.99680.87500.92690.8652
MRMR0.98820.99390.88890.93440.8256
THCASPSA0.93520.91670.95220.92940.8593
RelChaNet0.64440.32220.50000.39190.5096
ReliefF0.92940.94490.89570.91500.8469
GA0.64440.32220.50000.39190.5837
MI0.92940.94490.89570.91500.7975
SA0.85710.84440.84440.84440.7246
MRMR0.89280.92860.85000.87330.7549
UCECSPSA0.96610.96500.96500.96500.8823
RelChaNet0.92090.92780.91020.91690.8536
ReliefF0.84740.84770.83550.84010.7195
GA0.95480.95770.94930.95300.8727
MI0.97170.97180.96980.97080.8273
SA0.91750.90330.88270.89220.7950
MRMR0.92160.89080.66330.71880.8084
Table A14. SVM with 10% feature selection.
Table A14. SVM with 10% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA0.98720.99350.75000.83010.8750
RelChaNet0.98080.99040.62500.69510.8697
ReliefF0.97450.48730.50000.49350.8296
GA0.97450.48730.50000.49350.8492
MI0.97450.48730.50000.49350.8290
SA0.90550.92360.81150.85060.7560
MRMR0.89620.85110.81020.82820.7740
HNSCSPSA0.96470.98200.66670.74090.8736
RelChaNet0.95880.97920.61110.67120.8358
ReliefF0.94700.47350.50000.48640.8295
GA0.96470.98200.66670.74090.8385
MI0.94700.47350.50000.48640.7958
SA0.78570.76670.76670.76670.6587
MRMR0.98820.98660.98660.98660.8594
KICHSPSA0.92850.95000.90000.91810.8594
RelChaNet0.71420.68750.66670.67250.5945
ReliefF0.75000.86000.65000.64940.6987
GA0.75000.86000.65000.64940.6295
MI0.89280.92860.85000.87330.7256
SA0.91560.83700.65990.70790.7893
MRMR0.90960.93330.89040.90270.8560
KIRPSPSA0.85560.83510.77950.80050.7847
RelChaNet0.74220.86980.51920.46220.6395
ReliefF0.79890.85250.64940.63140.6845
GA0.85560.83510.77950.80050.7395
MI0.90420.90300.83980.86500.7689
SA0.99410.99110.99570.99330.8794
MRMR0.61480.55370.54240.53820.4975
LIHCSPSA0.92910.95750.85000.88960.8496
RelChaNet0.85040.92120.68340.67840.7694
ReliefF0.77160.88490.51670.46720.6493
GA0.85040.92120.68340.67840.7268
MI0.92910.95750.85000.88960.8195
SA0.81440.77880.71480.73550.6595
MRMR0.91750.90330.88270.89220.8023
LUSCSPSA0.89750.94850.52780.52550.8843
RelChaNet0.89150.44580.50000.47130.7804
ReliefF0.89150.44580.50000.47130.7956
GA0.89750.94850.52780.52550.8384
MI0.89150.70570.59760.62440.7304
SA0.89280.89180.87220.88050.7368
MRMR0.93970.90100.84360.86900.8069
PRADSPSA0.89750.94650.64580.69760.8495
RelChaNet0.89750.94650.64580.69760.7794
ReliefF0.86140.93030.52080.50250.7496
GA0.85540.42770.50000.46100.7295
MI0.86140.93030.52080.50250.7857
SA0.91560.88910.74300.79200.7677
MRMR0.91560.88910.74300.79200.7677
STADSPSA0.94070.94550.86570.89860.8495
RelChaNet0.88140.90010.71760.76520.7945
ReliefF0.78510.65210.62960.63850.6495
GA0.74810.57890.56480.56930.6829
MI0.83700.75110.68980.71180.7937
SA0.91560.83700.65990.79200.7677
MRMR0.91560.88910.74300.79200.7677
THCASPSA0.96470.96920.95020.95890.8821
RelChaNet0.92940.95280.89090.91400.8547
ReliefF0.95290.93650.96520.94810.8367
GA0.95290.96090.93200.94460.8735
MI0.96470.96920.95020.95890.8628
SA0.90960.85650.60770.64960.7983
MRMR0.89750.82360.71510.75300.7649
UCECSPSA0.97170.97180.96980.97080.8936
RelChaNet0.93220.94000.92190.92870.8195
ReliefF0.90390.91490.88970.89820.8287
GA0.97170.97180.96980.97080.8472
MI0.96610.96500.96500.96500.8295
SA0.81440.77880.71480.73550.6595
MRMR0.88800.86540.76390.80000.7695
Table A15. SVM with 15% feature selection.
Table A15. SVM with 15% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA0.97450.48730.50000.49350.9295
RelChaNet0.97450.48730.50000.49350.8596
ReliefF0.97450.48730.50000.49350.8976
GA0.97450.48730.50000.49350.8597
MI0.98080.99040.62500.69510.9056
SA0.61480.55370.54240.53820.4975
MRMR0.92120.93440.84480.87910.8265
HNSCSPSA0.95880.97920.61110.67120.8612
RelChaNet0.95880.97920.61110.67120.8296
ReliefF0.94700.47350.50000.48640.8905
GA0.94700.47350.50000.48640.8295
MI0.95880.97920.61110.67120.8691
SA0.99410.99110.99570.99330.8794
MRMR0.77310.70950.69880.70360.6106
KICHSPSA0.89280.92860.85000.87330.8154
RelChaNet0.71420.68750.66670.67250.5896
ReliefF0.75000.86000.65000.64940.6289
GA0.71420.68750.66670.67250.5986
MI0.75000.86000.65000.64940.6384
SA0.81440.77880.71480.73550.6595
MRMR0.89750.82360.71510.75300.7649
KIRPSPSA0.74220.86980.51920.46220.6891
RelChaNet0.73710.61790.50960.44240.6086
ReliefF0.73190.36600.50000.42260.6057
GA0.73710.61790.50960.44240.6296
MI0.74220.86980.51920.46220.6285
SA0.99410.99110.99570.99330.8756
MRMR0.92160.95810.72920.79240.8186
LIHCSPSA0.92910.95750.85000.88960.8528
RelChaNet0.88970.93780.76670.80670.7934
ReliefF0.85030.91810.68330.72370.7493
GA0.81100.90150.60000.59550.6967
MI0.77160.88490.51670.46720.6239
SA0.90360.87120.70130.75080.7697
MRMR0.89150.84870.65960.70510.7694
LUSCSPSA0.89750.94850.52780.52550.8175
RelChaNet0.89150.70570.59760.62440.8640
ReliefF0.89150.44580.50000.47130.7834
GA0.89150.70570.59760.62440.7594
MI0.89750.94850.52780.52550.7495
SA0.87570.91270.84930.86350.7596
MRMR0.97640.98790.77780.85100.8436
PRADSPSA0.89750.94650.64580.69760.7854
RelChaNet0.87950.93830.58330.61000.7478
ReliefF0.86140.93030.52080.50250.7964
GA0.89750.94650.64580.69760.7854
MI0.86740.77080.57630.59690.7395
SA0.99410.99110.99570.99330.8756
MRMR0.75140.85140.69860.69700.6396
STADSPSA0.88800.93900.72220.77520.7593
RelChaNet0.80740.90300.51850.48200.6743
ReliefF0.83700.75110.68980.71180.7937
GA0.95550.93060.93060.93060.8789
MI0.88140.90010.71760.76520.7397
SA0.98820.98660.98660.98660.8594
MRMR0.81440.77880.71480.73550.6926
THCASPSA0.97640.96930.97790.97340.8689
RelChaNet0.95290.96090.93200.94460.8327
ReliefF0.97050.95830.97830.96710.8526
GA0.97640.96930.97790.97340.8417
MI0.92940.94490.89570.91500.8075
SA0.92160.95810.72920.79240.8064
MRMR0.90550.92360.81150.85060.7567
UCECSPSA0.97170.97180.96980.97080.8893
RelChaNet0.95480.95770.94930.95300.8388
ReliefF0.92650.93220.91710.92300.8495
GA0.95480.95770.94930.95300.8563
MI0.97170.97180.96980.97080.8354
SA0.99410.99110.99570.99330.8794
MRMR0.98230.99090.83330.89540.8567
Table A16. XGBoost with 5% feature selection.
Table A16. XGBoost with 5% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00001.0000
RelChaNet0.99360.99680.87500.92690.9256
ReliefF1.00001.00001.00001.00000.9862
GA1.00001.00001.00001.00001.0000
MI1.00001.00001.00001.00000.9589
SA0.97050.83020.93200.87310.8678
MRMR0.95550.92160.94440.93240.9166
HNSCSPSA0.99410.99690.94440.96900.9586
RelChaNet0.98230.93130.88580.90710.8932
ReliefF0.98820.99390.88890.93440.9273
GA0.98230.93130.88580.90710.8596
MI0.98820.99390.88890.93440.9285
SA0.95550.92160.94440.93240.8923
MRMR0.92910.89850.90760.90290.8450
KICHSPSA1.00001.00001.00001.00001.0000
RelChaNet0.92850.95000.90000.91810.8495
ReliefF1.00001.00001.00001.00001.0000
GA0.92850.95000.90000.91810.8175
MI1.00001.00001.00001.00000.9486
SA0.94810.92400.91200.91780.8843
MRMR0.91520.91200.91360.91270.8462
KIRPSPSA0.96900.96610.95450.96010.8365
RelChaNet0.92780.91220.90200.90690.7947
ReliefF0.94840.93920.92820.93350.8672
GA0.92780.91220.90200.90690.7648
MI0.95870.95860.93530.94610.8725
SA0.93370.87030.85740.86370.8782
MRMR0.88650.84990.87380.86040.8256
LIHCSPSA0.95270.94410.92300.93300.8496
RelChaNet0.94480.92790.91790.92270.8185
ReliefF0.91330.87270.89730.88390.8564
GA0.94480.91990.92940.92450.8395
MI0.91330.87270.89730.88390.7963
SA0.91520.91200.91360.91270.8086
MRMR0.89760.85220.87540.86280.8349
LUSCSPSA0.93970.92120.74660.80500.8946
RelChaNet0.93970.92120.74660.80500.8175
ReliefF0.93970.92120.74660.80500.8305
GA0.93970.92120.74660.80500.8175
MI0.95780.97740.80560.86780.8283
SA0.96470.80450.92890.85420.9063
MRMR0.96470.97520.94550.95840.9600
PRADSPSA0.97590.96690.93400.94950.8826
RelChaNet0.95180.92950.86800.89520.8284
ReliefF0.96980.96240.91310.93570.8594
GA0.97590.98630.91670.94760.8485
MI0.97590.96690.93400.94950.8593
SA0.88130.88150.87250.87620.7943
MRMR0.95550.92160.94440.93240.9166
STADSPSA0.98510.97690.97690.97690.9286
RelChaNet0.95550.94180.91670.92850.8391
ReliefF0.96290.96110.92130.93960.8475
GA0.95550.93060.93060.93060.8238
MI0.94810.92400.91200.91780.8645
SA0.89260.88870.89030.88950.8519
MRMR0.91560.78190.78190.78190.8280
THCASPSA1.00001.00001.00001.00000.9893
RelChaNet0.65920.61430.58620.58530.5973
ReliefF0.98230.98210.97750.97970.8647
GA0.98230.98210.97750.97970.8465
MI0.98230.97780.98220.97990.8392
SA0.92770.83080.76430.79250.7829
MRMR0.96470.81190.87650.84060.8130
UCECSPSA0.97170.97420.96780.97070.8891
RelChaNet0.94350.94870.93560.94090.8592
ReliefF0.93220.93400.92600.92940.8197
GA0.94350.94870.93560.94090.8625
MI0.97170.97420.96780.97070.8939
SA0.94810.92400.91200.91780.8823
MRMR0.95290.95030.94150.94570.9411
Table A17. XGBoost with 10% feature selection.
Table A17. XGBoost with 10% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00000.9735
RelChaNet0.99360.99680.87500.92690.9624
ReliefF1.00001.00001.00001.00000.9625
GA1.00001.00001.00001.00000.9382
MI1.00001.00001.00001.00000.9483
SA0.93970.87820.87820.87820.8538
MRMR0.96470.80450.92890.85420.8826
HNSCSPSA0.98820.99390.88890.93440.9629
RelChaNet0.98230.93130.88580.90710.9074
ReliefF0.98820.99390.88890.93440.9627
GA0.98230.93130.88580.90710.8825
MI0.98230.93130.88580.90710.8836
SA0.89760.85220.87540.86280.8349
MRMR0.88130.88150.87250.87620.7943
KICHSPSA1.00001.00001.00001.00001.0000
RelChaNet0.92850.95000.90000.91810.8829
ReliefF1.00001.00001.00001.00000.9572
GA1.00001.00001.00001.00001.0000
MI1.00001.00001.00001.00001.0000
SA0.92770.83080.76430.79250.8459
MRMR0.93370.84350.79200.81490.6261
KIRPSPSA0.94840.93920.92820.93350.8296
RelChaNet0.93810.93090.90900.91920.8173
ReliefF0.93810.93090.90900.91920.8243
GA0.94840.92370.95260.93660.8395
MI0.96900.96610.95450.96010.8293
SA0.92850.91670.94440.92510.9444
MRMR0.96470.95970.95970.95970.8746
LIHCSPSA0.96060.95020.93970.94480.8394
RelChaNet0.95270.93450.93450.93450.8439
ReliefF0.96060.95020.93970.94480.8475
GA0.96060.95020.93970.94480.8158
MI0.94480.91990.92940.92450.8385
SA0.95550.92160.94440.93240.9166
MRMR0.94810.92400.91200.91780.8823
LUSCSPSA0.95780.97740.80560.86780.8720
RelChaNet0.94570.90020.79880.83990.8285
ReliefF0.93370.88010.74320.79220.8147
GA0.94570.92870.77440.83000.8294
MI0.94570.90020.79880.83990.8167
SA0.89760.85220.87540.86280.8175
MRMR0.92160.83870.85040.84440.8711
PRADSPSA0.98190.98970.93750.96140.8794
RelChaNet0.96980.98300.89580.93320.8294
ReliefF0.97590.98630.91670.94760.8837
GA0.96980.96240.91310.93570.8836
MI0.96980.96240.91310.93570.8276
SA0.94810.92400.91200.91780.8843
MRMR1.00001.00001.00001.00001.0000
STADSPSA0.97770.97160.95830.96480.8936
RelChaNet0.97030.96640.93980.95240.8594
ReliefF0.94810.92400.91200.91780.8278
GA0.95550.94180.91670.92850.8187
MI0.97030.96640.93980.95240.8468
SA0.93370.84350.79200.81490.6261
MRMR1.00001.00001.00001.00001.0000
THCASPSA0.98230.97780.98220.97990.8946
RelChaNet0.98230.98210.97750.97970.8629
ReliefF0.97640.97770.96840.97290.8593
GA0.98230.98210.97750.97970.8583
MI0.98230.98210.97750.97970.8654
SA0.88650.84990.87380.86040.8256
MRMR0.94570.94370.82980.87480.8849
UCECSPSA0.95480.95770.94930.95300.9381
RelChaNet0.93220.93180.92800.92980.8754
ReliefF0.95480.95530.95130.95320.8574
GA0.95480.95770.94930.95300.8729
MI0.93780.93870.93280.93550.8692
SA0.88650.84990.87380.86040.8256
MRMR0.95550.92160.94440.93240.9166
Table A18. XGBoost with 15% feature selection.
Table A18. XGBoost with 15% feature selection.
DatasetFeature Selection MethodAccuracyPrecisionRecallF1 ScoreBalanced Accuracy
COADSPSA1.00001.00001.00001.00001.0000
RelChaNet0.99360.99680.87500.92690.8745
ReliefF1.00001.00001.00001.00001.0000
GA1.00001.00001.00001.00000.9018
MI1.00001.00001.00001.00000.9721
SA0.92160.83870.85040.84440.8711
MRMR0.94810.92400.91200.91780.8720
HNSCSPSA0.98820.99390.88890.93440.9385
RelChaNet0.98230.93130.88580.90710.8195
ReliefF0.98820.99390.88890.93440.9656
GA0.99410.99690.94440.96900.9057
MI0.98820.99390.88890.93440.8593
SA0.96470.80450.92890.85420.8826
MRMR0.93370.87030.85740.86370.8782
KICHSPSA1.00001.00001.00001.00000.9591
RelChaNet0.92850.95000.90000.91810.8926
ReliefF1.00001.00001.00001.00000.9364
GA0.92850.95000.90000.91810.9053
MI1.00001.00001.00001.00000.9582
SA0.99360.99680.87500.92690.9500
MRMR0.95550.92160.94440.93240.9351
KIRPSPSA0.96900.96610.95450.96010.8794
RelChaNet0.94840.92370.95260.93660.8793
ReliefF0.95870.95860.93530.94610.8729
GA0.94840.93920.92820.93350.8462
MI0.96900.96610.95450.96010.8521
SA0.96470.80450.92890.85420.8742
MRMR0.88700.88210.88750.88430.7560
LIHCSPSA0.96060.95020.93970.94480.8295
RelChaNet0.94480.91990.92940.92450.8517
ReliefF0.95270.94410.92300.93300.8294
GA0.95270.94410.92300.93300.8275
MI0.96060.95020.93970.94480.8395
SA0.88700.88210.88750.88430.7540
MRMR0.96470.80450.92890.85420.9257
LUSCSPSA0.96380.98050.83330.89010.8493
RelChaNet0.94570.92870.77440.83000.8592
ReliefF0.94570.97130.75000.81860.8692
GA0.93370.88010.74320.79220.7849
MI0.93370.88010.74320.79220.8692
SA1.00001.00001.00001.00001.0000
MRMR0.97640.98790.77780.85100.9289
PRADSPSA0.97590.98630.91670.94760.8639
RelChaNet0.96980.98300.89580.93320.8196
ReliefF0.96980.96240.91310.93570.8526
GA0.95180.92950.86800.89520.8683
MI0.96980.96240.91310.93570.8481
SA0.90960.76510.82730.79130.8374
MRMR0.91520.91200.91360.91270.8462
STADSPSA0.97030.96640.93980.95240.8605
RelChaNet0.95550.93060.93060.93060.8364
ReliefF0.97030.96640.93980.95240.8493
GA0.98510.97690.97690.97690.8296
MI0.97030.96640.93980.95240.8468
SA0.92160.83870.85040.84440.8711
MRMR0.93970.87820.87820.87820.8644
THCASPSA0.98230.98210.97750.97970.9528
RelChaNet0.97050.97340.95930.96590.8504
ReliefF0.98230.97780.98220.97990.8749
GA0.65920.61430.58620.58530.5295
MI0.97050.97340.95930.96590.8939
SA0.82470.78030.75840.76790.7920
MRMR0.91520.91200.91360.91270.8462
UCECSPSA0.94910.95050.94450.94720.9017
RelChaNet0.93780.93870.93280.93550.8429
ReliefF0.94910.94840.94650.94740.8826
GA0.93220.93400.92600.92940.8597
MI0.94350.94870.93560.94090.8197
SA1.00001.00001.00001.00000.9385
MRMR0.98720.99350.75000.83010.8949

References

  1. Elshawi, R.; Sakr, S.; Talia, D.; Trunfio, P. Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service. Big Data Res. 2018, 14, 1–11. [Google Scholar] [CrossRef]
  2. Rawat, D.B.; Doku, R.; Garuba, M. Cybersecurity in Big Data Era: From Securing Big Data to Data-Driven Security. IEEE Trans. Serv. Comput. 2021, 14, 2055–2072. [Google Scholar] [CrossRef]
  3. Parsajoo, M.; Armaghani, D.J.; Asteris, P.G. A precise neuro-fuzzy model enhanced by artificial bee colony techniques for assessment of rock brittleness index. Neural Comput. Appl. 2021, 34, 3263–3281. [Google Scholar] [CrossRef]
  4. Hu, H.; Feng, D.; Yang, F. A Promising Nonlinear Dimensionality Reduction Method: Kernel-Based Within Class Collaborative Preserving Discriminant Projection. IEEE Signal Process. Lett. 2020, 27, 2034–2038. [Google Scholar] [CrossRef]
  5. Parsajoo, M.; Mohammed, A.; Yagiz, S.; Armaghani, D.; Khandelwal, M. An evolutionary adaptive neuro-fuzzy inference system for estimating field penetration index of tunnel boring machine in rock mass. J. Rock Mech. Geotech. Eng. 2021, 13, 1290–1299. [Google Scholar] [CrossRef]
  6. Wang, J.; Wang, L.; Nie, F.; Li, X. Joint Feature Selection and Extraction With Sparse Unsupervised Projection. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 3071–3081. [Google Scholar] [CrossRef]
  7. Wang, S.; Gao, C.; Zhang, Q.; Dakulagi, V.; Zeng, H.; Zheng, G.; Bai, J.; Song, Y.; Cai, J.; Zong, B. Research and Experiment of Radar Signal Support Vector Clustering Sorting Based on Feature Extraction and Feature Selection. IEEE Access 2020, 8, 93322–93334. [Google Scholar] [CrossRef]
  8. Li, J.; Chen, J.; Qi, F.; Dan, T.; Weng, W.; Zhang, B.; Yuan, H.; Cai, H.; Zhong, C. Two-Dimensional Unsupervised Feature Selection via Sparse Feature Filter. IEEE Trans. Cybern. 2023, 53, 5605–5617. [Google Scholar] [CrossRef]
  9. Thejas, G.S.; Garg, R.; Iyengar, S.S.; Sunitha, N.R.; Badrinath, P.; Chennupati, S. Metric and Accuracy Ranked Feature Inclusion: Hybrids of Filter and Wrapper Feature Selection Approaches. IEEE Access 2021, 9, 128687–128701. [Google Scholar] [CrossRef]
  10. Mandal, A.K.; Nadim, M.; Saha, H.; Sultana, T.; Hossain, M.D.; Huh, E.N. Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search. IEEE Access 2024, 12, 62341–62357. [Google Scholar] [CrossRef]
  11. Yoshino, E.; Juarto, B.; Kurniadi, F.I. Hybrid Machine Learning Model for Breast Cancer Classification with K-Means Clustering Feature Selection Techniques. In Proceedings of the 2023 International Seminar on Application for Technology of Information and Communication (iSemantic), Semarang, Indonesia, 16–17 September 2023; pp. 182–186. [Google Scholar]
  12. Nethala, T.R.; Sahoo, B.K.; Srinivasulu, P. GECC-Net: Gene Expression-Based Cancer Classification using Hybrid Fuzzy Ranking Network with Multi-kernel SVM. In Proceedings of the 2022 International Conference on Industry 4.0 Technology (I4Tech), Pune, India, 23–24 September 2022; pp. 1–6. [Google Scholar]
  13. Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef]
  14. Miller, A. Subset Selection in Regression; Chapman and Hall/CRC: Boca Raton, FL, USA, 2002. [Google Scholar]
  15. Reunanen, J. Overfitting in making comparisons between variable selection methods. J. Mach. Learn. Res. 2003, 3, 1371–1382. [Google Scholar]
  16. Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
  17. Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef]
  18. Kursa, M.B.; Rudnicki, W.R. Feature Selection with the Boruta Package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
  19. Degenhardt, F.; Seifert, S.; Szymczak, S. Evaluation of variable selection methods for random forests and omics data sets. Briefings Bioinform. 2017, 20, 492–503. [Google Scholar] [CrossRef] [PubMed]
  20. Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef] [PubMed]
  21. Nilsson, R.; Peña, J.M.; Björkegren, J.; Tegnér, J. Consistent Feature Selection for Pattern Recognition in Polynomial Time. J. Mach. Learn. Res. 2007, 8, 589–612. [Google Scholar]
  22. Granitto, P.M.; Furlanello, C.; Biasioli, F.; Gasperi, F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom. Intell. Lab. Syst. 2006, 83, 83–90. [Google Scholar] [CrossRef]
  23. Meinshausen, N.; Bühlmann, P. Stability Selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 2010, 72, 417–473. [Google Scholar] [CrossRef]
  24. Austin, P.C.; Tu, J.V. Bootstrap Methods for Developing Predictive Models. Am. Stat. 2004, 58, 131–137. [Google Scholar] [CrossRef]
  25. Kartini, D.; Mazdadi, M.I.; Budiman, I.; Indriani, F.; Hidayat, R. Binary PSO-GWO for Feature Selection in Binary Classification Using K-Nearest Neighbor. In Proceedings of the 2023 Eighth International Conference on Informatics and Computing (ICIC), Manado, Indonesia, 8–9 December 2023; pp. 1–6. [Google Scholar]
  26. Ludwig, S.A. Guided Particle Swarm Optimization for Feature Selection: Application to Cancer Genome Data. Algorithms 2025, 18, 220. [Google Scholar] [CrossRef]
  27. Singh, S.N.; Mishra, S.; Satapathy, S.K.; Cho, S.B.; Mallick, P.K. Efficient Feature Selection Techniques for Accurate Cancer Analysis Using Krill Heard Optimization. In Proceedings of the 2024 International Conference on Emerging Systems and Intelligent Computing (ESIC), Bandung, Indonesia, 17–18 September 2024; pp. 41–45. [Google Scholar]
  28. Singh, U.K.; Rout, M. Genetic Algorithm based Feature Selection to Enhance Breast Cancer Classification. In Proceedings of the 2023 IEEE International Conference on Contemporary Computing and Communications (InC4), Bangalore, India, 21–22 April 2023; Volume 1, pp. 1–5. [Google Scholar]
  29. Touchanti, K.; Ezzazi, I.; Bekkali, M.E.; Maser, S. A 2-stages feature selection framework for colon cancer classification using SVM. In Proceedings of the 2022 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 18–20 May 2022; pp. 1–5. [Google Scholar]
  30. Sachdeva, R.K.; Bathla, P.; Rani, P.; Kukreja, V.; Ahuja, R. A Systematic Method for Breast Cancer Classification using RFE Feature Selection. In Proceedings of the 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 28–29 April 2022; pp. 1673–1676. [Google Scholar]
  31. Mohiuddin, T.; Naznin, S.; Upama, P.B. Classification and Performance Analysis of Cancer Microarrays Using Relevant Genes. In Proceedings of the 2021 5th International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Dhaka, Bangladesh, 18–20 November 2021; pp. 1–6. [Google Scholar]
  32. Si, C.; Zhao, L.; Liu, J. Deep Feature Selection Algorithm for Classification of Gastric Cancer Subtypes. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA, 1–4 October 2023; pp. 698–705. [Google Scholar]
  33. Spall, J. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control 1992, 37, 332–341. [Google Scholar] [CrossRef]
  34. Spall, J.C.; Chin, D.C. Traffic-responsive signal timing for system-wide traffic control. Transp. Res. Part C Emerg. Technol. 1997, 5, 153–163. [Google Scholar] [CrossRef]
  35. Maeda, Y. Real-time control and learning using neuro-controller via simultaneous perturbation for flexible arm system. In Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301), Anchorage, AK, USA, 8–10 May 2002; Volume 4, pp. 2583–2588. [Google Scholar]
  36. Johannsen, D.A.; Wegman, E.J.; Solka, J.L.; Priebe, C.E. Simultaneous Selection of Features and Metric for Optimal Nearest Neighbor Classification. Commun. Stat.—Theory Methods 2004, 33, 2137–2157. [Google Scholar] [CrossRef]
  37. Aksakalli, V.; Malekipirbazari, M. Feature selection via binary simultaneous perturbation stochastic approximation. Pattern Recognit. Lett. 2016, 75, 41–47. [Google Scholar] [CrossRef]
  38. Yenice, Z.; Adhikari, N.; Wong, Y.; Aksakalli, V.; Taskin, A.; Abbasi, B. SPSA-FSR: Simultaneous Perturbation Stochastic Approximation for Feature Selection and Ranking. arXiv 2018, arXiv:1804.05589. [Google Scholar] [CrossRef]
  39. Aksakalli, V.; Yenice, Z.D.; Malekipirbazari, M.; Kargar, K. Feature selection using stochastic approximation with Barzilai and Borwein non-monotone gains. Comput. Oper. Res. 2021, 132, 105334. [Google Scholar] [CrossRef]
  40. Algin, R.; Alkaya, A.F.; Agaoglu, M. Performance of Simultaneous Perturbation Stochastic Approximation for Feature Selection. In Proceedings of the Intelligent and Fuzzy Systems, Bornova, Turkey, 19–21 July 2022; pp. 348–354. [Google Scholar]
  41. Akman, D.V.; Malekipirbazari, M.; Yenice, Z.D.; Yeo, A.; Adhikari, N.; Wong, Y.K.; Abbasi, B.; Gumus, A.T. k-best feature selection and ranking via stochastic approximation. Expert Syst. Appl. 2023, 213, 118864. [Google Scholar] [CrossRef]
  42. Spall, J.C. Introduction to Stochastic Search and Optimization, 1st ed.; Wiley-Interscience: New York, NY, USA, 2003. [Google Scholar]
  43. Barzilai, J.; Borwein, J.M. Two-Point Step Size Gradient Methods. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
  44. Raydan, M. The Barzilai and Borwein Gradient Method for the Large Scale Unconstrained Minimization Problem. SIAM J. Optim. 1997, 7, 26–33. [Google Scholar] [CrossRef]
  45. Molina, B.; Raydan, M. Preconditioned Barzilai-Borwein method for the numerical solution of partial differential equations. Numer. Algorithms 1996, 13, 45–60. [Google Scholar] [CrossRef]
  46. Dai, Y.H.; Liao, L.Z. R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 2002, 22, 1–10. [Google Scholar] [CrossRef]
  47. Sietsma; Dow. Neural net pruning-why and how. In Proceedings of the IEEE 1988 International Conference on Neural Networks, San Diego, CA, USA, 24–27 July 1988; Volume 1, pp. 325–333. [Google Scholar] [CrossRef]
  48. Zimmer, F. RelChaNet: Neural Network Feature Selection using Relative Change Scores. arXiv 2024, arXiv:2410.02344. [Google Scholar]
  49. Fan, H.; Xue, L.; Song, Y.; Li, M. A repetitive feature selection method based on improved ReliefF for missing data. Appl. Intell. 2022, 52, 16265–16280. [Google Scholar] [CrossRef]
  50. Kira, K.; Rendell, L. The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the AAAI-92: Proceedings of the 10th National Conference on Artifical Intelligence, San Jose, CA, USA, 12–16 July 1992; Swartout, W., Ed.; AAAI Press: Menlo Park, CA, USA; The MIT Press: Cambridge, MA, USA, August 1992; pp. 129–134. [Google Scholar]
  51. Kononenko, I.; Šimec, E.; Robnik-Šikonja, M. Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF. Appl. Intell. 1997, 7, 39–55. [Google Scholar] [CrossRef]
  52. Halim, Z.; Yousaf, M.N.; Waqas, M.; Sulaiman, M.; Abbas, G.; Hussain, M.; Ahmad, I.; Hanif, M. An effective genetic algorithm-based feature selection method for intrusion detection systems. Comput. Secur. 2021, 110, 102448. [Google Scholar] [CrossRef]
  53. Macedo, F.; Valadas, R.; Carrasquinha, E.; Oliveira, M.R.; Pacheco, A. Feature selection using Decomposed Mutual Information Maximization. Neurocomputing 2022, 513, 215–232. [Google Scholar] [CrossRef]
  54. Sulaiman, M.A.; Labadin, J. Feature selection based on mutual information. In Proceedings of the 2015 9th International Conference on IT in Asia (CITA), Kuching, Malaysia, 4–5 August 2015; pp. 1–6. [Google Scholar]
  55. Peng, H.; Long, F.; Ding, C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
  56. Syaiful, A.; Sartono, B.; Afendi, F.M.; Anisa, R.; Salim, A. Feature Selection using Simulated Annealing with Optimal Neighborhood Approach. J. Phys. Conf. Ser. 2021, 1752, 012030. [Google Scholar] [CrossRef]
  57. Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. In Proceedings of the 2003 IEEE Bioinformatics Conference, Stanford, CA, USA, 11–14 August 2003; pp. 523–528. [Google Scholar] [CrossRef]
  58. Seeja, G.; Doss, A.S.A.; Hency, V.B. A Novel Approach for Disaster Victim Detection Under Debris Environments Using Decision Tree Algorithms with Deep Learning Features. IEEE Access 2023, 11, 54760–54772. [Google Scholar] [CrossRef]
  59. El Zein, Y.; Lemay, M.; Huguenin, K. PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2024, 21, 1–13. [Google Scholar] [CrossRef] [PubMed]
  60. Fix, E.; Hodges, J.L., Jr. Discriminatory Analysis—Nonparametric Discrimination: Consistency Properties; Technical Report; California Univ Berkeley: Berkeley, CA, USA, 1951. [Google Scholar]
  61. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
  62. Xing, W.; Bei, Y. Medical Health Big Data Classification Based on KNN Classification Algorithm. IEEE Access 2020, 8, 28808–28819. [Google Scholar] [CrossRef]
  63. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; KDD ’16. pp. 785–794. [Google Scholar]
  64. Ashenden, S.K. (Ed.) The Era of Artificial Intelligence, Machine Learning, and Data Science in the Pharmaceutical Industry; Elsevier: Amsterdam, The Netherlands, 2021. [Google Scholar]
  65. Li, T.H.S.; Chiu, H.J.; Kuo, P.H. Hepatitis C Virus Detection Model by Using Random Forest, Logistic-Regression and ABC Algorithm. IEEE Access 2022, 10, 91045–91058. [Google Scholar] [CrossRef]
  66. Teddy, M.; Muhammad, A.; Media, A. Crime Index Based on Text Mining on Social Media using Multi Classifier Neural-Net Algorithm. TELKOMNIKA Telecommun. Comput. Electron. Control 2022, 20, 570–579. [Google Scholar]
  67. García-Gonzalo, E.; Fernández-Muñiz, Z.; García Nieto, P.J.; Bernardo Sánchez, A.; Menéndez Fernández, M. Hard-Rock Stability Analysis for Span Design in Entry-Type Excavations with Learning Classifiers. Materials 2016, 9, 531. [Google Scholar] [CrossRef]
  68. Asaly, S.; Gottlieb, L.-A.; Reuveni, Y. Using Support Vector Machine (SVM) and Ionospheric Total Electron Content (TEC) Data for Solar Flare Predictions. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 1469–1481. [Google Scholar] [CrossRef]
  69. Shamreen Ahamed, B. LGBM Classifier based Technique for Predicting Type-2 Diabetes. Eur. J. Mol. Clin. Med. 2021, 8, 454–467. [Google Scholar]
  70. Algarni, A.; Ahmad, Z.; Alaa Ala’Anzy, M. An Edge Computing-Based and Threat Behavior-Aware Smart Prioritization Framework for Cybersecurity Intrusion Detection and Prevention of IEDs in Smart Grids with Integration of Modified LGBM and One Class-SVM Models. IEEE Access 2024, 12, 104948–104963. [Google Scholar] [CrossRef]
  71. Alzamzami, F.; Hoda, M.; Saddik, A.E. Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation. IEEE Access 2020, 8, 101840–101858. [Google Scholar] [CrossRef]
  72. Weinstein, J.; Collisson, E.; Mills, G.; Shaw, K.; Ozenberger, B.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 2013, 45, 113–1120. [Google Scholar] [CrossRef]
  73. Yin, H. Enhancing Ionospheric Radar Returns Classification with Feature Engineering-Based Light Gradient Boosting Machine Algorithm. In Proceedings of the 2023 3rd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Wuhan, China, 15–17 December 2023; pp. 528–532. [Google Scholar]
Figure 1. Overview of the proposed methodology workflow.
Figure 1. Overview of the proposed methodology workflow.
Algorithms 18 00622 g001
Figure 2. RelChaNet relative change scores calculation illustration [48].
Figure 2. RelChaNet relative change scores calculation illustration [48].
Algorithms 18 00622 g002
Figure 3. Support Vector Machine (SVM) data classification [67].
Figure 3. Support Vector Machine (SVM) data classification [67].
Algorithms 18 00622 g003
Figure 4. Heat map table representing Balanced Accuracy scores for all the classifiers on 5% Feature Subset.
Figure 4. Heat map table representing Balanced Accuracy scores for all the classifiers on 5% Feature Subset.
Algorithms 18 00622 g004
Figure 5. Representation of DT’s Accuracy scores as a violin plot.
Figure 5. Representation of DT’s Accuracy scores as a violin plot.
Algorithms 18 00622 g005
Figure 6. Representation of KNN’s Accuracy scores as a violin plot.
Figure 6. Representation of KNN’s Accuracy scores as a violin plot.
Algorithms 18 00622 g006
Figure 7. Representation of LGBM’s Accuracy scores as a violin plot.
Figure 7. Representation of LGBM’s Accuracy scores as a violin plot.
Algorithms 18 00622 g007
Figure 8. Representation of LR’s Accuracy scores as a violin plot.
Figure 8. Representation of LR’s Accuracy scores as a violin plot.
Algorithms 18 00622 g008
Figure 9. Representation of SVM’s Accuracy scores as a violin plot.
Figure 9. Representation of SVM’s Accuracy scores as a violin plot.
Algorithms 18 00622 g009
Figure 10. Representation of XGB’s Accuracy scores as a violin plot.
Figure 10. Representation of XGB’s Accuracy scores as a violin plot.
Algorithms 18 00622 g010
Table 1. Hyperparameters for SPSA.
Table 1. Hyperparameters for SPSA.
HyperparameterValues
Number of iterations (max_iters)500
Initial step size (a)0.01
Perturbation size (c)0.05
Learning rate decay0.602 (alpha), 0.101 (gamma)
Table 2. Hyperparameters for RelChaNet.
Table 2. Hyperparameters for RelChaNet.
HyperparameterValues
Number of layers2
Hidden units per layer50
Learning rate0.01
Batch size128
Number of epochs200
Table 3. Hyperparameters for ReliefF.
Table 3. Hyperparameters for ReliefF.
HyperparameterValues
Number of neighbors (k)20
Sample sizefull dataset
Table 4. Hyperparameters for the Genetic Algorithm.
Table 4. Hyperparameters for the Genetic Algorithm.
HyperparameterValues
Population size50
Number of generations50
Crossover rate0.5
Mutation rate0.01
Selection methodTournament Selection
Total Iterations20
Table 5. Hyperparameters for Mutual Information.
Table 5. Hyperparameters for Mutual Information.
HyperparameterValues
Number of bins (discretization)10
Estimator typeK-Nearest Neighbor
k-neighbors10
Table 6. Hyperparameters for Simulated Annealing.
Table 6. Hyperparameters for Simulated Annealing.
HyperparameterValues
Initial temperature300
Cooling scheduleExponential decay
Number of iterations1000
Neighborhood size5
Table 7. Hyperparameters for MRMR.
Table 7. Hyperparameters for MRMR.
HyperparameterValues
Number of features to select5%, 10%, 15% of total features
Feature scoring functionMutual Information (MI)
Table 8. Hyperparameters for Decision Tree.
Table 8. Hyperparameters for Decision Tree.
HyperparameterValues
max_depth20
min_samples_split10
min_samples_leaf5
criteriongini
max_featuressqrt
Table 9. Hyperparameters for K-Nearest Neighbors.
Table 9. Hyperparameters for K-Nearest Neighbors.
HyperparameterValues
n_neighbors10
weightsdistance
metricminkowski (p = 2 default)
leaf_size20
Table 10. Hyperparameters for Extreme Gradient Boosting (XGB).
Table 10. Hyperparameters for Extreme Gradient Boosting (XGB).
HyperparameterValue
Boostergbtree
Learning rate0.1
Maximum depth6
Number of estimators100
Subsample0.8
Column sample by tree0.8
Gamma0
Minimum child weight1
Regularization parameter (L2)1
Regularization parameter (L1)0
Table 11. Hyperparameters for Logistic Regression.
Table 11. Hyperparameters for Logistic Regression.
HyperparameterValues
penaltylasso
C5
solverlbfgs
max_iter200
class_weightbalanced
Table 12. Hyperparameters for Support Vector Machine.
Table 12. Hyperparameters for Support Vector Machine.
HyperparameterValues
C5
kernelrbf
gammascale
class_weightbalanced
max_iter 1
Table 13. Hyperparameters for LightGBM classifier.
Table 13. Hyperparameters for LightGBM classifier.
HyperparameterValues
num_leaves128
max_depth−1
learning_rate0.01
n_estimators100
min_data_in_leaf50
feature_fraction1.0
bagging_fraction1.0
bagging_freq1
Table 14. Summary of Cancer datasets.
Table 14. Summary of Cancer datasets.
DatasetSamplesFeaturesClass 1Class 2
COAD52237,67750913
HNSC56435,95853529
KICH9143,8066031
KIRP32244,87423686
LIHC42135,92432299
LUSC55344,89449459
PRAD55344,82447281
STAD44844,87835880
THCA56436,120380184
UCEC58836,086345243
Table 15. Average total runtime (in seconds) for feature selection algorithms and classifiers, averaged across all ten datasets.
Table 15. Average total runtime (in seconds) for feature selection algorithms and classifiers, averaged across all ten datasets.
TaskFeature Selection AlgorithmsTotal Runtime (in Seconds)
Feature Selection AlgorithmsSPSA2108
RelChaNet2529
ReliefF2646
GA2123
MI1963
SA4986
MRMR7395
Feature Selection Algorithm + ClassifiersSPSA3609
RelChaNet4050
ReliefF4171
GA3643
MI3460
SA8496
MRMR11,974
Table 16. p-values for each pairwise comparison of means for DT classifier.
Table 16. p-values for each pairwise comparison of means for DT classifier.
SPSARelChaNetReliefFGAMISAMRMR
SPSA1.0000000.2550210.8303110.0030060.0016250.0001730.679535
RelChaNet0.2550211.0000000.9675520.7444770.6454730.3098440.994057
ReliefF0.8303110.9675521.0000000.1849450.1296080.0313220.999975
GA0.0030060.7444770.1849451.0000000.9999990.9940570.309844
MI0.0016250.6454730.1296080.9999991.0000000.9986170.230008
SA0.0001730.3098440.0313220.9940570.9986171.0000000.066581
MRMR0.6795350.9940570.9999750.3098440.2300080.0665811.000000
Table 17. p-values for each pairwise comparison of means for KNN classifier.
Table 17. p-values for each pairwise comparison of means for KNN classifier.
SPSARelChaNetReliefFGAMISAMRMR
SPSA1.0000000.0427870.8776720.0000510.0004390.0054110.009475
RelChaNet0.0427871.0000000.5755320.6107120.8776720.9961660.999241
ReliefF0.8776720.5755321.0000000.0094750.0427870.2066460.281653
GA0.0000510.6107120.0094751.0000000.9992410.9321790.877672
MI0.0004390.8776720.0427870.9992411.0000000.9961660.987221
SA0.0054110.9961660.2066460.9321790.9961661.0000000.999999
MRMR0.0094750.9992410.2816530.8776720.9872210.9999991.000000
Table 18. p-values for each pairwise comparison of means for LGBM classifier.
Table 18. p-values for each pairwise comparison of means for LGBM classifier.
SPSARelChaNetReliefFGAMISAMRMR
SPSA1.0000000.6795350.9577230.0016250.0226240.0000140.000024
RelChaNet0.6795351.0000000.9961660.2300080.6795350.0135510.019133
ReliefF0.9577230.9961661.0000000.0497540.2816530.0013160.002001
GA0.0016250.2300080.0497541.0000000.9911370.9459760.967552
MI0.0226240.6795350.2816530.9911371.0000000.5755320.645473
SA0.0000140.0135510.0013160.9459760.5755321.0000001.000000
MRMR0.0000240.0191330.0020010.9675520.6454731.0000001.000000
Table 19. p-values for each pairwise comparison of means for the LR classifier.
Table 19. p-values for each pairwise comparison of means for the LR classifier.
SPSARelChaNetReliefFGAMISAMRMR
SPSA1.0000000.4360860.2300080.0000510.0008560.0020010.049754
RelChaNet0.4360861.0000000.9998230.0878530.3395100.4702410.957723
ReliefF0.2300080.9998231.0000000.2066460.5755320.7126240.996166
GA0.0000510.0878530.2066461.0000000.9961660.9821150.575532
MI0.0008560.3395100.5755320.9961661.0000000.9999930.916230
SA0.0020010.4702410.7126240.9821150.9999931.0000000.967552
MRMR0.0497540.9577230.9961660.5755320.9162300.9675521.000000
Table 20. p-values for each pairwise comparison of means for SVM classifier.
Table 20. p-values for each pairwise comparison of means for SVM classifier.
SPSARelChaNetReliefFGAMISAMRMR
SPSA1.0000000.4702410.0878530.0004390.0004390.0016250.022624
RelChaNet0.4702411.0000000.9821150.2300080.2300080.4027860.855064
ReliefF0.0878530.9821151.0000000.7444770.7444770.8980680.999241
GA0.0004390.2300080.7444771.0000001.0000000.9999280.945976
MI0.0004390.2300080.7444771.0000001.0000000.9999280.945976
SA0.0016250.4027860.8980680.9999280.9999281.0000000.991137
MRMR0.0226240.8550640.9992410.9459760.9459760.9911371.000000
Table 21. p-values for each pairwise comparison of means for the XGB classifier.
Table 21. p-values for each pairwise comparison of means for the XGB classifier.
SPSARelChaNetReliefFGAMISAMRMR
SPSA1.0000000.3395100.9459760.0002200.0065430.0002200.004462
RelChaNet0.3395101.0000000.9321790.2550210.7748490.2550210.712624
ReliefF0.9459760.9321791.0000000.0135510.1464630.0135510.114268
GA0.0002200.2550210.0135511.0000000.9821151.0000000.991137
MI0.0065430.7748490.1464630.9821151.0000000.9821151.000000
SA0.0002200.2550210.0135511.0000000.9821151.0000000.991137
MRMR0.0044620.7126240.1142680.9911371.0000000.9911371.000000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pasupuleti, S.D.; Ludwig, S.A. Feature Selection Method Based on Simultaneous Perturbation Stochastic Approximation Technique Evaluated on Cancer Genome Data Classification. Algorithms 2025, 18, 622. https://doi.org/10.3390/a18100622

AMA Style

Pasupuleti SD, Ludwig SA. Feature Selection Method Based on Simultaneous Perturbation Stochastic Approximation Technique Evaluated on Cancer Genome Data Classification. Algorithms. 2025; 18(10):622. https://doi.org/10.3390/a18100622

Chicago/Turabian Style

Pasupuleti, Satya Dev, and Simone A. Ludwig. 2025. "Feature Selection Method Based on Simultaneous Perturbation Stochastic Approximation Technique Evaluated on Cancer Genome Data Classification" Algorithms 18, no. 10: 622. https://doi.org/10.3390/a18100622

APA Style

Pasupuleti, S. D., & Ludwig, S. A. (2025). Feature Selection Method Based on Simultaneous Perturbation Stochastic Approximation Technique Evaluated on Cancer Genome Data Classification. Algorithms, 18(10), 622. https://doi.org/10.3390/a18100622

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop