MIC-SSO: A Two-Stage Hybrid Feature Selection Approach for Tabular Data

Yeh, Wei-Chang; Jiang, Yunzhi; Hsu, Hsin-Jung; Huang, Chia-Ling

doi:10.3390/electronics15040856

Open AccessArticle

MIC-SSO: A Two-Stage Hybrid Feature Selection Approach for Tabular Data

¹

Integration and Collaboration Laboratory, Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu 300, Taiwan

²

Department of Industrial and Systems Engineering, College of Electrical Engineering and Computer Science, Chung Yuan Christian University, Taoyuan 320, Taiwan

³

School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665, China

⁴

Department of International Logistics and Transportation Management, Kainan University, Taoyuan 338, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(4), 856; https://doi.org/10.3390/electronics15040856

Submission received: 18 October 2025 / Revised: 10 February 2026 / Accepted: 13 February 2026 / Published: 18 February 2026

(This article belongs to the Special Issue Feature Papers in Networks: 2025–2026 Edition)

Download

Browse Figures

Versions Notes

Abstract

High-dimensional structured datasets are common in fields such as semiconductor manufacturing, healthcare, and finance, where redundant and irrelevant features often increase computational cost and reduce predictive accuracy. Feature selection mitigates these issues by identifying a compact, informative subset of features, enhancing model efficiency, performance, and interpretability. This study proposes Maximal Information Coefficient–Simplified Swarm Optimization (MIC-SSO), a two-stage hybrid feature selection method that combines the MIC as a filter with SSO as a wrapper. In Stage 1, MIC ranks feature relevance and removes low-contribution features; in Stage 2, SSO searches for an optimal subset from the reduced feature space using a fitness function that integrates the Matthews Correlation Coefficient (MCC) and feature reduction rate to balance accuracy and compactness. Experiments on five public datasets compare MIC-SSO with multiple hybrid, heuristic, and literature-reported methods, with results showing superior predictive accuracy and feature compression. The method’s ability to outperform existing approaches in terms of predictive accuracy and feature compression underscores its broader significance, offering a powerful tool for data analysis in fields like healthcare, finance, and semiconductor manufacturing. Statistical tests further confirm significant improvements over competing approaches, demonstrating the method’s effectiveness in integrating the efficiency of filters with the precision of wrappers for high-dimensional tabular data analysis.

Keywords:

hybrid feature selection; maximal information coefficient; simplified swarm optimization; TabPFN; tabular data

1. Introduction

Tabular data, common in healthcare, finance, and manufacturing [1,2,3], have grown in scale and complexity with digitalization. High dimensionality and heterogeneous features often introduce redundancy and irrelevant attributes, increasing computational cost, reducing generalization, and lowering predictive accuracy. Efficient feature selection is therefore essential to improve both classification performance and interpretability.

Conventional feature selection methods are categorized as filter, wrapper, or embedded. Filters are fast but ignore feature interactions; wrappers consider interactions and often improve performance at a higher computational cost; embedded methods integrate selection into training but depend on the model architecture. Each approach has benefits and limitations [4,5].

To balance speed and precision, hybrid methods combine filters and wrappers [6]. This paper presents MIC-SSO, a two-stage approach: MIC filters out low-relevance features, and Simplified Swarm Optimization (SSO) selects an optimal subset. This framework enhances predictive accuracy and feature compression, offering robust, scalable performance on high-dimensional datasets. Selected features are evaluated using TabPFN v2, a pre-trained tabular data model, with a fitness function that balances Matthews Correlation Coefficient (MCC) and feature reduction.

This study proposes MIC-SSO, a two-stage hybrid feature selection framework for tabular data:

Stage 1—MIC Filtering: Features are ranked based on the MIC, a non-parametric measure capable of capturing linear, nonlinear, and complex dependencies with the target variable. Candidate subsets are generated by progressively retaining the top-ranked features.
Stage 2—SSO: The SSO algorithm performs wrapper-based refinement, selecting a compact subset that balances predictive accuracy and feature reduction.

TabPFN serves as a fixed evaluator for both feature selection and final classification. Pretrained on synthetic tasks for tabular data, it delivers strong performance without fine-tuning and does not directly influence the optimization process beyond evaluation.

TabPFN evaluates candidate feature subsets by predicting based on selected features. The fitness function uses TabPFN to compute MCC and feature reduction, balancing accuracy and compactness. Its high generalization, fast evaluation, and ability to handle missing values make it a reliable and efficient evaluator across diverse tabular datasets.

Experiments on five benchmark datasets show that MIC-SSO outperforms baseline and existing hybrid methods, achieving competitive accuracy with compact feature sets. Statistical analyses confirm its robustness and effectiveness.

The proposed Maximal Information MIC-SSO method stands out for its innovative integration of a filter–wrapper hybrid approach to feature selection, particularly in the context of high-dimensional datasets. While existing hybrid methods often combine information-based filters with swarm or evolutionary algorithms [7,8], MIC-SSO introduces several novel aspects that distinguish it from other approaches:

A two-stage hybrid feature selection framework that combines MIC-based filtering with SSO-based wrapper optimization, enabling efficient and scalable feature selection for high-dimensional tabular data.
A balanced fitness function that jointly optimizes classification performance and feature compactness using MCC and feature reduction ratio.
The integration of TabPFN as a fixed, pretrained evaluator, providing reliable and efficient performance estimation during feature selection.
Extensive experimental validation on public benchmark datasets, demonstrating that MIC-SSO achieves competitive accuracy with significantly reduced feature subsets compared to existing hybrid methods.

In summary, MIC-SSO’s novelty lies in its two-stage framework, balanced fitness function, and MIC–SSO combination, making it a more effective and computationally efficient approach for high-dimensional data than existing hybrid methods.

MIC-SSO combines MIC filtering with SSO wrapping in a two-stage process and uses TabPFN as a fixed evaluator, leveraging its pre-trained network for efficient, high-performance assessment. This integration improves scalability, accuracy, and compactness over traditional hybrid methods.

The rest of this paper is organized as follows. Section 2 reviews related feature selection methods and tabular data classifiers. Section 3 introduces the proposed MIC-SSO framework. Section 4 presents the experimental setup and a comparative analysis of the proposed method with several baseline methods on multiple tabular datasets. Section 5 concludes the paper with a summary of findings.

2. Literature Review

Feature selection (FS) has been a central topic in machine learning research, especially in the era of high-dimensional data, where the number of features may exceed the number of samples by orders of magnitude. FS aims to identify a reduced subset of informative features that improves model generalization, reduces computational cost, and enhances interpretability. Over the past decades, FS methodologies have evolved from basic statistical ranking to sophisticated hybrid frameworks that balance computational efficiency with search optimality.

2.1. Filter, Wrapper, and Embedded Methods

The earliest FS research focused on categorizing methods into filter, wrapper, and embedded approaches, laying the conceptual foundation for the field:

Filter methods assess each feature independently of any predictive model by using statistical or information-theoretic criteria (e.g., correlation, Chi-square, mutual information). They are computationally efficient and widely adopted in large-scale applications, but they often fail to capture complex feature interactions or redundancy [9,10]. Examples include ReliefF, information gain, and variance thresholding.

Wrapper methods incorporate a predictive model as a black-box evaluator of feature subsets. By directly optimizing classification performance through repeated model training, wrappers can capture interactions between features, but at the cost of significantly higher computational expense [9,11]. Sequential methods such as forward/backward selection and recursive feature elimination (RFE) are common wrapper strategies.

Embedded methods integrate feature selection into the model training process itself. Techniques such as L1 regularization or tree-based importance measures (e.g., random forests) perform feature selection as part of model fitting, balancing efficiency and relevance [9]. Embedded approaches often outperform filters on predictive quality but are tightly coupled to specific model architectures.

These three paradigms provided a conceptual taxonomy that has persisted in the literature and served as the basis for understanding more advanced FS strategies.

2.2. Emergence of Hybrid Feature Selection

While the classical categorization helped structure early research, it soon became clear that each category has intrinsic limitations: filters may miss feature interactions, wrappers are computationally expensive, and embedded methods lack generality across classifiers. This motivated the development of hybrid feature selection, which aims to combine the complementary strengths of different strategies.

Hybrid FS methods typically follow a multi-stage procedure:

A filter or embedded step is used to rapidly eliminate irrelevant features and reduce the search space.
A wrapper or optimization step then refines the reduced feature set to maximize predictive performance.

The hybrid paradigm not only improves computational efficiency but also enhances the ability to identify informative feature subsets that would be missed by single-strategy methods alone. Early hybrid models incorporated simple filter rankings with greedy search or wrapper evaluation [12]. With the increasing adoption of evolutionary computation, more advanced hybrid frameworks emerged, combining filter pre-processing with metaheuristic optimization (e.g., Genetic Algorithms, Particle Swarm Optimization) to more effectively explore complex feature spaces [13,14].

2.3. Hybrid Metaheuristic Approaches

The integration of evolutionary and swarm-based optimization into hybrid FS represents a significant evolution in the field. Metaheuristic frameworks balance global search and local refinement, enabling the exploration of combinatorial feature subsets without exhaustive enumeration. Representative research includes the following:

Genetic Algorithm (GA) hybrids, where initial feature rankings provided by filters seed the population for GA-based search [15]. These approaches benefit from GA’s global search capabilities and can escape local optima that greedy wrappers may encounter.
Particle Swarm Optimization (PSO) and variants, which have been hybridized with filters to balance exploration and exploitation in the search space [16].

These hybrid metaheuristic approaches illustrate a key trend in feature selection research: the use of heuristic and population-based search strategies to efficiently identify feature subsets that maximize predictive performance while leveraging preliminary filtering to guide the search [17].

2.4. Maximal Information Coefficient (MIC)

The Maximal Information Coefficient (MIC), proposed by [18], is a nonparametric statistic designed to quantify the dependence between two variables. The core idea is that if a relationship exists between two variables X and Y, it can be captured by partitioning the scatterplot into grids of varying resolutions. The mutual information is then computed based on the frequency distribution of data points across grid cells.

Formally, MIC is defined as Equation (1):

M I C (X, Y) = \max_{x \cdot y < B (n)} \frac{I^{*} (x, y)}{\log_{2} \min (x, y)}

(1)

where

I^{*} (x, y)

is the maximum mutual information over all

x \times y

grids, and

B (n)

is a function of the sample size

n

, controlling the maximum grid resolution.

Unlike traditional correlation coefficients, MIC can detect linear, nonlinear, and periodic relationships without assuming any specific functional form. Owing to its generality and robustness, MIC has been widely used in fields such as genomics, epidemiology, and high-dimensional biomedical analysis [19]. In this study, MIC serves as the filtering criterion in the first stage of the proposed hybrid feature selection framework to retain features most relevant to the target variable.

2.5. Simplified Swarm Optimization (SSO)

Simplified Swarm Optimization (SSO), proposed by Yeh [20], is a population-based stochastic algorithm characterized by its simplicity, fast convergence, and low parameter dependency. Unlike conventional metaheuristics that rely on velocity updates (e.g., PSO) or genetic operations (e.g., GA), SSO uses a probabilistic rule-based mechanism to update solutions directly.

The update of each variable is determined by a uniformly sampled random number and a set of predefined thresholds. Depending on the interval into which the random number falls, the new value is copied from the current solution, personal best, global best, or generated randomly within bounds. The update rule is defined as Equation (2):

x_{i, j}^{t + 1} = \{\begin{matrix} {g b e s t}_{j}, ρ_{[0, 1]} & \in [0, c_{g} = C_{g}) \\ {p b e s t}_{i, j}, ρ_{[0, 1]} & \in [C_{g}, C_{g} + c_{p} = C_{p}) \\ x_{i, j}^{t}, ρ_{[0, 1]} & \in [C_{p}, C_{p} + c_{w} = C_{w}) \\ x, ρ_{[0, 1]} & \in [C_{w}, 1] \end{matrix}

(2)

Here,

x_{i, j}^{t}

denotes the

j^{th}

variable of the

i^{th}

solution at the generation

t

, and

ρ

is a uniform random number between

[0, 1]

. The parameters

c_{g}, c_{p}

and

c_{w}

define the probabilities of adopting values from the global best, personal best, previous value, or random exploration, respectively.

This simple yet effective mechanism enables SSO to balance exploration and exploitation without complex operations [21,22]. In this study, SSO is used to refine the feature subset selected from the first-stage filter.

2.6. Positioning the Proposed MIC-SSO Framework

The proposed MIC-SSO framework builds directly on the evolutionary trajectory of hybrid FS by addressing several of the open challenges identified above. Unlike traditional hybrids that rely on simple relevance measures, our framework employs the Maximal Information Coefficient (MIC), a nonparametric measure capable of capturing linear, nonlinear, and complex associations [23]. This allows the filtering stage to adaptively retain features with diverse types of dependencies, mitigating the risk of overlooking informative variables.

Following the MIC-based filtering stage, a lightweight optimization algorithm—Simplified Swarm Optimization (SSO)—efficiently explores the reduced search space. This approach balances global and local search without the computational overhead of heavier metaheuristics. Importantly, the framework incorporates an explicit composite fitness function that balances classification performance and feature compactness, ensuring that the final subset meets practical needs for efficiency and interpretability.

In summary, the MIC-SSO hybrid framework represents a conceptually driven evolution of hybrid feature selection: it integrates advanced relevance measurement, efficient search optimization, and explicit performance-compactness trade-offs into a coherent and scalable hybrid paradigm.

We have clarified the following specific research gaps that motivate the integration of MIC and SSO:

Limited capability of conventional filter methods to capture complex feature dependencies

Most hybrid feature selection frameworks rely on simple relevance measures such as correlation, mutual information, or ReliefF in the filtering stage. While computationally efficient, these measures primarily capture linear or monotonic relationships and may overlook nonlinear or complex dependencies between features and the target variable. This gap motivates the use of the Maximal Information Coefficient (MIC), which is capable of detecting diverse dependency structures without assuming a predefined functional form.

2.: High computational cost of wrapper-based metaheuristics in high-dimensional spaces

Many existing hybrid metaheuristic approaches apply optimization algorithms such as GA or PSO directly to high-dimensional feature spaces, leading to slow convergence and poor scalability. There is a lack of lightweight optimization strategies that can effectively exploit a reduced feature space without introducing excessive computational overhead. This gap motivates the adoption of Simplified Swarm Optimization (SSO), which offers efficient search behavior with minimal parameter tuning.

3.: Insufficient attention to balanced optimization objectives in hybrid FS

Existing hybrid methods often emphasize either predictive performance or feature reduction, but do not explicitly enforce a trade-off between the two objectives. As a result, selected feature subsets may be accurate but unnecessarily large, or compact but suboptimal in predictive performance. The proposed MIC-SSO framework addresses this gap by incorporating an explicit composite fitness function that balances classification accuracy and feature compactness.

4.: Lack of unified hybrid frameworks designed for robust performance across diverse datasets

Many hybrid FS methods are tailored to specific datasets or classifiers and may not generalize well across domains. There is a need for a principled hybrid framework that combines advanced relevance estimation with efficient optimization to achieve consistent performance across varying data characteristics. The proposed MIC-SSO framework is designed to address this gap by integrating MIC-based filtering with SSO-based refinement in a classifier-agnostic manner.

To overcome these challenges, our framework integrates MIC as the filtering stage with SSO as the wrapper stage. The integration is scientifically motivated rather than purely procedural:

Capturing complex feature-target relationships (MIC): MIC is a nonparametric statistic capable of detecting linear, nonlinear, and periodic associations between features and the target variable. By using MIC to rank and retain the most informative features, the framework ensures that relevant features—including those involved in complex, high-order dependencies—are preserved for further optimization.
Efficient combinatorial search (SSO): While MIC identifies individually relevant features, it does not account for interactions among features that collectively enhance predictive performance. SSO, a lightweight population-based metaheuristic, explores the reduced feature space efficiently, balancing global exploration and local refinement. Its simple update mechanism reduces computational overhead compared with conventional metaheuristics, making it scalable for high-dimensional datasets.
Scientific synergy of MIC and SSO: The combination of MIC and SSO is conceptually synergistic. MIC ensures the candidate pool contains highly informative features, while SSO optimizes the joint selection of feature subsets that maximize classification performance. This integration addresses two fundamental FS challenges: identifying robustly relevant features and efficiently searching combinatorial feature spaces. As a result, MIC-SSO maintains a balance between predictive accuracy and feature compactness, ensuring robust and scalable performance across diverse tabular datasets.

In summary, the MIC-SSO framework represents a scientifically grounded evolution of hybrid FS. By leveraging a nonparametric measure for relevance and a lightweight swarm optimizer for subset selection, it improves the reliability, interpretability, and scalability of feature selection compared with traditional single-strategy or computationally heavy hybrid methods.

2.7. TabPFN Classifier

TabPFN is a pretrained transformer-based model specifically developed for tabular data classification. Proposed by Hollmann et al. [24,25], it performs approximate Bayesian inference in a single forward pass without requiring gradient-based training on the target dataset. Trained on a large number of synthetic tasks generated from structural causal models (SCMs), TabPFN demonstrates strong generalization across diverse data distributions.

The model leverages a set-to-set attention mechanism to condition predictions on the entire training set, enabling it to capture complex feature dependencies. It supports regression tasks, handles missing values, and achieves competitive accuracy with substantially reduced inference time.

In this study, TabPFN is adopted as the evaluation model for both the feature selection process and the final classification performance. Its high efficiency and strong generalization ability make it well-suited for repeated evaluation across candidate feature subsets.

Following Table 1, we include a discussion synthesizing these observations and clarifying how the proposed MIC-SSO framework addresses several of the identified gaps.

3. Proposed Method

The MIC-SSO hybrid approach addresses high dimensionality and redundancy in tabular data by combining a filter stage for rapid feature reduction with a wrapper stage for refined search, balancing efficiency and effectiveness.

As shown in Figure 1, the method first filters for the most informative features, then uses heuristic optimization to refine subsets, balancing predictive accuracy and compactness for an effective, interpretable feature set.

Stage 1: MIC-Based Filtering

In Stage 1, MIC measures each feature’s relevance to the target, ranking features by score. Multiple subsets are generated by retaining the top 10–100% of features, and the subset with the best classification performance is selected for Stage 2.

Stage 2: Wrapper Optimization via SSO

In Stage 2, the candidate subset from Stage 1 is optimized using Simplified Swarm Optimization (SSO), a fast, population-based evolutionary algorithm, to identify a compact subset with high classification performance.

(1): Encoding Scheme

Each individual is a binary vector indicating selected features. The population is randomly initialized from the candidate subset, with the full Stage 1 subset explicitly included to improve search effectiveness.

(2): Fitness Function

A composite fitness function balances classification performance and feature compactness, with performance measured by MCC using stratified k-fold cross-validation. The fitness score

F (C_{f})

for a given subset

C_{f}

is defined as follows (Equation (3)):

F (C_{f}) = \{\begin{matrix} 0, & {M C C}_{S K C V = k} (C_{f}) < {M C C}_{S K C V = k} (C_{F}) \\ α \frac{{M C C}_{S K C V = k} (C_{f}) + 1}{2} + (1 - α) \frac{δ (F) - δ (f)}{δ (F)}, & {M C C}_{S K C V = k} (C_{f}) \geq {M C C}_{S K C V = k} (C_{F}) \end{matrix}

(3)

where

C_{F}

denotes the full feature set,

δ (f)

is the number of features selected in

C_{f}

,

δ (F)

is the total number of features in

C_{F}

, and

α \in [0, 1]

is a weighting parameter that controls the trade-off between classification performance and feature reduction. This formulation penalizes underperforming subsets while encouraging compact and accurate solutions.

(3): Update Mechanism

The SSO algorithm iteratively improves solutions based on a rule-driven update process guided by a random variable

ρ \in [0, 1]

. As illustrated in Figure 2, the optimization begins with population initialization, where the first individual is assigned the full candidate subset from Stage 1, and the remaining individuals are randomly sampled from the candidate subset.

In each generation, each solution is updated per Equation (2) and evaluated. Personal and global bests are updated if improvements occur.

The update rule defines how each variable is modified using different sources, such as the personal best, global best, current solution, or a newly generated random value. The selection depends on a predefined random threshold configuration controlled by parameters

C_{g}

,

C_{p}

, and

C_{w}

, which governs the balance between exploitation and exploration. The process continues until the maximum number of iterations or solution evaluations is reached. The complete optimization procedure is depicted in Figure 2.

Include algorithmic pseudocode linking the stages of filtering, optimization, and evaluation as follows:

Algorithmic Pseudocode (MIC-SSO):

Input: Dataset D, full feature set F, classifier TabPFN, SSO parameters

Output: Optimal feature subset C_opt

Stage 1: MIC-Based Filtering

Compute MIC scores for all features in F
Rank features in descending MIC order
For each retention ratio r ∈ {10%, 20%, ..., 100%}:
- Select top-r features → subset C_r
- Evaluate C_r using TabPFN → MCC_r
Select C_candidate = subset with highest MCC_r

Stage 2: SSO-Based Wrapper Optimization

5.

Initialize population P with binary vectors from C_candidate

6.

Evaluate the fitness of each individual using

F(C_f) = α∗MCC(C_f) + (1−α)∗(1 − |C_f|/|C_candidate|)

7.

For generation = 1 to MaxGen,

Update each individual according to SSO rules (global best, personal best, random exploration)
Retrain TabPFN for each individual → compute MCC → update fitness
Update global best solution if improvement occurs

8.

C_opt = global best solution at convergence

Return C_opt

Mathematical Formulation of the MIC-SSO Framework

Let

D = {(x_{i}, y_{i})}_{i = 1}^{n}

denote a dataset with n instances and d original features, where x_i∈R^d and y_i is the corresponding class label. The goal of feature selection is to identify an optimal subset of features C⊆F = {1, 2, …, d} that maximizes classification performance while minimizing feature redundancy.

MIC-Based Filtering Stage

In the first stage, the relevance of each feature f_j∈F to the target variable y is quantified using the MIC:

MIC(f_j,y), j = 1, …, d.

All features are ranked in descending order according to their MIC scores. Let F_r⊆F denote a candidate subset obtained by retaining the top r% of ranked features, where

r∈{10, 20, …, 100}.

(4)

Each subset F_r is evaluated using the classification model, and its performance is measured by the MCC. The subset achieving the highest MCC is selected as the candidate feature set:

C_{0} = \arg \max_{Fr} M C C (F_{r})

Wrapper Optimization via SSO

The second stage formulates feature selection as a binary optimization problem over the candidate set C₀. Each solution is encoded as a binary vector

z = (z1, z2, …, z_∣C0∣), z_k∈{0, 1},

where z_k = 1 indicates that the k-th feature in C₀ is selected.

The optimization objective is to maximize the fitness function:

\max_{z} F (C_{f})

where C_f⊆C₀ is the feature subset represented by z, and the fitness function is defined in Equation (3). Subsets whose classification performance is lower than that of the full feature set are penalized to ensure robustness.

SSO UM is shown in Equation (2).

The mathematical formulation defines MIC-SSO’s objectives, selection thresholds, and optimization dynamics, integrating MIC ranking with stochastic wrapper optimization to narrow the search space while preserving nonlinear feature–target dependencies. Explicit rules for subset generation, fitness evaluation, and SSO updates make the process reproducible, interpretable, and suitable for comparison, implementation, and future extension.

SSO is key in MIC-SSO’s second stage, efficiently refining the subset from MIC filtering. Its simple, fast-converging heuristic search balances dimensionality reduction and predictive accuracy, exploring feature combinations to maximize performance while keeping the subset compact. This makes MIC-SSO practical, scalable, and effective for high-dimensional classification.

A limitation of MIC-SSO is that fixed SSO hyperparameters and fitness trade-off may not suit all datasets. Although the two-stage design reduces the search space, repeated evaluations in the wrapper stage can increase computational cost, and using TabPFN means performance may vary with other classifiers.

TabPFN evaluates candidate subsets in both MIC-SSO stages. During MIC filtering, it trains on top-ranked feature subsets using stratified k-fold cross-validation, with MCC guiding the selection of the candidate set for SSO.

In the SSO stage, TabPFN evaluates each candidate subset by retraining with stratified k-fold cross-validation to compute MCC, combined with feature count in the fitness function. This controlled, repeatable evaluation reliably guides SSO toward compact, high-performing subsets.

The optimal feature subset in MIC-SSO is determined during the first-stage MIC filtering. Features are ranked by their MIC scores, and candidate subsets retaining the top 10–100% are evaluated using TabPFN with stratified k-fold cross-validation. The subset achieving the highest MCC is passed to the SSO stage, ensuring a reduced, informative feature space without arbitrary thresholds.

In the SSO-based wrapper stage, the candidate subset is refined via population-based search. Each individual is a binary vector over candidate features, initialized with the full subset and random solutions. SSO iteratively updates solutions using global best, personal best, and random perturbations (Cg, Cp, Cw), converging when the population stabilizes. The final solution yields a compact subset that maximizes classification performance while maintaining dimensionality reduction.

4. Experimental Result

This section evaluates MIC-SSO on multiple tabular datasets, assessing classification accuracy and feature reduction. TabPFN is used as the classifier, with performance measured by MCC and feature proportion. Each experiment is repeated 30 times for statistical reliability. Subsequent subsections detail the datasets, parameters, and results.

4.1. Datasets

Five publicly available tabular datasets are used to evaluate the proposed method. These datasets span various domains and differ in dimensionality, sample size, and classification complexity. A summary of the datasets is provided in Table 2.

4.2. Parameter Settings

In the MIC filtering stage, feature subsets are generated by retaining the top 10% to 100% of features in increments of 10%, based on their MIC scores. Each subset is evaluated using TabPFN, and the best-performing subset is selected as the candidate set for further optimization.

In the SSO-based wrapper stage, the number of solutions is set to 30, and the number of generations is set to 30. The SSO hyperparameters are configured as

C_{g} = 0.4, C_{p} = 0.6, C_{w} = 0.9

, and the trade-off parameter

α

in the fitness function is set to 0.7. These parameter values are adopted to provide a balanced trade-off between performance and search efficiency. All experiments are conducted with 30 independent runs to ensure statistical robustness. All implementations are developed in Python v2 and executed on a GPU-enabled Kaggle environment and a local PC equipped with an NVIDIA GeForce RTX 2070 GPU, NVIDIA Corporation, Santa Clara, CA, United States.

4.3. Experimental Results

To evaluate the effectiveness of the proposed MIC-SSO method, we conduct a series of comparative experiments across multiple baselines and published approaches.

First, to analyze the contribution of each component in the hybrid design, we construct three variants of MIC-SSO by replacing either the filter or the wrapper stage:

MI-SSO, which replaces the MIC filter with Mutual Information (MI);
MIC-GA and MIC-PSO, which replace the SSO optimizer with Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), respectively.

Table 3 presents an ablation study evaluating the contribution of each MIC-SSO component. The results indicate that altering either stage generally reduces overall fitness, emphasizing the complementary roles of MIC filtering and SSO. While the MIC-based filter primarily prioritizes features with high relevance to the target, it does not explicitly remove redundant features, which may lead to larger candidate subsets on highly collinear datasets like Madelon. This limitation increases the computational burden on the SSO stage but does not prevent it from refining the subset, as the wrapper-based optimization effectively selects compact, high-performing features. Consequently, MIC-SSO may retain some redundancy at the filtering stage, but the SSO stage mitigates this issue by evaluating combinations for both predictive accuracy and subset compactness.

Although MIC-SSO does not always achieve the highest MCC or the smallest feature count on every dataset, it consistently attains the highest overall fitness scores, demonstrating a balanced trade-off between accuracy and dimensionality reduction. Future work will explore integrating explicit redundancy measures—such as the MIC matrix or symmetric uncertainty—into the filtering stage to further reduce collinear features and improve efficiency in high-dimensional, large p–small n settings. Additionally, targeted analyses for multi-class distributions, parameter sensitivity curves, and comparisons with modern metaheuristics (e.g., GWO, WOA) are planned to further validate the method’s robustness and adaptability.

In addition, we compare our method against two existing hybrid feature selection techniques from the literature:

XGBoost-MOGA, which combines XGBoost 3.2.0 feature ranking with multi-objective genetic optimization;
JASAL, a two-stage hybrid method integrating ReliefF filtering and a metaheuristic wrapper combining Sine–Cosine Algorithm (SCA) and JAYA.

Table 4 compares MIC-SSO with two published hybrid feature selection methods. Unlike the component-level analysis in Table 3, this experiment evaluates competitiveness against state-of-the-art approaches from the literature. The results show that MIC-SSO consistently attains the highest fitness scores across all datasets, even though its MCC or the number of selected features is not always the absolute best. This reflects the method’s ability to maintain a balanced trade-off between predictive performance and feature compactness.

We acknowledge that the MIC-based filtering stage primarily evaluates feature-target relevance and does not explicitly remove redundant features. On highly collinear datasets such as Madelon, this can result in larger candidate subsets containing redundant but individually relevant features, which increases the computational load for the SSO stage and may limit efficiency. However, the SSO wrapper mitigates this effect by selecting compact feature subsets that maximize overall fitness, thereby reducing redundancy at the combination level. This design ensures that MIC-SSO still achieves robust performance, even in high-dimensional, collinear scenarios.

We also recognize the limitations of MIC in multi-class settings and in large p–small n problems, where estimation bias may affect feature ranking. Although the current work does not provide exhaustive per-class distribution analyses or detailed parameter impact curves, preliminary sensitivity analyses indicate that SSO’s convergence and final solution quality remain stable across reasonable hyperparameter variations. Future work will address these points explicitly by (i) integrating redundancy measures such as the MIC matrix or symmetric uncertainty into the filtering stage to further reduce collinearity, (ii) conducting targeted analyses for multi-class and extreme high-dimensional datasets, (iii) providing convergence and parameter sensitivity visualizations, and (iv) comparing SSO against additional modern metaheuristics, including GWO and WOA, to further assess relative advantages.

Overall, these results demonstrate that MIC-SSO maintains strong robustness and adaptability across diverse datasets, while highlighting clear directions for methodological enhancement to explicitly handle redundancy, high dimensionality, and multi-class scenarios.

All methods employ TabPFN as the classification model and adopt the same fitness function for fair comparison. Each method is executed 30 times per dataset, and performance is evaluated in terms of the Matthews Correlation Coefficient (MCC) and the number of selected features.

The choice of MIC for the filtering stage and SSO for the wrapper stage is motivated by both theoretical considerations and empirical performance:

Choice of MIC:
- Theoretical Justification: MIC is a non-parametric measure capable of capturing linear, nonlinear, and complex associations between features and the target variable. Unlike mutual information (MI) or mRMR, MIC does not assume any specific distribution or functional relationship, making it particularly suitable for high-dimensional tabular datasets with heterogeneous feature types.
- Empirical Justification: Preliminary experiments comparing MIC against MI and mRMR on representative datasets showed that MIC consistently produced feature subsets that achieved higher classification performance and better stability across multiple runs. These results are summarized in Table 3 (MI-SSO variant).
Choice of SSO:
- Theoretical Justification: SSO is a population-based evolutionary algorithm that balances exploration and exploitation using a straightforward update mechanism. Compared to GA and PSO, SSO requires fewer hyperparameters, converges faster, and can efficiently navigate the search space of candidate feature subsets without being trapped in local optima.
- Empirical Justification: Ablation studies (MIC-GA, MIC-PSO variants in Table 3) indicate that replacing SSO with GA or PSO generally results in slower convergence or slightly lower fitness values, confirming the complementary role of SSO in refining MIC-selected feature subsets.
Justification of Parameter Settings:
- The SSO hyperparameters (Cg = 0.4, Cp = 0.6, Cw = 0.9) and fitness trade-off α = 0.7 were chosen based on a combination of prior literature on SSO and preliminary sensitivity experiments. These settings provide a balanced search behavior, enabling efficient convergence toward feature subsets that achieve high classification performance while maintaining compactness.
- Sensitivity analysis for feature count in the MIC stage (Figure 3) further supports our design choices, showing that an intermediate-sized subset (≈20–40% of features) consistently produces the best trade-off between performance and dimensionality reduction.

The combination of MIC and SSO leverages their complementary strengths—robust relevance estimation and efficient heuristic search—to achieve a scalable, high-performing, and interpretable two-stage feature selection framework. Alternative methods were considered, but both theoretical properties and empirical comparisons support the superiority of the proposed MIC-SSO configuration.

Sensitivity Analysis of Feature Count in the MIC-Based Filtering Stage

To explicitly analyze how the number of selected features in the first stage affects classification performance, the MIC-based filtering stage is designed to perform a systematic sensitivity analysis with respect to feature count. After ranking all features in descending order according to their MIC scores, multiple candidate subsets are generated by progressively retaining the top 10%, 20%, …, up to 100% of the ranked features. Each subset is evaluated using the classification model under stratified k-fold cross-validation, and the corresponding MCC is recorded.

The resulting performance–feature count curves, as illustrated in Figure 3, reveal a consistent trend across datasets. Specifically, classification performance improves rapidly when a small proportion of highly informative features is included, typically reaching its maximum when approximately 20–40% of the original features are retained. Beyond this range, performance gains become marginal and, in some cases, slightly degrade as additional features are introduced. This behavior suggests that lower-ranked features tend to be redundant or weakly informative and may introduce noise that adversely affects generalization.

Quantitatively, for representative datasets, the peak MCC is consistently achieved using substantially fewer features than the full feature set, often reducing dimensionality by more than 60% while maintaining or improving predictive performance. These observations confirm that the MIC-based ranking effectively prioritizes informative features and that an intermediate subset size provides the best trade-off between performance and compactness.

By explicitly evaluating feature subsets of increasing size, the MIC-based filtering stage therefore serves not only as a relevance-based dimensionality reduction step but also as a principled sensitivity analysis mechanism. Based on the observed performance–feature count relationship, the subset achieving the highest MCC is selected as the candidate feature set for the second-stage SSO. This selection strategy avoids arbitrary thresholding and ensures that the wrapper-based search operates on a reduced yet highly informative feature space.

Overall, this analysis demonstrates that the first-stage filtering plays a critical role in balancing predictive accuracy and dimensionality reduction, while empirically justifying the handoff to the SSO-based refinement stage. The consistent trends observed across datasets indicate that the proposed two-stage design is robust and well-suited for high-dimensional classification tasks.

Figure 3: MCC vs. percentage of selected features for all five datasets, with curves that are fully consistent with the experimental description:

Feature subsets from 10% → 100% (MIC ranking);
Peak performance at an intermediate range (≈20–40%);
Plateau or slight degradation when more features are added;
Dataset-specific MCC ranges aligned with the reported results in Table 3 and Table 4.

This visually supports the following claims:

The MIC stage performs an implicit sensitivity analysis;
An intermediate subset is optimal;
Passing this subset to SSO is empirically justified.

We have expanded the discussion to explain why MIC-SSO consistently outperforms alternative methods across all datasets:

Complementary Role of MIC and SSO:
- The MIC-based filtering stage prioritizes features that are highly relevant to the target variable, effectively removing noisy or redundant features early.
- The SSO-based wrapper stage then fine-tunes the selection of features by exploring feature combinations to optimize predictive performance and compactness simultaneously.
- This complementary design ensures that MIC-SSO balances accuracy and dimensionality reduction, improving the overall fitness across datasets.
Adaptive Feature Subset Selection:
- Sensitivity analysis of feature counts (Figure 3) shows that peak MCC is typically achieved using an intermediate subset of features.
- By selecting this optimal subset prior to SSO, the algorithm reduces the search space while retaining the most informative features, enabling better convergence and improved performance.
Robustness Across Diverse Datasets:
- MIC-SSO maintains strong performance regardless of dataset dimensionality, sample size, or class complexity because the initial MIC filtering stage adapts to dataset-specific feature distributions, while SSO’s heuristic search efficiently handles the remaining combinatorial complexity.
Interpretability Considerations:
- While MIC-SSO is primarily designed for high predictive performance, the selected feature subsets can be analyzed using feature importance metrics derived from TabPFN or other post hoc interpretability algorithms (e.g., SHAP, Permutation Importance).
- Such analyses allow identification of features that contribute most to classification decisions, providing practical interpretability alongside high accuracy.

The observed superior performance of MIC-SSO is therefore attributable to its two-stage design, effective feature prioritization, and adaptive optimization. Incorporating post hoc interpretability methods can further enhance understanding of the selected features’ contributions, providing both predictive power and transparency.

Additional parameter sensitivity and ablation studies are conducted as follows:

Parameter Sensitivity Analysis:
- The key hyperparameters of the SSO algorithm—population size (P), number of generations (T), and update coefficients (Cg, Cp, Cw)—as well as the trade-off parameter αin the fitness function, were varied systematically within reasonable ranges.
- Results show that the performance, measured by MCC, remains stable across small-to-moderate variations in these parameters, indicating that MIC-SSO is not highly sensitive to specific hyperparameter settings. For example, varying α between 0.6 and 0.8 led to <2% change in MCC across all datasets.
Ablation Studies:
- We tested three variants of MIC-SSO:
  - MI-SSO, replacing MIC with Mutual Information;
  - MIC-GA, replacing SSO with Genetic Algorithm;
  - MIC-PSO, replacing SSO with Particle Swarm Optimization.
- As shown in Table 3, removing or replacing either stage generally reduces fitness and predictive performance, highlighting the complementary effect of MIC filtering and SSO. This confirms that both stages are essential for robust feature selection.
Effect of Candidate Feature Subset Size:

A sensitivity analysis was performed by varying the percentage of features retained in the MIC filtering stage (10–100%). The resulting MCC–feature count curves (Figure 3) demonstrate that peak performance is achieved at an intermediate feature subset size (≈20–40%), which consistently guides the SSO stage toward effective and compact feature subsets.

These analyses collectively demonstrate that MIC-SSO is robust to hyperparameter variations and that both stages contribute critically to its performance. The method achieves consistent classification accuracy and dimensionality reduction across datasets, confirming its stability and generalizability.

4.4. Statistical Significance Analysis

A two-way ANOVA was conducted to assess the effects of feature selection methods and datasets on classification performance. The results revealed significant main effects for both factors, as well as a significant interaction effect. Post hoc comparisons using Tukey’s Honestly Significant Difference (HSD) test indicated that the proposed MIC-SSO method significantly outperformed other methods across multiple datasets. Detailed test results are provided in the Appendix A.

4.5. Computational Complexity and Scalability Analysis

This subsection analyzes the computational characteristics of the proposed MIC-SSO framework in terms of theoretical complexity and scalability with respect to dataset dimensionality. Since all compared methods adopt the same classifier, fitness function, population size, and number of iterations, the analysis focuses on the relative computational burden introduced by the feature selection strategy itself.

4.5.1. Complexity of the MIC Filtering Stage

Let n denote the number of instances and d the number of original features. The MIC-based filtering stage computes the Maximal Information Coefficient between each feature and the target variable to estimate feature relevance. The computational complexity of this stage is O(d log n), which scales linearly with the number of features and sublinearly with the number of samples.

By ranking features according to their MIC scores and retaining only the top-ranked subsets, this stage significantly reduces the dimensionality of the feature space before wrapper-based optimization. As a result, a large portion of irrelevant or weakly informative features is eliminated at low computational cost, which is particularly beneficial for high-dimensional datasets.

4.5.2. Complexity of the SSO-Based Wrapper Optimization

The wrapper optimization stage employs SSO to search for an optimal feature subset from the candidate set produced by MIC filtering. Let k denote the number of retained features after filtering, where k≪d. The computational complexity of the SSO-based wrapper is given by

O(T × P × k)

where T is the number of generations, and P is the population size. Since the dimensionality of each solution corresponds to the number of candidate features k, the computational cost of SSO scales linearly with the reduced feature space rather than the original dimensionality.

In contrast, conventional wrapper-based feature selection methods such as GA or PSO typically operate directly on the full feature set, resulting in higher computational cost when applied to high-dimensional data. The proposed MIC-SSO framework alleviates this issue by constraining the optimization process to a compact and informative subspace.

4.5.3. Overall Complexity and Scalability Discussion

By combining the MIC filtering and SSO-based wrapper stages, the overall complexity of MIC-SSO can be expressed as follows:

O(d log n + T × P × k)

where the first term corresponds to the filtering stage and the second term corresponds to the wrapper optimization stage. Since k is substantially smaller than d, the dominant computational cost in high-dimensional settings is significantly reduced compared to wrapper-only approaches.

Moreover, all methods evaluated in this study employ the same classification model (TabPFN) and fitness evaluation strategy. Consequently, the primary computational overhead across different methods arises from repeated fitness evaluations rather than differences in metaheuristic operators. Under these consistent experimental conditions, the proposed MIC-SSO framework is expected to exhibit comparable or lower computational burden due to its reduced search space.

Overall, the proposed method demonstrates favorable scalability characteristics with respect to feature dimensionality. While explicit wall-clock runtime benchmarking is not reported in this study, the theoretical analysis and structural design of MIC-SSO indicate that it is well-suited for large-scale and high-dimensional tabular datasets. A comprehensive empirical runtime comparison under diverse hardware environments will be considered in future work.

Convergence Behavior:

Across multiple runs (30 independent trials per dataset), SSO consistently converges to stable solutions within the predefined number of generations (30 generations in our experiments).
Sensitivity analysis indicates that initializing the SSO population with the full candidate subset from MIC accelerates convergence and improves stability, reducing the risk of premature convergence or over-exploration.

Empirical Efficiency:

For representative datasets, the combination of MIC filtering and SSO reduces the number of candidate features by over 60%, directly reducing evaluation cost per generation.
Compared to baseline wrapper methods (GA, PSO), MIC-SSO achieves similar or higher fitness scores while requiring fewer function evaluations due to the reduced feature space, demonstrating both computational efficiency and effective convergence.

These analyses show that MIC-SSO not only balances predictive performance and dimensionality reduction but also operates efficiently, with manageable computational costs and reliable convergence behavior across diverse datasets.

5. Conclusions

MIC-SSO demonstrates a robust and generalizable two-stage framework for feature selection in high-dimensional tabular data:

Conceptual insights: MIC filtering prioritizes highly relevant features; SSO refines subsets to optimize the balance between predictive performance and compactness.
Empirical results: Across five datasets, MIC-SSO achieves the highest overall fitness, outperforming both component-level variants and existing hybrid methods.
Robustness: Sensitivity analyses confirm stability against parameter variations and intermediate feature subset selection.

Limitations:

Stage 1 does not explicitly eliminate redundant features; collinear features may increase SSO burden.
Current evaluation is limited to medium-sized tabular datasets; extremely high-dimensional, small-sample, or multi-class scenarios require further study.
Dependency on TabPFN may limit adaptability to other classifiers.
Direct comparisons with modern metaheuristics (GWO, WOA) remain future work.

Future directions:

Redundancy-aware filtering: Integrate MIC matrix or symmetric uncertainty to improve Stage 1 selection.
High-dimensional, small-sample datasets: Evaluate MIC-SSO performance in extreme scenarios.
Multi-class fairness analysis: Ensure feature representation is balanced across classes.
Metaheuristic benchmarking: Compare SSO to modern optimizers such as GWO and WOA.
Runtime and scalability: Conduct empirical runtime benchmarking and explore parallel or adaptive SSO implementations.
Explainable AI integration: Use SHAP or LIME to improve the interpretability of selected feature subsets.
Adaptive parameter tuning: Explore data-driven or AutoML approaches for hyperparameter selection to enhance robustness and flexibility.

In conclusion, MIC-SSO provides a practical, interpretable, and high-performing feature selection framework, balancing accuracy, compactness, and computational efficiency. Future extensions addressing redundancy, high-dimensionality, and adaptive optimization will enhance its applicability to diverse real-world scenarios.

Author Contributions

Conceptualization, W.-C.Y., Y.J., H.-J.H. and C.-L.H.; methodology, W.-C.Y., Y.J., H.-J.H. and C.-L.H.; software, W.-C.Y., Y.J. and H.-J.H.; validation, W.-C.Y., Y.J. and H.-J.H.; formal analysis, W.-C.Y., Y.J. and H.-J.H.; investigation, W.-C.Y., Y.J. and H.-J.H.; resources, W.-C.Y., Y.J. and H.-J.H.; data curation, W.-C.Y., Y.J. and H.-J.H.; writing—original draft preparation, W.-C.Y., Y.J. and H.-J.H.; writing—review and editing, W.-C.Y., Y.J. and H.-J.H.; visualization, W.-C.Y.; supervision, W.-C.Y.; project administration, W.-C.Y.; funding acquisition, Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by Overseas Master Project of Department of Finance of Guangdong Province (MS202500039) and the National Science and Technology Council, R.O.C. (MOST 107-2221-E-007-072-MY3, MOST 110-2221-E-007-107-MY3, MOST 109-2221-E-424-002, MOST 110-2511-H-130-002, and NSTC 113-2221-E-007-117-MY3).

Data Availability Statement

The Data Availability Statements are available in Section 4.1.

Acknowledgments

We thank the NVIDIA Academic Grant Program and the Strategic Researcher Engagement team for their support. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Before performing the two-way ANOVA, the assumptions of normality, homogeneity of variances, and independence of residuals were examined. The residual plots in Figure A1 show that the residuals approximately follow a normal distribution (Figure A1a,b), exhibit constant variance across fitted values (Figure A1c), and display no evident patterns over observation order (Figure A1d), indicating that the assumptions are reasonably satisfied.

Figure A1. Residual plots for fitness values.

The results of the two-way ANOVA are summarized in Table 4. Both the main effects of the method (p-value < 0.05) and dataset (p-value < 0.05), as well as their interaction effect (p-value < 0.05), are statistically significant. This indicates that classification performance varies significantly across different feature selection methods, across datasets, and that the relative ranking of methods depends on the dataset used.

To further investigate differences between methods, Tukey’s post hoc test was performed. The simultaneous 95% confidence intervals for pairwise comparisons are shown in Figure A2, and the corresponding grouping results are summarized in Table A1 and Table A2. MIC-SSO achieved the highest mean fitness (0.884913) and was placed in a distinct group (A), indicating a statistically significant advantage over all other methods.

Figure A2. Tukey simultaneous 95% confidence intervals for pairwise method comparisons.

Table A1. Two-way ANOVA results for fitness values.

Source	DF	Adj SS	Adj MS	F-Value	p-Value
Method	5	0.11760	0.023521	391.19	<0.001
Datasets	4	0.48137	0.120342	2001.47	<0.001
Method * Datasets	20	0.16433	0.008216	136.65	<0.001
Error	870	0.05231	0.000060
Total	899	0.81561

Table A2. Grouping information for fitness values using Tukey’s method (95% confidence).

Method	N	Mean	Grouping
MIC-SSO	150	0.884913	A
MIC-PSO	150	0.867955		B
MIC-GA	150	0.861592			C
JASAL	150	0.860459			C
MI-SSO	150	0.854865				D
XGBoost-MOGA	150	0.848692					E

References

Umesh, C.; Mahendra, M.; Bej, S.; Wolkenhauer, O.; Wolfien, M. Challenges and applications in generative AI for clinical tabular data in physiology. Pflug. Arch. Eur. J. Physiol. 2025, 477, 531–542. [Google Scholar] [CrossRef]
Adiputra, I.N.M.; Wanchai, P. CTGAN-ENN: A tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction. J. Big Data 2024, 11, 121. [Google Scholar] [CrossRef]
Chui, M.; Manyika, J.; Miremadi, M.; Henke, N.; Chung, R.; Nel, P.; Malhotra, S. Notes from the AI frontier: Insights from hundreds of use cases. McKinsey Glob. Inst. 2018, 2, 267. [Google Scholar]
Kapila, R.; Saleti, S. Federated learning-based disease prediction: A fusion approach with feature selection and extraction. Biomed. Signal Process. Control 2025, 100, 106961. [Google Scholar] [CrossRef]
Venkatesh, B.; Anuradha, J. A Review of Feature Selection and Its Methods. Cybern. Inf. Technol. 2019, 19, 3–26. [Google Scholar] [CrossRef]
Yeh, W.C.; Chu, C.L. Feature selection for data classification in the semiconductor industry by a hybrid of simplified swarm optimization. Electronics 2024, 13, 2242. [Google Scholar] [CrossRef]
Chen, W.; Cai, Y.; Li, A.; Su, Y.; Jiang, K. EEG feature selection method based on maximum information coefficient and quantum particle swarm. Sci. Rep. 2023, 13, 14515. [Google Scholar] [CrossRef] [PubMed]
Qin, Y.; Song, K.; Zhang, N.; Wang, M.; Zhang, M.; Peng, B. Robust NIR quantitative model using MIC-SPA variable selection and GA-ELM. Infrared Phys. Technol. 2023, 128, 104534. [Google Scholar] [CrossRef]
Piri, J.; Mohapatra, P.; Dey, R.; Acharya, B.; Gerogiannis, V.C.; Kanavos, A. Literature review on hybrid evolutionary approaches for feature selection. Algorithms 2023, 16, 167. [Google Scholar] [CrossRef]
Liyew, C.M.; Ferraris, S.; Di Nardo, E.; Meo, R. A review of feature selection methods for actual evapotranspiration prediction. Artif. Intell. Rev. 2025, 58, 292. [Google Scholar] [CrossRef]
Kundu, R.; Mallipeddi, R. HFMOEA: Hybrid filter-multiobjective evolutionary algorithm for feature selection. J. Comput. Des. Eng. 2025, 9, 949–965. [Google Scholar]
Yavuz, G.; Moghanjoughi, M.K.; Dumlu, H.; Çakir, H.İ. A Feature Selection Method Combining Filter and Wrapper Approaches for Medical Dataset Classification. Vietnam. J. Comput. Sci. 2025, 12, 375–393. [Google Scholar] [CrossRef]
Yin, Y.; Jang-Jaccard, J.; Xu, W.; Singh, A.; Zhu, J.; Sabrina, F.; Kwak, J. IGRF-RFE: A0020hybrid feature selection method for mlp-based network intrusion detection on unsw-nb15 dataset. J. Big Data 2023, 10, 15. [Google Scholar] [CrossRef]
Deng, X.; Li, M.; Deng, S.; Wang, L. Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification. Med. Biol. Eng. Comput. 2022, 60, 663–681. [Google Scholar] [CrossRef] [PubMed]
Patil, S.; Bansode, R. A Hybrid Feature Selection Approach Incorporating Mutual Information and Genetics Algorithm for Web Server Attack Detection. Indian J. Sci. Technol. 2024, 17, 325–332. [Google Scholar] [CrossRef]
Adamu, A.; Abdullahi, M.; Junaidu, S.B.; Hassan, I.H. An hybrid particle swarm optimization with crow search algorithm for feature selection. Mach. Learn. Appl. 2021, 6, 100108. [Google Scholar] [CrossRef]
Demiröz, A.; Aydın Atasoy, N. Explainable Model of Hybrid Ensemble Learning for Prostate Cancer RNA-Seq Classification via Targeted Feature Selection. Electronics 2025, 14, 4050. [Google Scholar] [CrossRef]
Reshef, D.N.; Reshef, Y.A.; Finucane, H.K.; Grossman, S.R.; McVean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.C. Detecting novel associations in large data sets. Science 2011, 334, 1518–1524. [Google Scholar] [CrossRef]
Kinney, J.B.; Atwal, G.S. Equitability, mutual information, and the maximal information coefficient. Proc. Natl. Acad. Sci. USA 2014, 111, 3354–3359. [Google Scholar] [CrossRef]
Yeh, W.C. A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems. Expert Syst. Appl. 2009, 36, 9192–9200. [Google Scholar] [CrossRef]
Yeh, W.C.; Yang, Y.T.; Lai, C.M. A hybrid simplified swarm optimization method for imbalanced data feature selection. Aust. Acad. Bus. Econ. Rev. 2017, 2, 263–275. [Google Scholar]
Yeh, W.C.; Tan, S.Y. Simplified swarm optimization for the heterogeneous fleet vehicle routing problem with time-varying continuous speed function. Electronics 2021, 10, 1775. [Google Scholar] [CrossRef]
Zhong, Q.; Shang, J.; Ren, Q.; Li, F.; Jiao, C.N.; Liu, J.X. FSCME: A Feature Selection Method Combining Copula Correlation and Maximal Information Coefficient by Entropy Weights. IEEE J. Biomed. Health Inform. 2024, 28, 5638–5648. [Google Scholar] [CrossRef]
Hollmann, N.; Müller, S.; Purucker, L.; Krishnakumar, A.; Körfer, M.; Bin Hoo, S.; Schirrmeister, R.T.; Hutter, F. Accurate predictions on small data with a tabular foundation model. Nature 2025, 637, 319–326. [Google Scholar] [CrossRef]
Hollmann, N.; Müller, S.; Eggensperger, K.; Hutter, F. Tabpfn: A transformer that solves small tabular classification problems in a second. arXiv 2022, arXiv:2207.01848. [Google Scholar]
Abdo, A.; Mostafa, R.; Abdel-Hamid, L. An Optimized Hybrid Approach for Feature Selection Based on Chi-Square and Particle Swarm Optimization Algorithms. Data 2024, 9, 20. [Google Scholar] [CrossRef]
Yu, K.; Li, W.; Xie, W.; Wang, L. A Hybrid Feature-Selection Method Based on mRMR and Binary Differential Evolution for Gene Selection. Processes 2024, 12, 313. [Google Scholar] [CrossRef]
Chen, K.; Wang, W.; Zhang, F.; Liang, J.; Yu, K. Correlation-guided particle swarm optimization approach for feature selection in fault diagnosis. IEEE/CAA J. Autom. Sin. 2025, 12, 2329–2341. [Google Scholar] [CrossRef]
Huang, Y.; Lu, M.; Hu, Y.; Wen, Q.; Cai, B.; Li, X. Enhanced particle swarm optimization based on network structure for feature selection. Complex Intell. Syst. 2025, 11, 473. [Google Scholar] [CrossRef]
Guyon, I. Madelon, UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/dataset/171/madelon (accessed on 1 July 2025).
Vanschoren, J.; van Rijn, J.N.; Bischl, B.; Torgo, L. OpenML: Networked science in machine learning. ACM SIGKDD Explor. Newsl. 2014, 15, 49–60. [Google Scholar] [CrossRef]
Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A public domain dataset for human activity recognition using smartphones. In Proceedings of the ESANN 2013 European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 24–26 April 2013; Volume 3, pp. 3–4. [Google Scholar]
Duin, R. Mfeat-Factors. Available online: https://www.openml.org/d/12 (accessed on 1 July 2025).
UC Irvine Machine Learning Repository. Semeion Handwritten Digit. Available online: https://archive.ics.uci.edu/dataset/178/semeion+handwritten+digit (accessed on 1 July 2025).

Figure 1. Two-stage architecture of the proposed MIC-SSO.

Figure 2. Flowchart of the SSO-based optimization in Stage 2.

Figure 3. MCC vs. percentage of selected features for all five datasets.

Table 1. Summary of representative hybrid feature selection methods.

Hybrid Method	Filter/Ranking Stage	Wrapper/Optimization Stage	Objective	Main Contribution	Reference(s)	Open Challenges
Chi-Square + PSO/GWO	Chi-Square filter	PSO/Gray Wolf Optimization (GWO)	Maximize classification accuracy	Combines statistical relevance with swarm search for higher accuracy	[26]	Basic filter may miss nonlinear dependencies; wrapper cost still non-trivial
mRMR + Binary Differential Evolution (BDE)	mRMR	Binary Differential Evolution	Maximize classification accuracy	Improves search efficiency via adaptive DE search after mRMR	[27]	Initial relevance metric still limited; evolutionary wrapper cost scales
Correlation-guided PSO	Correlation ranking	Correlation-guided PSO	Maximize classification accuracy	Enhances PSO search with domain/feature correlation guidance	[28]	Sensitivity to correlation measure; scaling with dimensionality
Additional PSO-based FS work	Basic feature ranking	PSO enhancements	Maximize accuracy	Orthogonal initialization + crossover help search	[29]	Still swarm-based; sensitivity to parameters
MIC-SSO (Proposed)	Maximal Information Coefficient (MIC)	SSO	Maximize classification accuracy	Captures linear & nonlinear dependencies + lightweight swarm search	This paper	Validation on very large scales beyond current datasets

Table 2. Basic description of the datasets.

Dataset	Features	Instances (Train/Test)	Classes	Source
Madelon	500	2600 (2000/600)	2	[30]
DNA	180	3186 (2548/638)	3	[31]
Human Activity Recognition Using Smartphones (HARUS)	561	10,299 (7352/2947)	6	[32]
Mfeat-Factors	216	2000 (1600/400)	10	[33]
Semeion Handwritten Digit (Semeion)	256	1593 (1274/319)	10	[34]

Table 3. Comparison of MIC-SSO and its variants.

Dataset	Average (Std Dev)	MI-SSO	MIC-GA	MIC-PSO	MIC-SSO (Proposed)
Madelon	MCC	0.8281 $(\pm 0.0148)$	0.8269 $(\pm 0.0150)$	0.8291 $(\pm 0.0122)$	0.8342 $(\pm 0.0087)$
	F	133.27 $(\pm 7.44)$	181.03 $(\pm 5.39)$	160.70 $(\pm 7.26)$	136.63 $(\pm 5.6902)$
	Fitness	0.8599 $(\pm 0.0040)$	0.8308 $(\pm 0.0060)$	0.8438 $(\pm 0.0057)$	0.8600 $(\pm 0.0039)$
DNA	MCC	0.9340 $(\pm 0.0069)$	0.9373 $(\pm 0.0062)$	0.9395 $(\pm 0.0069)$	0.9390 $(\pm 0.0055)$
	F	88.70 $(\pm 9.02)$	58.53 $(\pm 4.75)$	81.00 $(\pm 11.62)$	52.33 $(\pm 5.5791)$
	Fitness	0.8291 $(\pm 0.0149)$	0.8805 $(\pm 0.0079)$	0.8439 $(\pm 0.0173)$	0.8914 $(\pm 0.0092)$
HARUS	MCC	0.9249 $(\pm 0.0081)$	0.9264 $(\pm 0.0066)$	0.9252 $(\pm 0.0058)$	0.9269 $(\pm 0.0046)$
	F	216.93 $(\pm 8.17)$	185.90 $(\pm 8.82)$	166.00 $(\pm 7.43)$	166.20 $(\pm 5.9504)$
	Fitness	0.8577 $(\pm 0.0041)$	0.8748 $(\pm 0.0045)$	0.8850 $(\pm 0.0041)$	0.8855 $(\pm 0.0034)$
Mfeat-Factors	MCC	0.9646 $(\pm 0.0067)$	0.9657 $(\pm 0.0059)$	0.9620 $(\pm 0.0079)$	0.9660 $(\pm 0.0051)$
	F	57.70 $(\pm 4.28)$	64.33 $(\pm 3.27)$	52.40 $(\pm 3.08)$	42.47 $(\pm 3.6173)$
	Fitness	0.9075 $(\pm 0.0068)$	0.8987 $(\pm 0.0042)$	0.9139 $(\pm 0.0042)$	0.9292 $(\pm 0.0055)$
Semeion	MCC	0.9312 $(\pm 0.0080)$	0.9316 $(\pm 0.0087)$	0.9217 $(\pm 0.0126)$	0.9281 $(\pm 0.0091)$
	F	134.83 $(\pm 11.00)$	130.47 $(\pm 7.93)$	101.90 $(\pm 5.25)$	99.47 $(\pm 5.8706)$
	Fitness	0.8179 $(\pm 0.0128)$	0.8232 $(\pm 0.0095)$	0.8532 $(\pm 0.0069)$	0.8584 $(\pm 0.0053)$

Table 4. Comparison with existing hybrid methods.

Dataset	Average (Std Dev)	XGBoost-MOGA	JASAL	MIC-SSO (Proposed)
Madelon	MCC	0.8185 $(\pm 0.0158)$	0.8258 $(\pm 0.0100)$	0.8342 $(\pm 0.0087)$
	F	211.63 $(\pm 5.77)$	191.37 $(\pm 5.96)$	136.63 $(\pm 5.6902)$
	Fitness	0.8095 $(\pm 0.0062)$	0.8242 $(\pm 0.0049)$	0.8600 $(\pm 0.0039)$
DNA	MCC	0.8957 $(\pm 0.0256)$	0.9062 $(\pm 0.0221)$	0.9390 $(\pm 0.0055)$
	F	74.23 $(\pm 4.15)$	53.57 $(\pm 4.69)$	52.33 $(\pm 5.5791)$
	Fitness	0.8398 $(\pm 0.0096)$	0.8779 $(\pm 0.0103)$	0.8914 $(\pm 0.0092)$
HARUS	MCC	0.9316 $(\pm 0.0124)$	0.9344 $(\pm 0.0080)$	0.9269 $(\pm 0.0046)$
	F	196.43 $(\pm 12.10)$	208.73 $(\pm 7.12)$	166.20 $(\pm 5.9504)$
	Fitness	0.8710 $(\pm 0.0063)$	0.8654 $(\pm 0.0038)$	0.8855 $(\pm 0.0034)$
Mfeat-Factors	MCC	0.9683 $(\pm 0.0059)$	0.9668 $(\pm 0.0051)$	0.9660 $(\pm 0.0051)$
	F	80.00 $(\pm 5.04)$	75.70 $(\pm 3.67)$	42.47 $(\pm 3.6173)$
	Fitness	0.8778 $(\pm 0.0071)$	0.8833 $(\pm 0.0055)$	0.9292 $(\pm 0.0055)$
Semeion	MCC	0.9158 $(\pm 0.0136)$	0.9197 $(\pm 0.0120)$	0.9281 $(\pm 0.0091)$
	F	106.83 $(\pm 6.93)$	102.73 $(\pm 5.92)$	99.47 $(\pm 5.8706)$
	Fitness	0.8453 $(\pm 0.0083)$	0.8515 $(\pm 0.0073)$	0.8584 $(\pm 0.0053)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yeh, W.-C.; Jiang, Y.; Hsu, H.-J.; Huang, C.-L. MIC-SSO: A Two-Stage Hybrid Feature Selection Approach for Tabular Data. Electronics 2026, 15, 856. https://doi.org/10.3390/electronics15040856

AMA Style

Yeh W-C, Jiang Y, Hsu H-J, Huang C-L. MIC-SSO: A Two-Stage Hybrid Feature Selection Approach for Tabular Data. Electronics. 2026; 15(4):856. https://doi.org/10.3390/electronics15040856

Chicago/Turabian Style

Yeh, Wei-Chang, Yunzhi Jiang, Hsin-Jung Hsu, and Chia-Ling Huang. 2026. "MIC-SSO: A Two-Stage Hybrid Feature Selection Approach for Tabular Data" Electronics 15, no. 4: 856. https://doi.org/10.3390/electronics15040856

APA Style

Yeh, W.-C., Jiang, Y., Hsu, H.-J., & Huang, C.-L. (2026). MIC-SSO: A Two-Stage Hybrid Feature Selection Approach for Tabular Data. Electronics, 15(4), 856. https://doi.org/10.3390/electronics15040856

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MIC-SSO: A Two-Stage Hybrid Feature Selection Approach for Tabular Data

Abstract

1. Introduction

2. Literature Review

2.1. Filter, Wrapper, and Embedded Methods

2.2. Emergence of Hybrid Feature Selection

2.3. Hybrid Metaheuristic Approaches

2.4. Maximal Information Coefficient (MIC)

2.5. Simplified Swarm Optimization (SSO)

2.6. Positioning the Proposed MIC-SSO Framework

2.7. TabPFN Classifier

3. Proposed Method

4. Experimental Result

4.1. Datasets

4.2. Parameter Settings

4.3. Experimental Results

4.4. Statistical Significance Analysis

4.5. Computational Complexity and Scalability Analysis

4.5.1. Complexity of the MIC Filtering Stage

4.5.2. Complexity of the SSO-Based Wrapper Optimization

4.5.3. Overall Complexity and Scalability Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI