A Two-Stage Feature Selection Approach Based on Artificial Bee Colony and Adaptive LASSO in High-Dimensional Data

Onakpojeruo, Efe Precious; Sancar, Nuriye

doi:10.3390/appliedmath4040081

Open AccessArticle

A Two-Stage Feature Selection Approach Based on Artificial Bee Colony and Adaptive LASSO in High-Dimensional Data

by

Efe Precious Onakpojeruo

^1,2,*

and

Nuriye Sancar

^3,*

¹

Operational Research Center in Healthcare, Near East University, TRNC Mersin 10, Nicosia 99138, Turkey

²

Department of Biomedical Engineering, Near East University, TRNC Mersin 10, Nicosia 99138, Turkey

³

Department of Mathematics, Near East University, TRNC Mersin 10, Nicosia 99138, Turkey

^*

Authors to whom correspondence should be addressed.

AppliedMath 2024, 4(4), 1522-1538; https://doi.org/10.3390/appliedmath4040081

Submission received: 28 October 2024 / Revised: 5 December 2024 / Accepted: 10 December 2024 / Published: 12 December 2024

(This article belongs to the Special Issue Optimization and Machine Learning)

Download

Browse Figure

Versions Notes

Abstract

High-dimensional datasets, where the number of features far exceeds the number of observations, present significant challenges in feature selection and model performance. This study proposes a novel two-stage feature-selection approach that integrates Artificial Bee Colony (ABC) optimization with Adaptive Least Absolute Shrinkage and Selection Operator (AD_LASSO). The initial stage reduces dimensionality while effectively dealing with complex, high-dimensional search spaces by using ABC to conduct a global search for the ideal subset of features. The second stage applies AD_LASSO, refining the selected features by eliminating redundant features and enhancing model interpretability. The proposed ABC-ADLASSO method was compared with the AD_LASSO, LASSO, stepwise, and LARS methods under different simulation settings in high-dimensional data and various real datasets. According to the results obtained from simulations and applications on various real datasets, ABC-ADLASSO has shown significantly superior performance in terms of accuracy, precision, and overall model performance, particularly in scenarios with high correlation and a large number of features compared to the other methods evaluated. This two-stage approach offers robust feature selection and improves predictive accuracy, making it an effective tool for analyzing high-dimensional data.

Keywords:

feature selection; artificial bee colony; adaptive LASSO; high-dimensional data

1. Introduction

High-dimensional data are the data where the number of features (p) is significantly higher than the number of observations (n) (p >> n) [1]. High-dimensional datasets present challenges in the feature-selection process due to the increased complexity resulting from the large number of features, multicollinearity, and the presence of irrelevant or redundant features. This situation presents researchers with additional difficulties in interpreting and evaluating this data. In high-dimensional data, redundant features can reduce model performance, increase computational time, and also cause overfitting problems in the modeling process. These challenges require advanced methods that can effectively identify the most relevant predictors while maintaining computational efficiency. The increasing popularity of high-dimensional data in various fields, such as genetics and bioinformatics, has led to effective feature-selection methods rapidly gaining importance. Existing approaches, such as the High Dimensional Selection with Interactions (HDSI) algorithm, which integrates bootstrapping and random subspace sampling with classical statistical techniques, represent a significant advancement in addressing feature-selection challenges to efficiently handle high-dimensional data [2]. Furthermore, high-dimensional analysis of semidefinite relaxations for sparse principal component analysis highlights trade-offs between statistical and computational efficiency [3]. On the other hand, the Greedy Anytime Algorithm for sparse PCA provides an efficient solution for high-dimensional sparse PCA problems [4]. For a given model, feature selection can be determined as an optimization problem.

Given a dataset

X = [x_{1}, x_{2}, \dots, x_{p}]

where x_i represents i-th feature, the aim of feature selection is to identify a subset of features

X_{S} \subset X

that maximizes the model’s performance according to a certain evaluation criterion. Feature selection is the process of diminishing the number of input features by eliminating the least significant or redundant ones, hence enhancing model interpretability and decreasing computational effort [5]. Depending on the way these features are employed, the feature-selection process can be broadly subdivided into three broad categories: filter, wrapper, and embedded [6,7]. Filter methods use statistical measures to rank each feature discriminately to the learning algorithm, which is simple and fast but ignores interactions between features. The wrapper methods apply a machine learning model to rate various feature subsets, which results in a more precise selection of features but comes at the cost of more time consumption [6,7,8]. Some of the methods are designed in a way that feature selection is incorporated into the model, for instance, LASSO or Elastic Net, balancing accuracy with the complexity of the model and number of samples. Despite the strengths of existing selection methods, the high-dimensional data complexity still requires hybrid approaches that leverage the advantages of multiple feature-selection techniques [9].

There are many methods available for optimizing the feature-selection process, and most of them have their advantages as well as disadvantages. Of the swarm intelligence algorithms, nature-inspired Artificial Bee Colony (ABC) has emerged as a powerful optimization algorithm [10,11]. ABC was originally designed based on the foraging pattern of honeybees. ABC mimics the bee colony to search for the optimal solution to hundreds of optimization problems [12,13]. When applied to feature selection, it seeks the best subsets of features through a balance of the exploitation and exploration process. This algorithm performs better than others due to its simplicity and ability to avoid becoming trapped in local optima. If compared with conventional approaches that fail in high-dimensional data due to computational issues, the application of ABC is more flexible [12,14,15]. This study proposes the development of a two-stage feature-selection approach involving ABC and Adaptive Least Absolute Shrinkage and Selection Operator (AD_LASSO) to increase model accuracy. The motivation for using ABC in this framework stems from its demonstrated effectiveness in global optimization tasks and its capability to handle the intricacies of high-dimensional data.

2. Related Studies

Feature selection plays an important role in machine learning, especially in dealing with high-dimensional datasets where the number of features is higher than the number of observations. Different feature-selection techniques have been proposed in the existing literature, broadly categorized into filter methods, wrapper methods, embedded methods, and a combination of these methods, known as hybrid methods. In this section, the authors review some studies and advances dedicated to the hybrid feature-selection techniques, as well as present the potential of ABC relative to other types of metaheuristic approaches.

2.1. Filtering Method, Wrapper Technique, and Embedded Algorithm

Among the simplest selection techniques, there are filter methods, where each feature is examined individually with respect to the learning algorithm through such criteria as mutual information, correlation, or statistical tests [16]. While these methods are computationally efficient, they do not consider the interaction of features and hence, the resulting best feature subsets can be far from ideal. Wrapper methods, on the other hand, evaluate the weights of several features with a machine learning scheme for selecting features based on their significant contributions to the model’s performance [17]. Although wrapper methods offer better accuracy, they are time-consuming and computationally expensive, particularly when the data is high-dimensional; this is because, in a wrapper, all or several features are tested by training a model on a sample of the given data [6,7]. Embedded approaches such as the LASSO, Adaptive LASSO, and Elastic Net integrate feature selection directly into the model training process, where variables are selected as part of the model building [18]. The LASSO method adds an L1 norm penalty to the objective function when the coefficients of the features that are not useful for the prediction equal to zero, hence providing a binary selection of features [19,20]. Because of this, LASSO is especially helpful when the data contain many unimportant factors. However, LASSO often chooses one predictor from a set of correlated features, which may not always be desirable in scenarios where predictors are highly correlated. Adaptive LASSO considers LASSO’s strengths and applies better penalty strategies for feature selection. It overcomes LASSO’s limitations by offering better feature selection, less bias, and better consistency in high-dimensional data [21,22].

2.2. Metaheuristic-Based Feature Selection

Metaheuristic algorithms, such as ABC, genetic algorithm (GA), Ant Colony Optimization (ACO), and particle swarm optimization (PSO), have been mostly used for feature selection in recent years due to their efficiency in searching large solution spaces. These algorithms mimic natural processes looking for feature subsets which, in turn, can spare one from evaluating all possible combinations, which is useful in high-dimensional data problems. ABC has received more attention lately in the literature as an alternative to GA and PSO. ABC, first developed by Karaboga in 2005 [12], seeks to mimic the honey bees’ intelligent foraging pattern. The colony comprises workers or employed bees that focus on the exploitation of the various regions within the solution space, onlookers, and scouts that focus on the discovery of new regions. Employed bees seek for promising areas, onlookers select the best areas to search or invest their time in, and scouts, finally, bring new areas to work on. This enables ABC to achieve exploration and exploitation ratios just as successively as crossover and mutation in GA. In view of the fact that feature selection is one of the most crucial stages of a pattern-recognition process, a number of experimental verifications of the efficiency of ABC have been conducted in terms of this step. For instance, ref. [23] applied ABC to large-scale gene expression profiles and revealed that ABC could achieve a considerable decrease in dimensionality while achieving high performance. Also, ref. [24] applied ABC to consider the features in a medical diagnosis system, which showed better results than GA and Simulated Annealing. Studies that have compared the two algorithms suggest that ABC has improved models with better exploration potential in large multifactorial spaces. For instance, refs. [25,26] used ABC to select features and suggested that ABC yielded better global search owing to its ineligibility for premature convergence.

2.3. Hybrid Feature-Selection Approaches

Due to the shortcomings of the singular feature-selection strategies, there have been some attempts to integrate the various methods into an individual framework. Hybrid methods mostly involve the application of filter or wrapper methods in their association with metaheuristic algorithms, with the aim of increasing the efficiency of computations and the accuracy of their predictions. ABC has inspired other approaches, but the combined methods of using it together with other optimization approaches have attracted more attention. Refs. [27,28] proposed an integrated ABC approach, incorporating Tabu Search to improve both local and global searching. The results of this experiment showed that the proposed randomized hybrid method converged faster and provided better solutions as compared to more conventional methods of feature selection for bioinformatics problems. Similarly, ref. [29] has combined the method of ABC with a support vector machine for the feature-selection procedure in high dimensions and proved good classification performance on relatively low times of computations. ABC has become an effective alternative tool for feature-selection problems as employed in the following studies [12,13,14,15]. Its biological-inspired mechanism that deals with exploration and exploitation has been successfully used to find the optimal feature subset in different areas, ranging from medical diagnosis to text classification. In this study, we proposed the ABC-based method due to its well-developed global search abilities and its higher stability in preventing the algorithm from being trapped by local optima than other metaheuristic optimization methods. The primary aim of this study is to develop a two-stage feature-selection technique using the ABC optimization method alongside AD_LASSO in a high-dimensional dataset. This hybrid framework seeks to achieve maximum feature-selection performance while at the same time minimizing model complexity.

3. Materials and Methods

3.1. Linear Regression Model

Linear regression describes the association between a dependent feature (y) and one or more independent features (predictors). The goal is to predict the values of y using the predictors by estimating the coefficients β in the following linear equation:

y = X β + ϵ

(1)

where

X \in R^{n x p}

data (design) matrix for independent features (predictors). The row i of the data matrix X is the column vector

x_{i} = (x_{i 1}, \dots, x_{i p})

.

y \in R^{n}

is the vector of observed dependent features with n as the number of observations, and p is the number of independent features.

β = (β_{1}, \dots, β_{p}) \in R^{p}

is the vector of unknown coefficients (parameters to be estimated) and

ϵ \in R^{n}

is the error term, assumed to be normally distributed with a mean of zero and constant variance (i.e., homoscedastic), i.e.,

E (ϵ) = 0, ϵ ~ N (0, σ^{2} I_{n})

where

σ^{2} i s v a r i a n c e

.

Suppose that

\hat{β}

is the estimator of β. Then, the residuals are determined as

r_{i} = r_{i} (\hat{β}) = y_{i} - X_{i} \hat{β}

where

\hat{y} = X \hat{β}

. Because residuals show the error of the model fit, it is best to keep them small in size. The estimation of

\hat{β}

is obtained by minimizing the sum of squared residuals:

\hat{β} = \underset{β}{a r g m i n} {‖y - X β‖}_{2}^{2} = \underset{β}{argmin} \sum_{i = 1}^{n} (y_{i} - X_{i} β)^{2}

(2)

where

{‖y - X β‖}_{2}

represents the L2 norm (Euclidean norm) of the residual vector

y - X β

. This minimization problem has a closed-form solution which is called an ordinary least square estimator (OLS), defined as

\hat{β} = {(X^{T} X)}^{- 1} X^{T} y

(3)

It is obviously seen that computation of the OLS estimator depends on some assumptions, like most statistical models [30]; the matrix

X^{T} X

must have full rank, i.e., rank (X) = p. When p >> n, the matrix

X^{T} X

becomes singular and non-invertible, making the OLS solution undefined. On the other hand, overfitting and high correlation among independent features are other challenges of high-dimensional data. Stepwise regression [31] and LARS (Least Angle Regression) [32] are widely used methods for feature selection in high-dimensional data. These methods can help manage the complexity of the model while selecting important features. On the other hand, regularization methods in regression, such as LASSO and Adaptive LASSO, apply penalties on the size of coefficients to avoid overfitting and enhance model performance in high-dimensional data by shrinking less relevant feature coefficients toward zero.

3.2. Stepwise Regression

Stepwise regression [31] is a feature-selection method that iteratively adds or removes features based on a selected criterion in extended BIC (ExBIC) for high-dimensional data. The process can follow a forward selection approach, which starts with no features and adds the most significant features step-by-step, or a backward elimination approach, which starts with all features and removes the least significant ones. Alternatively, a combination of both, called stepwise selection, evaluates adding and removing variables at each step.

3.3. Least Angle Regression (LARS)

The Least Angle Regression (LARS) method is a stepwise version of the LASSO approach used in regression for feature selection [32]. LARS is similar to forward stepwise regression. At each step, it identifies the feature most correlated with the target. When multiple features have equal correlation with the target, instead of continuing along the same feature, it proceeds in an equiangular direction between the features. The LARS method follows the following steps:

Start with $\hat{β} = 0 a n d r = y$ where r is residual.
Identify the feature X that has the highest correlation with the residual (or equivalently, the feature that forms the least angle with the residual).
Continue following the highly correlated predictor until the residual and another feature x have an equal correlation.
Move in a direction equiangular to both the features.
Repeat steps until all the features are included in the model.

3.4. Regularization Methods

Regularization methods such as LASSO and adaptive lasso are frequently used methods in regression analysis which assist in overcoming overfitting by including penalty terms to the loss function, thus improving model generalization on high-dimensional data. Equation (4) clearly demonstrates the mechanics of regularization:

\underset{t o t a l l o s s f u n c t i o n}{\underset{⏟}{L (β; .)}} = \underset{l o s s f u n t i o n f o r l i n e a r m o d e l}{\underset{⏟}{L (β)}} + \underset{r e g u l a r i z a t i o n p e n a l t y}{\underset{⏟}{ϕ (β; .)}}

(4)

where

L (β) = {‖y - X β‖}_{2}^{2}

.

3.4.1. Least Absolute Shrinkage and Selection Operator (LASSO)

LASSO (Least Absolute Shrinkage and Selection Operator) is one of the regularization methods that adds an L1 penalty (absolute norm) to the loss function [20]. The optimization problem can then be formulated for some t > 0 as

\min {‖y - X β‖}_{2}^{2}, subject to {‖β‖}_{1} \leq t

(5)

where

{‖β‖}_{1} = \sum_{J = 1}^{p} | β_{j} |

is the L1 norm of the coefficient vector. By solving this minimization problem as an unconstrained problem by incorporating a Lagrange multiplier λ > 0, the LASSO estimator is given by

{\hat{β}}_{L A S S O} = \underset{β}{a r g m i n} {‖y - X β‖}_{2}^{2} + λ {‖β‖}_{1}

(6)

where λ is a tuning parameter that controls the shrinkage of the LASSO coefficient with λ ≥ 0. LASSO stands out due to its ability to perform feature selection by shrinking some coefficients to exactly zero [33]. Because of this, LASSO is especially helpful when the data contains many unimportant factors. However, LASSO often chooses one predictor from a set of correlated features, which may not always be desirable in scenarios where predictors are highly correlated.

3.4.2. Adaptive LASSO

Adaptive LASSO considers LASSO’s strengths and applies better penalty strategies for feature selection [21,22]. It overcomes LASSO’s limitations by offering better feature selection, less bias, and better consistency in high-dimensional data. The Adaptive LASSO estimator is defined as

{\hat{β}}_{L A S S O} = \underset{β}{a r g m i n} {‖y - X β‖}_{2}^{2} + λ \sum_{j = 1}^{p} w_{j} |β_{j}|

(7)

where

w_{j}

is the weight for each coefficient. Generally, it is set to

w_{j} = \frac{1}{| \hat{β_{j}} |}

with LASSO estimator

{\hat{β}}_{j}

. These weights penalize small coefficients more and large coefficients less, thus keeping the necessary features in the model.

Adaptive LASSO handles high-dimensional components better than LASSO due to its improved weight. The weights

w_{j}

, inversely proportional to initial coefficient estimates

{\hat{β}}_{j}

, mitigate bias from smaller coefficients that are highly impacted. This technique promotes significant features with higher coefficient values and demotes and eliminates unimportant ones with higher penalties, improving feature-selection precision. However, Adaptive LASSO may have limitations, especially when working with enormous data or data with significant correlation or nonlinearity.

One-stage selection methods like Adaptive LASSO may perform poorly when there are multiple layout interconnections or dependency linkages. This is where two-stage techniques become crucial. Utilizing a two-stage approach, one can enhance one-stage methods by adding a second refinement step, utilizing Adaptive LASSO with a metaheuristic optimization algorithm. Two-stage techniques allow additional time for consideration and fine-tuning feature importance, reducing variability and increasing stability. These methods first perform an initial selection, followed by a secondary process to refine the feature subset, ensuring that only the most significant features are retained in the model. This proposed approach enhances the reliability of feature selection in high-dimensional data, where traditional methods may fall short. In this study, a two-stage ABC-Adaptive LASSO-based hybrid variable-selection method has been proposed.

3.5. Artificial Bee Colony Optimization (ABC)

The ABC algorithm has been introduced by Karaboga [12] for solving the continuous optimization problems. It mimics the foraging behavior of honey bee colonies, where the search process is divided into three roles: employed bees, onlooker bees, and scout bees. Each of these roles works alongside the other to essentially conduct a search of the decision-making space for the best possible solution. The first half of the bees in the ABC algorithm are workers, and the second half are observers. Each food source stands in for a worker bee that uses its own food source to its advantage and then returns to the hive to tell the other bees about it. Spectator bees identify food sources by following the movements of worker bees. The method states that a food source’s nectar amount symbolizes the solution’s quality (i.e., fitness), while a food source’s position represents a potential solution (i.e., food source) to the problem. The number of solutions in the swarm is equal to the number of employed or onlooker bees and food sources [12,13].

The ABC algorithm has seven steps, which include the following:

Step 1: Initialization: First of all, ABC is initialized with SN food sources where SN is the size of food sources (worker bees). Each food source

X_{i}

where i = 1, 2, …, SN is a vector with dimension D, which stands for the number of parameters to be optimized. The first food source locations are randomly generated by the Equation (8):

X_{i}^{j} = X_{m i n}^{j} + r a n d (0,1) \cdot (X_{m a x}^{j} - X_{m i n}^{j})

(8)

where j = 1, 2, …, D.

X_{m a x}^{j}

and

X_{m i n}^{j}

represents upper and lower bounds of the jth parameter, and

r a n d (0,1)

represents the random number from 0 to 1.

Step 2: This step involves evaluating the food sources by objective function. In this context, we determine the nectar amount (or objective value) associated with each food source.

Step 3: The process of worker bees: Upon initialization, each worker bee visits its food source and searches for a neighboring food source with superior nectar quality. The location of the closest food source for a worker bee X_i is V_i given by the following equation:

V_{i}^{j r a n d} = X_{i}^{j r a n d} + r a n d (- 1, 1) \cdot (X_{i}^{j r a n d} - X_{k}^{j r a n d})

(9)

where

X_{k}

is the food source which is randomly selected, k ∈ (1, 2, …, SN) is determined at random, different from

i, j r a n d \in {1, 2, \dots, D}

represents the random integer number, and rand[−1, 1] is a random value from −1 to 1.

Step 4: Selection and assessment of quality: The quality of the new food source is assessed after the identification of the new food source. Bees will abandon their current food source in favor of a new one if the latter exhibits superior quality.

Step 5: The process for onlooker bees: The onlooker bees acquire knowledge regarding the characteristics of the food sources from worker bees after all the worker bees complete their foraging operations. The onlooker bee assesses the probability of locating a food source

X_{i}

based on the quality of information obtained from all worker bees, represented as

π_{i}

. For each food source

X_{i}

, the probability value

π_{i}

is ascertained based on the quality of food source I, as assessed by the worker bee by Equation (10).

π_{i} = \frac{f_{i}}{\sum_{n = 1}^{S N} f_{n}}

(10)

where f_i is the value associated the objective function. Consequently, this probability value

π_{i}

is compared with a randomly generated number between 0 and 1. The onlooker bee that identifies a new food source is given this food source if its probability value, as calculated by Equation (10), exceeds the random value, provided the food source is located.

Step 6: This stage involves preserving the best food source with the best quality.

Step 7: The scout bee: In the scout bee process, a bee substitutes an abandoned food source with one it has discovered. Each bee in the swarm is assigned an own counter for this process. Upon reaching a specific threshold in her counter value, a bee will abandon the food source (the solution) and commence the search for alternative food sources. According to Equation (8), a scout bee seeks a new food source.

The procedure persists until a specified termination criterion is met by repeating steps 3 to 7.

Since the feature-selection problem is defined as a discrete optimization problem, a binary version of ABC is needed. Generally, the sigmoid function is applied to convert continuous values into binary.

S (V_{i}) = \frac{1}{1 + e^{- V_{i}}}

(11)

If

S (V_{i}) > r a n d (0,1), s e t V_{i}

; otherwise, set

V_{i}

= 0.

3.6. The Proposed ABC-ADLASSO Method for Feature Selection

The proposed feature-selection method consists of two phases: In the first phase, ABC optimization is applied to narrow the search space and identify a subset of relevant features, reducing computing costs and improving model accuracy. In the second phase, the AD_LASSO method is used to further refine the selected features, eliminating any remaining irrelevant features. By reducing the dimensionality with ABC first, the complexity of the AD_LASSO process is minimized, improving its performance accuracy and preventing overfitting. The advantages of ABC over other heuristic methods have been widely discussed in the literature. For example, ref. [34] highlights that ABC requires fewer control parameters and has a more advanced ability to balance exploration and exploitation. Ref. [35] emphasizes that, despite its simplicity, ABC is more effective in global optimization problems. Additionally, ABC can be easily adapted to various optimization problems, giving it a significant advantage over other algorithms and faster convergence, making it a suitable choice for high-dimensional data. To use the ABC algorithm efficiently and reap its benefits, a few crucial factors must be taken into account:

Representation of Bees
Each bee in the ABC algorithm represents a potential solution, which is a binary vector corresponding to a subset of features. For example, given a dataset with 100 features, a bee might be represented as a vector [1, 0, 0, 1, 0, 1, 0, …, 1], where 1 s indicates the selected features.
Objective Function
Choosing the appropriate objective function in the optimization process is critical to ensure the accuracy and effectiveness of the solution. The Extended-Bayesian Information Criterion (ExBIC) was utilized as a fitness function for the proposed feature-selection method. ExBIC is a model selection criterion developed especially for high-dimensional data and is commonly used for feature selection [36]. ExBIC is also effective in controlling false positives while balancing model fit and complexity and is defined by the Equation (12):

$E x B I C = - 2 * l o g L + d * l o g (n) + 2 * γ * l o g (p)$

(12)

where d denotes the number of selected features, n is the number of total observations, p is the number of all features in the data matrix, and γ is a parameter ranging between 0 and 1. A more optimal model will have a lower ExBIC value, reflecting an improved trade-off between model accuracy and complexity. In this case, γ is a fixed-value parameter, commonly assigned as 0.5, as suggested by [36]. $l o g L$ is the logarithm of the likelihood of the model (which is related to the residual sum of squares in linear regression). In the proposed method, the fitness function ExBIC will be minimized in the first step using the ABC algorithm for feature selection. Then, in the second step, the remaining unnecessary features will be eliminated using Adaptive LASSO. Adaptive LASSO will be enabled to work on a more refined feature set in the second stage after unnecessary features are removed using ABC in the first stage. In order to provide more precise and efficient results, this strategy seeks to integrate the advantages of both approaches.
The Control Parameters for ABC
By trial and error, the following parameters have been defined for the ABC-based proposed method:
Number of food sources: SN, as SN is the number of features in the data.
Maximum number of iterations: 100.
Max Limit: 10, where the max limit is how many times a food source can be selected without improvement before it is abandoned.

The flow chart of the proposed two-stage ABC-ADLASSO method is presented in Figure 1.

4. Simulation Study

The simulation study was conducted to show the feature-selection performance of the developed ABC-ADLASSO method by comparing it with the AD_LASSO, LASSO, stepwise, and LARS methods under different simulation settings in high-dimensional data. In the simulation study, the R Studio was used for all processes. The linear model was used for data generation.

y = X β + ϵ, ϵ ~ N (0, σ^{2})

(13)

Six simulation scenarios with high-dimensional settings were considered. The sample size n = 50 is used for each setting.

The 6 scenarios are considered as follows [18]:

Scenario1: p = 60 and σ = 1.5. The rows are independent in data matrix X. The first 10 features $(x_{j 1}, \dots, x_{j 10})$ and the remaining 50 features $(x_{j 11}, \dots, x_{j 60})$ are independent in the j-th row. The pairwise correlation among r-th and d-th components in $(x_{j 1}, \dots, x_{j 10}) i s ρ^{| r - d |}$ where ρ = 0.5 and r, d = 1, …, 10. Also, the pairwise correlation among r-th and d-th components in $(x_{j 11}, \dots, x_{j 60}) i s ρ^{| r - d |}$ where $ρ$ = 0.5 and r, d = 11, …, 60.
Scenario2: This is identical to Scenario1, with the exception that $ρ$ = 0.90.
Scenario3: This is identical to Scenario1, with the exception that p = 100.
Scenario4: This is identical to Scenario2, with the exception that p = 100.
Scenario5: p = 60 and σ = 1.5. The features are generated:
$x_{j i} = Z_{1 i} + e_{j i}$ for j = 1, 2, …, 5 and
$x_{j i} = Z_{2 i} + e_{j i}$ for j = 6, 7, …, 10 where $Z_{j i} ~ N (0,1) a n d e_{j i} ~ N (0,1 / 100)$ . $β s$ are 1.5 for the first 10 components and 0 for the rest of the components.
Scenario6: This is identical to Scenario1, with the exception that p = 100.

The confusion matrix was used to evaluate performances of the developed and traditional Adaptive LASSO methods. In this matrix, True Positives (TP) are features correctly identified as relevant (correctly determining significant or non-zero coefficients), and False Positives (FP) are irrelevant features incorrectly identified as relevant (zero coefficients incorrectly determined as significant or non-zero). True Negatives (TN) are irrelevant features correctly identified as irrelevant (correctly determining zero coefficients), and False Negatives (FN) are relevant features incorrectly identified as irrelevant (nonzero coefficients incorrectly determined as non-significant or zero). Based on the confusion matrix, accuracy, sensitivity, and specificity values were computed for the methods:

A c c u r a c y = \frac{T P + T N}{T P + T N + F N + F P}

(14)

S p e c i f i c i t y = \frac{T N}{T N + F P}

(15)

S e n s i t i v i t y = \frac{T P}{T P + F N}

(16)

A total of 300 random repetitions of the simulations are performed. Every simulated dataset is split into a training set (80%) and a test set (20%) for each iteration of the simulation. The proposed ABC-ADLASSO, AD_LASSO, LASSO, stepwise, and LARS methods were implemented on the training set, and the performances of the methods were analyzed on the testing set.

5. Simulation Results

The simulation study has been performed to demonstrate the impact of increasing the number of features (p) and the correlation among the features on feature-selection performance in high-dimensional data. As p increases, the complexity of the data grows, making feature selection more challenging due to the higher likelihood of including irrelevant or redundant features. Among the feature-selection methods compared, the standard AD_LASSO consistently outperforms traditional one-stage feature-selection methods, LASSO, LARS, and stepwise, across all scenarios, particularly when dimensionality increases. The simulation study demonstrates that the proposed two-stage ABC-ADLASSO method enhances AD_LASSO’s feature-selection performance, achieving superior sensitivity, specificity, and accuracy values, resulting in more successful outcomes than AD_LASSO and other compared methods across all scenarios.

In scenarios with lower correlation and lower dimension (p = 60,

ρ

= 0.50), LASSO performs similarly to stepwise and LARS, managing acceptable feature selection. As the correlation among features rises (

ρ

= 0.90), stepwise demonstrates the most significant decline in performance, struggling to handle multicollinearity effectively. LASSO and LARS also have experienced noticeable performance decreases, but LASSO generally outperforms LARS by providing a slightly better feature selection in high-correlation settings. However, in these difficult scenarios, both approaches are not as effective as AD_LASSO.

Our proposed ABC-ADLASSO feature-selection method consistently outperforms AD_LASSO, LASSO, stepwise, and LARS across all simulation scenarios, particularly as p increases and the correlation between features increases. This improvement is attributed to the ABC’s ability to explore a broader solution space, enabling it to handle multicollinearity more effectively and avoid local minima. As the correlation among features increases, traditional one-stage methods like AD_LASSO, LASSO, LARS, and stepwise (BICP) struggle with bias and selection accuracy, while the proposed two-stage approach achieves feature selection more robustly. Table 1 shows the performance results for all methods across different simulation scenarios.

Real dataset application

The proposed ABC-ADLASSO, AD_LASSO, LASSO, stepwise, and LARS methods were implemented in the Communities and Crime [37], Large-scale Wave Energy Farm [38], Insurance Company Benchmark (COIL 2000) [39], and Federal Reserve Economic Data (FRED) [40] real datasets for feature selection to identify significant features from the training set, and their efficacy was assessed on the test set.

In the Communities and Crime dataset, there is a significant amount of missing data in this dataset, where the response feature is the per capita violent crime rate. The features included in the dataset involve the community, such as the percentage of the population considered urban and the median family income, and involve law enforcement, such as per capita number of police officers and percent of officers assigned to drug units. After data cleaning, a clean dataset with 101 explanatory features and 1996 observations was obtained. Since the goal is to select important features on a high-dimensional dataset, a random index was created to select 80 observations for the training set and 20 observations for the test set.

The Large-scale Wave Energy Farm dataset includes 99 WECs, or wave energy converters, with 6300 observations based on Perth and Sydney wave scenarios as predictors and total power output as the response variable. The main goal is to predict the total power output of the wave farm based on the coordination of WECs. Since the goal is to select significant features on a high-dimensional dataset, a random index was created to select 80 observations for the training set and 20 observations for the test set.

The Insurance Company Benchmark (COIL 2000) dataset includes 5000 customer records, each with 86 features. Among these, 85 are independent variables: 43 sociodemographic features and 42 product ownership features. The target variable is number of mobile home policies, which indicates the number of mobile home insurance policies. Since the study aims to perform feature selection on a high-dimensional dataset, a random index was used to partition the data into an 80-observation training set and a 20-observation test set.

The FRED data used in this study consist of 115 macroeconomic variables obtained from the Federal Reserve Economic Data (FRED) database of the St. Louis Federal Reserve Bank. For this analysis, we focus on the period between 2008 and 2016 to evaluate high-dimensional regression models with 102 observations. The goal is to perform variable selection on 114 predictors with one output variable, “Personal Consumption Expenditures Price Index” (PCEPI), using Adaptive-LASSO and the proposed method and to compare the performance of these approaches using 82 observations for the training set and 20 observations for the test set.

Each method was applied to every dataset 10 times for feature selection, and the mean, standard deviation, median, interquartile range (IQR), minimum, and maximum values for Adjusted R², MAE, and RMSE are presented in Table 2, Table 3, Table 4 and Table 5.

b.: Real dataset application results

The results from the real datasets are generally consistent with the simulation findings. Similar to the simulation scenarios, the proposed method demonstrated significantly superior performance compared to other methods in each real dataset. Among traditional methods, AD_LASSO outperformed LASSO, LARS, and stepwise, but the proposed method consistently achieved the best performance across all real datasets. The findings in each real dataset indicate that the ABC-ADLASSO approach regularly surpasses AD_LASSO, LASSO, stepwise, and LARS methods based on Adjusted R², Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). ABC-ADLASSO attained elevated Adjusted R² values, indicating its enhanced capacity to elucidate variance in the response feature. It yields reduced RMSE and MAE values, signifying enhanced predictive accuracy. These findings show that the proposed two-stage method is a very suitable method for feature selection in the context of high-dimensional data. Also, it can be said that all methods show stable results in the min–max value range. However, since the ABC-ADLASSO method is a heuristic optimization-based method, this property is a very important feature of this method.

6. Discussion

The findings from both the simulation study and the empirical data application validate the benefits of the proposed two-stage feature-selection method utilizing ABC and Adaptive LASSO. In high-dimensional contexts, feature selection is essential to prevent overfitting and enhance model interpretability. Our results indicate that ABC-ADLASSO provides enhanced feature selection and predictive accuracy relative to one-stage approaches such as AD_LASSO, LASSO, stepwise, and LARS. The initial stage utilizes the ABC method to effectively reduce the search space, select the most promising features, and address issues related to high multicollinearity. The second stage employs Adaptive LASSO to further improve the selected features, guaranteeing that the final model is both concise and precise. The proposed ABC-ADLASSO method offers advantages such as improved feature-selection accuracy by combining global exploration (ABC) with AD_LASSO. However, a potential drawback is the need for careful tuning of hyperparameters. The performance of both ABC and AD_LASSO depends on the chosen hyperparameter settings, and incorrect tuning may reduce the method’s effectiveness and impact the accuracy of the results. In the simulation analysis, the suggested method performs well in every situation, especially as the correlation rises. This approach reduces the complexity of the model and increases its performance in high-dimensional data.

7. Conclusions

This study has introduced an innovative two-stage feature-selection method that integrates the ABC metaheuristic optimization method with Adaptive LASSO for high-dimensional data. The proposed ABC-ADLASSO method was compared with the AD_LASSO, LASSO, stepwise, and LARS methods under different simulation settings in high-dimensional data and various real datasets to show the feature-selection performance of the proposed method. The ABC-ADLASSO method has overcome the shortcomings of single-stage feature-selection methods by integrating a global optimization algorithm (ABC) in the first stage and enhancing feature selection using a penalization technique (Adaptive LASSO) in the second stage. According to the results obtained from simulations and applications on various real datasets, ABC-ADLASSO has shown significantly superior performance in terms of accuracy, precision, and overall model performance, particularly in scenarios with high correlation and a large number of features compared to the other methods evaluated. This two-stage methodology offers a robust and adaptable solution to handling high-dimensional data, rendering it particularly relevant in domains such as genetics, bioinformatics, and intricate predictive modeling. Future research may investigate the integration of this methodology with alternative machine learning classifiers and its use across different datasets from various fields. Also, in future studies, a comprehensive comparative analysis of the proposed method with other optimization-based feature-selection techniques can be performed.

Author Contributions

Conceptualization, E.P.O. and N.S.; methodology, E.P.O. and N.S.; software, E.P.O. and N.S.; validation, E.P.O. and N.S.; formal analysis, E.P.O. and N.S.; investigation, E.P.O. and N.S.; resources, E.P.O. and N.S.; data curation, E.P.O. and N.S.; writing—original draft preparation; writing—review and editing, E.P.O. and N.S.; visualization, E.P.O. and N.S.; supervision, E.P.O. and N.S.; project administration, E.P.O. and N.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sancar, N.; Onakpojeruo, E.P.; Inan, D.; Uzun, O.D. Adaptive Elastic Net Based on Modified PSO for Variable Selection in Cox Model with High-Dimensional Data: A Comprehensive Simulation Study. IEEE Access 2023, 11, 127302–127316. [Google Scholar] [CrossRef]
Jain, R.; Xu, W. HDSI: High dimensional selection with interactions algorithm on feature selection and testing. PLoS ONE 2021, 16, e0246159. [Google Scholar] [CrossRef] [PubMed]
Amini, A.A.; Wainwright, M.J. High-dimensional analysis of semidefinite relaxations for sparse principal components. In Proceedings of the IEEE International Symposium on Information Theory ISIT 2008, Toronto, ON, Canada, 6–11 July 2008; pp. 2454–2458. [Google Scholar]
Holtzman, G.; Soffer, A.; Vilenchik, D. A greedy anytime algorithm for sparse PCA. In Proceedings of the 33rd Conference on Learning Theory (COLT 2020), Graz, Austria, 9–12 July 2020; pp. 1939–1956. [Google Scholar]
Rouhi, A.; Nezamabadi-Pour, H. Feature Selection in High-Dimensional Data. In Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2020; Volume 1123, pp. 85–128. Available online: https://link.springer.com/chapter/10.1007/978-3-030-34094-0_5 (accessed on 7 October 2024).
Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front. Bioinform. 2022, 2, 927312. Available online: www.frontiersin.org (accessed on 25 October 2024). [CrossRef] [PubMed]
Curreri, F.; Fiumara, G.; Xibilia, M.G. Input Selection Methods for Soft Sensor Design: A Survey. Future Internet 2020, 12, 97. Available online: https://www.mdpi.com/1999-5903/12/6/97/htm (accessed on 25 October 2024). [CrossRef]
Maseno, E.M.; Wang, Z. Hybrid Wrapper Feature Selection Method Based on Genetic Algorithm and Extreme Learning Machine for Intrusion Detection. J. Big Data 2024, 11, 24. [Google Scholar] [CrossRef]
Bohrer, J.S.; Dorn, M. Enhancing Classification with Hybrid Feature Selection: A Multi-Objective Genetic Algorithm for High-Dimensional Data. Expert Syst. Appl. 2024, 255, 124518. [Google Scholar] [CrossRef]
Owoc, M.L. Usability of Honeybee Algorithms in Practice. In Towards Nature-Inspired Sustainable Development; IFIP Advances in Information and Communication Technology; Springer: Cham, Switzerland, 2024; Volume 693, pp. 161–176. Available online: https://link.springer.com/chapter/10.1007/978-3-031-61069-1_12 (accessed on 15 October 2024).
Stamadianos, T.; Taxidou, A.; Marinaki, M.; Marinakis, Y. Swarm Intelligence and Nature-Inspired Algorithms for Solving Vehicle Routing Problems: A Survey. Oper. Res. 2024, 24, 47. Available online: https://link.springer.com/article/10.1007/s12351-024-00862-5 (accessed on 15 October 2024). [CrossRef]
Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization; Technical Report TR06; Computer Engineering Department, Engineering Faculty, Erciyes University: Kayseri, Türkiye, 2005. [Google Scholar]
Karaboga, D.; Kaya, E. An Adaptive and Hybrid Artificial Bee Colony Algorithm (aABC) for ANFIS Training. Appl. Soft Comput. 2016, 49, 423–436. [Google Scholar] [CrossRef]
Nozohour-Leilabady, B.; Fazelabdolabadi, B. On the Application of Artificial Bee Colony (ABC) Algorithm for Optimization of Well Placements in Fractured Reservoirs: Efficiency Comparison with the Particle Swarm Optimization (PSO) Methodology. Petroleum 2016, 2, 79–89. [Google Scholar] [CrossRef]
Yarat, S.; Senan, S.; Orman, Z. A Comparative Study on PSO with Other Metaheuristic Methods. In International Series in Operations Research and Management Science; Springer: Cham, Switzerland, 2021; Volume 306, pp. 49–72. Available online: https://link.springer.com/chapter/10.1007/978-3-030-70281-6_4 (accessed on 15 October 2024).
Theng, D.; Bhoyar, K.K. Feature Selection Techniques for Machine Learning: A Survey of More Than Two Decades of Research. Knowl. Inf. Syst. 2024, 66, 1575–1637. Available online: https://link.springer.com/article/10.1007/s10115-023-02010-5 (accessed on 7 October 2024). [CrossRef]
Liu, X.Y.; Liang, Y.; Wang, S.; Yang, Z.Y.; Ye, H.S. A Hybrid Genetic Algorithm with Wrapper-Embedded Approaches for Feature Selection. IEEE Access 2018, 6, 22863–22874. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Yerlikaya-Özkurt, F.; Taylan, P. Enhancing Classification Modeling Through Feature Selection and Smoothness: A Conic-Fused Lasso Approach Integrated with Mean Shift Outlier Modelling. J. Dyn. Games 2024, 12, 1–23. Available online: http://staging.xml2html.mdpi.lab/articles/appliedmath-04-00081 (accessed on 7 October 2024). [CrossRef]
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Huang, J.; Ma, S.; Zhang, C.H. Adaptive Lasso for Sparse High-Dimensional Regression Models. Ann. Stat. 2008, 18, 1603–1618. [Google Scholar]
Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. Available online: https://www.tandfonline.com/doi/abs/10.1198/016214506000000735 (accessed on 25 October 2024). [CrossRef]
Zhang, Z.; Tong, T.; Fang, Y.; Zheng, J.; Zhang, X.; Niu, C.; Li, J.; Zhang, X.; Xue, D. Genome-Wide Identification of Barley ABC Genes and Their Expression in Response to Abiotic Stress Treatment. Plants 2020, 9, 1281. [Google Scholar] [CrossRef]
Garg, S.; Kaur, K.; Batra, S.; Aujla, G.S.; Morgan, G.; Kumar, N.; Zomaya, A.Y.; Ranjan, R. En-ABC: An Ensemble Artificial Bee Colony Based Anomaly Detection Scheme for Cloud Environment. J. Parallel Distrib. Comput. 2020, 135, 219–233. [Google Scholar] [CrossRef]
Hancer, E.; Xue, B.; Karaboga, D.; Zhang, M. A Binary ABC Algorithm Based on Advanced Similarity Scheme for Feature Selection. Appl. Soft Comput. 2015, 36, 334–348. [Google Scholar] [CrossRef]
Chamchuen, S.; Siritaratiwat, A.; Fuangfoo, P.; Suthisopapan, P.; Khunkitti, P. High-Accuracy Power Quality Disturbance Classification Using the Adaptive ABC-PSO as Optimal Feature Selection Algorithm. Energies 2021, 14, 1238. [Google Scholar] [CrossRef]
Guo, Y.; Zhang, C. A Hybrid Artificial Bee Colony Algorithm for Satisfiability Problems Based on Tabu Search. In Proceedings of the 3rd IEEE International Conference on Computer and Communications (ICCC 2017), Chengdu, China, 13–16 October 2017; IEEE: New York, NY, USA, 2018; pp. 2226–2230. [Google Scholar]
Gu, T.; Chen, H.; Chang, L.; Li, L. Intrusion Detection System Based on Improved ABC Algorithm with Tabu Search. IEEJ Trans. Electr. Electron. Eng. 2019, 14, 1652–1660. Available online: https://onlinelibrary.wiley.com/doi/full/10.1002/tee.22987 (accessed on 15 October 2024). [CrossRef]
Kiliçarslan, S.; Dönmez, E. Improved Multi-Layer Hybrid Adaptive Particle Swarm Optimization Based Artificial Bee Colony for Optimizing Feature Selection and Classification of Microarray Data. Multimed. Tools Appl. 2024, 83, 67259–67281. Available online: https://link.springer.com/article/10.1007/s11042-023-17234-4 (accessed on 7 October 2024). [CrossRef]
Kumar, H. Decision Making for Hotel Selection Using Rough Set Theory: A Case Study of Indian Hotels. Int. J. Appl. Eng. Res. 2018, 13, 3988–3998. [Google Scholar]
Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W. Applied Linear Statistical Models, 5th ed.; McGraw-Hill: New York, NY, USA, 2005. [Google Scholar]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least Angle Regression. Ann. Statist. 2004, 32, 407–499. [Google Scholar] [CrossRef]
Sirimongkolkasem, T.; Drikvandi, R. On Regularisation Methods for Analysis of High Dimensional Data. Ann. Data Sci. 2019, 6, 737–763. Available online: https://link.springer.com/article/10.1007/s40745-019-00209-4 (accessed on 26 November 2024). [CrossRef]
Akay, B.; Karaboga, D.; Gorkemli, B.; Kaya, E. A Survey on the Artificial Bee Colony Algorithm Variants for Binary, Integer, and Mixed Integer Programming Problems. Appl. Soft Comput. 2021, 106, 107351. [Google Scholar] [CrossRef]
Bansal, J.C.; Joshi, S.K.; Sharma, H. Modified Global Best Artificial Bee Colony for Constrained Optimization Problems. Comput. Electr. Eng. 2018, 67, 365–382. [Google Scholar] [CrossRef]
Chen, J.; Chen, Z. Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika 2008, 95, 759–771. [Google Scholar] [CrossRef]
Communities and Crime—UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/dataset/183/communities+and+crime (accessed on 24 October 2024).
Neshat, M.; Alexander, B.; Sergiienko, N.Y.; Wagner, M. Optimization of Large Wave Farms Using a Multi-Strategy Evolutionary Framework. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, Cancún, Mexico, 8–12 July 2020. [Google Scholar]
Putten, P. Insurance Company Benchmark (COIL 2000) [Dataset]. UCI Machine Learning Repository. [CrossRef]
Federal Reserve Bank of St. Louis. Federal Reserve Economic Data (FRED). Available online: https://fred.stlouisfed.org (accessed on 27 November 2024).

Figure 1. Flow chart of the proposed method.

Table 1. Simulation results.

Scenario	Method	Sensitivity	Specificity	Accuracy
Scenario1 n = 50, p = 60, $ρ$ = 0.50	ABC-ADLASSO	0.879	0.958	0.914
	AD_LASSO	0.715	0.892	0.817
	LASSO	0.686	0.887	0.799
	STEPWISE	0.652	0.857	0.795
	LARS	0.633	0.861	0.789
Scenario2 n = 50, p = 60, $ρ$ = 0.90	ABC-ADLASSO	0.911	0.961	0.924
	AD_LASSO	0.742	0.919	0.846
	LASSO	0.674	0.893	0.802
	STEPWISE	0.564	0.731	0.633
	LARS	0.643	0.890	0.786
Scenario3 n = 50, p = 100, $ρ$ = 0.50	ABC-ADLASSO	0.928	0.973	0.938
	AD_LASSO	0.784	0.925	0.909
	LASSO	0.627	0.849	0.742
	STEPWISE	0.577	0.721	0.609
	LARS	0.615	0.856	0.743
Scenario4 n = 50, p = 100, $ρ$ = 0.90	ABC-ADLASSO	0.905	0.961	0.922
	AD_LASSO	0.734	0.905	0.825
	LASSO	0.600	0.868	0.734
	STEPWISE	0.482	0.659	0.527
	LARS	0.617	0.872	0.755
Scenario5 n = 50, p = 60, grouping effect	ABC-ADLASSO	0.923	0.974	0.931
	AD_LASSO	0.739	0.916	0.859
	LASSO	0.573	0.831	0.744
	STEPWISE	0.432	0.630	0.508
	LARS	0.548	0.862	0.783
Scenario6 n = 50, p = 100, grouping effect	ABC-ADLASSO	0.935	0.984	0.943
	AD_LASSO	0.793	0.933	0.850
	LASSO	0.552	0.824	0.735
	STEPWISE	0.337	0.442	0.411
	LARS	0.524	0.846	0.731

Table 2. Real dataset results on Communities and Crime Dataset.

Communities and Crime Dataset		Mean	Standard Deviation	Median	Min	Max	IQR (25th Percentile–75th Percentile)
ABC-ADLASSO	Adjusted R²	0.769	0.023	0.764	0.739	0.796	0.750–0.791
	RMSE	0.100	0.014	0.091	0.073	0.116	0.084–0.113
	MAE	0.076	0.013	0.080	0.057	0.095	0.065–0.084
AD_LASSO	Adjusted R²	0.575	0.047	0.580	0.517	0.643	0.532–0.607
	RMSE	0.202	0.021	0.199	0.175	0.247	0.186–0.211
	MAE	0.161	0.022	0.166	0.123	0.190	0.146–0.174
LASSO	Adjusted R²	0.496	0.014	0.497	0.472	0.515	0.486–0.507
	RMSE	0.239	0.007	0.243	0.232	0.268	0.225–0.249
	MAE	0.192	0.006	0.190	0.184	0.202	0.187–0.197
STEPWISE	Adjusted R²	0.320	0.006	0.318	0.311	0.330	0.316–0.324
	RMSE	0.321	0.005	0.317	0.312	0.330	0.318–0.324
	MAE	0.227	0.002	0.227	0.224	0.230	0.225–0.228
LARS	Adjusted R²	0.460	0.007	0.461	0.448	0.470	0.456–0.464
	RMSE	0.249	0.006	0.247	0.240	0.260	0.245–0.252
	MAE	0.206	0.005	0.210	0.198	0.224	0.203–0.217

Table 3. Real dataset results on Large-scale Wave Energy Farm dataset.

Large-Scale Wave Energy Farm Dataset		Mean	Standard Deviation	Median	Min	Max	IQR (25th Percentile–75th Percentile)
ABC-ADLASSO	Adjusted R²	0.695	0.010	0.690	0.684	0.708	0.689–0.700
	RMSE	0.110	0.002	0.113	0.108	0.119	0.109–0.115
	MAE	0.084	0.002	0.084	0.082	0.087	0.083–0.085
AD_LASSO	Adjusted R²	0.616	0.006	0.614	0.608	0.623	0.610–0.619
	RMSE	0.187	0.005	0.187	0.180	0.192	0.185–0.190
	MAE	0.147	0.005	0.147	0.140	0.152	0.145–0.150
LASSO	Adjusted R²	0.527	0.008	0.529	0.517	0.536	0.521–0.530
	RMSE	0.279	0.006	0.274	0.270	0.285	0.277–0.282
	MAE	0.205	0.007	0.204	0.197	0.214	0.200–0.209
STEPWISE	Adjusted R²	0.369	0.006	0.370	0.361	0.379	0.365–0.372
	RMSE	0.523	0.005	0.520	0.508	0.535	0.522–0.530
	MAE	0.447	0.008	0.445	0.438	0.460	0.442–0.449
LARS	Adjusted R²	0.425	0.005	0.428	0.418	0.432	0.421–0.429
	RMSE	0.355	0.006	0.356	0.347	0.364	0.349–0.360
	MAE	0.273	0.007	0.272	0.265	0.283	0.267–0.278

Table 4. Real dataset results on Insurance Company Benchmark (COIL 2000) dataset.

Insurance Company Benchmark (COIL 2000) DataSet		Mean	Standard Deviation	Median	Min	Max	IQR (25th Percentile–75th Percentile)
ABC-ADLASSO	Adjusted R²	0.636	0.012	0.628	0.620	0.653	0.624–0.647
	RMSE	0.113	0.002	0.113	0.110	0.118	0.112–0.115
	MAE	0.087	0.002	0.087	0.084	0.092	0.085–0.089
AD_LASSO	Adjusted R²	0.611	0.005	0.612	0.600	0.619	0.608–0.615
	RMSE	0.170	0.003	0.171	0.165	0.176	0.168–0.172
	MAE	0.140	0.003	0.140	0.135	0.146	0.138–0.142
LASSO	Adjusted R²	0.566	0.004	0.568	0.559	0.574	0.563–0.570
	RMSE	0.228	0.003	0.228	0.223	0.235	0.225–0.230
	MAE	0.183	0.003	0.183	0.180	0.191	0.181–0.186
STEPWISE	Adjusted R²	0.476	0.004	0.477	0.470	0.484	0.473–0.479
	RMSE	0.522	0.003	0.523	0.514	0.527	0.520–0.525
	MAE	0.428	0.003	0.428	0.421	0.432	0.426–0.430
LARS	Adjusted R²	0.514	0.005	0.514	0.505	0.522	0.510–0.518
	RMSE	0.468	0.004	0.470	0.462	0.475	0.466–0.471
	MAE	0.378	0.004	0.379	0.372	0.385	0.376–0.381

Table 5. Real dataset results on Federal Reserve Economic Data (FRED) dataset.

Federal Reserve Economic Data (FRED)		Mean	Standard Deviation	Median	Min	Max	IQR (25th Percentile–75th Percentile)
ABC-ADLASSO	Adjusted R²	0.758	0.013	0.754	0.736	0.776	0.746–0.766
	RMSE	0.112	0.002	0.117	0.093	0.133	0.103–0.123
	MAE	0.089	0.002	0.092	0.067	0.107	0.077–0.097
AD_LASSO	Adjusted R²	0.703	0.009	0.700	0.680	0.720	0.690–0.710
	RMSE	0.169	0.003	0.168	0.148	0.188	0.158–0.178
	MAE	0.139	0.003	0.140	0.120	0.160	0.130–0.150
LASSO	Adjusted R²	0.625	0.017	0.620	0.603	0.643	0.613–0.633
	RMSE	0.228	0.006	0.231	0.210	0.250	0.220–0.240
	MAE	0.188	0.006	0.185	0.165	0.205	0.175–0.195
STEPWISE	Adjusted R²	0.483	0.008	0.487	0.465	0.505	0.475–0.495
	RMSE	0.414	0.006	0.410	0.392	0.434	0.402–0.422
	MAE	0.325	0.007	0.324	0.303	0.345	0.313–0.333
LARS	Adjusted R²	0.555	0.016	0.552	0.532	0.572	0.542–0.562
	RMSE	0.416	0.005	0.418	0.398	0.438	0.408–0.428
	MAE	0.317	0.005	0.319	0.299	0.339	0.309–0.329

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Onakpojeruo, E.P.; Sancar, N. A Two-Stage Feature Selection Approach Based on Artificial Bee Colony and Adaptive LASSO in High-Dimensional Data. AppliedMath 2024, 4, 1522-1538. https://doi.org/10.3390/appliedmath4040081

AMA Style

Onakpojeruo EP, Sancar N. A Two-Stage Feature Selection Approach Based on Artificial Bee Colony and Adaptive LASSO in High-Dimensional Data. AppliedMath. 2024; 4(4):1522-1538. https://doi.org/10.3390/appliedmath4040081

Chicago/Turabian Style

Onakpojeruo, Efe Precious, and Nuriye Sancar. 2024. "A Two-Stage Feature Selection Approach Based on Artificial Bee Colony and Adaptive LASSO in High-Dimensional Data" AppliedMath 4, no. 4: 1522-1538. https://doi.org/10.3390/appliedmath4040081

APA Style

Onakpojeruo, E. P., & Sancar, N. (2024). A Two-Stage Feature Selection Approach Based on Artificial Bee Colony and Adaptive LASSO in High-Dimensional Data. AppliedMath, 4(4), 1522-1538. https://doi.org/10.3390/appliedmath4040081

Article Menu

A Two-Stage Feature Selection Approach Based on Artificial Bee Colony and Adaptive LASSO in High-Dimensional Data

Abstract

1. Introduction

2. Related Studies

2.1. Filtering Method, Wrapper Technique, and Embedded Algorithm

2.2. Metaheuristic-Based Feature Selection

2.3. Hybrid Feature-Selection Approaches

3. Materials and Methods

3.1. Linear Regression Model

3.2. Stepwise Regression

3.3. Least Angle Regression (LARS)

3.4. Regularization Methods

3.4.1. Least Absolute Shrinkage and Selection Operator (LASSO)

3.4.2. Adaptive LASSO

3.5. Artificial Bee Colony Optimization (ABC)

3.6. The Proposed ABC-ADLASSO Method for Feature Selection

4. Simulation Study

5. Simulation Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI