Novel Improved Salp Swarm Algorithm: An Application for Feature Selection

We live in a period when smart devices gather a large amount of data from a variety of sensors and it is often the case that decisions are taken based on them in a more or less autonomous manner. Still, many of the inputs do not prove to be essential in the decision-making process; hence, it is of utmost importance to find the means of eliminating the noise and concentrating on the most influential attributes. In this sense, we put forward a method based on the swarm intelligence paradigm for extracting the most important features from several datasets. The thematic of this paper is a novel implementation of an algorithm from the swarm intelligence branch of the machine learning domain for improving feature selection. The combination of machine learning with the metaheuristic approaches has recently created a new branch of artificial intelligence called learnheuristics. This approach benefits both from the capability of feature selection to find the solutions that most impact on accuracy and performance, as well as the well known characteristic of swarm intelligence algorithms to efficiently comb through a large search space of solutions. The latter is used as a wrapper method in feature selection and the improvements are significant. In this paper, a modified version of the salp swarm algorithm for feature selection is proposed. This solution is verified by 21 datasets with the classification model of K-nearest neighborhoods. Furthermore, the performance of the algorithm is compared to the best algorithms with the same test setup resulting in better number of features and classification accuracy for the proposed solution. Therefore, the proposed method tackles feature selection and demonstrates its success with many benchmark datasets.


Introduction
The fields of big data, cryptography, and computer science in general are all influenced by the domain of optimization and to some extent even somewhat rely on it. The field of optimization is broad and employs a large variety of techniques. Although there is a large number of optimization solutions, in most of the cases there is room for further improvements and new algorithms can lead to better results. What is more, some optimization methods prove to be suitable for a certain class of problems, while others perform better for other types. Consequently, when proposing a new optimization technique, it needs to be thoroughly tested in order to identify its strengths and weaknesses with respect to the solutions' quality when dealing with different types of problems.
Nature-inspired algorithms have been widely applied in recent years for solving various range mathematical and engineering optimization non-deterministic polynomial hard (NP-hard) problems [1] due to its high robustness and efficiency in exploiting and exploring vast search space domain. Of all nature-inspired approaches, evolutionary algorithms (EA) and swarm intelligence metaheuristics stand out the most and they have been effectively applied to different NP-hard real-world challenges [2][3][4]. The EA approaches conduct a search process by adopting reproduction, crossover and mutation operators from natural evolution, while swarm intelligence mimics collective intelligent behavior of group of organisms from nature such as flock of birds, school of fish, colonies of ants and bees, and so forth. Both families of methods belong to the group of artificial intelligence optimization techniques. Various metaheuristics were reviewed and considered to be improved upon. The most recent from the reviewed set are the grey wolf optimizer (GWO), red deer algorithm (RDA) [5], ant lion optimizer (ALO) [6], grasshopper optimization algorithm (GOA) [7], multi-verse optimizer (MVO) [8], moth-flame optimization algorithm (MFO) [9], social engineering optimizer (SEO) [10], dragonfly algorithm (DA) [11], whale optimization algorithm (WOA) [12], harris hawks optimization (HHO) [13], sine cosine algorithm (SCA) [14]. While the mentioned algorithms have all shown notable improvement performance-wise, none are without shortcomings. In the field of swarm metaheuristics, the primary solutions tend to favor either exploration or exploitation phases. There have been attempts in the domain to initially create a solution that performs equally well in both phases like the elaborate SCA. Nevertheless, even the SCA has undergone modifications and achieved better performance than its original version [15]. Hence, the true potential of the swarm metaheuristics is achieved through hybridization. This modification method relies on the principle of fusing the original algorithm with another. This is usually achieved by incorporating a principle from an algorithm that has better performance for the phase that unfavored by the solution that is improved upon. The dynamic of the field dictates constant improvement and search for new solutions and new ways to improve the existing ones. The reason for the authors to opt to improve SSA is with its robustness while maintaining simplicity. The algorithm is easy to implement and the fine-tuning modifications are even suggested by its author.
The expansion of data availability and computer processing power in recent decades has led to interaction between the fields of nature-inspired metaheuristics and machine learning, which is an artificial intelligence subdomain and as a crucial tool for data science. Machine learning models can be efficiently utilized to find patterns and make predictions from what may appear at first glance uncorrelated huge amounts of data. However, employed large datasets are usually packed with inessential and redundant data negatively influencing machine learning performance regarding computational complexity and accuracy. An attribute of "high-dimensional" is usually associated with such datasets and this phenomena is known as the curse of dimensionallity [16].
Therefore, finding relevant information (features) from large datasets is crucial for tackling the above mentioned issue and it is known as the dimensionality reduction challenge in the modern computer science literature [17]. The process of dimensionality reduction is usually employed in the data pre-processing phase of machine learning and it encompasses two approaches: feature extraction and feature selection. By using feature extraction, new variables are derived from the primary dataset [18], while feature selection chooses a subset of significant variables for further use [19].
The aim of feature selection is to find the most informative subset from high-dimensional datasets by removing redundant and irrelevant features, therefore improving classification and prediction accuracy of machine learning model. According to G. Chandrashekar et al. [20], all feature selection methods can be split into three groups: filter, wrapper and embedded. Wrapper methods utilize learning algorithms to evaluate feature subset by training a model and they are the most efficient, however the most computational demanding as well. Filter methods do not rely on a training system, but apply a measure to assign a score to feature subsets. This group is generally less computationally expensive than the wrapper family, but generates a universal set (not tuned to a particular predictive model) since it does not include model training. Finally, the embedded methods use feature selection as a part of the model construction procedure, that is, algorithms execute feature selection during the model training. The embedded methods are as fast as the filter ones, but more precise.
Regarding the computational difficulty, embedded methods are in the middle of wrappers and filters.
Nature-inspired algorithms, especially swarm intelligence metaheuristics [21,22], have been successfully applied as wrapper methods for feature selection in machine learning and this is one point where machine learning and optimization metaheuristics intersect.
If there are n f features in a dataset, the total number of 2 n f subsets exist and, since for high-dimensional datasets n f is typically a large number, this challenge is considered NP-hard. Consequently, regarding the fact that swarm intelligence proved to be a robust and efficient optimizer for solving NP-hard challenges, its application as a wrapper feature selection method is straightforward.
Notwithstanding that many swarm intelligence applications for feature selection can be found by surveying recent literature sources, considering no free lunch (NFL) theorem [23], there is still space for improvements in this domain. The NFL, which proved to be accurate, states that no universal algorithm exists that can solve all optimization problems. Accordingly, an approach that efficiently solves feature selection issues for all datasets does not exist. The NFL theorem motivates researchers to improve and adjust current algorithms or propose new ones, to solve various problems, including feature selection challenge.
Therefore, the motivation behind the proposed study is to try to further enhance feature selection in machine learning by employing an improved salp swarm algorithm (SSA), which was also developed and evaluated for the purpose of this research. The SSA belongs to the family of swarm intelligence metaheuristics and it was proposed in 2017 by Mirjalili et al. [24]. The basic SSA is enhanced by including an additional mechanism and by hybridization with another well-known swarm intelligence metaheuristics.
Guided by established practice from the modern literature, before its application to feature selection, the proposed enhanced SSA is firstly tested and evaluated on a recognized test-bed with challenging instances of functions having 30 dimensions from the Congress on Evolutionary Computation 2013 (CEC2013) benchmark suite [25]. This also allows a direct comparison of the obtained results with the outputs of a large variety of state-ofthe-art (SOTA) metaheuristics. Afterwards, it is adapted as a wrapper-based approach for feature selection and validated against 21 well-known datasets retrieved from University of California, Irvine (UCI) repository [26].
The scientific contributions of proposed study can be summed as follows: • proposed improved SSA algorithm overcomes some observed deficiencies and establishes better performance than original SSA; • proposed method proves to be promising and competitive with other SOTA metaheuristics according to CEC2013 testing results; and • compared to other SOTA approaches, improvements in addressing feature selection issue in machine learning in terms of classification accuracy and number of selected features is established.
Based on that stated above, the method proposed in this study tackles the feature selection challenge and demonstrates its success with many benchmark datasets.
The organization of the manuscript is as follows. Section 2 covers some of the most notable SOTA approaches from the domain of swarm intelligence, as well as from the area of hybrid methods between swarm algorithms and machine learning. In Section 3, the original SSA is presented first, then its drawbacks are indicated and finally details of the proposed algorithm are provided. Sections 4 and 5 present simulations with standard CEC2013 instances along with feature selection experiments including comparative analysis and discussion with other recent SOTA algorithms. Finally, a summary and future research plans are examined in Section 6.

Related Works
There are several recent good survey studies that present the challenges that appear within feature selection in various fields of machine learning, as well as indicate the most prolific methods to achieve the task. Some very inspiring reads are [20,27], as well as the more recent work [28]. These also thoroughly present the complexity of the feature selection task, the manner in which the dimensionality reduction can be achieved for various datasets, ideas that are also marginally discussed in the introduction section of the current article. Another work that also presents a survey for the same problem is [29]. This study especially concentrates on evolutionary computation approaches for achieving the goal, so it is better linked with the current work. A review of studies for feature selection that is further narrowed only on methodologies involving swarm intelligence algorithms is found in [30].
The two most popular evolutionary computation approaches in feature selection are genetic algorithms (GAs) and particle swarm optimization (PSO), and for both there is an increasing trend in the number of studies using them in the last couple of decades [29]. They are both applied in wrapper approaches beside various classification algorithms, like support vector machines [31][32][33], K-nearest neighbor [34][35][36], artificial neural networks [37,38], decision tree [39] and so forth.
In [31], a regression real-world task regarding combustion processes in industry is considered, where support vector regression is actually employed for getting an optimal carbon monoxide concentration in the exhaust gases based on other characteristics. Besides a GA for feature selection, two more methods from Bayesian statistics are tested, but the GA approach proves to be superior. Another case of successful combination between a GA and SVM for classification is presented in [32], where the GA is used both for feature selection and for fine tuning the parameters of the SVM. In [33], dataset with medical microscopical images is considered and features are first extracted from these and they are further reduced by feature selections and eventually an SVM is applied for achieving automated diagnosis.
In [34], a bees inspired optimization algorithm is used as the metaheuristics that takes care of optimization, several benchmark datasets are used and the results are compared to cases when a GA, a PSO or an ant colony optimization are used. The approach in [36] integrates an evolutionary algorithm with a local search technique and the authors claim very good performance for medium-to large-sized datasets.
In [37], a real-world credit dataset is collected at a Croatian bank and the GA combined with ANN is applied to it and then further tested on a UCI database. Applications to medical data are presented in [38], where various classifiers (SVM, artificial neural networks, K-nearest neighbor, linear regression) are optimized via a genetic algorithm as concerns both parameter optimization and feature selection.
Finally, in [39] an application to medical images performs, as in [33] above, feature extraction and then feature selection is performed using a GA. Various classifiers like SVM, ANN and decision tree are used for the final prediction. Another example of feature selection tackled by swarm intelligence is [40], where the PSO algorithm is validated and improved upon with a innovative mechanism of initialization and the update process of solutions with the 20 popular datasets.
SSA has also been used to address the feature selection problem. Some of the efficiently improved cases of the basic SSA include the solution of feature weighting with the minimum distance problem [41], the problem of feature selection solving through hybridization with the opposition based learning heuristics [42], and the improvement of accuracy, reliability and the convergence time for the problem of feature selection with the introduction of the inertia weight control parameter [43]. SSA has also been successfully modified and applied in other application domains recently, such as green home health care routing problem [44], health care supply chain [45], crop disease detection [46] and power systems unit commitment task [47], to name the few.
Nature is the source of inspiration in the case of swarm intelligent algorithms. The benefit for the machine learning techniques derives from the good compatibility with the main principle of swarm intelligence of employing an immense amount of units individually incapable of solving the problem. This sort of algorithms are often applied by themselves for the reason of their well known exceeding performance. Furthermore, their full poten-tial is reached by incorporating hybridization techniques. The real world application of swarm intelligence solutions is vast from the clustering, node localization, and preserving of energy in wireless sensor networks [48][49][50][51], through to the scheduling problem with cloud tasks [2,52], the prediction of COVID-19 cases based on machine learning [53,54], MRI classification optimization [55,56], text document clustering [57], and the optimization of the artificial neural networks [58][59][60][61].

Proposed Method
This section first introduces basic details of the original SSA metaheuristics. Afterwards, the observed drawbacks of the basic version are elaborated and mechanisms that are able to overcome its deficiencies are proposed. Finally, solutions for improving SSA are put forward.

Basic Salp Swarm Algorithm
The SSA [24] algorithm was motivated by the group of animals called salp, which are aquatic, small, barrel-shaped and transparent. The individual units of this specimen bind together with the goal of finding the safest paths in finding food sources. These interesting creatures link up one behind another forming a chain.
The first unit in the chain is the leader and its behavior models exploration and exploitation of the optimization algorithm search process. The leader decides where the group will go in search for paths and food in its area. The leader's position is changed towards the direction of the food source, that represents the current best solution.
The units' positions in D-dimensional search space are mathematically described as a two-dimensional matrix labeled X, while the food source (current best solution) is labeled as F. The following function updates the leader's position in the j-th dimension [24]: the x 1 denotes leader, F j represents the position of the current best solution (food source), the upper and lower search space boundaries in the j-th dimension are, respectively, ub j and lb j , while c 1 , c 2 and c 3 denote pseudo-random numbers drawn from the interval [0, 1]. The parameters c 2 and c 3 determine the step size and dictate whether the position of the new solution will be generated towards negative or positive infinity. However, the most important parameter is considered to be c 1 due to the reason that it directly influences the exploration and exploitation balance, which is one of the most important factors that influence search process efficiency. The c 1 is calculated as [24]: where the current iteration is represented as l and the maximum iterations in a run are denoted as L.
The position of followers is updated with the following equation that represents Newton's law of motion [24]: where x i j denote i-th follower in the j-th dimension and i ≥ 2. Annotation t represents time and a = t , and the initial speed is V 0 . Due to the fact that time in any optimization process is modeled as iteration, the disparity between iterations is 1 and V 0 = 0 at the beginning, Equation (3) can be reformulated as:

Cons of the Original Algorithm and Proposed Improved Approach
It is a common case for the basic optimization algorithms to have certain deficiencies and that is also the case with the SSA. Noticed cons of the basic SSA can be summarized as follows: insufficient exploration, average exploitation power (conditional drawback) and intensification-diversification trade-off.
In general, any optimization algorithm can be improved by applying small modifications, for example, minor changes made to the search equation, additional mechanisms, and/or significant changes by hybridization with other algorithm. For the purpose of this study, basic SSA was improved by including novel mechanism, as well as hybridization with another well-known optimization metaheuristics.
Based on the findings from previous research [62,63], as well as on extensive simulations with challenging CEC2013 benchmark instances [25] that were conducted for the purpose of this study, it was discovered that the diversification process of basic SSA exhibits some deficiencies, which leads to the inappropriate intensification-diversification balance, that is on average dis-balanced towards exploitation.
First of all, the SSA exploration is controlled only by dynamic parameter c 1 according to Equation (2) and at the beginning of a run it is shifted towards exploration, while at later iterations it slides towards exploitation. However, this mechanism is applied only to the leader F (current best solution) and the whole search process to some extent depends on the luck. Followers are updated according to Equation (4), which is essentially exploitation between its previous and current positions. If the algorithm was lucky and manages to find a region of the search space where the optimum solution resides, then the search process will eventually converge and satisfying solutions' quality will be obtained. Conversely, the search will stuck in sub-optimal regions and best solutions will be located far from global optimum at the end of a run.
Therefore, a solution for the above mentioned issue would be to improve exploration in early iterations. For achieving this goal, an exploration replacement mechanism is incorporated into the basic SSA in the following way: in the first rmp iterations, the wrs worst solutions from population are rejected and renewed with randomly generated solutions within upper and lower bounds of the search space according to expression: where rnd is pseudo-random number drawn from a uniform distribution. The same expression is utilized in the initialization phase, where a starting random population is generated. This mechanism introduces two additional control parameters: replacement mechanism point (rmp), that determines when (in terms of l) the replacement mechanism will be triggered and worst replaced solutions (wrs) that controls the number of worst solutions that will be replaced with random ones. If rmp = L, then the enhanced exploration will be performed throughout the whole run, similarly if rmp = 0, then the SSA search will executed as in its basic version.
By further analysis of the original SSA, it was also determined that the exploitation procedure with the followers (Equation (4)) is relatively simple depending on their current and previous positions. To overcome this, hybridization with another recently proposed metaheuristics, the SCA [14] is performed. In each iteration, the followers are updated either by using basic SSA equation (Equation (4), or SCA search expression for and individual i and component j: where r 1 , r 2 , r 3 and r 4 are four randomly generated values from the interval [0, 1], P j represents the j-th component of random individual from population, || indicates the absolute value and sin and cos are standard trigonometric functions.
Similarly, as the original SSA, the SCA employs the following formula to adjust intensification-diversification balance: where the parameter a represents a constant.
To control whether the followers' position will be updated using basic SSA or SCA search, pseudo-random number φ is used, as it is shown in Algorithm 1.
Encouraged with the introduced modifications, proposed enhanced SSA is named SSA with replacement mechanism and SCA search-SSARM-SCA. Its pseudo-code is shown in Algorithm 1. The flowchart of the algorithm is shown in Figure 1.
Initialize population X by using Equation (5) repeat Compute the objective function for each solution x i Update the best salp (solution) (F = Xb)

Complexity and Limitations of Proposed Method
The most computationally expensive operation during metaheuristics algorithm's execution is fitness function valuation (FFE). Accordingly, as established in the most relevant and contemporary computer science publications, the complexity of the algorithm is measured in terms of utilized FFEs [64].
Complexity of both basic SSA and the proposed SSARM-SCA algorithms is the same: where NP denotes the number of solutions in the population, while T represents the number of iterations. The proposed algorithm in each iteration performs the search either by utilizing the SSA or SCA search equations. In the first rmp iterations the wrs solutions are replaced by pseudo-random solutions, however this does not add additional costs in terms of FFE, as all solutions in the population are being evaluated at the beginning of each iteration.
When the FFE is being considered, the proposed SSARM-SCA algorithm is not more complex than the basic SSA metaheuristics. The algorithm is slightly more complex if the number of floating point operations is taken into account, however this can be disregarded in comparison to FFE, and therefore it is not relevant for the algorithm's complexity.

Validation of the Proposed Method for Standard CEC2013 Benchmarks
Following good practice from modern literature, the proposed SSARM-SCA is first tested on challenging CEC2013 benchmark instances [25] with 30 dimensions (D = 30) before being adapted for the practical feature selection challenge. With the goal of making comparative analysis with other SOTA approaches, which results are published in the recent papers, the same experimental conditions in terms of control parameters as in [65] are kept.
The CEC2013 benchmark suite contains 28 functions that are split into three groups based on its characteristics. Test instances from 1 to 5 are unimodal, benchmarks from 6 to 20 are multimodal, and finally, test bed from 21 to 28 belongs to the category of composite functions. Functions' details employed in simulations are given in Table 1.
Besides the proposed method and original SSA, for the purpose of comparative analysis, all methods shown in [65] are also implemented and evaluated. All algorithms are tested with 50 individuals in population N = 50 and the number of fitness function evaluations maxFFEs of 3 × 10 5 is set as termination condition as in [65].
The SSARM-SCA is compared to practical genetic algorithm (RGA) [66], gravitational search algorithm (GSA) [67], disruption GSA (D-GSA) [68], black hole GSA (BH-GSA) [69], clustered GSA (C-GSA) [70] and attractive repulsive GSA (AR-GSA) [65]. Specific SSARM-SCA control parameters are set as follows: rms = 3 × 10 2 according to expression maxFFEs/1000 and wrs = 10 by using formula N/5. Values for these parameters are determined empirically. Dynamic parameter c 1 for original SSA and SSARM-SCA are adjusted according to Equation (2) and parameter r 1 of SSARM-SCA is adjusted throughout the run by expression (7). It is noted that in those expressions instead of l and L, the FFEs and maxFFEs are used, respectively. Other methods implemented for the purpose of comparison are tested with the control parameters suggested in [65].
All algorithms are executed in 51 independent runs and the following metrics in terms of objective function values are captured: best, median, worst, mean and standard deviation. Comparative analysis results are split into three tables based on the function types as follows: Table 2 show results for unimodal, Table 3 presents metrics for multimodal and Table 4 depicts results for composite CEC2013 instances. The best results for each metrics are marked bold in all tables.       First of all, obtained results for all methods for the purpose of this study are similar as in [67], therefore this research validates results reported in [67]. From the comparative analysis results superiority of proposed SSARM-SCA can be unambiguously determined. For most of the benchmarks, including all three types (unimodal, multimodal and composite) in average, the SSARM-SCA obtains the best results for all four indicators among all other SOTA metaheurisitcs. Specifically, when comparing to the original SSA, improvements in terms of convergence speed and results' quality are substantial.
More insights regarding the convergence speed can be obtain from Figure 2. In the presented figure, convergence speed graphs for some methods included in analysis for 2 unimodal (F1 and F4), 4 multimodal (F7,F12,F14 and F18) and 2 composite (F24 and F28) benchmarks are generated. Provided graphs validate clear improvements of proposed SSARM-SCA over original SSA and other SOTA methods in terms of convergence.
However, to more objectively determine the robustness and efficiency of one approach over others, results should also be compared in terms of statistical tests. For that reason, the Friedman test [71,72], as the primary method for doing as alongside the ranked two-way analysis of variances of the proposed method and other implemented methods for the research, was conducted.
The results achieved by the 8 implemented algorithms over the 28 functions from the CEC2013 benchmark set, including the Friedman and the aligned Friedman test, are presented in the Tables 5 and 6, respectively.    As observed in Table 6, the proposed SSARM-SCA outperformed all of the other candidates, as well as the basic SSA which averaged the ranking of 133.463. Proposed SSARM-SCA obtained an average ranking of 56.838.
Furthermore, the research [73] provides grounds for the possible improvement in terms of performance in comparison with the χ 2 value. Hence, the Iman and Davenport's test [74] is used as well. The results of this test are summarized in Table 7.
The results show a value of 2.230 × 10 1 , which demonstrates significantly better results than the F-distribution critical value (F(9, 9 × 10) = 2.058 × 10 0 ). Additionally, the null hypothesis is rejected by Iman and Davenport's test. The Friedman statistics fared with the score of (χ 2 r = 1.407 × 10 1 ) resulting in better performance than the F-distribution critical value at the level of significance being α = 0.05.
The final conclusion is that the null hypothesis can be rejected and that the proposed SSARM-SCA is clearly the best of its competitors. The rejection of the null hypothesis by both statistical tests performed is followed by the next type of test, Holm's step-down method which is a non-parametric post-hoc method. The findings of such experiments are displayed in Table 8.
The p value is the main sorting reference for all the methods and they are compared against the α/(k − i). The k denote the degree of freedom while the i shows the number of the algorithm, respectively. This research utilized α parameter at the levels of 0.05 and 0.1. It should be noted that the values of p parameter are displayed in scientific notation. The summary of testing with Holm's method by the results provided in the Table 8 stands to prove that the improvement has been achieved for the subjected solution in case of both levels of significance.

Feature Selection Experiments
The feature selection belongs to the group of binary problems, hence the well-known V-shaped transfer function was used for mapping continuous search space variables to discrete values 0 and 1. Therefore, if a dataset consists of n f feature, one solution is represented as a binary array of length n f . This is how the proposed SSARM-SCA was adapted for this problem and for the sake of distinguishing binary version from its respecting continuous version it is referenced as the bSSARM-SCA.
Efficiency of proposed method for feature selection challenge was compared to SOTA metaheuristics presented in [22]. For that reason similar experimental conditions as in [22] were established. However, instead of using L = 70 with N = 8 as in [22], the maxFFEs was used as termination condition and it was set to 560 (N · L). This approach is more reasonable since different optimization algorithms consume different number of FFEs in each iteration and respecting the fact that the FFE is the most expensive calculation in optimization process. The other SSARM-SCA control parameters were as follows: rms = 56 according to formula maxFFEs/10 and wrs = 2 by using expression round(N/3).
The bSSARM-SCA performance was tested on the 21 UCI datasets which are often used for bench-marking (Table 9). All datasets are split into training and testing using train_test_split rule in proportion 80%:20%. Each solution's fitness is calculated on the training set by utilizing nearest neighbors (KNN) classifier and the following fitness function F as in [22]: where the E R (D) represents the error-rate of classification, the selected features number represented as R, and lastly the C shows the sum of all features. The α and β are parameters that establish relative influence of the E R (D) and R to the fitness function and they sum to 1 (α = 1 − β). From the formulated fitness function it can be seen that the classification error rate, as well as the number of selected features are taken into consideration and that the problem is formulated as minimization optimization challenge. In this study, α is set to 0.9, while β is adjusted to 0.1.
At the end of a run, the solution with best fitness is determined and results of its evaluation on the testing set were reported. All experiments were conducted in 20 independent runs. All methods, including SOTA metaheuristics used in comparative analysis shown in [22], along with original bSSA and bSSARM-SCA are implemented in Python using numpy, pandas, scikitlearn and matplotlib libraries. Moreover, the same performance metrics as in [22] are shown and for all implemented methods, a V-shaped transfer function is used for mapping continuous to binary search space. Algorithms proposed in [22] were tested with the control parameters as suggested in original papers. Finally, as proposed in [40], four different initialization methods were employed in order to more objectively evaluate proposed method: small, mixed and large. In small initialization, all individuals are generated at the beginning of a run with small number of selected features (about 1/3) and in the case of a large individuals employ most of the features ([2/3,1]). In mixed initialization experiments, generated solutions take into account about 2/3 of all features in the dataset.
In all three experiments mean fitness and accuracy obtained over 20 runs are used as performance metrics and expressions used for its calculation are given in Equation (9) and Equation (10), respectively.
where average fitness is denoted as Avg( f ), f * designates the individual with the best fitness in the run, while Run represents the total number of runs.
where Avg(c) represents the average classification accuracy, N marks the number of instances in the test set, C i represents the classifier output for instance i, and L i denotes the reference class corresponding to the given instance i.
Mean fitness and classification accuracy for all three initialization strategies and 21 UCI datasets are shown in Tables 10-15. In all provided tables, the best results are marked with bold style.  From the provided experimental results, a few important remarks can be deduced. First, similar results for WOA, bWOA-S, bWOA-v, BALO1, BALO2, BALO3, PSO, bGWO and bDA to those reported in [22] were obtained, therefore validity of previous study is confirmed (it is noted that due to stochastic nature of metaheuristics, exactly the same results could not be generated). Second, proposed hybrid bSSARM-SCA for most datasets and benchmark instances outscores original SSA, hence performance improvements over basic implementation are clear. Finally, when compared to all other SOTA approaches encompassed by comparative analysis, proposed bSSARM-SCA in average obtained the best results and proved to be robust method in tackling feature selection challenge in terms of employed fitness function and classification accuracy.
Formulated fitness function takes into account the number of selected features, however only with weighted coefficient of 0.1 (parameter β = 0.1 in expression (8)). For that reason, to further validate propose method the average proportion of selected features (selection size) over 20 runs and all three initialization strategies are shown in Table 16. Similar to results with an average obtained fitness function and classification accuracy, from Table 16 it can be concluded that on average proposed bSSARM-SCA metaheuristics managed to significantly reduce the number of selected features and this in turn has implications for the classifier's computational efficiency. Therefore, as a conclusion by performing feature selection with bSSARM-SCA classification computational time can be substantially reduced. In terms of average selection size, only the bDA for some test instances managed to outscore the method proposed in this study.
Box and whiskers diagram visualization of average classification error (E R ) for all datasets and three initialization strategy is shown in Figure 3. From presented diagram stability of propose bSSARM-SCA can be undoubtedly noticed. For example, when compared with basic SSA, that in some runs misses promising regions of the search space, the superiority of the algorithm proposed in this study is evident.
Finally, to show the performances of the proposed bSSARM-SCA algorithm and compare it to other SOTA SSA versions, the authors have implemented binary versions of three novel SSA modifications. The accuracy of the bSSARM-SCA over 21 datasets was compared to opposition based learning and inertia weight ISSA (bISSA1), proposed by [41], opposition based learning and local search ISSA (bISSA2) proposed in [42], and inertia weight ISSA (bISSA3) given in [43]. Again, it is worth noting that the authors have independently implemented all three mentioned binary ISSA variants and executed the experiments with 21 observed datasets. The obtained results are shown in Table 17, where the best result is marked bold for each category (small, large or mixed initialization). The simulation findings clearly show the superiority of the proposed bSSARM-SCA method, that obtained the best results on 15 out of 21 observed datasets. The second best method was bISSA2 [42], which obtained the best results on four datasets, while the bISSA1 method [41] achieved the best accuracy on two datasets.

Conclusions
Research proposed in this study presents a novel SSA algorithm that addresses observed deficiencies of its original implementation. By hybridizing basic algorithm with well-known SCA metaheuristics and by incorporating guided replacement mechanism, a novel SSARM-SCA metaheurisitcs is devised.
Guided by established practice from the modern literature, before its application to feature selection, the proposed enhanced SSA is firstly tested and evaluated on a recognized test-bed with challenging instances of functions having 30 dimensions from the CEC2013 benchmark suite. Afterwards, it is adapted as a wrapper-based approach for feature selection and validated against 21 well-known datasets retrieved from UCI.
According to experimental findings and rigorous comparative analysis with other recent SOTA approaches, proposed SSARM-SCA proves to be an efficient optimizer that significantly improves convergences speed and results' quality of the basic SSA and also other SOTA algorithms. Moreover, obtained results prove that the proposed method manage to established better classification accuracy and utilization of lesser number of features, therefore it also manages to improve the solution to the feature selection challenge.
The proposed SSARM-SCA algorithm does not increase the complexity of the basic SSA implementation in terms of FFE, while offering significantly better performances for this particular problem. However, according to the no free lunch theorem, the limitation of the proposed solution is that there are no guarantees that it would perform well for other optimization problems.
The possible directions of the future research include testing of the devised SSARM-SCA algorithm on other practical datasets from different application domains, and also applying it to other optimization problems, such as the wireless sensor networks optimization problem and task scheduling in cloud-based systems.