A Hybrid Feature Selection Framework Using Improved Sine Cosine Algorithm with Metaheuristic Techniques

: Feature selection is the procedure of extracting the optimal subset of features from an elementary feature set, to reduce the dimensionality of the data. It is an important part of improving the classiﬁcation accuracy of classiﬁcation algorithms for big data. Hybrid metaheuristics is one of the most popular methods for dealing with optimization issues. This article proposes a novel feature selection technique called MetaSCA, derived from the standard sine cosine algorithm (SCA). Founded on the SCA, the golden sine section coefﬁcient is added, to diminish the search area for feature selection. In addition, a multi-level adjustment factor strategy is adopted to obtain an equilibrium between exploration and exploitation. The performance of MetaSCA was assessed using the following evaluation indicators: average ﬁtness, worst ﬁtness, optimal ﬁtness, classiﬁcation accuracy, average proportion of optimal feature subsets, feature selection time, and standard deviation. The performance was measured on the UCI data set and then compared with three algorithms: the sine cosine algorithm (SCA), particle swarm optimization (PSO), and whale optimization algorithm (WOA). It was demonstrated by the simulation data results that the MetaSCA technique had the best accuracy and optimal feature subset in feature selection on the UCI data sets, in most of the cases.


Introduction
With the explosive growth of data resources in modern society, data mining not only has a crucial effect on various industries, but has also become the key to the core competitiveness of various industries [1][2][3].Data mining is the operation of extracting hidden patterns from extensive data in conditions of incomplete, noisy, and random row data, which is helpful for people to make decisions [4][5][6].However, due to the huge amount of data and the existence of redundant data, it is difficult to obtain the information directly from big data that can help in decision making.Based on the above statements, we can see that data pre-processing has a meaningful impact on the success of big data mining [7][8][9].In addition, feature selection is a key step in data preprocessing.[10][11][12].
For deterministic datasets, since the presence of irrelevant features does not affect the accuracy of the classification algorithm, training the classifier using the primary feature set will increase the computational overhead of the classifier and will not enhance the classification performance of the classifier [13][14][15].Therefore, feature selection enables Energies 2022, 15, 3485 2 of 24 reducing the feature dimensionality of the original dataset.[16,17].This can not only improve the efficiency of classifier, but also save computational resources [18][19][20].
There are various main models for feature selection [21,22]: the filtering model, embedding model, and packing model [23].The main idea of the filtering model is to score each feature using a proxy measurement, then obtain the importance ranking sequence of all features, and finally choose the optimal feature subset among the sort sequence, according to the set threshold of the number of features.In addition, the filtering model first chooses the optimal subset of features, and then trains the learner.Common proxy metrics include the chi-square test, information gain, correlation coefficient, and so on [24][25][26].The main idea of the embedded model is to complete the feature selection operation in conjunction with the learner fitting operation, to obtain the feature subset with higher classification accuracy [27,28].The package model treats the optimization problem for a subset of features as an optimization issue [29,30].The package model first generates several different feature subsets, then calculates the fitness of all feature subsets by using the adaptation function, and, finally, finds the feature subset with the highest adaptation value, as the best feature subset.There are many common metaheuristic optimization algorithms that can address this problem [31], examples include particle swarm optimization (PSO) [32], genetic algorithm (GA) [33], and sine cosine algorithm (SCA) [34].
Meta-heuristic optimization is a common approach to solving global optimization issues.Unlike the traditional optimization techniques, such as simulated annealing with gradient descent, the metaheuristic algorithm is a flexible optimization method that can ignore gradient variations.The SCA is an optimization that is subject to population intelligence among metaheuristic algorithms [35].However, similarly to the different population intelligence-based algorithms, the optimization strategy of the SCA is prone to local optimization and unbalanced exploration [36], which leads to the difficulty of finding the optimal subset in feature selection using the SCA.For the purpose of settling the above problem, this article puts forward an improved SCA (MetaSCA) feature selection model.The principal contributions of the work are listed below.
(1) We propose a hybrid feature selection framework, using an improved SCA with metaheuristic techniques to reduce the dimensionality of data in the face of the curse of dimensionality due to a large number of features in a dataset.(2) We analyze the optimization performance of the standard SCA algorithm and point out that the algorithm has difficulty in selecting the best feature subset during feature selection.An improved SCA (MetaSCA), based on the multilevel regulatory factor strategy and the golden section coefficient strategy, is proposed to enhance the superiority-seeking effect of the SCA and implemented for the solution of the optimal feature subset.(3) We tested the method with several datasets, to explore the performance of the method in feature selection.From the simulation results, we can see that the MetaSCA technique achieved good results in seeking the best feature subset.
The remainder of the article is organized as follows: Section 2 lists some related works in the literature.Section 3 describes the feature selection model and the problems to be solved in this article.Section 4 investigates the SCA and analyzes the optimization performance of the SCA.Based on this, the improved MetaSCA is proposed in this paper to solve the feature selection problem.Section 5 gives the simulation results of the MetaSCA in feature selection and discusses the potential of the scheme for feature selection.Section 6 gives the conclusions and features the work.

Related Works
In the past few years of investigation on big data classification and artificial intelligence, the number of features employed in applications has expanded from a few dozen to hundreds [37].However, this excess of features will not only cause the curse of dimensionality, but also has a negative impact on the problems explored.Therefore, after obtaining a large number of data features, we need to select the relevant ones that are helpful to the problem from all the features.Feature selection is a method for achieving this goal [19,22,38].Moreover, feature selection is broadly classified into filtering models, embedding models, and packing models.As one of the packing models, heuristic search is a method that uses heuristic information to continuously reduce the search space, which reduces the impact of excessive computation in finding the best feature subset in a highdimensional feature area [39].Therefore, metaheuristic search methods are combined with classifier models to acquire the best feature subset.Several metaheuristic models are given below that handle feature selection.
Two feature selection models founded on the whale optimization algorithm (WOA) have been proposed.The first model embedded the simulated annealing (SA) algorithm into the WOA, while the second one used the SA to refine the optimal solution obtained after each iteration of the WOA.[40].Based on the WOA, two packing feature selection methods have been proposed.In the first method, the tournament and roulette selection mechanisms were used to replace the random operator in the conventional WOA.In the other method, the crossover and mutation operators were utilized to refine the WOA [41].Feature selection founded on accelerated particle swarm optimization was created to enhance the accuracy of classification for high-dimensional data, when processing big data [32].The method was applied to a set of high-dimensional data for feature selection and experimental evaluation.The simulation indicated that the lightweight feature selection acquired good results.
A new algorithm (UFSACO) contingent on the ant colony optimization algorithm was proposed and applied in feature selection [42].The UFSACO did not use a learning algorithm and found the optimal feature subset through multiple iterations.This method had the advantages of low computational complexity and could solve the feature selection in high-dimensional data sets.The simulation results demonstrated the effectiveness of the UFSACO.An evolutionary crossover and mutation operator algorithm founded on the gravitational search algorithm (GSA) was presented in [43], to complete the task of feature selection.Simulation results showed the superiority of the algorithm in feature selection [43].A fresh competitive binary gray wolf optimization method was put forward to accomplish the feature selection task in electromyography (EMG) signal classification.This method was compared with the binary gray wolf algorithm, the binary particle swarm optimization algorithm, and the GA.The simulation data illustrated that the method [44] not only had a better classification performance in using the selected optimal feature subset, but also had big advantages in feature reduction.
For the subset of feature selection, an enhanced Harris hawks optimization (IHHO) contingent on Harris hawks optimization (HHO) was put forward to select the optimal feature subset in a feature selection problem [45].IHHO embedded the salp swarm algorithm (SSA) into conventional HHO, and compared this method with other feature selection methods.The simulation results indicated that using the optimal feature subset selected by IHHO to train the classifier could obtain a better classification.Integrated improved binary particle swarm optimization (iBPSO) with correlation-based feature selection (CFS) for cancer dataset classification resulted in a better performance accuracy of classification [46].
For feature selection, [47] an improved since cosine algorithm (ISCA) with elitism strategy and a new solution update mechanism was presented.The ISCA was compared for its performance in feature selection on data sets with the GA, PSO, and the SCA.Experimental results revealed that the algorithm advanced in [47] not only decreased the number of features, but also improved the classification performance.An improved Salp swarm optimizer, named ISSAFD, was proposed for feature selection in [48], based on the SCA and interrupt operator.The sinusoidal function was used to update the follower position in the SCA, to overcome the disadvantage of falling into the local optimum in the exploration stage.In addition, interruption strategies were added to strengthen the population diversity and to maintain a balance between global and local searches [48].
A novel hybrid optimization algorithm (SCAGA) based on the SCA and GA was advanced to handle the task of feature selection [49].Evaluation of its fitness was made using the k-nearest neighbor algorithm for classification accuracy.The SCAGA was compared with the conventional SCA, ant colony optimization (ACO), and PSO on University of California Irvine (UCI) data sets for proficiency of feature selection.The experimental comparison showed that the SCAGA achieved the best performance in the test set.According to binary particle swarm optimization (BPSO) and the SCA, a hybrid optimization (HBPSOSCA) was proposed to select feature subsets with rich information from highdimensional features for cluster analysis [50].In [51], two binary metaheuristic algorithms based on the SCA called the SBSCA and the VBSCA for feature selection in medical data were presented.These algorithms generated each solution using two transfer functions.The simulation comparison results showed that the two methods proposed in [51] had a higher classification accuracy than the other four compared binary algorithms on the five medical datasets of the UCI repository.
In this study, an enhanced sine cosine algorithm (MetaSCA) is advanced for feature selection, using the multi-level regulatory factor strategy and the golden sine strategy.First, we transformed the continuous solution of the SCA into binary form, by mapping to determine feature selection and dropout.Then, the influence of the multi-level regulatory factor on the subset of features during SCA optimization was investigated with respect to diversity perspective.For the purpose of a more balanced exploration and exploitation in the optimization process, a multi-level regulatory factor strategy is proposed.Finally, the golden sine strategy is introduced to narrow down the feature solution space through golden partitioning and search only the space that yields good results.

Feature Selection Model
An object often contains multiple features when dealing with classification problems.These characteristics fall into the following three broad categories.

•
Related features: Such features help to complete the classification task and improve the fit of an algorithm.

•
Irrelevant features: Irrelevant features that do not help to improve the fit of an algorithm, and which are not relevant to the task at hand.

•
Redundancy feature: The improvement in classification performance brought by redundant features can also be obtained from other features.
Feature selection is the process of selecting all relevant features and discarding the redundant and irrelevant ones, to maximize the classification rate of the classifier and diminish the complexity of the original dataset when faced with all the features of the dataset.The feature selection framework put forward in this article is exhibited in Figure 1.The MetaSCA will be introduced in Section 4.

Problem Formulation
To decrease the difficulty of the learning task and increase the classification efficiency of the classification model, only relevant features need to be selected for training and fitting a classification algorithm.However, we cannot determine in advance which features are relevant.Therefore, we specify that the subset of features that contains fewer features, but that is better for the classification accuracy obtained by the classifier, is the optimal feature subset.
Assuming that the total quantity of features in the dataset is denoted as m, the original set of features is expressed as F and the feature subset is represented as F * .F i = 1 represents the selected feature in F. The problem of feature selection model can be formulated as follows: where f (F * )-represents the fitness of feature subset F * ; error(F * )-stands for the error rate after classifying datasets using the feature subset; ∑ m i=1 F i = 1 represents the entire count of features in the feature subset F * ; ω is a constant, which is used to determine the influence of error(F * ) and ∑ m i=1 F i =1 m on the fitness function f (F * ), respectively; Let c1 be the feature subset that contains all features that are marked 1 in the original feature set.

Problem Formulation
To decrease the difficulty of the learning task and increase the classification efficiency of the classification model, only relevant features need to be selected for training and fitting a classification algorithm.However, we cannot determine in advance which features are relevant.Therefore, we specify that the subset of features that contains fewer features, but that is better for the classification accuracy obtained by the classifier, is the optimal feature subset.
Assuming that the total quantity of features in the dataset is denoted as , the original set of features is expressed as  and the feature subset is represented as  * . = 1 represents the selected feature in .The problem of feature selection model can be formulated as follows: where ( * )-represents the fitness of feature subset  * ; ( * ) -stands for the error rate after classifying datasets using the feature subset; ∑  = 1 represents the entire count of features in the feature subset  * ;  is a constant, which is used to determine the influence of ( * ) and ∑ on the fitness function ( * ), respectively; Let 1 be the feature subset that contains all features that are marked 1 in the original feature set.In this section, we give the optimization process of the conventional SCA.Table 1 contains some symbols, and their corresponding meanings, that are used in this paper.Golden section coefficient This algorithm has the advantages of simple parameters and clear results [35,52,53].At the same time, the disadvantages of the SCA are obvious, such as the tendency to fall into the local optimum and a slow convergence rate [54].The position update is mainly determined by the sine or cosine function, as shown in Equation (2).
where X t+1 (i, j)-represents the position of individual i in the t + 1 round of dimension; j, X t (i, j)-represents the position of individual i in the t round of dimension j; X best t (i, j)-represents the position of the global optimal solution in the previous t rounds; r 1 (t) -represents the amplitude factor; , and r 2 , r 3 , r 4 parame- ters are uniformly distributed random numbers.
The parameter r 1 determines the moving direction of X t+1 (i, j); this direction could be either between the space of X t (i, j) and X best t (i, j) or outside it.Moreover, r 1 also defines the exploration and exploitation of the update process of SCA.The r 2 defines how much X t (i, j) moves toward or away from X best t (i, j).The r 3 parameter defines the degree of influence of the optimal solution X best t (i, j) on the current solution X t (i, j).On the condition that r3 > 1 stochastically, the degree influence of X and Y should be weakened, otherwise they should be strengthened.The r 4 parameter controls the switch of the SCA, between the sine transform and cosine transform.
Figure 2 shows the conceptual schematic of the influence of the sine and cosine functions in the scope of [−2, 2].The parameter r 2 determines whether the updated position of the solution appears between or outside the current solution and the optimal solution.When the value of r 1 (t)sinr 2 or r 1 (t)cosr 2 is in the scope of [−2 , −1) ∪ (1, 2] , the SCA conducts exploration.When the value of a is in the scope of [−1, 1], the SCA conducts exploitation.Algorithm 1 shows the specific process of the SCA optimization.Update parameters r 1 (t), r 2 , r 3, r 4 ;

6.
Update the position of each solution X t in the solution set according to Formula 1; 7.
Calculate the fitness function value of each solution according to f (x).

8.
Update current optimal solution X best t .9. While (t < T ) 10. Output: Global optimal solution X best t after iteration.
position of the solution appears between or outside the current solution and the optimal solution.When the value of  () or  () is in the scope of −2, −1) ∪ (1, 2 , the SCA conducts exploration.When the value of a is in the scope of −1, 1 , the SCA conducts exploitation.Algorithm 1 shows the specific process of the SCA optimization.Update parameters  (),  ,  ,  ; 6.
Update the position of each solution  in the solution set according to Formula 1; 7.
Calculate the fitness function value of each solution according to ().

Analysis of the SCA in Feature Selection
In the initial stage of optimization using the SCA in feature selection, the SCA will randomly select multiple sets of features out of the original feature set to create  group feature subsets.Then, each feature subset is scored using the evaluation function, and the feature subset with the highest score is determined as the best feature subset.Next, the initialized feature subsets are perturbed according to Equation (2) to form new feature subsets with the number of .In addition, an evaluation function is used to assess the score of each new feature subset, and the feature subset with the best score is compared with the best feature subset in the previous round, to obtain the current best feature subset.The above process is repeated to obtain the best feature subset after the maximum round of iterations.
A key point in the SCA feature selection process is that the optimization strategy of the SCA affects the diversity of feature subsets.Next, we analyze the change of feature subset diversity during the process of the SCA updating the best feature subset.Assuming the formation of the initial feature subset is uniformly distributed, it has the probability density function of  ∼ (, ), () = ,  ≤  ≤ , where  is the value of the feature

Analysis of the SCA in Feature Selection
In the initial stage of optimization using the SCA in feature selection, the SCA will randomly select multiple sets of features out of the original feature set to create pop group feature subsets.Then, each feature subset is scored using the evaluation function, and the feature subset with the highest score is determined as the best feature subset.Next, the initialized feature subsets are perturbed according to Equation (2) to form new feature subsets with the number of pop.In addition, an evaluation function is used to assess the score of each new feature subset, and the feature subset with the best score is compared with the best feature subset in the previous round, to obtain the current best feature subset.The above process is repeated to obtain the best feature subset after the maximum round of iterations.
A key point in the SCA feature selection process is that the optimization strategy of the SCA affects the diversity of feature subsets.Next, we analyze the change of feature subset diversity during the process of the SCA updating the best feature subset.Assuming the formation of the initial feature subset is uniformly distributed, it has the probability density function of where F is the value of the feature subset, and a and b are the upper and lower boundaries of the range of values of the feature subset, respectively.
The expected value E(F 1 ) and variance D(F 1 ) of the first-generation feature subset are as follows: (3) In this paper, the center of gravity is applied to portray the diversity of feature subsets.The diversity of t-generation feature subset I(F t+1 ) is defined as: E(I(F t+1 )) is used to represent the expected value of I(F t+1 ): Energies 2022, 15, 3485 8 of 24 Theorem 1.During the pursuit for the best feature subset by the SCA, the expectation value of the diversity of the feature subset in round t + 1 varies with the adjustment factor r 2 1 (t) and the random number r 3 .
Second, expanded by Equation ( 5): We then transform term A in Equation (6) to: Expand item B in Equation (6) to: Let us expand term C in Equation ( 6) to: Energies 2022, 15, 3485 In order to study the transformation of sine and cosine in Equation ( 2) at the same time, relation f (r 2 , r 4 ) is introduced as follows: Using Equation (10) to improve Equation (2) we get the expression as follows: Equation ( 12) is for calculating the expectation and variance of relationship f (r 2 , r 4 ): Energies 2022, 15, 3485 10 of 24 According to Equations ( 11) and ( 12), it can be shown that the expected value of the feature subset in the (t + 1)-th round is: It can be obtained from Equation ( 13) that the expected value of the (t + 1)-th round feature subset is the same as that of the t − th round feature subset.From Equation (3) and Equation ( 13), it can be seen that the expected value of the population is constant with respect to the value range of the population.
Substituting Equations ( 7)-( 9) and ( 14) into Equation ( 6) we obtain the expression as follows: After determining the quantity of feature subsets, that of features in the original dataset, and the range of values for the feature subsets, the expected value of t + 1-th round of feature subset diversity can be obtained from Equation (15), which is determined by 2 and I(E[F t (i, j)]).In addition, it can be seen from Equation (3) and (13) that I(E[F t (i, j)]) is determined not to change.It can be proven from Equation ( 4) and ( 14 Thus, the expected value of the diversity of the t + 1-th round feature subset is determined by the adjustment factor r 2 1 (t) and the random number r 3 .This completes the proof.According to Theorem 1, when pursuing the best feature subset using the SCA, the diversity of the feature subset in the t-th round of iterative optimization is determined by the control factor r 1 (t) and the random number r 3 , provided that the number of initial feature subsets, the total number of features in the original data set, and the range of values of the features have been previously determined.In addition, a higher population diversity facilitates the global search but makes convergence slower.In contrast, a low population diversity facilitates the local search but tends to fall into the local optimum.In the conventional SCA, the control factor r 1 (t) decreases linearly from 2 to 0 with an increasing number of iterations.When t ∈ 0, T 2 , r 1 (t) > 1 , the r 2 1 (t) to r 1 (t) index is expanded, the algorithm is biased towards a global search.At that time of t ∈ T 2 , T , r 1 (t) ≤ 1 , the r 2 1 (t) to r 1 (t) index was reduced, which accelerated the convergence of the population.

Feature Selection with Metaheuristic
According to the expectation in Equation ( 15) of feature subset diversity in conventional SCA optimization, the SCA focuses on exploration in the first half of the iteration time and on exploitation in the second half of the iteration time.Additionally, the improvement of the optimal feature subset F best t (i) in updating other feature subsets is biased due to the uncertainty of the random number r 3 .To address these problems, a multilevel golden mean SCA (MetaSCA) is proposed.

Multilevel Regulatory Factor Strategy
The value of the regulatory factor in the conventional SCA decreases linearly with the number of iterations.When r 1 (t) < 1, the convergence of the algorithm is accelerated.If the algorithm gets stuck in a local optimum at this stage, the optimization will stagnate.To address this drawback, a multilevel regulatory factor strategy is put forward in this article, as exhibited in Figure 3, which divides the regulatory factor into four levels, according to the number of iteration rounds.The total number of iteration rounds is , which is divided into four segments, where  ∈ 0,  ,  ∈ ,  ,  ∈ ,  ,  ∈ ,  .In the  and  time periods, the regulatory factor is set to strengthen the global explore capability of the SCA.In  and  time periods, the regulatory factor is changed to strengthen the local exploitation capability.The multilevel regulatory factor is defined as follows: Substitute  = 2 into  × and ℎ  × 1 − to find the limit.We have: The total number of iteration rounds is T, which is divided into four segments, where In the T 1 and T 3 time periods, the regulatory factor is set to strengthen the global explore capability of the SCA.In T 2 and T 4 time periods, the regulatory factor is changed to strengthen the local exploitation capability.The multilevel regulatory factor is defined as follows: T and tanh a × 1 − t T to find the limit.We have: From Equations ( 16) and ( 17), when t ∈ 0, 1  4 T and t ∈ 1 2 T, 3  4 T , the multilevel regulatory factor r * 1 (t) increases from 1 to 2, so that the algorithm focuses on global search in this stage.When t ∈ 1 4 T, 1 2 T and t ∈ 3 4 T, T , the multilevel regulatory factor r * 1 (t) decreases from tanh(2) to 0, so that the process focuses on local search in this stage.After the above improvements, the algorithm alternates between global and local search during the iterative process, to avoid falling into the local optimum.

Golden Selection Coefficient Strategy
The golden section is inspired by the unit circle scan of the sine function.Similarly to the spatial search of the problem solution, the search area is reduced by the golden mean, to approximate the optimal solution [55].The golden partition coefficients do not require gradient information and the contraction step is fixed.The golden partition coefficients x 1 and x 2 are applied to the position update process to accomplish a good balance between global and local search, as shown in Figure 4.The improvement with this strategy reduces the search space and allows the individual to approach the optimal value quickly during the position update process.
Energies 2022, 15, x FOR PEER REVIEW 13 of 24 reduces the search space and allows the individual to approach the optimal value quickly during the position update process.
( ) x α λ β λ The expressions of  and  are shown in Equations ( 18) and ( 19) where ,  are the initial values of the golden section ratio search;  = −,  = , and  is the golden section ratio,  = √ .

Metaheuristic Process
In this section, the MetaSCA, an improved version of the SCA that handles feature selection, based on a multilevel regulatory factor strategy and the golden mean coefficient strategy proposed in the previous section, is detailed.The feature selection process of the The expressions of x 1 and x 2 are shown in Equations ( 18) and ( 19) where α, β are the initial values of the golden section ratio search; α = −π, β = π, and λ is the golden section ratio, λ = Energies 2022, 15, 3485 13 of 24 After adding the golden section coefficient, the feature subset is updated as follows:

Metaheuristic Process
In this section, the MetaSCA, an improved version of the SCA that handles feature selection, based on a multilevel regulatory factor strategy and the golden mean coefficient strategy proposed in the previous section, is detailed.The feature selection process of the MetaSCA is given in Algorithm 2. The detailed implementation procedure is exhibited in Algorithm 2.

Algorithm 2
The MetaSCA_process for feature selection 1. Input: The number of feature subsets pop, number of features in the training set dim, fitness function f (x), maximum numbers of iterations T. 2. To initialize pop feature sets, and select the feature marked "1" from the dim features of each feature set to form a feature subset F.

3.
To calculate the fitness value of each feature subset according to f (x).Determine the minimum fitness value G best and the corresponding optimal feature subset F best .4. for t < T do 5.
for each feature subset do 6.
Update feature subset by

else 10.
Update feature subset by

end if 12.
Discretize F t+1 according to formula 23 to obtain a new feature subset 13.
Calculate the fitness value f (F t+1 ) of the new feature subset according to f (x) Step 1: Input Dataset.The dataset is input and split into a 70% training set and a 30% test set.
Step 2: Initialize feature subsets.pop feature sets are generated based on a given number of pop.Each feature set contains multiple one-dimensional binary vectors.The number of vectors is the same as the number of features in the original dataset.One cell of each vector represents the feature sequence number, and the other cell stores either a 0 or 1.The features marked with number 1 are selected for the feature subset and used to classify the data.The specific process is shown in Figure 5.The numbers 0 and 1 in the cells are shown below where F t (i, j)-represents the j − th feature in the i − th feature set.
rand-represents a random number within the range of [0, 1].We use Equations ( 21) and ( 22) to convert the numbers in the original solution into 0 and 1, to realize discretization.

Datasets and Parameters
Seven datasets were collected from the center for Machine Learning and Intelligent Systems at the UCI.The names of the datasets are Sonar, Ionosphere, Vehicle, Cancer, Wine, WDBC, and Diabetes.All details of the utilized UCI datasets are enumerated in Table 2.The size of the dimension in the MetaSCA model is equal to the quantity of features in the datasets.Furthermore, the number of features in these datasets includes a variety of different values, from 8 to 60.Therefore, the effect of the MetaSCA in feature selection can be demonstrated, from a few features to very many features.Each dataset was randomly split into 70% for the training set and 30% for the test set, before using the dataset to test the results of the MetaSCA_FS.

Evaluation Setup
The proposed MetaSCA was run alone 10 times, with 300 iterations each time when selecting features for each dataset.Part of the specific parameters and values of the model are listed in Table 3. Step 3: Calculate the fitness value of the feature subset.The fitness values of all feature subsets are calculated according to Equation ( 1).In addition, the feature subset corresponding to the minimum fitness value is found, and the minimum fitness value at this point is recorded.
Step 4: Transform to get new feature subsets and update optimal feature subset.The feature subsets are transformed according to Equations ( 20)- (22).The number and type of feature subsets can be changed.After transforming all feature subsets, the fitness value of each new feature subset is recalculated using the fitness function.If the best fitness value obtained in this step is smaller than the previous one, the best fitness value is updated and the best feature subset is obtained.
Step 5: Repeat iteration to acquire the final best feature subset.Repeat step 4 until the maximum number of iterations is reached.The global minimum adaptation value and the optimal feature subset are obtained.
Step 6: Classify data sets using the optimal feature subset.The classifier is selected, then the best subset of features from step 5 is applied to fit the classifier, and the fitted classifier is applied to classify the test set, to obtain classification accuracy.

Datasets and Parameters
Seven datasets were collected from the center for Machine Learning and Intelligent Systems at the UCI.The names of the datasets are Sonar, Ionosphere, Vehicle, Cancer, Wine, WDBC, and Diabetes.All details of the utilized UCI datasets are enumerated in Table 2.The size of the dimension in the MetaSCA model is equal to the quantity of features in the datasets.Furthermore, the number of features in these datasets includes a variety of different values, from 8 to 60.Therefore, the effect of the MetaSCA in feature selection can be demonstrated, from a few features to very many features.Each dataset was randomly split into 70% for the training set and 30% for the test set, before using the dataset to test the results of the MetaSCA_FS.

Evaluation Setup
The proposed MetaSCA was run alone 10 times, with 300 iterations each time when selecting features for each dataset.Part of the specific parameters and values of the model are listed in Table 3.The indexes (average fitness, optimal fitness, worst fitness, standard deviation, classification accuracy, proportion of selected feature subset in the total features, and average running time) were used on each dataset, to evaluate the best feature subset with the algorithm.The evaluation criteria were as follows:

•
Average fitness.This index describes the average fitness value after running 10 experiments on each data set.Its calculation formula is shown in Equation (23).
where N-illustrates the quantity of experimental runs; G best i -illustrates the optimal fitness resulting from run number i.

•
Optimal fitness.This represents the smallest fitness in the fitness set acquired after 10 experiments on each data set.We have its selection method: • Worst fitness.This index shows the worst fitness value in the fitness set obtained after many experiments.We have: • Standard deviation.This shows the degree of dispersion between fitness in the fitness set obtained after many experiments.In addition, the smaller the standard deviation, A smaller fitness value means a superior optimization of feature selection.In addition, the smaller the worst fitness, the superior the worst fitness obtained by the feature selection model when optimizing the fitness function.A comparison of results of the MetaSCA with the SCA, PSO, and the WOA reveals that the MetaSCA achieved better performance with respect to the average fitness and the worst fitness, according to Table 4. Bold indicates the best value.In the comparison of the standard deviation index of fitness, although the MetaSCA did not obtain the best value on the Ionospheric dataset, Wine dataset, and Vehicle, it obtained a smaller standard fitness in more than half of all datasets.The results obtained in the optimal fitness criterion show that the MetaSCA obtained the best optimal fitness in all datasets, except the WDBC dataset.As determined from the above experimental results, the MetaSCA feature selection model proposed in this paper with KNN (n = 3) as the classifier performed best, in terms of the fitness function.
Figure 6 indicates the average running times of the four algorithms after 10 optimizations of the fitness function with KNN (n = 3) as the classifier.As the results demonstrate in Figure 6, the mean running times of the four methods were almost the same when finding the best feature subset with the same dataset.However, the MetaSCA method had a shorter running time on all datasets compared to the conventional SCA.An important factor is that the MetaSCA introduces the golden partition coefficient strategy to reduce the search space when discovering the optimal feature subset, so the search for the optimal feature subset can be completed faster and the running time of the feature selection model can be shortened.In addition, compared with the average running time of the PSO and the WOA, only the latter had a slightly shorter running time than the MetaSCA on the Wine dataset and the WBDC dataset, while the MetaSCA ran faster than both the WOA and PSO on the other datasets.This indicates that in most cases the MetaSCA surpassed the three different optimizations, regarding the speed for selecting the best subset of features.
same when finding the best feature subset with the same dataset.However, the MetaSCA method had a shorter running time on all datasets compared to the conventional SCA.An important factor is that the MetaSCA introduces the golden partition coefficient strategy to reduce the search space when discovering the optimal feature subset, so the search for the optimal feature subset can be completed faster and the running time of the feature selection model can be shortened.In addition, compared with the average running time of the PSO and the WOA, only the latter had a slightly shorter running time than the MetaSCA on the Wine dataset and the WBDC dataset, while the MetaSCA ran faster than both the WOA and PSO on the other datasets.This indicates that in most cases the MetaSCA surpassed the three different optimizations, regarding the speed for selecting the best subset of features.Figure 7a shows the average classification accuracy of the four feature selection models, running 10 times on seven data sets.From Figure 7a, and by observing the classification accuracy of the four feature selection models on seven datasets, it can be seen that the MetaSCA in this paper achieved the best classification accuracy on all datasets.Figure 7b shows the ratio between the selected optimal feature subset and the total features.We can seen from Figure 7b that the number of optimal feature subsets selected by MetaSCA feature selection model on the wine dataset is significantly smaller than that of the SCA, PSO and the WOA.The results of feature selection ratio on the other six data sets were similar.Moreover, the MetaSCA did not obtain the worst feature selection ratio on any of the data sets.Combining the results in Figure 7a,b above, it can be concluded that the MetaSCA feature selection model achieved a higher classification accuracy, with the same or smaller proportion of optimal feature subsets, compared with the SCA, PSO, and the WOA.
Seven data sets, namely Sonar, Ionosphere, Vehicle, Cancer, Wine, WBDC and Diabetes, were selected to compare the stability of the feature selection models of the MetaSCA, the SCA, the WOA, and PSO. Figure 8a-g exhibit the stability comparison results of the four algorithms for feature selection on the Sonar, Ionosphere, Vehicle, Cancer, Wine, WBDC and Diabetes datasets, respectively.The red, black, blue and green boxes represent the results obtained from MetaSCA, SCA, PSO and WOA respectively.The horizontal line above the box body in each comparison figure at the top represents the maximum classification accuracy obtained after the method was run on the data set 10 times.Accordingly, the horizontal line at the bottom of the box corresponds to the minimum classification accuracy, and the horizontal line in the box represents the median of 10 results.In addition, the larger the volume of the box, the greater the degree of discretization, and the more unstable the result of this optimal method.First, as shown in Figure 8a-g, after 10 repetitions of classification, the highest classification accuracy obtained by the MetaSCA was superior than the other three optimization algorithms, except for the highest classification accuracy of the four feature selection methods on the wine dataset (Figure 8e); all algorithms achieved a 100% accuracy.Furthermore, combining the results of the seven comparison graphs in Figure 8a through Figure 8g, the worst classification accuracy obtained by the MetaSCA model was also higher than those of the SCA, PSO, and the WOA, in the comparison of the worst classification accuracy on different data sets.Finally, the discreteness of the classification accuracy of the four optimizations on different datasets is compared.Among the three data sets of Sonar (Figure 8a), Ionosphere (Figure 8b), and Wine (Figure 8e), the MetaSCA model had the least degree of dispersion.Moreover, the discreteness of the classification accuracy with the MetaSCA feature selection model was better than the results of the SCA model with all data sets.
Energies 2022, 15, x FOR PEER REVIEW 19 of 24 Figure 7a shows the average classification accuracy of the four feature selection models, running 10 times on seven data sets.From Figure 7a, and by observing the classification accuracy of the four feature selection models on seven datasets, it can be seen that the MetaSCA in this paper achieved the best classification accuracy on all datasets.Figure 7b shows the ratio between the selected optimal feature subset and the total features.We can seen from Figure 7b that the number of optimal feature subsets selected by MetaSCA feature selection model on the wine dataset is significantly smaller than that of the SCA, PSO and the WOA.The results of feature selection ratio on the other six data sets were similar.Moreover, the MetaSCA did not obtain the worst feature selection ratio on any of the data sets.Combining the results in Figure 7a,b above, it can be concluded that the MetaSCA feature selection model achieved a higher classification accuracy, with the same or smaller proportion of optimal feature subsets, compared with the SCA, PSO, and the WOA.Seven data sets, namely Sonar, Ionosphere, Vehicle, Cancer, Wine, WBDC and Diabetes, were selected to compare the stability of the feature selection models of the MetaSCA, the SCA, the WOA, and PSO. Figure 8a-g exhibit the stability comparison results of the four algorithms for feature selection on the Sonar, Ionosphere, Vehicle, Cancer, Wine, WBDC and Diabetes datasets, respectively.The red, black, blue and green boxes represent the results obtained from MetaSCA, SCA, PSO and WOA respectively.The horizontal line above the box body in each comparison figure at the top represents the maximum classification accuracy obtained after the method was run on the data set 10 times.Accordingly, the horizontal line at the bottom of the box corresponds to the minimum classification accuracy, and the horizontal line in the box represents the median of 10 results.In addition, the larger the volume of the box, the greater the degree of discretization, and the more unstable the result of this optimal method.First, as shown in Figure 8a-g, after 10 repetitions of classification, the highest classification accuracy obtained by the MetaSCA was superior than the other three optimization algorithms, except for the highest classification accuracy of the four feature selection methods on the wine dataset (Figure 8e); all algorithms achieved a 100% accuracy.Furthermore, combining the results of the seven comparison graphs in Figure 8a through Figure 8g, the worst classification accuracy obtained by the MetaSCA model was also higher than those of the SCA, PSO, and the WOA, in the comparison of the worst classification accuracy on different data sets.Finally, the discreteness of the classification accuracy of the four With the aim of testing the influence of different classifiers in the feature selection model, the classifier was changed from the original KNN (N = 3) to SVM (c = 1, gamma = 1) model.The MetaSCA and SCA were chosen to deal with feature selection and classification for the abovementioned seven datasets.The MetaSCA and SCA were run 10 times on each dataset, to acquire the mean of the 10 classification accuracies and the average of the proportion of the optimal feature subset selected by the model, out of the total number of features.The experimental simulation comparison data are indicated in Figure 9. First, the proportion of optimal feature subsets selected by the MetaSCA was less than that of the SCA in all used datasets, except for equaling it on the vehicle dataset.In addition, on the premise of the same or a smaller proportion of optimal feature subset, the classification accuracy acquired by the MetaSCA was higher than the result by the SCA for all datasets.Among them, on the vehicle dataset, the classification accuracy of the MetaSCA was significantly higher than that of SCA model.Although the SCA achieved the same number of optimal feature subsets as the MetaSCA on the Ionosphere dataset, the classification accuracy of the SCA was still lower than result of the MetaSCA.The comparisons above show that when the classifier was SVM (c = 1, gamma = 1), the performance of the MetaSCA was also superior to that of the conventional SCA.This further confirms the effectiveness of the MetaSCA advanced in this article.
optimizations on different datasets is compared.Among the three data sets of Sonar (Figure 8a), Ionosphere (Figure 8b), and Wine (Figure 8e), the MetaSCA model had the least degree of dispersion.Moreover, the discreteness of the classification accuracy with the MetaSCA feature selection model was better than the results of the SCA model with all data sets.With the aim of testing the influence of different classifiers in the feature selection model, the classifier was changed from the original KNN (N = 3) to SVM (c = 1, gamma = 1) model.The MetaSCA and SCA were chosen to deal with feature selection and classification for the abovementioned seven datasets.The MetaSCA and SCA were run 10 times on each dataset, to acquire the mean of the 10 classification accuracies and the average of the proportion of the optimal feature subset selected by the model, out of the total number of features.The experimental simulation comparison data are indicated in Figure 9. First, the proportion of optimal feature subsets selected by the MetaSCA was less than that of the SCA in all used datasets, except for equaling it on the vehicle dataset.In addition, on the premise of the same or a smaller proportion of optimal feature subset, the classification accuracy acquired by the MetaSCA was higher than the result by the SCA MetaSCA was significantly higher than that of SCA model.Although the SCA achieved the same number of optimal feature subsets as the MetaSCA on the Ionosphere dataset, the classification accuracy of the SCA was still lower than result of the MetaSCA.The comparisons above show that when the classifier was SVM (c = 1, gamma = 1), the performance of the MetaSCA was also superior to that of the conventional SCA.This further confirms the effectiveness of the MetaSCA advanced in this article.

Conclusions
The goal of this work was to propose an improved sine cosine algorithm for feature selection.The objective was to select the optimal subset of features when faced with a deterministic data set and then train the classifier using the optimal feature subset to obtain a better classification accuracy.We demonstrated the effect of the regulatory factor  () and parameter  on the expected value of feature subset diversity.A hybrid metaheuristic optimation (MetaSCA) was proposed, based on a multilevel regulatory factor strategy and golden sine strategy for feature selection.First, the strategy of multilevel regulatory factor  * () was introduced to improve the balance between exploration and exploitation of the SCA, in order to avoid the SCA from sinking into the local optimum when dealing with the feature selection issue.Then, the golden sine algorithm was used to diminish the feature seek area, so that the MetaSCA can only find the best feature subset in the better feature area.The MetaSCA was implemented for feature selection in seven common UCI datasets, for performance evaluation, and the evaluation results were contrasted with the conventional SCA and different metaheuristic optimizations, such as PSO and the WOA.From the comparison of the results, the MetaSCA feature selection method had a better performance than the other metaheuristic optimizations and selected the optimal feature subset quickly with a higher classification accuracy.In future work, considering the need to extract the best feature subset from a multitude of features, the metaheuristic can be improved, to enable a significant increase in the speed of feature selection.

Conclusions
The goal of this work was to propose an improved sine cosine algorithm for feature selection.The objective was to select the optimal subset of features when faced with a deterministic data set and then train the classifier using the optimal feature subset to obtain a better classification accuracy.We demonstrated the effect of the regulatory factor r 1 (t) and parameter r 3 on the expected value of feature subset diversity.A hybrid metaheuristic optimation (MetaSCA) was proposed, based on a multilevel regulatory factor strategy and golden sine strategy for feature selection.First, the strategy of multilevel regulatory factor r * 1 (t) was introduced to improve the balance between exploration and exploitation of the SCA, in order to avoid the SCA from sinking into the local optimum when dealing with the feature selection issue.Then, the golden sine algorithm was used to diminish the feature seek area, so that the MetaSCA can only find the best feature subset in the better feature area.The MetaSCA was implemented for feature selection in seven common UCI datasets, for performance evaluation, and the evaluation results were contrasted with the conventional SCA and different metaheuristic optimizations, such as PSO and the WOA.From the comparison of the results, the MetaSCA feature selection method had a better performance than the other metaheuristic optimizations and selected the optimal feature subset quickly with a higher classification accuracy.In future work, considering the need to extract the best feature subset from a multitude of features, the metaheuristic can be improved, to enable a significant increase in the speed of feature selection.

Algorithm 1 2 . 3 .
Standard sine cosine algorithm 1. Input: Number of solution pop, dimension of solution dim, maximum number of iterations T, objective fitness function f (x).Initialize a solution set with dim and pop quality.Calculate the fitness function value of each solution X t in the solution set according to f (x), and find the solution X best t with the smallest fitness value.4. Do (for each iteration) 5.

Figure 2 .
Figure 2. The optimization process of the SCA.

Energies 2022 , 24 Figure 5 .
Figure 5. Selecting a feature subset from the total features.

Figure 5 .
Figure 5. Selecting a feature subset from the total features.

Figure 6 .Figure 6 .
Figure 6.Average running time of the algorithms.

Figure 7 .
Figure 7. Classification accuracy and feature selection ratio (KNN, neighbors = 3).(a) comparison results of classification accuracy; (b) comparison results of feature selection rate.

Figure 7 .
Figure 7. Classification accuracy and feature selection ratio (KNN, neighbors = 3).(a) comparison results of classification accuracy; (b) comparison results of feature selection rate.

Figure 8 .
Figure 8. Stability comparison of feature selection models (KNN, neighbors = 3).(a) stability comparison on Sonar dataset; (b) stability comparison on Ionosphere dataset; (c) stability comparison on Vehicle dataset; (d) stability comparison on Cancer dataset; (e) stability comparison on Wine dataset; (f) stability comparison on WBDC dataset; (g) stability comparison on Diabetes dataset.

Table 1 .
Symbols and their meanings.

end if 18. end for 19. end for 20.
Select the classifier and employ the best subset of features to fit the training set.21.The trained classifier is applied to classify the test set and the classification accuracy (acc) is calculated.22. Output: Optimal feature subset F best , Optimal fitness value G best , Classification accuracy acc.

Table 3 .
Parameters in the MetaSCA model.

Table 4 .
Comparison of the MetaSCA and other methods in optimization of the fitness function.