Boosting Atomic Orbit Search Using Dynamic-Based Learning for Feature Selection

: Feature selection (FS) is a well-known preprocess step in soft computing and machine learning algorithms. It plays a critical role in different real-world applications since it aims to determine the relevant features and remove other ones. This process (i.e., FS) reduces the time and space complexity of the learning technique used to handle the collected data. The feature selection methods based on metaheuristic (MH) techniques established their performance over all the conventional FS methods. So, in this paper, we presented a modiﬁed version of new MH techniques named Atomic Orbital Search (AOS) as FS technique. This is performed using the advances of dynamic opposite-based learning (DOL) strategy that is used to enhance the ability of AOS to explore the search domain. This is performed by increasing the diversity of the solutions during the searching process and updating the search domain. A set of eighteen datasets has been used to evaluate the efﬁciency of the developed FS approach, named AOSD, and the results of AOSD are compared with other MH methods. From the results, AOSD can reduce the number of features by preserving or increasing the classiﬁcation accuracy better than other MH techniques.


Introduction
Data has become the backbones of different fields and domains in recent decades, such as artificial intelligence, data science, data mining, and other related fields. The vast increase of data volumes produced by the web, sensors, and different techniques and 1.
We propose an alternative feature selection method to improve the behavior of atomic Orbit optimization (AOS).

2.
We use the dynamic opposite-based learning to enhance the exploration and maintain the diversity of solutions during the searching process.

3.
We compare the performance of the developed AOSD with other MH techniques using different datasets.
The other sections of this study are organized as follows. Section 2 presents the related works and Section 3 introduces the background of AOS and DOL. The developed method is introduced in Section 4. Section 5 introduces the experiment results and the discussion of the experiments using different FS datasets. The conclusion and future works are presented in Section 6.

Related Works
In recent years, many MH natural-inspired optimization algorithms have been used in the field of feature selection [33][34][35][36]. This section presents a simple review of the latest MH optimization techniques used for FS applications. Hu et al. [37] proposed a modified binary gray wolf optimizer (BGWO) for FS applications. They developed five transfer functions to enhance the BGWO. The authors evaluated the developed approach using different datasets. They concluded that the applications of the extended transfer functions improved the performance of the developed BGWO, and it outperformed the traditional BGWO and GWO. In [38], an FS approach was developed based on the multi-objective Particle Swarm Optimization (PSO) with fuzzy cost. The main idea of this approach is to develop a simple technique, called fuzzy dominance relationship, which is employed to compare the performance of the candidate particles. In addition, it is used to define a fuzzy crowding distance measure to determine the global leader of the particles. This method, called PSOMOFS, was evaluated with UCI datasets and compared to several FS techniques to confirm its competitive performance. Gao et al. [39] developed two variants of the binary equilibrium optimizer (BEO) using two techniques. The first technique is developed by mapping the continuous equilibrium optimizer into discrete types with S and V-shaped transfer functions (BEO-S and BEO-V). The second technique depends on the current target (solution) and the position vector (BEO-T). The two variants of the BEO were evaluated with nineteen UCI datasets, and they obtained good results. Al-tashi et al. [40] proposed a new variant of the GWO for FS applications. The proposed method, called binary multi-objective GWO, is developed using the sigmoid transfer function (BMGWO-S). It was tested with fifteen UCI datasets, and it outperformed the traditional multi-objective GWO (MGWO) and several well-known optimization algorithms. Alazam et al. [41] proposed a wrapperbased FS method using a pigeon-inspired optimizer. The proposed FS method was applied for intrusion detection systems (IDS) in cloud computing environments. It was evaluated using the three well-known IDS dataset, and it improved the classification accuracy of the IDS. Zhang et al. [42] developed the binary version of the differential evolution (BDE) for FS. They used several developed operators to enhance the performance of the BDE, such as the mutation operator and One-bit Purifying Search operator. The evaluation outcomes showed that the application of the developed operators improved the performance of the BDE.
Additionally, different MH optimization algorithms have been developed and utilized for FS applications, such as the binary emperor penguin optimizer, proposed by Dhiman et al. [43]. Three modified binary versions of the dragonfly algorithm (BDA) were presented by [44] for FS, called linear, quadratic, and sinusoidal BDA. The experimental outcomes showed that Sinusoidal-BDA achieved the best performance compared to other modified versions of the BDA. A modified binary Harris hawks optimizer was proposed by Zhang et al. [45] for FS applications. The salp swarm algorithm was used to boost the search process of the original HHO and overcome its shortcomings. Sahlol et al. [46] proposed a modified marine predators algorithm (MPA) using the fractional-order technique. The developed method, called FO-MPA, was applied to enhance the classification accuracy of the COVID-19 CT images. Abdel-Basset et al. [47] proposed an FS approach using four binary versions slime mould algorithm (SMA).

Atomic Orbital Search
The AOS is a newly developed optimization method [28], which is inspired by the laws of quantum technicians where the typical arrangement of electrons around the nucleus is in attitude. The mathematical representation of the AOS is given as follows.
The AOS algorithm uses several solutions (X) as shown in Equation (1), and each solution (X i ) holds several decision variables (x i,j ).
where N represents the number of used solutions, and D indicates the dimension length of the tested problem. The first solutions are randomly initialized using Equation (2).
where E represents a vector of objective values, and E i refers to the energy level of the solution number i. The electron likelihood density chart defines solutions positions estimated using the Probability Density Function (PDF). According to the given description of the individuals by PDF, each imaginarily formulated layer includes several solutions. In this respect, the mathematical representation of the K k positions and the E k of the used individuals in imaginary courses are given as below: . .
where X k i is the solution number i in the imaginary layer (IL) number k, and n represents the number of the produced IL. p indicates the number of solutions of IL number k. E k i represents the objective value of the solution number i in the IL number k.
In this respect, the required state and energy are defined for the solutions in each supposed IL by analyzing all solutions' average positions and objective values in the felt layer. More so, the mathematical representation for this scheme is given as: In Equation (7), BS k and BE k denote the required state and energy of the layer number k, respectively. X k i and E k i stand for the position and fitness value of the solution number i in k-th layer.
Depending on the given items, the required energy and state of an atom are defined by estimating the mean positions and objective values of the used solutions as follows: where BS and BE are the required state and energy of the atom. The energy level (E k i ) of t X k i in each IL is associated with the required energy of the layer (BE k ). Suppose the energy ratio of the current solution in a particular layer is larger than the required energy (i.e., E k i ≥ BE k ) so, the photon emission is estimated. In this rule, the individuals are managing to transmit a photon with a cost of energy estimated using γ and β to concurrently give to the required position of the atom (BS) and the position of the electron with the lowest energy ratio (LE) in the atom. The updating process of individuals is formulated as: in Equation (10), X k i and X k i+1 denote the current and expected values for individual i at kth layer. α i , β i , and γ i refer to random vectors.
Suppose the energy ratio of a solution in a particular layer is smaller than the required energy (E k i < BE k ); the consumption of photon is examined. The mathematical function for the position updating is presented as follows: In the case of generating a random number (∅) for each individual and it is valued less than the PR (i.e., ∅ < PR), the number of photons on the solution is not feasible. Therefore, the action of particles between various layers nearby the nucleus is estimated. The position updating is given as follows: where r i is a vector of random numbers.

Dynamic-Opposite Learning
The primary steps of the Dynamic-Opposition-Based Learning (DOL) approach are presented. In the beginning, the conventional Opposition-Based Learning (OBL) approach is presented [48]. This approach is used in this paper to enhance the performance of the proposed method. The OBL approach is employed to create a unique opposition solution to the existing solution. It attempts to determine the best solutions that lead to increasing the speed rate of convergence.
The opposite (X o ) of a given real number (X ∈ [U, L]) can be calculated as follows.
Opposite point [49]: Suppose that X = [X 1 , X 2 ,..., X Dim ] is a point in a Dim-dimensional search space, and X 1 , X 2 , ...,X Dim ∈ R and X j [U j ,L j ]. Thus, the opposite point (X o ) of X is presented as follows: Moreover, the most useful two points (X o and X) are chosen according to the fitness function values, and the other is neglected.
Related to the opposite point, the dynamic opposite preference (X DO ) of the value X is represented as follows: where r 8 and r 9 are random values in the range of [0 1]. w is weighting agent. Consequently, the dynamic opposite value (X DO j ) of X is equal to [X 1 , X 2 ,..., X Dim ], which is presented as follows: Accordingly, DOL optimization begins by creating the first solutions (X = (X 1 , ..., X Dim ) and calculate its dynamic opposite values (X Do ) using Equation (16). Next, based on the given fitness value, the best solution from the given (i.e., X Do and X) is used, and another one is excluded.

Developed AOSD Feature Selection Algorithm
To improve the performance of the traditional AOS algorithm and use it as an FS method, we use dynamic opposite-based learning. The steps of the developed AOS-based DOL are given in Figure 1. These steps can be classified into two phases; the first one aims to learn the developed method based on the training set. At the same time, the second phase aims to assess the method's performance using the testing set.

Learning Phase
In this phase, the training set representing 70% from the input is applied to learn the model by selecting the optimal subset of relevant features. The developed AOSD aims at the beginning by constructing initial population, and this is achieved using the following formula: In Equation (17), N F is the number of features (also, it is used to represents the dimension). U and L are the limits of the search domain. The next process in AOSD is to convert each agent X i to binary form BX i , and this is defined in Equation (20).
Thereafter, the fitness value of each X i is computed, and it represents the quality. The following formula represents the fitness value that depends on the selected features from the training set.
where |BX i | is the number of features that correspond to the ones in BX i . γ i refers to the classification error obtained from the KNN classifier that learned using the reduced training set using features in BX i . λ is applied to manage the process of selecting features which simulate reducing the error of classification.
The following process is to apply the DOL as defined in Equation (16) to each X i to find X Do i . Then select from X ∪ X DO the best N solutions that have the smallest fitness value. In addition, the best solution X b is determined with best fitness Fit b . After that, AOSD starts to update the solutions X using the operators of AOS as discussed in Section 3.1. To maintain the diversity of the solutions X, their opposite values are computed using the following formula: where Pr DO is random probability used to switch between X and X N . X N represents the N solutions chosen from X ∪ X DoJ based on their fitness value. Whereas, X DoJ ij for each X i at dimension j is given as: where X o ij is defined in Equation (16). In the developed AOSD, the limits of search space are updated dynamically using the following formula: Thereafter, the terminal conditions are checked, and if they are met, then return by X b . Otherwise, repeat the updating steps of AOSD.

Evaluation Phase
In this phase, the best solution X b is employed to reduce the number of features of the testing set representing 30% from given data. This process is performed by selecting only those features corresponding to ones inside its binary version BX b (computed using Equation (20)). Then, the KNN classifier is applied to the reduced testing set and it predicts the output of the testing set by computing the output's performance using performance measures.

Experimental Results
This section introduces the experimental evaluation of the developed AOSD method. Additionally, extensive comparisons to several existing optimization methods are carried out to verify the performance of the developed AOSD method.

Experimental Datasets and Parameter Settings
We considered comprehensive datasets to evaluate the proposed AOSD method using twenty datasets with different categories, including low and high dimensionality. The low dimensionality datasets are the well-known UCI datasets [50]. The properties of the used datasets are given in Table 1, including the number of classes, number of features, and number of samples. It is worth mentioning that the used datasets covered several domains, such as games, biology, biomedical, and physics. Furthermore, we set up essential parameters and strategies to evaluate the proposed AOSD method. For example, we use the Hold-out strategy as a classification strategy, with 80% and 20% for training and testing sets, respectively. More so, we repeat each experiment with 30 independent runs. The K nearest neighbor (KNN) is adopted as the classifier with the Euclidean distance metric (K = 5).
In addition, a number of the well-known optimization algorithms have been considered for the comparison, such as Atomic Orbital Search (AOS), arithmetic optimization algorithm (AOA) [51], Marine Predators Algorithm (MPA) [46], Manta ray foraging optimizer (MRFO) [52], Harris Hawks optimization (HHO), Henry gas solubility optimization (HGS) algorithm (HGSO), Whale optimization algorithm (WOA), grey wolf optimization (GWO) [53], GA, and BPSO. These methods are uniformly distributed, and the max iteration number is set to 100, where the population size is 10. In addition, the dimensions of these methods are fixed to the feature numbers as in the datasets.

Performance Measures
We used several evaluation measures to test the proposed AOSD method. The confusion matrix (CM) is described in Table 2. As known, it is used to test the performance of a classifier, including Accuracy, Specificity, and Sensitivity [54]. • Average accuracy (AVG Acc ): This measure is the rate of correctly data classification, and it is computed as [22,[55][56][57]: Each method is performed 30 times (N r = 30); thus, the AVG Acc is computed as: • Average fitness value (AVG Fit ): it is used to assess the performance of an applied algorithm, and it puts the error rate of classification and reducing the selection ratio as the following equation [22,[55][56][57]: • Average number of the selected features (AVG |BX Best | ): This metric is applied to compute the ability of the applied method to reduce the number of features overall number of runs, and it is computed as [22,[55][56][57]: in which |BX k Best | represents the cardinality of selected features for kth run.
• Standard deviation (STD): STD is employed to assess the quality of each applied method and analyze the achieved results in different runs. It is computed as [22,[55][56][57]: (Note: STD Y is computed for each metric: Accuracy, Fitness, Time, Number of selected features, Sensitivity, and Specificity.

Comparisons
In this section, the developed AOSD is evaluated over eighteen well-known datasets. The evaluation uses ten algorithms to compare the performance of the developed AOSD, namely AOS, AOA, MPA, MRFO, HHO, HGSO, WOA, bGWO, GA, and BPSO. Six measures are used, called maximum fitness function (MAX), the average of the fitness function, minimum fitness function (MIN), accuracy (Acc), and standard deviation (St). The values obtained by the compared algorithms are recorded in Tables 3-9 where the smaller value in the tables means the better results, except for Table 8, where the higher value is the best; therefore, all best values in the tables are in boldface.
The results of the fitness function values are listed in Table 3 and the smaller fitness value means the better results. This table contains the average of the fitness function for the developed AOSD method and the comparison methods for all datasets. From these results, the AOSD got the best results in 6 out of 18 datasets (i.e., S2, S4, S7, S9, S15, and S16); therefore, it got the first rank. The AOA obtained the best values in three datasets (i.e., S3, S8, and S18), and it was ranked second, followed by MPA, MRFO, BPSO, and HHO, respectively; the GA showed the worst results. With the use of the average, it is possible to analyze the behavior of the results provided by the algorithms in the experiments. In terms of optimization, the fitness standard helps identify a typical value in the experiments for each dataset. Figure 2 shows the performance of the AOSD using the average of the fitness functions.   Table 4 shows the results of the standard division for all methods. The Std here is used to verify the dispersion of the results along with the experiments with different datasets. A low value in Std represents low dispersion, which means the algorithm is more stable along with the experiments. The AOSD showed good stability compared to the other methods, and it achieved the lowest Std value in 6 out of 18 datasets (i.e., S6, S7, S9, S13, S17, and S18). It was ranked first followed by BPSO and it showed good stability in S14, S8, S10, S11, and S15 datasets. In addition, the AOA, MPA, and MRFO also showed good resilience. The bGWO and WOA showed the worst Std values in this measure. In addition, the best fitness values are listed in Table 5. By analyzing the best values obtained by the compared algorithms in all the runs for each dataset, the idea is to see which algorithm can provide the best solution in the best case (or in the best run). This table shows that the AOSD showed the best Min values in 33% of all datasets; it obtained the best Min results in S14, S2, S4, S7, S8, S16, and S18. The HHO and MPA received the best values in this measure in two datasets for each, ranking second and third, respectively. All methods obtained the same results in S4 datasets except for HGSO and GA. The GA recorded the worst performance in this measure. In terms of the worst results of the fitness values, Table 6 shows these results. The study of the worst values in the results of compared algorithms helps to verify that even in the worst case, some algorithms provide reasonable solutions. Besides, it also permits one to see which algorithm is the worst in the worst case. The developed AOSD showed good results compared to other methods and achieved the best results in 7 out of 18 datasets (i.e., S14, S3, S9, S12, S13, S16, and S18). It showed competitive results in the other datasets. The AOA achieved the second rank by obtaining the best results in six datasets (i.e., S4, S6, S7, S8, S10, and S17), followed by AOA and MPA. The other compared methods were ordered as MRFO, BPSO, AOS, HGSO, HHO, bGWO, WOA, and GA in this sequence. Moreover, the selected features number by each method is recorded in Table 7. In this measure, the best method tries to choose the lowest features and achieve high accuracy results. As shown in Table 7, the AOSD reached the second rank by obtaining the lowest features number in 7 out of 18 datasets, whereas the first rank was received by the WOA method, it selected the lowest number of features in 8 datasets. The third rank was obtained by HGSO followed by MRFO, HHO, AOS, bGWO, MPA, AOA, and BPSO; whereas, the GA showed the worst performance in all datasets.
In addition, Table 8 illustrates the results of all compared methods in terms of classification accuracy. The use of accuracy permits the evaluation of the correct predicted data points out of all the data points. By such interpretation, this study permits identifying if an algorithm is outstanding in classification. The accuracy is essential in multiple real applications; for that reason, its use is mandatory. In this measure, the developed AOSD showed good results in 17% of the datasets; therefore, it was able to classify these datasets with high accuracy compared to other methods, and it obtained the same accuracy with the other methods in 22% of the datasets. The MRFO was ranked second, followed by MPA, AOA, BPSO, HHO, AOS, HHO, HGSO, EFO, and bGWO whereas, the lowest accuracy was shown by the WOA method. Figure 3 illustrates the performance of the AOSD based on the average classification accuracy for all datasets.  Moreover, Table 9 records the statistical results of the Friedman rank test to rank all methods using both the classification accuracy and the fitness function values. This test studies the statistical differences between the algorithms considering the results obtained for the 30 independent runs in all datasets. From Table 9, we can see that developed AOSD achieved the first rank in classification accuracy, followed by MRFO, MPA, AOA, BPSO, AOS, and HHO. The WOA was ranked last. Whereas, in the fitness function, the AOSD showed an excellent rank and was came second after the AOA with slight deference, followed by MPA, BPSO, MRFO, HHO, and HGSO. The GA was ranked last. From these results, we can notice that the AOSD showed the best results in accuracy, whereas it showed the second-best in the fitness function. These results indicate the superiority of the AOSD due to the fact that the classification accuracy measure can be more important than the fitness function value in solving classification problems. In general, the aforementioned results show that the developed AOSD method showed a noticeable enhancement in solving classification problems by selecting the essential features. The DOL approach improves the performance of the AOS by increasing the ability of the AOS to discover the search domain and save it from getting stuck in a local point.
Furthermore, the results of the AOSD showed its advantages over the compared algorithms by achieving the best fitness functions values in 33% of all datasets, whereas the second-rank HHO method achieved the best values in 16% of the datasets. This result was also observed in the rest of the measures. In addition, if we compare the differences between the proposed method AOSD and its original version AOS, in the accuracy measure, we can see that the proposed method outperformed the original version in 16 out of 18 datasets and showed similar accuracies in the other two cases. Besides, the proposed method is ranked first according to the statistical test (i.e., Friedman test) for accuracy measure, which indicates a significant difference between the AOSD and the compared method at p-value equals 0.05. Based on the results, we will work in the future to increase the performance of the proposed method by improving its exploitation phase and applying it in different optimization problems.

Conclusions
This paper developed a modified Atomic Orbit Search (AOS) and used it as a feature selection (FS) approach. The modification has been performed using dynamic oppositebased learning (DOL) to enhance the exploration and diversity of solutions. This leads to improving the convergence rate to explore the feasible region that contains the optima solution (relevant features). To justify the performance of the AOSD as an FS approach, a set of twenty datasets collected from different real-life applications has been used. In addition, the results of AOSD have been compared with other well-known FS approaches based on MH techniques such as AOS, APA, MPA, MRFO, HHO, HGSO, WOA, GWO, GA, and PSO. The obtained results concluded that the developed AOSD provided higher efficiency than other FS approaches.
Besides the obtained results, the developed AOSD can be extended to other real-life applications, including medical images, superpixel-Based clustering, Internet of things (IoT), security, and other fields.