Next Article in Journal
An Ensemble Classification Method for Brain Tumor Images Using Small Training Data
Previous Article in Journal
Two Analytical Techniques for Fractional Differential Equations with Harmonic Terms via the Riemann–Liouville Definition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization

by
Mohamed Abd Elaziz
1,2,3,4,*,
Ahmed A. Ewees
5,
Mohammed A. A. Al-qaness
6,
Samah Alshathri
7,* and
Rehab Ali Ibrahim
2
1
Faculty of Computer Science & Engineering, Galala University, Suze 435611, Egypt
2
Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt
3
Artificial Intelligence Research Center (AIRC), Ajman University, Ajman 346, United Arab Emirates
4
Department of Electrical and Computer Engineering, Lebanese American University, Byblos 13518, Lebanon
5
Department of Computer, Damietta University, Damietta 34517, Egypt
6
College of Physics and Electronic Information Engineering, Zhejiang Normal University, Jinhua 321004, China
7
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
*
Authors to whom correspondence should be addressed.
Mathematics 2022, 10(23), 4565; https://doi.org/10.3390/math10234565
Submission received: 20 October 2022 / Revised: 22 November 2022 / Accepted: 29 November 2022 / Published: 2 December 2022
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
Feature selection (FS) methods play essential roles in different machine learning applications. Several FS methods have been developed; however, those FS methods that depend on metaheuristic (MH) algorithms showed impressive performance in various domains. Thus, in this paper, based on the recent advances in MH algorithms, we introduce a new FS technique to modify the performance of the Dwarf Mongoose Optimization (DMO) Algorithm using quantum-based optimization (QBO). The main idea is to utilize QBO as a local search of the traditional DMO to avoid its search limitations. So, the developed method, named DMOAQ, benefits from the advantages of the DMO and QBO. It is tested with well-known benchmark and high-dimensional datasets, with comprehensive comparisons to several optimization methods, including the original DMO. The evaluation outcomes verify that the DMOAQ has significantly enhanced the search capability of the traditional DMO and outperformed other compared methods in the evaluation experiments.

1. Introduction

The recent advances in meta-heuristic (MH) algorithms have been widely employed in different applications, including the feature selection (FS) problems [1]. The FS problems are necessary for machine learning-based classification methods. In the domains of data mining and machine learning, the FS process is one of the most crucial preprocessing procedures for evaluating high-dimensional data. Classification accuracy in machine learning issues is heavily reliant on the chosen characteristics of a dataset. The basic goal of feature selection is to improve the performance of algorithms by removing extraneous and irrelevant characteristics from the dataset [2]. It is clear that FS methods play significant roles in enhancing classification accuracy as well as reducing computational costs. In general, FS methods can be adopted in various applications, such as wireless sensing [3], human activity recognition [4], medical applications [5], text classification [6], image classification [7], remote sensing images [8], fault detection and diagnosis [9], intrusion detection system [10], and other complex engineering problems [11,12,13].
In recent years, MH optimization algorithms have shown significant performance in FS applications. For example, Xu et al. [1] developed a modified version of the grasshopper optimization algorithm (GOA) for FS applications. They used bare-bones Gaussian strategy and elite opposition-based learning to boost the local and global search mechanisms of the GOA and to balance the exploitation and exploration mechanisms. The improved method called EGOA was tested with different numerical functions, and it showed superior performance compared to the original GOA. In [14], three binary versions of the differential evolution (DE), Harris Hawks optimization (HH0), and grey wolf optimization (GWO) algorithms were developed to select features from electroencephalogram (EEG) data for cognitive load detection. They tested the developed methods with several classifiers, including the support vector machine (SVM) and the K-nearest neighbor (KNN). The binary HHO with the KNN classifier achieved the best results. Başaran [15] applied three MH algorithms, namely genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony, to select features from the magnetic resonance imaging (MRI) to classify brain tumors. The three MH methods’ applications improved the SVM classifier’s classification accuracy. Rashno et al. [16] proposed an efficient FS method based on a multi-objective PSO algorithm. They used the feature ranks and particle ranks to update the position of particles and their velocity during each iteration. Additionally, 16 datasets were utilized to assess the performance of the improved PSO method, which achieved significant performance compared to 11 existing methods. Nadimi-Shahraki et al. [17] developed a new FS approach using a modified whale optimization algorithm (WOA). They used a pooling technique and three search mechanisms to boost the search capability of the traditional WOA. This method was utilized to improve the classification of the COVID-19 medical images. Varzaneh et al. [2] proposed an FS approach using a modified equilibrium optimization algorithm. They used an Entropy-based operator to boost the search performance of the traditional equilibrium optimization method. Moreover, they used the binary version of the modified equilibrium optimization method. The binary version was evaluated with eight datasets to be tested as an FS method, which recorded better results compared to several optimization methods. Hassan et al. [18] suggested a binary version of the manta ray foraging (MRF) algorithm as an FS method, which was applied to enhance the classification of the intrusion detection systems. The developed version of the MRF method was applied with the random forest classifier and the adaptive S-shape function, and it was tested with well-known IDS datasets, such as CIC-IDS2017 and NSL-KDD. Eluri and Devarakonda [19] used a binary version of the golden eagle optimizer (GEO) as an FS approach. They utilized a technique called Time Varying Flight Length along with the binary version of the GEO for balancing the exploitation and exploration processes of the GEO. It was compared to a number of well-known MH optimization algorithms as well as feature selection methods. The outcomes showed that the developed GEO had shown competitive performance. Balasubramanian and Ananthamoorthy [20] applied the salp swarm optimizer (SSA) as an FS method to enhance the glaucoma diagnostic. The SSA was utilized with the Kernel-Extreme Learning Machine (KELM) classifier, and it significantly improved the classification accuracy. Long et al. [21] proposed a new FS approach based on an enhanced version of the butterfly optimization algorithm. The modification was inspired by the PSO algorithm to update positions based on the velocity and memory items in the local search stage. It was tested with complex problems and FS benchmark datasets and was compared with different optimization methods.
Utilizing the high performance of the MH algorithms in FS applications, in this paper, we proposed a new FS approach, using a modified version of the Dwarf mongoose optimization algorithm [22]. The DMO was recently developed based on dwarf mongoose foraging behavior in nature. It has three social groups that are used in the algorithm design, called alpha, babysitters, and scout groups. The DMO was assessed using different and complex optimization and engineering problems. The outcomes confirmed its competitive performance. Like other MH methods, an individual MH may face some limitations during the search process. The main limitation of the original DMO is the convergence speed and trapping at local optima. This occurred since the balance between the exploration and exploitation may be weak. In general, exploration phase aims to discover the feasible regions which contain the optimal solutions, whereas, the exploitation aims to discover the optimal solution inside the feasible region. To solve this issue, in this paper, we use a search technique called Quantum-based optimization (QBO) [23]. The main goal of the QBO is to provide better balancing between exploration and exploitation phases to produce efficient solutions. Actually, the QBO has been applied to enhance the performance of several MH techniques, for example, the quantum Henry gas solubility optimization (QHGSO) algorithm [24], the quantum salp swarm algorithm (QSSA) [25], quantum GA (QGA) [26], and quantum marine predators algorithm (QMPA) [27].
The proposed method, called DMOAQ, starts by dividing the sample of the testing set into training and testing. It is followed by setting the initial value for a set of individuals that represent the solution for the tested problem, and then followed by computing the fitness value for each of them using the training sample and allocating the best of them. After that, it uses the operators of DMO to update the current solutions. The process of enhancing the value of solutions is conducted until it reaches the stop condition. Then, it uses the best solution to evaluate the learned model based on the reduced testing set. The developed DMOAQ is tested with different datasets, as well as compared to a number of MH optimization methods to verify its performance.
In short, the main objectives and contributions of this study are listed as follows:
  • To develop a new variant of the Dwarf Mongoose Optimization Algorithm (DMO) algorithm as feature selection.
  • To achieve an optimal balance between exploitation and exploration for the DMO algorithm, we utilized a Quantum-based optimization technique. Thus, a new version of the DMO, called DMOQA, was developed and applied as an FS approach.
  • To evaluate the performance of DMOAQ using a set of different UCI datasets and high dimensional datasets. In addition, we compare it with other well-known FS methods.
The structure of the rest of the sections of this study is given as: Section 2 introduces the background of the Dwarf Mongoose Optimization Algorithm and Quantum-based optimization. Section 3 shows the steps of the developed method. Section 4 presents the experimental results and their discussion. The conclusion and future are introduced in Section 5.

2. Background

2.1. Dwarf Mongoose Optimization Algorithm

The Dwarf Mongoose Optimization (DMO) Algorithm is introduced in [22], as an MH technique that simulates the behavior of the Dwarf Mongoose in nature during searching for food. In general, DMO contains five phases, beginning with the initiation phase, followed by the alpha group phase, when the female alpha directs the exploration of new locations. The phase of the scout group is when new sleeping mounds (i.e., food source) are explored based on old sleeping mounds. Additionally, a greater variety of terrain is explored as foraging intensity increases. The babysitting group phase, which is only carried out when the timer value exceeds the value of the babysitting exchange parameter, is marked in green. The termination phase, which is indicated in neon green, is where the algorithm finally comes to a halt. A thorough explanation is shown in the following phases.

2.1.1. Phase 1: Initialization

The Dwarf Mongoose Optimization Algorithm begins by creating a matrix of size ( n × d ) and initializing the population of mongooses (X) as Equation (1), where the problem dimensions (d) are represented in columns and the population size (n) is displayed in rows.
X = x 1 , 1 x 1 , 2 x 1 , d x 2 , 1 x 2 , 2 x 2 , d x i , j x n , 1 x n , 2 x n , d
An element ( x i , j ) is the component of the problem’s jth dimension found in the population’s ith solution. Typically, its value is determined by using Equation (2) as a uniformly distributed random value that is constrained by the problem’s upper ( U B ) and lower ( L B ) limits.
x i = u n i f r n d ( L B , U B , D )

2.1.2. Phase 2: Alpha Group

The next step in DMO is to compute the fitness value ( f i t ) of each solution then compute its probability, and this process is formulated as:
α = f i t i i = 1 n f i t i
where the number of mongooses n is updated using the following formula.
n = n b s
In Equation (4), b s refers to number of babysitters.
The female alpha of the group uses a distinctive vocalisation ( p e e p ) to communicate with the others. This is used to coordinate the movement of the group in the large foraging area. So, the DMO uses the following equation to update the value of solution X i .
X i + 1 = X i + p h i × p e e p
where P h i is an integer that is produced randomly from the range [–1,1] for each iteration. In addition, the sleeping mound is updated using Equation (6).
s m i = f i t i + 1 f i t i m a x { | f i t i + 1 , f i t i | }
Followed by calculating the average sleeping mound ( φ ), as in Equation (7):
φ = i = 1 n s m i n

2.1.3. Step 3: Scouting Group

The fresh candidate places for food or a sleeping mound are scouted during the algorithm’s scouting phase while existing sleeping mounds are ignored in accordance with nomadic customs. Scouting and foraging take place simultaneously, with the scouted areas only being visited after the babysitters’ exchange criterion is met. The current location of the sleeping mound, a movement vector (M), a stochastic multiplier ( r a n d ), and a movement regulating parameter ( C F ) are used to steer the scouted places for the food and sleeping mounds, as shown in Equation (8).
X i + 1 = f x = X i C F × p h i × r a n d × [ X i M ] , φ i + 1 > φ i X i + C F × p h i × r a n d × [ X i M ] , o t h e r w i s e
The future movement depends on the performance of the mongoose group, and the new scouted position ( X i + 1 ) is simulated with both success and failure of improvement of the overall performance in mind.
The movement of mongoose (M) described in Equation (9) is regulated via the collective-volitive parameter ( C F ), which is computed in Equation (10).
At the start of the search phase, the collective-volitive parameter allows for rapid exploration, but, with each iteration, the focus gradually shifts from discovering new regions to exploiting productive ones.
M = i = 1 n X i × s m i X i
C F = 1 i t e r M a x i t e r 2 × i t e r M a x i t e r

2.1.4. Phase 4: Babysitting Group

The alpha group switches positions with babysitters in the late afternoon and early evening, providing the alpha female an opportunity to look after the colony’s young. The population size affects the ratio of caregivers to foraging mongooses. The following Equation (11) is modeled to imitate the exchange process after midday or in the evening.
p h a s e = S c o u t , C < L B a b y s i t t i n g , C L
Until the counter (C) surpasses the exchange criteria parameter, at which point the data collected by the preceding foraging group are re-initialized, and the counter is reset to zero, the alpha continues in the scout phase. The babysitter’s initial weight is set to zero in order to ensure that the average weight of the alpha group is decreased in the following iteration, which promotes exploitation.

2.1.5. Phase 5: Termination

The dwarf mongoose algorithm stops once the maximum number of defined iterations has been reached and returns the best result obtained during execution.

2.2. Quantum-Based Optimization

Within this section, the basic information for the quantum-based optimization (QBO) is introduced. In QBO, a binary number is used to represent the features that will be either selected (1) or eliminated (0). Each feature in QBO is represented by a quantum bit (Q-bit (q)), where q denotes the superposition of binary values (i.e., ‘1’ and ‘0’). The following equation [23] can be used to establish the mathematical formulation of Q-bit(q).
q = α + i β = e i θ , | α | 2 + | β | 2
where the possibility of the value of the Q-bit being ‘0’ and ‘1’ is given by α and β , respectively. The parameter θ denotes the angle of q, and it is updated using t a n 1 ( α / β ).
The process of finding the changing in the value of q is the main objective of QBO, and it is determined by calculating Δ θ as:
q ( t + 1 ) = q ( t ) × R ( Δ θ ) = [ α ( t ) β ( t ) ] × R ( Δ θ )
R ( Δ θ ) = c o s ( Δ θ ) s i n ( Δ θ ) s i n ( Δ θ ) c o s ( Δ θ )
In Equation (14), Δ θ is the rotation angle of ith Q-bit of jth Q-solution. The value of Δ θ is predefined based on X b as in Table 1, and followed the experimental tests conducted on the knapsack problems [28].

3. Proposed Method

In order to improve the ability to strike a better balance between the exploration and exploitation of DMA while looking for a workable solution, QBO is used. The training and testing sets of the newly created FS method, DMOAQ, are composed of 70% and 30% of the total data, respectively. Then, using the training samples, the fitness values for each population are calculated. After that, the best agent is allocated, which has the lowest fitness value. The solutions are modified using the operators of DMA during the exploitation phase. Updating of each individual continues until the stop criteria are met.
After that, the testing set’s dimension is reduced depending on the best solution, and the implemented DMOAQ as FS is evaluated using a variety of metrics. The DMOAQ (Figure 1) is thoroughly covered in the following sections.

3.1. First Stage

At this point, the N agents representing the population are made. In this study, each solution contains D features and Q-bits. As a result, the formulation of the solution is X i in Equation (15).
X i = q i 1 q i 2 q i 1 q i 2 q i D q i D = [ θ i 1 θ i 2 | θ i D ] , i = 1 , 2 , , N
A collection of superpositions of probabilities for those features that are either selected or not are referred to as X i in this equation.

3.2. Second Stage

Updating the agents until they meet the stop criteria is the main goal of this stage of the DMOAQ. This is conduucted by a number of stages, the first of which is to use Equation (16) to obtain the binary of each unique X i :
B X i , j = 1 i f   rand < β 2 0 o t h e r w i s e
where β is defined in Equation (12). r a n d [ 0 , 1 ] is the random value. The next step is to learn the classifier using the training features that correspond to the ones in B X i j and compute the fitness value, which is defined as:
F i t i = ρ × γ + 1 ρ × | B X i j | D
The total number of features that were chosen is shown in Equation (17) by the variables | B X i , j | , and γ is the error classification using the classifier (i.e., relevant features). The factor that equalizes the fitness value of two parts is ρ [ 0 , 1 ] .
Finding the best agent X b with the smallest F i t b is the next process. After that, we apply the DMA operators as described in Equations (4)–(11).

3.3. Third Stage

At this point, the testing set is reduced by only choosing features that match those in the binary version of X b . The output of the testing set is then predicted using the trained classifier on the testing set’s reduced dimension. The output’s quality is then evaluated using a variety of metrics. Algorithm 1 details the DMOAQ algorithm’s steps.
Algorithm 1 The DMOAQ method.
1:
Input: a dataset with D features, as well as the number of solutions (N), iterations ( t m a x ), and DMOAQ parameters.
2:
First Stage
3:
Divide the data into two sets (i.e., testing and training)
4:
Using Equation (15), create the population X.
5:
Second Stage
6:
Set t = 1
7:
while ( t < t m a x ) do
8:
      Using Equation (16) to obtain the Boolean form of X i .
9:
      Compute fitness value of X i using training set as in Equation (17).
10:
    Allocate the best agent X b .
11:
    Enhance X using Equations (4)–(11).
12:
     t = t + 1 .
13:
Third Stage
14:
Reduce the testing set according to selected features using X b .
15:
Evaluate the quality using different metrics.

4. Experimental Setup and Dataset

In this section, the quality of the proposed DMOAQ is computed through a set of two experimental series. The first experiment used a set of eighteen UCI datasets. Whereas, the second experiment aims to assess the performance of DMOAQ using eight high-dimensional datasets collected from different domains.

4.1. Performance Measures

This study uses six performance metrics to evaluate the effectiveness of the developed DMOAQ. These metrics represent the averages of the accuracy, standard deviation, the selected attributes, and fitness value, as well as the minimum fitness value and maximum fitness value. The definitions of each of them are given as the following.
a c c u r a c y = T P + T N T P + F N + F P + T N
where FN, TN, FP and TP, denotes false negative, true negative, false positive, and true positive, respectively.
Maximum ( M a x ) of the fitness value ( F i t ) is defined in Equation (17).
M a x = max 1 i N r F i t b i
Minimum ( M i n ) of the fitness value ( F i t ) is defined in Equation (17).
M i n = min 1 i N r F i t b i
Standard deviation ( S t d ) of the fitness value ( F i t ) is defined in Equation (17).
S t d = 1 N r i = 1 N r ( F i t i F i t a ) 2
where N r is the number of runs and the average of F i t is given by F i t a .
To validate the performance of developed DMOAQ, it is compared to other methods, including traditional DMOA, grey-wolf optimization algorithm (GWO) [29], Chameleon Swarm Algorithm [30], Electric fish optimization (EFO) [31], Atomic orbital search (AOS) [32], Arithmetic Optimization (AO) [33], Reptile Search Algorithm (RSA) [34], LSHDE [35], sinusoidal parameter adaptation incorporated with L-SHADE (LSEpSin) [36], L-SHADE with Semi Parameter Adaptation (LSPA) [37] and Chaotic heterogeneous comprehensive learning particle swarm optimizer (CHCLPSO) [38]. The parameters of each technique is put according to the original implementation of each of them. Whereas, the common parameters such as t m a x and N are sets of 50 and 20, respectively. To obtain the average of the performance measures, each algorithm is conducted 25 times.

4.2. Experimental Series 1: FS Using UCI Datasets

In this section, the proposed DMOAQ is evaluated in selecting the relevant features using eighteen datasets [39]. These datasets were collected from different fields, and they consist of a varying number of instances, features, and classes. The descriptions of these datasets are given in Table 2.
The results of the DMOAQ are compared to eleven algorithms; these algorithms are: DMOA, bGWO, Chameleon, EFO, AOS, AO, RSA, LSHADE, LSHADE-cnEpSin (LcnE), LSHADE-SPACMA (LSPA) and CHCLPSO. In this regard, six performance measures are used, namely the average (Avg), maximum (MAX), minimum (MIN), and standard deviation (St) of the fitness function values, as well as the accuracy (Acc) and the number of the selected features. All results are presented in Table 3, Table 4, Table 5, Table 6 and Table 7.
Table 3 shows the results of the average of the fitness function for the DMOAQ and the compared algorithms for all datasets. The DMOAQ achieved the best results in 7 out of 18 datasets (i.e., Exactly, KrvskpEW, M-of-n, Tic-tac-toe3, WaveformEW, WineEW, and Zoo). The AO obtained the second rank by obtaining the best fitness function values in 4 datasets (i.e., Breastcancer4, IonosphereEW, PenglungEW, and SonarEW). The Chameleon and bGWO were ranked third and fourth, respectively. The AOS and RSA showed good results and came in the fifth and sixth ranks. Whereas, the rest of the algorithms were ranked as follows: DMOA, EFO, LSHADE, and LSHADE-SPACMA, respectively. The worst values were obtained by the LSHADE-cnEpSin. Figure 2 illustrates the fitness functions’ average for all datasets for this measure.
Table 4 presents the standard deviation values for the algorithms. The proposed DMOAQ showed acceptable standard deviation values, whereas, the CHCLPSO showed the smallest Std values followed by LSHADE and LSHADE-SPACMA. The worst results were shown by the bGWO, Chameleon, and LSHADE-SPACMA. The rest of the algorithms showed similar results to some extent. Figure 3 illustrates the standard deviation average of the fitness functions for all datasets for this measure.
Moreover, the best results of the fitness function values are recorded in Table 5. This table indicates that the proposed DMOAQ obtained the minimum fitness values in 8 out of 18 datasets. It was ranked first with obtaining the best Min results in Exactly, KrvskpEW, M-of-n, Tic-tac-toe3, Vote, WaveformEW, WineEW, and Zoo. The Chameleon showed the second-best results and obtained the best values in 6 out of 18 datasets. Both bGWO, AOS, and AO were ranked third, fourth, and fifth, respectively. The worst performances were shown by LSHADE-cnEpSin, LSHADE, and LSHADE-SPACMA. Figure 4 illustrates the minimum average of the fitness functions for all datasets for this measure.
Table 6 records the results of the worst results of the fitness function values for the compared methods. Based on these, the proposed DMOAQ showed good Max values compared to the other methods; it achieved the best values in 44% of all datasets, namely: Breastcancer4, Exactly, KrvskpEW, M-of-n, Tic-tac-toe3, WaveformEW, WineEW, and Zoo, whereas, it provided competitive results in the rest datasets. The DMOA obtained the best results in three datasets, namely: IonosphereEW, Lymphography, and WineEWsimilar, and was ranked second, followed by AO, AOS, and Chameleon. Figure 5 illustrates the maximum average of the fitness functions for all datasets for this measure. Although both LSHADE-SPACMA obtained the best results in two datasets, its average value was worse than most algorithms, which means it showed the best values in only Exactly2 and SpectEW datasets. The worst performances were shown by LSHADE, LSHADE-SPACMA, and CHCLPSO algorithms.
In terms of the accuracy measure, Table 7 shows the average of the classification accuracy for all methods. This table indicates the best performance of the DMOAQ in 44% of all datasets namely: namely: Exactly, Exactly2, IonosphereEW, KrvskpEW, M-of-n, WaveformEW, WineEW, and Zoo, which indicates the good ability of the DMOAQ in correctly classifying the datasets. The AOS and AO obtained the second and third ranks, followed by Chameleon and bGWO, respectively. The lowest accuracy results were obtained by the LSHADE and LSHADE-cnEpSin algorithms. Figure 6 illustrates the classification accuracy average for all datasets for this measure.

4.3. Experimental Series 1: FS for High Dimensional Datasets

This section evaluates the proposed DMOAQ a set of high dimensional datasets (https://archive.ics.uci.edu/ml/datasets.php, accessed on 20 October 2022) as described in Table 8. These datasets were gathered from the UCI machine learning repository and additional sources, and they span a variety of uses, such as event detection, sentiment analysis, and sensor-based human activity recognition. Eight datasets in total were acquired, including five from the UCI repository and the GPS trajectories dataset, which records the GPS coordinates, trajectory identifier, and duration of moving cars and buses in a city. GAS sensors dataset, which is a collection of 100 records containing temperature, humidity, and data from 8 MOX gas sensors in a home environment. With the help of a Wireless Sensor Network (WSN), the MovementAAL (Indoor User Mobility Prediction using RSS) dataset predicts an indoor user’s movement using radio signal strength (RSS) from multiple WSN nodes and user movement patterns. In order to determine whether a patient will live or die, the Hepatitis dataset includes 155 records of hepatitis C patients. A total of 30 people performing six various activities while holding the smartphone around their waist, including walking, sitting, standing, and lying, are included in the UCI-HAR dataset for human activity recognition. Both the SemEval2017 Task4 [40] dataset and the STS-Gold [41] dataset are English textual datasets for sentiment analysis that were gathered from Twitter, where each tweet is categorized as positive, negative, or neutral. A crisis event detection dataset in English called the C6 is used to forecast crises, including hurricanes, floods, earthquakes, tornadoes, and wildfires [42].
The experimental results of the DMOAQ method are compared to nine algorithms, namely: DMOA, bGWO, Chameleon, RunEFO, AOS, AO, RSA, LSHADE, and LSHADE-cnEpSin (LenE), LSHADE-SPACMA (LSPA), and CHCLPSO. The performance of the proposed DMOAQ is evaluated using five metrics, namely the average values of the fitness function and the selected feature number, as well as the classification accuracy, sensitivity, and specificity. The results are recorded in Table 9, Table 10, Table 11, Table 12 and Table 13.
The results of the fitness function average for the NLP datasets are recorded in Table 9. From this table, the DMOAQ achieved the best average in 63% of the datasets (i.e., STSGold, sensors, Movm, UCI, and C6); however, it obtained the same results with RSA in the Trajectory dataset. By these results, the DMOAQ method was ranked first. The RSA algorithm was ranked second by obtaining the best average in 2 datasets (i.e., Trajectory and hepatitis), followed by the AO, and it obtained the best results in the Sem dataset. The rest of the methods were ordered as follows: by Chameleon, bGWO, DMOA, and AOS, respectively. The worst result was shown by the EFO. Figure 7 illustrates the fitness functions’ average for all datasets for this measure.
Moreover, the selected feature numbers by all methods are recorded in Table 10. In this measure, the best method is the one that can determine the relevant features with the highest accuracy value. As shown in Table 10, the DMOAQ method selected the smallest features amount and considered the best method; in detail, it obtained the lowest features in 7 out of 8 datasets. The Chameleon was the second-best algorithm, followed by AO, bGWO, RSA, and DMOA. Figure 8 shows the ratio of the selected features for all datasets for this measure.
Furthermore, the results of the classification accuracy results are presented in Table 11. The DMOAQ method, as in this table, obtained the highest accuracy results in 3 out of 8 datasets, namely: sensors, Movm, and UCI datasets, whereas, it showed the same results in two datasets with RSA in Trajectory and EFO in C6. The second-best algorithm was the RSA. The AO came in the third rank, followed by DMOA, LSHADE, and Chameleon, respectively. The worst accuracy results were recorded by the AOS. Figure 9 illustrates the classification accuracy average for all datasets for this measure.
In terms of the sensitivity measure, Table 12 reports the results of this measure for all datasets. From Table 12, the proposed DMOAQ showed good sensitivity results in all datasets and obtained the best sensitivity value in c6 datasets and the same values with AO and RSA in both sensors and hepatitis datasets; however, it was ranked second after the RSA algorithm. The AO came in the third rank, followed by DMOA and LSHADE. The AOS was also ranked last.
Moreover, Table 13 records the values of the specificity measure. From this table, the proposed DMOAQ obtained the best specificity results in 3 out of 8 datasets and showed the same results with AO and RSA in the sensors dataset. In addition, the DMOAQ obtained the best specificity average for all datasets equal to 0.9313, whereas the second-best algorithms (i.e., DMOA) obtained a specificity average equal to 0.9188. The third and fourth algorithms were Chameleon and LSHADE. The rest algorithms showed the same results to some extent.
Although the proposed method showed good results in most cases, it failed to reach the optimal values in some datasets because it is sensitive to the initial population, which needs to be improved in the future studies with different methods such as chaotic maps.

5. Conclusions and Future Work

In this paper, a new feature selection method has been presented. This method was called DMOAQ which depends on improving the performance of dwarf mongoose optimization (DMO) algorithm using quantum-based optimization (QBO). The main idea of the proposed method is to increase the balance between the exploration and exploitation of the traditional DMO during the searching process using the QBO to avoid the search limitations of the DMO. The performance of the DMOAQ has evaluated over 18 benchmark feature selection datasets and eight high dimensional datasets. The results were compared to well-known metaheuristic algorithms. The evaluation outcomes demonstrated that the QBO had significantly enhanced the search capability of the traditional DMO, whereas it obtained the best accuracy in 44% of the benchmark datasets as well as 62% in the high dimensional datasets. In the future, the proposed method will be evaluated in different applications such as image segmentation, parameter estimation, and solving real engineering problems. In addition, it can be applied to solve multi-objective optimization problems.

Author Contributions

M.A.E. and R.A.I., conceptualization, supervision, methodology, formal analysis, resources, data curation, and writing—original draft preparation. S.A., writing—review and editing. M.A.A.A.-q., formal analysis, validation, writing—review and editing. S.A., writing—review, editing, and editing, project administration, and funding acquisition. A.A.E., supervision, resources, formal analysis, methodology and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R197), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

The data available upon request.

Conflicts of Interest

The authors declare that there are no conflict of interest regarding the publication of this paper.

References

  1. Xu, Z.; Heidari, A.A.; Kuang, F.; Khalil, A.; Mafarja, M.; Zhang, S.; Chen, H.; Pan, Z. Enhanced Gaussian Bare-Bones Grasshopper Optimization: Mitigating the Performance Concerns for Feature Selection. Expert Syst. Appl. 2022, 212, 118642. [Google Scholar] [CrossRef]
  2. Varzaneh, Z.A.; Hossein, S.; Mood, S.E.; Javidi, M.M. A new hybrid feature selection based on Improved Equilibrium Optimization. Chemom. Intell. Lab. Syst. 2022, 228, 104618. [Google Scholar] [CrossRef]
  3. Al-qaness, M.A. Device-free human micro-activity recognition method using WiFi signals. Geo-Spat. Inf. Sci. 2019, 22, 128–137. [Google Scholar] [CrossRef]
  4. Dahou, A.; Al-qaness, M.A.; Abd Elaziz, M.; Helmi, A. Human activity recognition in IoHT applications using Arithmetic Optimization Algorithm and deep learning. Measurement 2022, 199, 111445. [Google Scholar] [CrossRef]
  5. Remeseiro, B.; Bolon-Canedo, V. A review of feature selection methods in medical applications. Comput. Biol. Med. 2019, 112, 103375. [Google Scholar] [CrossRef]
  6. Pintas, J.T.; Fernandes, L.A.; Garcia, A.C.B. Feature selection methods for text classification: A systematic literature review. Artif. Intell. Rev. 2021, 54, 6149–6200. [Google Scholar] [CrossRef]
  7. Raj, R.J.S.; Shobana, S.J.; Pustokhina, I.V.; Pustokhin, D.A.; Gupta, D.; Shankar, K. Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 2020, 8, 58006–58017. [Google Scholar] [CrossRef]
  8. AL-Alimi, D.; Al-qaness, M.A.; Cai, Z.; Dahou, A.; Shao, Y.; Issaka, S. Meta-Learner Hybrid Models to Classify Hyperspectral Images. Remote Sens. 2022, 14, 1038. [Google Scholar] [CrossRef]
  9. Onel, M.; Kieslich, C.A.; Guzman, Y.A.; Floudas, C.A.; Pistikopoulos, E.N. Big data approach to batch process monitoring: Simultaneous fault detection and diagnosis using nonlinear support vector machine-based feature selection. Comput. Chem. Eng. 2018, 115, 46–63. [Google Scholar] [CrossRef]
  10. Dahou, A.; Abd Elaziz, M.; Chelloug, S.A.; Awadallah, M.A.; Al-Betar, M.A.; Al-qaness, M.A.; Forestiero, A. Intrusion Detection System for IoT Based on Deep Learning and Modified Reptile Search Algorithm. Comput. Intell. Neurosci. 2022, 2022, 6473507. [Google Scholar] [CrossRef]
  11. Anter, A.M.; Ali, M. Feature selection strategy based on hybrid crow search optimization algorithm integrated with chaos theory and fuzzy c-means algorithm for medical diagnosis problems. Soft Comput. 2020, 24, 1565–1584. [Google Scholar] [CrossRef]
  12. Al-qaness, M.A.; Ewees, A.A.; Fan, H.; AlRassas, A.M.; Abd Elaziz, M. Modified aquila optimizer for forecasting oil production. Geo-Spat. Inf. Sci. 2022, 1–17. [Google Scholar] [CrossRef]
  13. Bashir, S.; Khan, Z.S.; Khan, F.H.; Anjum, A.; Bashir, K. Improving heart disease prediction using feature selection approaches. In Proceedings of the 16th IEEE International Bhurban Conference on Applied Sciences and Technology (IBCAST), Islamabad, Pakistan, 8–12 January 2019; pp. 619–623. [Google Scholar]
  14. Yedukondalu, J.; Sharma, L.D. Cognitive load detection using circulant singular spectrum analysis and Binary Harris Hawks Optimization based feature selection. Biomed. Signal Process. Control. 2022, 79, 104006. [Google Scholar] [CrossRef]
  15. Başaran, E. A new brain tumor diagnostic model: Selection of textural feature extraction algorithms and convolution neural network features with optimization algorithms. Comput. Biol. Med. 2022, 148, 105857. [Google Scholar] [CrossRef]
  16. Rashno, A.; Shafipour, M.; Fadaei, S. Particle ranking: An Efficient Method for Multi-Objective Particle Swarm Optimization Feature Selection. Knowl.-Based Syst. 2022, 245, 108640. [Google Scholar] [CrossRef]
  17. Nadimi-Shahraki, M.H.; Zamani, H.; Mirjalili, S. Enhanced whale optimization algorithm for medical feature selection: A COVID-19 case study. Comput. Biol. Med. 2022, 148, 105858. [Google Scholar] [CrossRef]
  18. Hassan, I.H.; Mohammed, A.; Masama, M.A.; Ali, Y.S.; Abdulrahim, A. An Improved Binary Manta Ray Foraging Optimization Algorithm based feature selection and Random Forest Classifier for Network Intrusion Detection. Intell. Syst. Appl. 2022, 16, 200114. [Google Scholar] [CrossRef]
  19. Eluri, R.K.; Devarakonda, N. Binary Golden Eagle Optimizer with Time-Varying Flight Length for feature selection. Knowl.-Based Syst. 2022, 247, 108771. [Google Scholar] [CrossRef]
  20. Balasubramanian, K.; Ananthamoorthy, N. Correlation-based feature selection using bio-inspired algorithms and optimized KELM classifier for glaucoma diagnosis. Appl. Soft Comput. 2022, 128, 109432. [Google Scholar] [CrossRef]
  21. Long, W.; Xu, M.; Jiao, J.; Wu, T.; Tang, M.; Cai, S. A velocity-based butterfly optimization algorithm for high-dimensional optimization and feature selection. Expert Syst. Appl. 2022, 201, 117217. [Google Scholar] [CrossRef]
  22. Agushaka, J.O.; Ezugwu, A.E.; Abualigah, L. Dwarf mongoose optimization algorithm. Comput. Methods Appl. Mech. Eng. 2022, 391, 114570. [Google Scholar] [CrossRef]
  23. Xing, H.; Ji, Y.; Bai, L.; Sun, Y. An improved quantum-inspired evolutionary algorithm for coding resource optimization based network coding multicast scheme. AEU Int. J. Electron. Commun. 2010, 64, 1105–1113. [Google Scholar] [CrossRef]
  24. Mohammadi, D.; Abd Elaziz, M.; Moghdani, R.; Demir, E.; Mirjalili, S. Quantum Henry gas solubility optimization algorithm for global optimization. Eng. Comput. 2021, 38, 2329–2348. [Google Scholar] [CrossRef]
  25. Chen, R.; Dong, C.; Ye, Y.; Chen, Z.; Liu, Y. QSSA: Quantum evolutionary salp swarm algorithm for mechanical design. IEEE Access 2019, 7, 145582–145595. [Google Scholar] [CrossRef]
  26. SaiToh, A.; Rahimi, R.; Nakahara, M. A quantum genetic algorithm with quantum crossover and mutation operations. Quantum Inf. Process. 2014, 13, 737–755. [Google Scholar] [CrossRef] [Green Version]
  27. Abd Elaziz, M.; Mohammadi, D.; Oliva, D.; Salimifard, K. Quantum marine predators algorithm for addressing multilevel image segmentation. Appl. Soft Comput. 2021, 110, 107598. [Google Scholar] [CrossRef]
  28. Srikanth, K.; Panwar, L.K.; Panigrahi, B.K.; Herrera-Viedma, E.; Sangaiah, A.K.; Wang, G.G. Meta-heuristic framework: Quantum inspired binary grey wolf optimizer for unit commitment problem. Comput. Electr. Eng. 2018, 70, 243–260. [Google Scholar] [CrossRef]
  29. Ibrahim, R.A.; Elaziz, M.A.; Lu, S. Chaotic opposition-based grey-wolf optimization algorithm based on differential evolution and disruption operator for global optimization. Expert Syst. Appl. 2018, 108, 1–27. [Google Scholar] [CrossRef]
  30. Braik, M.S. Chameleon Swarm Algorithm: A bio-inspired optimizer for solving engineering design problems. Expert Syst. Appl. 2021, 174, 114685. [Google Scholar] [CrossRef]
  31. Yilmaz, S.; Sen, S. Electric fish optimization: A new heuristic algorithm inspired by electrolocation. Neural Comput. Appl. 2020, 32, 11543–11578. [Google Scholar] [CrossRef]
  32. Azizi, M. Atomic orbital search: A novel metaheuristic algorithm. Appl. Math. Model. 2021, 93, 657–683. [Google Scholar] [CrossRef]
  33. Abualigah, L.; Diabat, A.; Mirjalili, S.; Abd Elaziz, M.; Gandomi, A.H. The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 2021, 376, 113609. [Google Scholar] [CrossRef]
  34. Abualigah, L.; Abd Elaziz, M.; Sumari, P.; Geem, Z.W.; Gandomi, A.H. Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Syst. Appl. 2022, 191, 116158. [Google Scholar] [CrossRef]
  35. Tanabe, R.; Fukunaga, A.S. Improving the search performance of SHADE using linear population size reduction. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China, 6–11 July 2014; pp. 1658–1665. [Google Scholar]
  36. Awad, N.H.; Ali, M.Z.; Suganthan, P.N.; Reynolds, R.G. An ensemble sinusoidal parameter adaptation incorporated with L-SHADE for solving CEC2014 benchmark problems. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016; pp. 2958–2965. [Google Scholar]
  37. Mohamed, A.W.; Hadi, A.A.; Fattouh, A.M.; Jambi, K.M. LSHADE with semi-parameter adaptation hybrid with CMA-ES for solving CEC 2017 benchmark problems. In Proceedings of the 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia, Spain, 5–8 June 2017; pp. 145–152. [Google Scholar]
  38. Yousri, D.; Allam, D.; Eteiba, M.; Suganthan, P.N. Chaotic heterogeneous comprehensive learning particle swarm optimizer variants for permanent magnet synchronous motor models parameters estimation. Iran. J. Sci. Technol. Trans. Electr. Eng. 2020, 44, 1299–1318. [Google Scholar] [CrossRef]
  39. Dua, D.; Graff, C. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2017. [Google Scholar]
  40. Rosenthal, S.; Farra, N.; Nakov, P. SemEval-2017 task 4: Sentiment analysis in Twitter. arXiv 2019, arXiv:1912.00741. [Google Scholar]
  41. Ahuja, R.; Sharma, S. Sentiment Analysis on Different Domains Using Machine Learning Algorithms. In Advances in Data and Information Sciences; Springer: Berlin/Heidelberg, Germany, 2022; pp. 143–153. [Google Scholar]
  42. Liu, J.; Singhal, T.; Blessing, L.T.; Wood, K.L.; Lim, K.H. Crisisbert: A robust transformer for crisis classification and contextual crisis embedding. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media, Virtual, 30 August–2 September 2021; pp. 133–141. [Google Scholar]
Figure 1. The main work flow of the proposed DMOAQ feature selection method.
Figure 1. The main work flow of the proposed DMOAQ feature selection method.
Mathematics 10 04565 g001
Figure 2. Average of values of the fitness function.
Figure 2. Average of values of the fitness function.
Mathematics 10 04565 g002
Figure 3. Average of the standard deviation of the fitness function.
Figure 3. Average of the standard deviation of the fitness function.
Mathematics 10 04565 g003
Figure 4. Average of minimum values of the fitness function.
Figure 4. Average of minimum values of the fitness function.
Mathematics 10 04565 g004
Figure 5. Average of maximum values of the fitness function.
Figure 5. Average of maximum values of the fitness function.
Mathematics 10 04565 g005
Figure 6. Average of the classification accuracy for UCI datasets.
Figure 6. Average of the classification accuracy for UCI datasets.
Mathematics 10 04565 g006
Figure 7. Average of the fitness function values.
Figure 7. Average of the fitness function values.
Mathematics 10 04565 g007
Figure 8. Ratio of the selected features for all datasets.
Figure 8. Ratio of the selected features for all datasets.
Mathematics 10 04565 g008
Figure 9. Average of the classification accuracy for all datasets.
Figure 9. Average of the classification accuracy for all datasets.
Mathematics 10 04565 g009
Table 1. Predefined value of Δ θ .
Table 1. Predefined value of Δ θ .
X ij X b f ( X i ) f ( X b ) Δ θ
00F0
01F0.01
10F–0.01
11F0
00T0
01T0
10T0
11T0
Table 2. Description of UCI datasets.
Table 2. Description of UCI datasets.
DSNumber of FeaturesNumber of InstancesNumber of ClassesData Category
Breastcancer (S1)96992Biology
BreastEW (S2)305692Biology
CongressEW (S3)164352Politics
Exactly (S4)1310002Biology
Exactly2 (S5)1310002Biology
HeartEW (S6)132702Biology
IonosphereEW (S7)343512Electromagnetic
KrvskpEW (S8)3631962Game
Lymphography (S9)181482Biology
M-of-n (S10)1310002Biology
PenglungEW (S11)325732Biology
SonarEW (S12)602082Biology
SpectEW (S13)222672Biology
Tic-tac-toc (S14)99582Game
Vote (S15)163002Politics
WaveformEW (S16)4050003Physics
WINEEW (S17)131783Chemistry
Zoo (S18)161016Artificial
Table 3. Average of the fitness function values.
Table 3. Average of the fitness function values.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnELSPACHCLPSO
Breastcancer40.06550.06650.06270.06690.08230.07100.06200.08240.08330.13250.10670.1325
BreastEW0.06190.09120.07960.06800.08840.04860.05690.07520.12540.19960.13420.1921
CongressEW0.04450.06350.04730.06130.09890.03400.06190.01910.06550.16090.05750.1790
Exactly0.04620.05930.09890.08420.07930.07250.06500.11030.25040.29420.29770.2754
Exactly20.25430.28770.21520.28090.31340.29340.24670.28680.22580.37480.22230.2454
HeartEW0.18230.19920.19200.16170.16620.19230.17670.17330.20190.35830.25190.2080
IonosphereEW0.06630.06860.09730.05940.14570.06410.05910.08870.11600.16600.15610.1740
KrvskpEW0.06880.09200.10240.09080.09920.07930.08150.09530.39040.40100.35840.3168
Lymphography0.15140.10230.10500.09290.22420.10070.20720.14240.25670.26670.21670.3167
M-of-n0.04620.05350.07660.06660.08080.06840.05230.11810.21180.35030.31970.2504
PenglungEW0.09980.10210.08100.14710.20060.05470.01870.13080.32000.33300.24740.2837
SonarEW0.08410.12760.08400.08860.13740.08230.07410.12130.28330.41670.39170.2381
SpectEW0.16110.18450.12350.16610.23450.12480.13420.20580.16300.27310.13700.2247
Tic-tac-toe30.18210.23070.26780.24620.24200.22900.22920.22760.26350.32340.32080.2934
Vote0.03580.07520.11140.09090.10430.04950.08930.03530.05670.14500.11420.1563
WaveformEW0.26160.27880.29140.28430.31150.28780.29220.29690.35740.45060.43810.3075
WineEW0.03850.04150.06340.04770.07960.04460.05770.06920.18330.18190.15970.1709
Zoo0.00690.03880.04810.02340.04250.04130.01130.03380.33330.21330.23330.2063
Table 4. Standard deviation of the fitness function values.
Table 4. Standard deviation of the fitness function values.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnELSPACHCLPSO
Breastcancer40.00000.00620.00880.00450.00370.00750.00630.00730.00000.01060.01410.0000
BreastEW0.00550.00460.01120.01170.01050.00770.00860.01220.00000.01920.00000.0000
CongressEW0.00210.00440.01020.01670.01380.00180.02230.00560.00000.02930.00000.0000
Exactly0.00000.01740.04760.05640.02820.02490.01720.05210.06450.00000.00000.0000
Exactly20.00210.00490.02420.01280.00950.01590.02120.00730.00000.01480.00000.0000
HeartEW0.00990.01220.03150.01140.03400.02500.02720.01430.00000.02230.03400.0000
IonosphereEW0.01080.00660.01070.01370.00330.01000.01500.01400.00000.02360.00960.0000
KrvskpEW0.00610.00600.01070.00980.01090.01070.01310.01190.00540.00890.04680.0000
Lymphography0.00920.01660.02240.02260.00940.03660.06030.00700.00940.02360.00000.0000
M-of-n0.00000.00390.03080.01600.02080.02890.00640.05650.00000.04220.00870.0000
PenglungEW0.00130.00230.03460.04540.00070.02910.00870.03990.00000.01230.08520.0000
SonarEW0.01410.01640.01900.02200.01130.01960.00950.00970.00000.00000.08250.0000
SpectEW0.01240.01150.02150.02900.01380.02430.03420.01550.00000.01440.00000.0000
Tic-tac-toe30.00000.00540.01210.01640.01190.00780.00710.00840.00000.02060.00000.0000
Vote0.00610.00520.01330.01150.01060.01090.00650.00140.00000.04710.03180.0000
WaveformEW0.00780.00750.00940.01660.00770.02060.00850.00800.00000.00320.01290.0100
WineEW0.00500.00460.01120.01160.01720.00640.01760.00610.00000.01370.05700.0000
Zoo0.00190.00430.00810.00600.01120.00950.00810.01630.00000.00000.00000.0000
Table 5. Minimum of the fitness function values.
Table 5. Minimum of the fitness function values.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnELSPACHCLPSO
Breastcancer40.06550.06080.04620.05900.07660.06550.05260.07660.08330.12500.09670.1325
BreastEW0.05370.08250.06160.04700.07820.03910.04700.05950.12540.18600.13420.1921
CongressEW0.04350.05600.02910.04760.07910.03320.03940.01660.06550.14020.05750.1790
Exactly0.04620.04620.05380.04620.05380.04620.05380.05380.20480.29420.29770.2754
Exactly20.25370.28520.20570.25580.29680.26930.23720.27940.22580.36430.22230.2454
HeartEW0.17820.16410.13210.14620.12820.15510.13850.15510.20190.34260.22780.2080
IonosphereEW0.04800.05090.07810.03620.14080.05270.04010.07510.11600.14930.14930.1740
KrvskpEW0.05580.07950.08530.07810.08640.06850.06830.07580.38660.39470.32540.3168
Lymphography0.14000.06890.08000.05560.21630.07440.12330.13220.25000.25000.21670.3167
M-of-n0.04620.04620.04620.04620.06600.04620.04620.05380.21180.32050.31350.2504
PenglungEW0.09750.09970.02460.07140.19970.00280.00800.07600.32000.32420.18720.2837
SonarEW0.05330.10620.05480.02330.12620.06640.06310.10570.28330.41670.33330.2381
SpectEW0.13640.17120.08480.11210.22120.10450.10300.18180.16300.26300.13700.2247
Tic-tac-toe30.18210.22900.25240.22430.22430.21490.22600.21960.26350.30890.32080.2934
Vote0.02750.06750.08880.07000.09500.04000.08250.03380.05670.11170.09170.1563
WaveformEW0.24840.26160.27110.25620.30310.25690.28350.28770.35740.44830.42900.3075
WineEW0.03080.03080.04620.03080.06350.03850.03080.06150.18330.17220.11940.1709
Zoo0.00630.03130.02500.01250.02500.03130.00630.01880.33330.21330.23330.2063
Table 6. Maximum of the fitness function values.
Table 6. Maximum of the fitness function values.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnELSPACHCLPSO
Breastcancer40.06550.08300.07660.07190.08600.08010.06840.09410.08330.14000.11670.1325
BreastEW0.07040.09820.10320.08950.10490.05700.06950.09070.12540.21320.13420.1921
CongressEW0.05190.07070.06850.10800.11640.03730.08510.02910.06550.18160.05750.1790
Exactly0.04620.12320.19390.25240.11420.09750.09300.18940.29600.29420.29770.2754
Exactly20.26290.29740.30580.30000.31990.31090.28470.29610.22580.38530.22230.2454
HeartEW0.21280.21030.23850.18590.21030.21280.21030.19100.20190.37410.27590.2080
IonosphereEW0.08800.07740.11720.08600.14960.08010.07830.10930.11600.18260.16290.1740
KrvskpEW0.07810.10190.12140.11300.11290.09200.09760.10490.39420.40730.39150.3168
Lymphography0.16860.12670.16300.13220.23890.16220.26890.14890.26330.28330.21670.3167
M-of-n0.04620.06150.18040.10070.10970.11870.06150.18040.21180.38020.32580.2504
PenglungEW0.10180.10770.14920.22520.20150.06950.02680.18430.32000.34170.30770.2837
SonarEW0.11120.15740.11950.11570.15260.10620.08810.12950.28330.41670.45000.2381
SpectEW0.18790.21360.16820.21670.25610.16670.17120.22120.16300.28330.13700.2247
Tic-tac-toe30.18210.24650.30570.28470.25590.23250.24180.24180.26350.33800.32080.2934
Vote0.05250.08380.13630.11380.12250.06750.10000.03630.05670.17830.13670.1563
WaveformEW0.27920.29170.30930.30940.32100.31280.30210.30620.35740.45280.44720.3075
WineEW0.04620.04620.08850.06920.10190.05380.07880.07690.18330.19170.20000.1709
Zoo0.01250.04380.06250.03750.05000.05630.02500.05630.33330.21330.23330.2063
Table 7. Accuracy measure for all datasets.
Table 7. Accuracy measure for all datasets.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnELSPACHCLPSO
base_Breastcancer40.96430.93320.96430.96890.96290.95570.96570.95290.92860.95360.94290.9571
BreastEW0.97720.91050.94910.95920.96840.97190.96490.95790.86840.88160.90350.9825
CongressEW0.96720.91490.97590.95520.96090.97470.96320.98850.95400.93680.96550.9545
Exactly1.00000.65650.96150.97230.98200.98100.99100.95100.73750.67500.67000.6800
Exactly20.78800.62880.77500.73580.72700.73900.74300.76000.72500.65500.73000.7400
HeartEW0.86110.67500.82690.86940.88890.84440.87040.85190.75930.75000.75930.8148
IonosphereEW0.97610.89300.92110.96340.91270.95490.96900.92960.92960.92960.94370.9167
KrvskpEW0.97390.74650.95130.96270.97130.96750.96630.92750.58520.64140.55940.6719
Lymphography0.89810.69670.93400.96660.82990.94000.82530.91330.80000.83330.83330.8667
M-of-n1.00000.76850.98450.99350.98200.98901.00000.94400.74500.69000.71000.7300
PenglungEW0.93330.93000.94310.85670.86670.94571.00000.86140.73330.73860.88460.8571
SonarEW0.96900.78100.95600.95710.93810.96670.98100.90000.69050.54760.53570.8571
SpectEW0.86850.70930.89540.85830.81110.91480.87410.80370.81480.82410.85190.7778
base_Tic-tac-toe30.85940.68590.77660.79380.80520.82710.82190.81880.71880.77600.87500.6354
Vote0.98830.85670.92170.95250.94670.97000.93000.97330.96670.98330.90831.0000
WaveformEW0.76480.66170.73680.74320.73940.74080.75200.71180.53700.52300.51700.7100
WineEW1.00000.92640.98471.00000.98331.00000.98890.97780.83330.93060.98610.8889
Zoo1.00000.95731.00001.00001.00001.00001.00001.00000.66670.93330.83330.9000
Table 8. The number of features and samples in each dataset.
Table 8. The number of features and samples in each dataset.
DatasetFeaturesInstancesClasses
GPS trajectories61632
UCI-HAR910,2996
Hepatitis191552
STS-Gold19220342
SemEval2017 Task4 (Sem)19261,8543
GAS sensors11919,4383
MovementAAL (Movm)413,1972
C619232,4626
Table 9. Average of the fitness function values.
Table 9. Average of the fitness function values.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnE
Sem0.377600.408230.393450.379730.452010.423810.373130.384220.425760.43123
STS-Gold0.072080.105540.105260.082740.143040.123750.096150.078180.133930.13223
GPS trajectories0.116800.224870.202730.190760.265360.224610.133980.116800.243490.24180
GAS sensors0.014480.042430.034110.019490.088520.058050.016870.018480.072890.07731
Hepatitis0.082110.138820.114230.101070.177310.143670.084240.070170.154070.15510
Movm0.167740.230360.231660.216170.280090.239990.193900.173860.265190.26310
UCI-HAR0.090260.138930.116330.094790.177830.137040.104880.095140.155740.15822
C60.027280.066700.048080.041360.108180.075530.038150.029850.087300.08773
Table 10. Number of selected features obtained from competitive algorithms.
Table 10. Number of selected features obtained from competitive algorithms.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnE
Sem20131613923213311227172181
STS-Gold2888452316010353110151135
GPS trajectories403191495763045198211462449
GAS sensors15342237937495507380
Hepatitis11711025041842038121215924514731498
Movm97249167132415258118247383328
UCI-HAR58531253133958678100450693718
C6492221137647329861118351350
Table 11. Accuracy obtained from competitive algorithm for all datasets.
Table 11. Accuracy obtained from competitive algorithm for all datasets.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnE
Sem0.588160.596960.586370.593130.587270.562600.590040.575790.593290.59069
STS-Gold0.936120.933660.909090.921380.933660.911550.923830.918920.938570.93120
GPS trajectories0.870370.796300.796300.796300.796300.796300.851850.870370.796300.79630
GAS sensors0.988250.982370.981200.983550.982370.982370.984720.981200.982370.98355
Hepatitis0.909090.896100.896100.896100.896100.896100.909090.922080.896100.89610
Movm0.817310.798080.778850.788460.778850.750000.788460.807690.788460.77885
UCI-HAR0.901930.896840.895150.897860.894810.891410.893110.897180.893790.89345
C60.971040.968710.968380.968710.971040.970710.969370.968380.970710.97004
Table 12. Sensitivity obtained from competitive algorithm for all datasets.
Table 12. Sensitivity obtained from competitive algorithm for all datasets.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnE
Sem0.546830.559920.532480.551610.531720.504530.554380.553370.548590.53902
STS-Gold0.960850.960850.950180.946620.971530.950180.953740.939500.967970.95730
GPS trajectories0.827590.758620.758620.758620.758620.758620.862070.931030.758620.75862
GAS sensors0.991940.983870.983870.983870.987900.987900.991940.991940.983870.98790
Hepatitis0.894740.868420.868420.868420.868420.868420.894740.894740.868420.86842
Movm0.700000.680000.660000.660000.660000.620000.680000.720000.660000.66000
UCI-HAR0.891130.895160.899190.885080.897180.893150.895160.887100.895160.89315
C60.977110.959510.963030.961270.964790.964790.964790.964790.963030.96479
Table 13. Specificity obtained from competitive algorithm for all datasets.
Table 13. Specificity obtained from competitive algorithm for all datasets.
DMOAQDMOAbGWOChameleonEFOAOSAORSALSHADELcnE
Sem0.836380.844080.848050.842160.845640.840230.834820.841790.841790.84709
STS-Gold0.880950.873020.817460.865080.849210.825400.857140.873020.873020.87302
GPS trajectories0.920000.840000.840000.840000.840000.840000.840000.800000.840000.84000
GAS sensors0.998340.996680.996680.996680.996680.996680.998340.998340.996680.99668
Hepatitis0.923080.923080.923080.923080.923080.923080.923080.948720.923080.92308
Movm0.925930.907410.888890.907410.888890.870370.888890.888890.907410.88889
UCI-HAR0.979190.978380.976740.981640.978380.977970.978780.982050.976740.97797
C60.986450.987680.988100.987680.988100.988510.988920.989740.988510.98810
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Elaziz, M.A.; Ewees, A.A.; Al-qaness, M.A.A.; Alshathri, S.; Ibrahim, R.A. Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization. Mathematics 2022, 10, 4565. https://doi.org/10.3390/math10234565

AMA Style

Elaziz MA, Ewees AA, Al-qaness MAA, Alshathri S, Ibrahim RA. Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization. Mathematics. 2022; 10(23):4565. https://doi.org/10.3390/math10234565

Chicago/Turabian Style

Elaziz, Mohamed Abd, Ahmed A. Ewees, Mohammed A. A. Al-qaness, Samah Alshathri, and Rehab Ali Ibrahim. 2022. "Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization" Mathematics 10, no. 23: 4565. https://doi.org/10.3390/math10234565

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop