Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization

Elaziz, Mohamed Abd; Ewees, Ahmed A.; Al-qaness, Mohammed A. A.; Alshathri, Samah; Ibrahim, Rehab Ali

doi:10.3390/math10234565

Open AccessArticle

Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization

by

Mohamed Abd Elaziz

^1,2,3,4,*

,

Ahmed A. Ewees

⁵,

Mohammed A. A. Al-qaness

⁶

,

Samah Alshathri

^7,*

and

Rehab Ali Ibrahim

²

¹

Faculty of Computer Science & Engineering, Galala University, Suze 435611, Egypt

²

Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt

³

Artificial Intelligence Research Center (AIRC), Ajman University, Ajman 346, United Arab Emirates

⁴

Department of Electrical and Computer Engineering, Lebanese American University, Byblos 13518, Lebanon

⁵

Department of Computer, Damietta University, Damietta 34517, Egypt

⁶

College of Physics and Electronic Information Engineering, Zhejiang Normal University, Jinhua 321004, China

⁷

Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Mathematics 2022, 10(23), 4565; https://doi.org/10.3390/math10234565

Submission received: 20 October 2022 / Revised: 22 November 2022 / Accepted: 29 November 2022 / Published: 2 December 2022

(This article belongs to the Section E1: Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

Feature selection (FS) methods play essential roles in different machine learning applications. Several FS methods have been developed; however, those FS methods that depend on metaheuristic (MH) algorithms showed impressive performance in various domains. Thus, in this paper, based on the recent advances in MH algorithms, we introduce a new FS technique to modify the performance of the Dwarf Mongoose Optimization (DMO) Algorithm using quantum-based optimization (QBO). The main idea is to utilize QBO as a local search of the traditional DMO to avoid its search limitations. So, the developed method, named DMOAQ, benefits from the advantages of the DMO and QBO. It is tested with well-known benchmark and high-dimensional datasets, with comprehensive comparisons to several optimization methods, including the original DMO. The evaluation outcomes verify that the DMOAQ has significantly enhanced the search capability of the traditional DMO and outperformed other compared methods in the evaluation experiments.

Keywords:

feature selection (FS); dwarf mongoose optimization (DMO); quantum-based optimization (QBO); metaheuristic (MH)

MSC:

68Txx

1. Introduction

The recent advances in meta-heuristic (MH) algorithms have been widely employed in different applications, including the feature selection (FS) problems [1]. The FS problems are necessary for machine learning-based classification methods. In the domains of data mining and machine learning, the FS process is one of the most crucial preprocessing procedures for evaluating high-dimensional data. Classification accuracy in machine learning issues is heavily reliant on the chosen characteristics of a dataset. The basic goal of feature selection is to improve the performance of algorithms by removing extraneous and irrelevant characteristics from the dataset [2]. It is clear that FS methods play significant roles in enhancing classification accuracy as well as reducing computational costs. In general, FS methods can be adopted in various applications, such as wireless sensing [3], human activity recognition [4], medical applications [5], text classification [6], image classification [7], remote sensing images [8], fault detection and diagnosis [9], intrusion detection system [10], and other complex engineering problems [11,12,13].

In recent years, MH optimization algorithms have shown significant performance in FS applications. For example, Xu et al. [1] developed a modified version of the grasshopper optimization algorithm (GOA) for FS applications. They used bare-bones Gaussian strategy and elite opposition-based learning to boost the local and global search mechanisms of the GOA and to balance the exploitation and exploration mechanisms. The improved method called EGOA was tested with different numerical functions, and it showed superior performance compared to the original GOA. In [14], three binary versions of the differential evolution (DE), Harris Hawks optimization (HH0), and grey wolf optimization (GWO) algorithms were developed to select features from electroencephalogram (EEG) data for cognitive load detection. They tested the developed methods with several classifiers, including the support vector machine (SVM) and the K-nearest neighbor (KNN). The binary HHO with the KNN classifier achieved the best results. Başaran [15] applied three MH algorithms, namely genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony, to select features from the magnetic resonance imaging (MRI) to classify brain tumors. The three MH methods’ applications improved the SVM classifier’s classification accuracy. Rashno et al. [16] proposed an efficient FS method based on a multi-objective PSO algorithm. They used the feature ranks and particle ranks to update the position of particles and their velocity during each iteration. Additionally, 16 datasets were utilized to assess the performance of the improved PSO method, which achieved significant performance compared to 11 existing methods. Nadimi-Shahraki et al. [17] developed a new FS approach using a modified whale optimization algorithm (WOA). They used a pooling technique and three search mechanisms to boost the search capability of the traditional WOA. This method was utilized to improve the classification of the COVID-19 medical images. Varzaneh et al. [2] proposed an FS approach using a modified equilibrium optimization algorithm. They used an Entropy-based operator to boost the search performance of the traditional equilibrium optimization method. Moreover, they used the binary version of the modified equilibrium optimization method. The binary version was evaluated with eight datasets to be tested as an FS method, which recorded better results compared to several optimization methods. Hassan et al. [18] suggested a binary version of the manta ray foraging (MRF) algorithm as an FS method, which was applied to enhance the classification of the intrusion detection systems. The developed version of the MRF method was applied with the random forest classifier and the adaptive S-shape function, and it was tested with well-known IDS datasets, such as CIC-IDS2017 and NSL-KDD. Eluri and Devarakonda [19] used a binary version of the golden eagle optimizer (GEO) as an FS approach. They utilized a technique called Time Varying Flight Length along with the binary version of the GEO for balancing the exploitation and exploration processes of the GEO. It was compared to a number of well-known MH optimization algorithms as well as feature selection methods. The outcomes showed that the developed GEO had shown competitive performance. Balasubramanian and Ananthamoorthy [20] applied the salp swarm optimizer (SSA) as an FS method to enhance the glaucoma diagnostic. The SSA was utilized with the Kernel-Extreme Learning Machine (KELM) classifier, and it significantly improved the classification accuracy. Long et al. [21] proposed a new FS approach based on an enhanced version of the butterfly optimization algorithm. The modification was inspired by the PSO algorithm to update positions based on the velocity and memory items in the local search stage. It was tested with complex problems and FS benchmark datasets and was compared with different optimization methods.

Utilizing the high performance of the MH algorithms in FS applications, in this paper, we proposed a new FS approach, using a modified version of the Dwarf mongoose optimization algorithm [22]. The DMO was recently developed based on dwarf mongoose foraging behavior in nature. It has three social groups that are used in the algorithm design, called alpha, babysitters, and scout groups. The DMO was assessed using different and complex optimization and engineering problems. The outcomes confirmed its competitive performance. Like other MH methods, an individual MH may face some limitations during the search process. The main limitation of the original DMO is the convergence speed and trapping at local optima. This occurred since the balance between the exploration and exploitation may be weak. In general, exploration phase aims to discover the feasible regions which contain the optimal solutions, whereas, the exploitation aims to discover the optimal solution inside the feasible region. To solve this issue, in this paper, we use a search technique called Quantum-based optimization (QBO) [23]. The main goal of the QBO is to provide better balancing between exploration and exploitation phases to produce efficient solutions. Actually, the QBO has been applied to enhance the performance of several MH techniques, for example, the quantum Henry gas solubility optimization (QHGSO) algorithm [24], the quantum salp swarm algorithm (QSSA) [25], quantum GA (QGA) [26], and quantum marine predators algorithm (QMPA) [27].

The proposed method, called DMOAQ, starts by dividing the sample of the testing set into training and testing. It is followed by setting the initial value for a set of individuals that represent the solution for the tested problem, and then followed by computing the fitness value for each of them using the training sample and allocating the best of them. After that, it uses the operators of DMO to update the current solutions. The process of enhancing the value of solutions is conducted until it reaches the stop condition. Then, it uses the best solution to evaluate the learned model based on the reduced testing set. The developed DMOAQ is tested with different datasets, as well as compared to a number of MH optimization methods to verify its performance.

In short, the main objectives and contributions of this study are listed as follows:

To develop a new variant of the Dwarf Mongoose Optimization Algorithm (DMO) algorithm as feature selection.
To achieve an optimal balance between exploitation and exploration for the DMO algorithm, we utilized a Quantum-based optimization technique. Thus, a new version of the DMO, called DMOQA, was developed and applied as an FS approach.
To evaluate the performance of DMOAQ using a set of different UCI datasets and high dimensional datasets. In addition, we compare it with other well-known FS methods.

The structure of the rest of the sections of this study is given as: Section 2 introduces the background of the Dwarf Mongoose Optimization Algorithm and Quantum-based optimization. Section 3 shows the steps of the developed method. Section 4 presents the experimental results and their discussion. The conclusion and future are introduced in Section 5.

2. Background

2.1. Dwarf Mongoose Optimization Algorithm

The Dwarf Mongoose Optimization (DMO) Algorithm is introduced in [22], as an MH technique that simulates the behavior of the Dwarf Mongoose in nature during searching for food. In general, DMO contains five phases, beginning with the initiation phase, followed by the alpha group phase, when the female alpha directs the exploration of new locations. The phase of the scout group is when new sleeping mounds (i.e., food source) are explored based on old sleeping mounds. Additionally, a greater variety of terrain is explored as foraging intensity increases. The babysitting group phase, which is only carried out when the timer value exceeds the value of the babysitting exchange parameter, is marked in green. The termination phase, which is indicated in neon green, is where the algorithm finally comes to a halt. A thorough explanation is shown in the following phases.

2.1.1. Phase 1: Initialization

The Dwarf Mongoose Optimization Algorithm begins by creating a matrix of size (

n \times d

) and initializing the population of mongooses (X) as Equation (1), where the problem dimensions (d) are represented in columns and the population size (n) is displayed in rows.

X = [\begin{matrix} x_{1, 1} & x_{1, 2} & \dots & x_{1, d} \\ x_{2, 1} & x_{2, 2} & \dots & x_{2, d} \\ ⋮ & ⋮ & x_{i, j} & ⋮ \\ x_{n, 1} & x_{n, 2} & \dots & x_{n, d} \end{matrix}]

(1)

An element (

x_{i, j}

) is the component of the problem’s jth dimension found in the population’s ith solution. Typically, its value is determined by using Equation (2) as a uniformly distributed random value that is constrained by the problem’s upper (

U B

) and lower (

L B

) limits.

x_{i} = u n i f r n d (L B, U B, D)

(2)

2.1.2. Phase 2: Alpha Group

The next step in DMO is to compute the fitness value (

f i t

) of each solution then compute its probability, and this process is formulated as:

α = \frac{{f i t}_{i}}{\sum_{i = 1}^{n} {f i t}_{i}}

(3)

where the number of mongooses n is updated using the following formula.

n = n - b s

(4)

In Equation (4),

b s

refers to number of babysitters.

The female alpha of the group uses a distinctive vocalisation (

p e e p

) to communicate with the others. This is used to coordinate the movement of the group in the large foraging area. So, the DMO uses the following equation to update the value of solution

X_{i}

.

X_{i + 1} = X_{i} + p h i \times p e e p

(5)

where

P h i

is an integer that is produced randomly from the range [–1,1] for each iteration. In addition, the sleeping mound is updated using Equation (6).

{s m}_{i} = \frac{{f i t}_{i + 1} - {f i t}_{i}}{m a x {| {f i t}_{i + 1}, {f i t}_{i} |}}

(6)

Followed by calculating the average sleeping mound (

φ

), as in Equation (7):

φ = \frac{\sum_{i = 1}^{n} {s m}_{i}}{n}

(7)

2.1.3. Step 3: Scouting Group

The fresh candidate places for food or a sleeping mound are scouted during the algorithm’s scouting phase while existing sleeping mounds are ignored in accordance with nomadic customs. Scouting and foraging take place simultaneously, with the scouted areas only being visited after the babysitters’ exchange criterion is met. The current location of the sleeping mound, a movement vector (M), a stochastic multiplier (

r a n d

), and a movement regulating parameter (

C F

) are used to steer the scouted places for the food and sleeping mounds, as shown in Equation (8).

X_{i + 1} = f (x) = \{\begin{matrix} X_{i} - C F \times p h i \times r a n d \times [X_{i} - M], φ_{i + 1} > φ_{i} \\ X_{i} + C F \times p h i \times r a n d \times [X_{i} - M], o t h e r w i s e \end{matrix}

(8)

The future movement depends on the performance of the mongoose group, and the new scouted position (

X_{i + 1}

) is simulated with both success and failure of improvement of the overall performance in mind.

The movement of mongoose (M) described in Equation (9) is regulated via the collective-volitive parameter (

C F

), which is computed in Equation (10).

At the start of the search phase, the collective-volitive parameter allows for rapid exploration, but, with each iteration, the focus gradually shifts from discovering new regions to exploiting productive ones.

M = \sum_{i = 1}^{n} \frac{X_{i} \times {s m}_{i}}{X_{i}}

(9)

C F = {(1 - \frac{i t e r}{{M a x}_{i t e r}})}^{(\frac{2 \times i t e r}{{M a x}_{i t e r}})}

(10)

2.1.4. Phase 4: Babysitting Group

The alpha group switches positions with babysitters in the late afternoon and early evening, providing the alpha female an opportunity to look after the colony’s young. The population size affects the ratio of caregivers to foraging mongooses. The following Equation (11) is modeled to imitate the exchange process after midday or in the evening.

p h a s e = \{\begin{matrix} S c o u t, C < L \\ B a b y s i t t i n g, C \geq L \end{matrix}

(11)

Until the counter (C) surpasses the exchange criteria parameter, at which point the data collected by the preceding foraging group are re-initialized, and the counter is reset to zero, the alpha continues in the scout phase. The babysitter’s initial weight is set to zero in order to ensure that the average weight of the alpha group is decreased in the following iteration, which promotes exploitation.

2.1.5. Phase 5: Termination

The dwarf mongoose algorithm stops once the maximum number of defined iterations has been reached and returns the best result obtained during execution.

2.2. Quantum-Based Optimization

Within this section, the basic information for the quantum-based optimization (QBO) is introduced. In QBO, a binary number is used to represent the features that will be either selected (1) or eliminated (0). Each feature in QBO is represented by a quantum bit (Q-bit (q)), where q denotes the superposition of binary values (i.e., ‘1’ and ‘0’). The following equation [23] can be used to establish the mathematical formulation of Q-bit(q).

q = α + i β = e^{i θ} {, | α |}^{2} + {| β |}^{2}

(12)

where the possibility of the value of the Q-bit being ‘0’ and ‘1’ is given by

α

and

β

, respectively. The parameter

θ

denotes the angle of q, and it is updated using

t a n^{- 1}

(

α

/

β

).

The process of finding the changing in the value of q is the main objective of QBO, and it is determined by calculating

Δ θ

as:

q (t + 1) = q (t) \times R (Δ θ) = [α (t) β (t)] \times R (Δ θ)

(13)

R (Δ θ) = [\begin{matrix} c o s (Δ θ) & - s i n (Δ θ) \\ s i n (Δ θ) & c o s (Δ θ) \end{matrix}]

(14)

In Equation (14),

Δ θ

is the rotation angle of ith Q-bit of jth Q-solution. The value of

Δ θ

is predefined based on

X_{b}

as in Table 1, and followed the experimental tests conducted on the knapsack problems [28].

3. Proposed Method

In order to improve the ability to strike a better balance between the exploration and exploitation of DMA while looking for a workable solution, QBO is used. The training and testing sets of the newly created FS method, DMOAQ, are composed of 70% and 30% of the total data, respectively. Then, using the training samples, the fitness values for each population are calculated. After that, the best agent is allocated, which has the lowest fitness value. The solutions are modified using the operators of DMA during the exploitation phase. Updating of each individual continues until the stop criteria are met.

After that, the testing set’s dimension is reduced depending on the best solution, and the implemented DMOAQ as FS is evaluated using a variety of metrics. The DMOAQ (Figure 1) is thoroughly covered in the following sections.

3.1. First Stage

At this point, the N agents representing the population are made. In this study, each solution contains D features and Q-bits. As a result, the formulation of the solution is

X_{i}

in Equation (15).

X_{i} = [q_{i 1} |q_{i 2}| \dots | q_{i D}] = [θ_{i 1} |θ_{i 2}| \dots | θ_{i D}], i = 1, 2, \dots, N

(15)

A collection of superpositions of probabilities for those features that are either selected or not are referred to as

X_{i}

in this equation.

3.2. Second Stage

Updating the agents until they meet the stop criteria is the main goal of this stage of the DMOAQ. This is conduucted by a number of stages, the first of which is to use Equation (16) to obtain the binary of each unique

X_{i}

:

B X_{i, j} = \{\begin{cases} 1 i f rand < {|β|}^{2} \\ 0 o t h e r w i s e \end{cases}

(16)

where

β

is defined in Equation (12).

r a n d \in [0, 1]

is the random value. The next step is to learn the classifier using the training features that correspond to the ones in

{B X}_{i j}

and compute the fitness value, which is defined as:

F i t_{i} = ρ \times γ + (1 - ρ) \times (\frac{| B X_{i j} |}{D})

(17)

The total number of features that were chosen is shown in Equation (17) by the variables

| B X_{i, j} |

, and

γ

is the error classification using the classifier (i.e., relevant features). The factor that equalizes the fitness value of two parts is

ρ \in [0, 1]

.

Finding the best agent

X_{b}

with the smallest

F i t_{b}

is the next process. After that, we apply the DMA operators as described in Equations (4)–(11).

3.3. Third Stage

At this point, the testing set is reduced by only choosing features that match those in the binary version of

X_{b}

. The output of the testing set is then predicted using the trained classifier on the testing set’s reduced dimension. The output’s quality is then evaluated using a variety of metrics. Algorithm 1 details the DMOAQ algorithm’s steps.

Algorithm 1 The DMOAQ method.

1:: Input: a dataset with D features, as well as the number of solutions (N), iterations ( $t m a x$ ), and DMOAQ parameters.
2:: First Stage
3:: Divide the data into two sets (i.e., testing and training)
4:: Using Equation (15), create the population X.
5:: Second Stage
6:: Set $t = 1$
7:: while ( $t < t m a x$ ) do
8:: Using Equation (16) to obtain the Boolean form of $X_{i}$ .
9:: Compute fitness value of $X_{i}$ using training set as in Equation (17).
10:: Allocate the best agent $X_{b}$ .
11:: Enhance X using Equations (4)–(11).
12:: $t = t + 1$ .
13:: Third Stage
14:: Reduce the testing set according to selected features using $X_{b}$ .
15:: Evaluate the quality using different metrics.

4. Experimental Setup and Dataset

In this section, the quality of the proposed DMOAQ is computed through a set of two experimental series. The first experiment used a set of eighteen UCI datasets. Whereas, the second experiment aims to assess the performance of DMOAQ using eight high-dimensional datasets collected from different domains.

4.1. Performance Measures

This study uses six performance metrics to evaluate the effectiveness of the developed DMOAQ. These metrics represent the averages of the accuracy, standard deviation, the selected attributes, and fitness value, as well as the minimum fitness value and maximum fitness value. The definitions of each of them are given as the following.

a c c u r a c y = \frac{T P + T N}{T P + F N + F P + T N}

(18)

where FN, TN, FP and TP, denotes false negative, true negative, false positive, and true positive, respectively.

Maximum (

M a x

) of the fitness value (

F i t

) is defined in Equation (17).

M a x = max_{1 \leq i \leq N r} F i t_{b}^{i}

(19)

Minimum (

M i n

) of the fitness value (

F i t

) is defined in Equation (17).

M i n = min_{1 \leq i \leq N r} F i t_{b}^{i}

(20)

Standard deviation (

S t d

) of the fitness value (

F i t

) is defined in Equation (17).

S t d = \sqrt{\frac{1}{N r} \sum_{i = 1}^{N r} {(F i t_{i} - F i t_{a})}^{2}}

(21)

where

N r

is the number of runs and the average of

F i t

is given by

F i t_{a}

.

To validate the performance of developed DMOAQ, it is compared to other methods, including traditional DMOA, grey-wolf optimization algorithm (GWO) [29], Chameleon Swarm Algorithm [30], Electric fish optimization (EFO) [31], Atomic orbital search (AOS) [32], Arithmetic Optimization (AO) [33], Reptile Search Algorithm (RSA) [34], LSHDE [35], sinusoidal parameter adaptation incorporated with L-SHADE (LSEpSin) [36], L-SHADE with Semi Parameter Adaptation (LSPA) [37] and Chaotic heterogeneous comprehensive learning particle swarm optimizer (CHCLPSO) [38]. The parameters of each technique is put according to the original implementation of each of them. Whereas, the common parameters such as

t_{m a x}

and N are sets of 50 and 20, respectively. To obtain the average of the performance measures, each algorithm is conducted 25 times.

4.2. Experimental Series 1: FS Using UCI Datasets

In this section, the proposed DMOAQ is evaluated in selecting the relevant features using eighteen datasets [39]. These datasets were collected from different fields, and they consist of a varying number of instances, features, and classes. The descriptions of these datasets are given in Table 2.

The results of the DMOAQ are compared to eleven algorithms; these algorithms are: DMOA, bGWO, Chameleon, EFO, AOS, AO, RSA, LSHADE, LSHADE-cnEpSin (LcnE), LSHADE-SPACMA (LSPA) and CHCLPSO. In this regard, six performance measures are used, namely the average (Avg), maximum (MAX), minimum (MIN), and standard deviation (St) of the fitness function values, as well as the accuracy (Acc) and the number of the selected features. All results are presented in Table 3, Table 4, Table 5, Table 6 and Table 7.

Table 3 shows the results of the average of the fitness function for the DMOAQ and the compared algorithms for all datasets. The DMOAQ achieved the best results in 7 out of 18 datasets (i.e., Exactly, KrvskpEW, M-of-n, Tic-tac-toe3, WaveformEW, WineEW, and Zoo). The AO obtained the second rank by obtaining the best fitness function values in 4 datasets (i.e., Breastcancer4, IonosphereEW, PenglungEW, and SonarEW). The Chameleon and bGWO were ranked third and fourth, respectively. The AOS and RSA showed good results and came in the fifth and sixth ranks. Whereas, the rest of the algorithms were ranked as follows: DMOA, EFO, LSHADE, and LSHADE-SPACMA, respectively. The worst values were obtained by the LSHADE-cnEpSin. Figure 2 illustrates the fitness functions’ average for all datasets for this measure.

Table 4 presents the standard deviation values for the algorithms. The proposed DMOAQ showed acceptable standard deviation values, whereas, the CHCLPSO showed the smallest Std values followed by LSHADE and LSHADE-SPACMA. The worst results were shown by the bGWO, Chameleon, and LSHADE-SPACMA. The rest of the algorithms showed similar results to some extent. Figure 3 illustrates the standard deviation average of the fitness functions for all datasets for this measure.

Moreover, the best results of the fitness function values are recorded in Table 5. This table indicates that the proposed DMOAQ obtained the minimum fitness values in 8 out of 18 datasets. It was ranked first with obtaining the best Min results in Exactly, KrvskpEW, M-of-n, Tic-tac-toe3, Vote, WaveformEW, WineEW, and Zoo. The Chameleon showed the second-best results and obtained the best values in 6 out of 18 datasets. Both bGWO, AOS, and AO were ranked third, fourth, and fifth, respectively. The worst performances were shown by LSHADE-cnEpSin, LSHADE, and LSHADE-SPACMA. Figure 4 illustrates the minimum average of the fitness functions for all datasets for this measure.

Table 6 records the results of the worst results of the fitness function values for the compared methods. Based on these, the proposed DMOAQ showed good Max values compared to the other methods; it achieved the best values in 44% of all datasets, namely: Breastcancer4, Exactly, KrvskpEW, M-of-n, Tic-tac-toe3, WaveformEW, WineEW, and Zoo, whereas, it provided competitive results in the rest datasets. The DMOA obtained the best results in three datasets, namely: IonosphereEW, Lymphography, and WineEWsimilar, and was ranked second, followed by AO, AOS, and Chameleon. Figure 5 illustrates the maximum average of the fitness functions for all datasets for this measure. Although both LSHADE-SPACMA obtained the best results in two datasets, its average value was worse than most algorithms, which means it showed the best values in only Exactly2 and SpectEW datasets. The worst performances were shown by LSHADE, LSHADE-SPACMA, and CHCLPSO algorithms.

In terms of the accuracy measure, Table 7 shows the average of the classification accuracy for all methods. This table indicates the best performance of the DMOAQ in 44% of all datasets namely: namely: Exactly, Exactly2, IonosphereEW, KrvskpEW, M-of-n, WaveformEW, WineEW, and Zoo, which indicates the good ability of the DMOAQ in correctly classifying the datasets. The AOS and AO obtained the second and third ranks, followed by Chameleon and bGWO, respectively. The lowest accuracy results were obtained by the LSHADE and LSHADE-cnEpSin algorithms. Figure 6 illustrates the classification accuracy average for all datasets for this measure.

4.3. Experimental Series 1: FS for High Dimensional Datasets

This section evaluates the proposed DMOAQ a set of high dimensional datasets (https://archive.ics.uci.edu/ml/datasets.php, accessed on 20 October 2022) as described in Table 8. These datasets were gathered from the UCI machine learning repository and additional sources, and they span a variety of uses, such as event detection, sentiment analysis, and sensor-based human activity recognition. Eight datasets in total were acquired, including five from the UCI repository and the GPS trajectories dataset, which records the GPS coordinates, trajectory identifier, and duration of moving cars and buses in a city. GAS sensors dataset, which is a collection of 100 records containing temperature, humidity, and data from 8 MOX gas sensors in a home environment. With the help of a Wireless Sensor Network (WSN), the MovementAAL (Indoor User Mobility Prediction using RSS) dataset predicts an indoor user’s movement using radio signal strength (RSS) from multiple WSN nodes and user movement patterns. In order to determine whether a patient will live or die, the Hepatitis dataset includes 155 records of hepatitis C patients. A total of 30 people performing six various activities while holding the smartphone around their waist, including walking, sitting, standing, and lying, are included in the UCI-HAR dataset for human activity recognition. Both the SemEval2017 Task4 [40] dataset and the STS-Gold [41] dataset are English textual datasets for sentiment analysis that were gathered from Twitter, where each tweet is categorized as positive, negative, or neutral. A crisis event detection dataset in English called the C6 is used to forecast crises, including hurricanes, floods, earthquakes, tornadoes, and wildfires [42].

The experimental results of the DMOAQ method are compared to nine algorithms, namely: DMOA, bGWO, Chameleon, RunEFO, AOS, AO, RSA, LSHADE, and LSHADE-cnEpSin (LenE), LSHADE-SPACMA (LSPA), and CHCLPSO. The performance of the proposed DMOAQ is evaluated using five metrics, namely the average values of the fitness function and the selected feature number, as well as the classification accuracy, sensitivity, and specificity. The results are recorded in Table 9, Table 10, Table 11, Table 12 and Table 13.

The results of the fitness function average for the NLP datasets are recorded in Table 9. From this table, the DMOAQ achieved the best average in 63% of the datasets (i.e., STSGold, sensors, Movm, UCI, and C6); however, it obtained the same results with RSA in the Trajectory dataset. By these results, the DMOAQ method was ranked first. The RSA algorithm was ranked second by obtaining the best average in 2 datasets (i.e., Trajectory and hepatitis), followed by the AO, and it obtained the best results in the Sem dataset. The rest of the methods were ordered as follows: by Chameleon, bGWO, DMOA, and AOS, respectively. The worst result was shown by the EFO. Figure 7 illustrates the fitness functions’ average for all datasets for this measure.

Moreover, the selected feature numbers by all methods are recorded in Table 10. In this measure, the best method is the one that can determine the relevant features with the highest accuracy value. As shown in Table 10, the DMOAQ method selected the smallest features amount and considered the best method; in detail, it obtained the lowest features in 7 out of 8 datasets. The Chameleon was the second-best algorithm, followed by AO, bGWO, RSA, and DMOA. Figure 8 shows the ratio of the selected features for all datasets for this measure.

Furthermore, the results of the classification accuracy results are presented in Table 11. The DMOAQ method, as in this table, obtained the highest accuracy results in 3 out of 8 datasets, namely: sensors, Movm, and UCI datasets, whereas, it showed the same results in two datasets with RSA in Trajectory and EFO in C6. The second-best algorithm was the RSA. The AO came in the third rank, followed by DMOA, LSHADE, and Chameleon, respectively. The worst accuracy results were recorded by the AOS. Figure 9 illustrates the classification accuracy average for all datasets for this measure.

In terms of the sensitivity measure, Table 12 reports the results of this measure for all datasets. From Table 12, the proposed DMOAQ showed good sensitivity results in all datasets and obtained the best sensitivity value in c6 datasets and the same values with AO and RSA in both sensors and hepatitis datasets; however, it was ranked second after the RSA algorithm. The AO came in the third rank, followed by DMOA and LSHADE. The AOS was also ranked last.

Moreover, Table 13 records the values of the specificity measure. From this table, the proposed DMOAQ obtained the best specificity results in 3 out of 8 datasets and showed the same results with AO and RSA in the sensors dataset. In addition, the DMOAQ obtained the best specificity average for all datasets equal to 0.9313, whereas the second-best algorithms (i.e., DMOA) obtained a specificity average equal to 0.9188. The third and fourth algorithms were Chameleon and LSHADE. The rest algorithms showed the same results to some extent.

Although the proposed method showed good results in most cases, it failed to reach the optimal values in some datasets because it is sensitive to the initial population, which needs to be improved in the future studies with different methods such as chaotic maps.

5. Conclusions and Future Work

In this paper, a new feature selection method has been presented. This method was called DMOAQ which depends on improving the performance of dwarf mongoose optimization (DMO) algorithm using quantum-based optimization (QBO). The main idea of the proposed method is to increase the balance between the exploration and exploitation of the traditional DMO during the searching process using the QBO to avoid the search limitations of the DMO. The performance of the DMOAQ has evaluated over 18 benchmark feature selection datasets and eight high dimensional datasets. The results were compared to well-known metaheuristic algorithms. The evaluation outcomes demonstrated that the QBO had significantly enhanced the search capability of the traditional DMO, whereas it obtained the best accuracy in 44% of the benchmark datasets as well as 62% in the high dimensional datasets. In the future, the proposed method will be evaluated in different applications such as image segmentation, parameter estimation, and solving real engineering problems. In addition, it can be applied to solve multi-objective optimization problems.

Author Contributions

M.A.E. and R.A.I., conceptualization, supervision, methodology, formal analysis, resources, data curation, and writing—original draft preparation. S.A., writing—review and editing. M.A.A.A.-q., formal analysis, validation, writing—review and editing. S.A., writing—review, editing, and editing, project administration, and funding acquisition. A.A.E., supervision, resources, formal analysis, methodology and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R197), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

The data available upon request.

Conflicts of Interest

The authors declare that there are no conflict of interest regarding the publication of this paper.

References

Xu, Z.; Heidari, A.A.; Kuang, F.; Khalil, A.; Mafarja, M.; Zhang, S.; Chen, H.; Pan, Z. Enhanced Gaussian Bare-Bones Grasshopper Optimization: Mitigating the Performance Concerns for Feature Selection. Expert Syst. Appl. 2022, 212, 118642. [Google Scholar] [CrossRef]
Varzaneh, Z.A.; Hossein, S.; Mood, S.E.; Javidi, M.M. A new hybrid feature selection based on Improved Equilibrium Optimization. Chemom. Intell. Lab. Syst. 2022, 228, 104618. [Google Scholar] [CrossRef]
Al-qaness, M.A. Device-free human micro-activity recognition method using WiFi signals. Geo-Spat. Inf. Sci. 2019, 22, 128–137. [Google Scholar] [CrossRef]
Dahou, A.; Al-qaness, M.A.; Abd Elaziz, M.; Helmi, A. Human activity recognition in IoHT applications using Arithmetic Optimization Algorithm and deep learning. Measurement 2022, 199, 111445. [Google Scholar] [CrossRef]
Remeseiro, B.; Bolon-Canedo, V. A review of feature selection methods in medical applications. Comput. Biol. Med. 2019, 112, 103375. [Google Scholar] [CrossRef]
Pintas, J.T.; Fernandes, L.A.; Garcia, A.C.B. Feature selection methods for text classification: A systematic literature review. Artif. Intell. Rev. 2021, 54, 6149–6200. [Google Scholar] [CrossRef]
Raj, R.J.S.; Shobana, S.J.; Pustokhina, I.V.; Pustokhin, D.A.; Gupta, D.; Shankar, K. Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 2020, 8, 58006–58017. [Google Scholar] [CrossRef]
AL-Alimi, D.; Al-qaness, M.A.; Cai, Z.; Dahou, A.; Shao, Y.; Issaka, S. Meta-Learner Hybrid Models to Classify Hyperspectral Images. Remote Sens. 2022, 14, 1038. [Google Scholar] [CrossRef]
Onel, M.; Kieslich, C.A.; Guzman, Y.A.; Floudas, C.A.; Pistikopoulos, E.N. Big data approach to batch process monitoring: Simultaneous fault detection and diagnosis using nonlinear support vector machine-based feature selection. Comput. Chem. Eng. 2018, 115, 46–63. [Google Scholar] [CrossRef]
Dahou, A.; Abd Elaziz, M.; Chelloug, S.A.; Awadallah, M.A.; Al-Betar, M.A.; Al-qaness, M.A.; Forestiero, A. Intrusion Detection System for IoT Based on Deep Learning and Modified Reptile Search Algorithm. Comput. Intell. Neurosci. 2022, 2022, 6473507. [Google Scholar] [CrossRef]
Anter, A.M.; Ali, M. Feature selection strategy based on hybrid crow search optimization algorithm integrated with chaos theory and fuzzy c-means algorithm for medical diagnosis problems. Soft Comput. 2020, 24, 1565–1584. [Google Scholar] [CrossRef]
Al-qaness, M.A.; Ewees, A.A.; Fan, H.; AlRassas, A.M.; Abd Elaziz, M. Modified aquila optimizer for forecasting oil production. Geo-Spat. Inf. Sci. 2022, 1–17. [Google Scholar] [CrossRef]
Bashir, S.; Khan, Z.S.; Khan, F.H.; Anjum, A.; Bashir, K. Improving heart disease prediction using feature selection approaches. In Proceedings of the 16th IEEE International Bhurban Conference on Applied Sciences and Technology (IBCAST), Islamabad, Pakistan, 8–12 January 2019; pp. 619–623. [Google Scholar]
Yedukondalu, J.; Sharma, L.D. Cognitive load detection using circulant singular spectrum analysis and Binary Harris Hawks Optimization based feature selection. Biomed. Signal Process. Control. 2022, 79, 104006. [Google Scholar] [CrossRef]
Başaran, E. A new brain tumor diagnostic model: Selection of textural feature extraction algorithms and convolution neural network features with optimization algorithms. Comput. Biol. Med. 2022, 148, 105857. [Google Scholar] [CrossRef]
Rashno, A.; Shafipour, M.; Fadaei, S. Particle ranking: An Efficient Method for Multi-Objective Particle Swarm Optimization Feature Selection. Knowl.-Based Syst. 2022, 245, 108640. [Google Scholar] [CrossRef]
Nadimi-Shahraki, M.H.; Zamani, H.; Mirjalili, S. Enhanced whale optimization algorithm for medical feature selection: A COVID-19 case study. Comput. Biol. Med. 2022, 148, 105858. [Google Scholar] [CrossRef]
Hassan, I.H.; Mohammed, A.; Masama, M.A.; Ali, Y.S.; Abdulrahim, A. An Improved Binary Manta Ray Foraging Optimization Algorithm based feature selection and Random Forest Classifier for Network Intrusion Detection. Intell. Syst. Appl. 2022, 16, 200114. [Google Scholar] [CrossRef]
Eluri, R.K.; Devarakonda, N. Binary Golden Eagle Optimizer with Time-Varying Flight Length for feature selection. Knowl.-Based Syst. 2022, 247, 108771. [Google Scholar] [CrossRef]
Balasubramanian, K.; Ananthamoorthy, N. Correlation-based feature selection using bio-inspired algorithms and optimized KELM classifier for glaucoma diagnosis. Appl. Soft Comput. 2022, 128, 109432. [Google Scholar] [CrossRef]
Long, W.; Xu, M.; Jiao, J.; Wu, T.; Tang, M.; Cai, S. A velocity-based butterfly optimization algorithm for high-dimensional optimization and feature selection. Expert Syst. Appl. 2022, 201, 117217. [Google Scholar] [CrossRef]
Agushaka, J.O.; Ezugwu, A.E.; Abualigah, L. Dwarf mongoose optimization algorithm. Comput. Methods Appl. Mech. Eng. 2022, 391, 114570. [Google Scholar] [CrossRef]
Xing, H.; Ji, Y.; Bai, L.; Sun, Y. An improved quantum-inspired evolutionary algorithm for coding resource optimization based network coding multicast scheme. AEU Int. J. Electron. Commun. 2010, 64, 1105–1113. [Google Scholar] [CrossRef]
Mohammadi, D.; Abd Elaziz, M.; Moghdani, R.; Demir, E.; Mirjalili, S. Quantum Henry gas solubility optimization algorithm for global optimization. Eng. Comput. 2021, 38, 2329–2348. [Google Scholar] [CrossRef]
Chen, R.; Dong, C.; Ye, Y.; Chen, Z.; Liu, Y. QSSA: Quantum evolutionary salp swarm algorithm for mechanical design. IEEE Access 2019, 7, 145582–145595. [Google Scholar] [CrossRef]
SaiToh, A.; Rahimi, R.; Nakahara, M. A quantum genetic algorithm with quantum crossover and mutation operations. Quantum Inf. Process. 2014, 13, 737–755. [Google Scholar] [CrossRef] [Green Version]
Abd Elaziz, M.; Mohammadi, D.; Oliva, D.; Salimifard, K. Quantum marine predators algorithm for addressing multilevel image segmentation. Appl. Soft Comput. 2021, 110, 107598. [Google Scholar] [CrossRef]
Srikanth, K.; Panwar, L.K.; Panigrahi, B.K.; Herrera-Viedma, E.; Sangaiah, A.K.; Wang, G.G. Meta-heuristic framework: Quantum inspired binary grey wolf optimizer for unit commitment problem. Comput. Electr. Eng. 2018, 70, 243–260. [Google Scholar] [CrossRef]
Ibrahim, R.A.; Elaziz, M.A.; Lu, S. Chaotic opposition-based grey-wolf optimization algorithm based on differential evolution and disruption operator for global optimization. Expert Syst. Appl. 2018, 108, 1–27. [Google Scholar] [CrossRef]
Braik, M.S. Chameleon Swarm Algorithm: A bio-inspired optimizer for solving engineering design problems. Expert Syst. Appl. 2021, 174, 114685. [Google Scholar] [CrossRef]
Yilmaz, S.; Sen, S. Electric fish optimization: A new heuristic algorithm inspired by electrolocation. Neural Comput. Appl. 2020, 32, 11543–11578. [Google Scholar] [CrossRef]
Azizi, M. Atomic orbital search: A novel metaheuristic algorithm. Appl. Math. Model. 2021, 93, 657–683. [Google Scholar] [CrossRef]
Abualigah, L.; Diabat, A.; Mirjalili, S.; Abd Elaziz, M.; Gandomi, A.H. The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 2021, 376, 113609. [Google Scholar] [CrossRef]
Abualigah, L.; Abd Elaziz, M.; Sumari, P.; Geem, Z.W.; Gandomi, A.H. Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Syst. Appl. 2022, 191, 116158. [Google Scholar] [CrossRef]
Tanabe, R.; Fukunaga, A.S. Improving the search performance of SHADE using linear population size reduction. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China, 6–11 July 2014; pp. 1658–1665. [Google Scholar]
Awad, N.H.; Ali, M.Z.; Suganthan, P.N.; Reynolds, R.G. An ensemble sinusoidal parameter adaptation incorporated with L-SHADE for solving CEC2014 benchmark problems. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016; pp. 2958–2965. [Google Scholar]
Mohamed, A.W.; Hadi, A.A.; Fattouh, A.M.; Jambi, K.M. LSHADE with semi-parameter adaptation hybrid with CMA-ES for solving CEC 2017 benchmark problems. In Proceedings of the 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia, Spain, 5–8 June 2017; pp. 145–152. [Google Scholar]
Yousri, D.; Allam, D.; Eteiba, M.; Suganthan, P.N. Chaotic heterogeneous comprehensive learning particle swarm optimizer variants for permanent magnet synchronous motor models parameters estimation. Iran. J. Sci. Technol. Trans. Electr. Eng. 2020, 44, 1299–1318. [Google Scholar] [CrossRef]
Dua, D.; Graff, C. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2017. [Google Scholar]
Rosenthal, S.; Farra, N.; Nakov, P. SemEval-2017 task 4: Sentiment analysis in Twitter. arXiv 2019, arXiv:1912.00741. [Google Scholar]
Ahuja, R.; Sharma, S. Sentiment Analysis on Different Domains Using Machine Learning Algorithms. In Advances in Data and Information Sciences; Springer: Berlin/Heidelberg, Germany, 2022; pp. 143–153. [Google Scholar]
Liu, J.; Singhal, T.; Blessing, L.T.; Wood, K.L.; Lim, K.H. Crisisbert: A robust transformer for crisis classification and contextual crisis embedding. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media, Virtual, 30 August–2 September 2021; pp. 133–141. [Google Scholar]

Figure 1. The main work flow of the proposed DMOAQ feature selection method.

Figure 2. Average of values of the fitness function.

Figure 3. Average of the standard deviation of the fitness function.

Figure 4. Average of minimum values of the fitness function.

Figure 5. Average of maximum values of the fitness function.

Figure 6. Average of the classification accuracy for UCI datasets.

Figure 7. Average of the fitness function values.

Figure 8. Ratio of the selected features for all datasets.

Figure 9. Average of the classification accuracy for all datasets.

Table 1. Predefined value of

Δ θ

.

Table 1. Predefined value of

Δ θ

.

$X_{ij}$	$X_{b}$	$f (X_{i}) \geq f (X_{b})$	$Δ θ$
0	0	F	0
0	1	F	0.01
1	0	F	–0.01
1	1	F	0
0	0	T	0
0	1	T	0
1	0	T	0
1	1	T	0

Table 2. Description of UCI datasets.

DS	Number of Features	Number of Instances	Number of Classes	Data Category
Breastcancer (S1)	9	699	2	Biology
BreastEW (S2)	30	569	2	Biology
CongressEW (S3)	16	435	2	Politics
Exactly (S4)	13	1000	2	Biology
Exactly2 (S5)	13	1000	2	Biology
HeartEW (S6)	13	270	2	Biology
IonosphereEW (S7)	34	351	2	Electromagnetic
KrvskpEW (S8)	36	3196	2	Game
Lymphography (S9)	18	148	2	Biology
M-of-n (S10)	13	1000	2	Biology
PenglungEW (S11)	325	73	2	Biology
SonarEW (S12)	60	208	2	Biology
SpectEW (S13)	22	267	2	Biology
Tic-tac-toc (S14)	9	958	2	Game
Vote (S15)	16	300	2	Politics
WaveformEW (S16)	40	5000	3	Physics
WINEEW (S17)	13	178	3	Chemistry
Zoo (S18)	16	101	6	Artificial

Table 3. Average of the fitness function values.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE	LSPA	CHCLPSO
Breastcancer4	0.0655	0.0665	0.0627	0.0669	0.0823	0.0710	0.0620	0.0824	0.0833	0.1325	0.1067	0.1325
BreastEW	0.0619	0.0912	0.0796	0.0680	0.0884	0.0486	0.0569	0.0752	0.1254	0.1996	0.1342	0.1921
CongressEW	0.0445	0.0635	0.0473	0.0613	0.0989	0.0340	0.0619	0.0191	0.0655	0.1609	0.0575	0.1790
Exactly	0.0462	0.0593	0.0989	0.0842	0.0793	0.0725	0.0650	0.1103	0.2504	0.2942	0.2977	0.2754
Exactly2	0.2543	0.2877	0.2152	0.2809	0.3134	0.2934	0.2467	0.2868	0.2258	0.3748	0.2223	0.2454
HeartEW	0.1823	0.1992	0.1920	0.1617	0.1662	0.1923	0.1767	0.1733	0.2019	0.3583	0.2519	0.2080
IonosphereEW	0.0663	0.0686	0.0973	0.0594	0.1457	0.0641	0.0591	0.0887	0.1160	0.1660	0.1561	0.1740
KrvskpEW	0.0688	0.0920	0.1024	0.0908	0.0992	0.0793	0.0815	0.0953	0.3904	0.4010	0.3584	0.3168
Lymphography	0.1514	0.1023	0.1050	0.0929	0.2242	0.1007	0.2072	0.1424	0.2567	0.2667	0.2167	0.3167
M-of-n	0.0462	0.0535	0.0766	0.0666	0.0808	0.0684	0.0523	0.1181	0.2118	0.3503	0.3197	0.2504
PenglungEW	0.0998	0.1021	0.0810	0.1471	0.2006	0.0547	0.0187	0.1308	0.3200	0.3330	0.2474	0.2837
SonarEW	0.0841	0.1276	0.0840	0.0886	0.1374	0.0823	0.0741	0.1213	0.2833	0.4167	0.3917	0.2381
SpectEW	0.1611	0.1845	0.1235	0.1661	0.2345	0.1248	0.1342	0.2058	0.1630	0.2731	0.1370	0.2247
Tic-tac-toe3	0.1821	0.2307	0.2678	0.2462	0.2420	0.2290	0.2292	0.2276	0.2635	0.3234	0.3208	0.2934
Vote	0.0358	0.0752	0.1114	0.0909	0.1043	0.0495	0.0893	0.0353	0.0567	0.1450	0.1142	0.1563
WaveformEW	0.2616	0.2788	0.2914	0.2843	0.3115	0.2878	0.2922	0.2969	0.3574	0.4506	0.4381	0.3075
WineEW	0.0385	0.0415	0.0634	0.0477	0.0796	0.0446	0.0577	0.0692	0.1833	0.1819	0.1597	0.1709
Zoo	0.0069	0.0388	0.0481	0.0234	0.0425	0.0413	0.0113	0.0338	0.3333	0.2133	0.2333	0.2063

Table 4. Standard deviation of the fitness function values.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE	LSPA	CHCLPSO
Breastcancer4	0.0000	0.0062	0.0088	0.0045	0.0037	0.0075	0.0063	0.0073	0.0000	0.0106	0.0141	0.0000
BreastEW	0.0055	0.0046	0.0112	0.0117	0.0105	0.0077	0.0086	0.0122	0.0000	0.0192	0.0000	0.0000
CongressEW	0.0021	0.0044	0.0102	0.0167	0.0138	0.0018	0.0223	0.0056	0.0000	0.0293	0.0000	0.0000
Exactly	0.0000	0.0174	0.0476	0.0564	0.0282	0.0249	0.0172	0.0521	0.0645	0.0000	0.0000	0.0000
Exactly2	0.0021	0.0049	0.0242	0.0128	0.0095	0.0159	0.0212	0.0073	0.0000	0.0148	0.0000	0.0000
HeartEW	0.0099	0.0122	0.0315	0.0114	0.0340	0.0250	0.0272	0.0143	0.0000	0.0223	0.0340	0.0000
IonosphereEW	0.0108	0.0066	0.0107	0.0137	0.0033	0.0100	0.0150	0.0140	0.0000	0.0236	0.0096	0.0000
KrvskpEW	0.0061	0.0060	0.0107	0.0098	0.0109	0.0107	0.0131	0.0119	0.0054	0.0089	0.0468	0.0000
Lymphography	0.0092	0.0166	0.0224	0.0226	0.0094	0.0366	0.0603	0.0070	0.0094	0.0236	0.0000	0.0000
M-of-n	0.0000	0.0039	0.0308	0.0160	0.0208	0.0289	0.0064	0.0565	0.0000	0.0422	0.0087	0.0000
PenglungEW	0.0013	0.0023	0.0346	0.0454	0.0007	0.0291	0.0087	0.0399	0.0000	0.0123	0.0852	0.0000
SonarEW	0.0141	0.0164	0.0190	0.0220	0.0113	0.0196	0.0095	0.0097	0.0000	0.0000	0.0825	0.0000
SpectEW	0.0124	0.0115	0.0215	0.0290	0.0138	0.0243	0.0342	0.0155	0.0000	0.0144	0.0000	0.0000
Tic-tac-toe3	0.0000	0.0054	0.0121	0.0164	0.0119	0.0078	0.0071	0.0084	0.0000	0.0206	0.0000	0.0000
Vote	0.0061	0.0052	0.0133	0.0115	0.0106	0.0109	0.0065	0.0014	0.0000	0.0471	0.0318	0.0000
WaveformEW	0.0078	0.0075	0.0094	0.0166	0.0077	0.0206	0.0085	0.0080	0.0000	0.0032	0.0129	0.0100
WineEW	0.0050	0.0046	0.0112	0.0116	0.0172	0.0064	0.0176	0.0061	0.0000	0.0137	0.0570	0.0000
Zoo	0.0019	0.0043	0.0081	0.0060	0.0112	0.0095	0.0081	0.0163	0.0000	0.0000	0.0000	0.0000

Table 5. Minimum of the fitness function values.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE	LSPA	CHCLPSO
Breastcancer4	0.0655	0.0608	0.0462	0.0590	0.0766	0.0655	0.0526	0.0766	0.0833	0.1250	0.0967	0.1325
BreastEW	0.0537	0.0825	0.0616	0.0470	0.0782	0.0391	0.0470	0.0595	0.1254	0.1860	0.1342	0.1921
CongressEW	0.0435	0.0560	0.0291	0.0476	0.0791	0.0332	0.0394	0.0166	0.0655	0.1402	0.0575	0.1790
Exactly	0.0462	0.0462	0.0538	0.0462	0.0538	0.0462	0.0538	0.0538	0.2048	0.2942	0.2977	0.2754
Exactly2	0.2537	0.2852	0.2057	0.2558	0.2968	0.2693	0.2372	0.2794	0.2258	0.3643	0.2223	0.2454
HeartEW	0.1782	0.1641	0.1321	0.1462	0.1282	0.1551	0.1385	0.1551	0.2019	0.3426	0.2278	0.2080
IonosphereEW	0.0480	0.0509	0.0781	0.0362	0.1408	0.0527	0.0401	0.0751	0.1160	0.1493	0.1493	0.1740
KrvskpEW	0.0558	0.0795	0.0853	0.0781	0.0864	0.0685	0.0683	0.0758	0.3866	0.3947	0.3254	0.3168
Lymphography	0.1400	0.0689	0.0800	0.0556	0.2163	0.0744	0.1233	0.1322	0.2500	0.2500	0.2167	0.3167
M-of-n	0.0462	0.0462	0.0462	0.0462	0.0660	0.0462	0.0462	0.0538	0.2118	0.3205	0.3135	0.2504
PenglungEW	0.0975	0.0997	0.0246	0.0714	0.1997	0.0028	0.0080	0.0760	0.3200	0.3242	0.1872	0.2837
SonarEW	0.0533	0.1062	0.0548	0.0233	0.1262	0.0664	0.0631	0.1057	0.2833	0.4167	0.3333	0.2381
SpectEW	0.1364	0.1712	0.0848	0.1121	0.2212	0.1045	0.1030	0.1818	0.1630	0.2630	0.1370	0.2247
Tic-tac-toe3	0.1821	0.2290	0.2524	0.2243	0.2243	0.2149	0.2260	0.2196	0.2635	0.3089	0.3208	0.2934
Vote	0.0275	0.0675	0.0888	0.0700	0.0950	0.0400	0.0825	0.0338	0.0567	0.1117	0.0917	0.1563
WaveformEW	0.2484	0.2616	0.2711	0.2562	0.3031	0.2569	0.2835	0.2877	0.3574	0.4483	0.4290	0.3075
WineEW	0.0308	0.0308	0.0462	0.0308	0.0635	0.0385	0.0308	0.0615	0.1833	0.1722	0.1194	0.1709
Zoo	0.0063	0.0313	0.0250	0.0125	0.0250	0.0313	0.0063	0.0188	0.3333	0.2133	0.2333	0.2063

Table 6. Maximum of the fitness function values.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE	LSPA	CHCLPSO
Breastcancer4	0.0655	0.0830	0.0766	0.0719	0.0860	0.0801	0.0684	0.0941	0.0833	0.1400	0.1167	0.1325
BreastEW	0.0704	0.0982	0.1032	0.0895	0.1049	0.0570	0.0695	0.0907	0.1254	0.2132	0.1342	0.1921
CongressEW	0.0519	0.0707	0.0685	0.1080	0.1164	0.0373	0.0851	0.0291	0.0655	0.1816	0.0575	0.1790
Exactly	0.0462	0.1232	0.1939	0.2524	0.1142	0.0975	0.0930	0.1894	0.2960	0.2942	0.2977	0.2754
Exactly2	0.2629	0.2974	0.3058	0.3000	0.3199	0.3109	0.2847	0.2961	0.2258	0.3853	0.2223	0.2454
HeartEW	0.2128	0.2103	0.2385	0.1859	0.2103	0.2128	0.2103	0.1910	0.2019	0.3741	0.2759	0.2080
IonosphereEW	0.0880	0.0774	0.1172	0.0860	0.1496	0.0801	0.0783	0.1093	0.1160	0.1826	0.1629	0.1740
KrvskpEW	0.0781	0.1019	0.1214	0.1130	0.1129	0.0920	0.0976	0.1049	0.3942	0.4073	0.3915	0.3168
Lymphography	0.1686	0.1267	0.1630	0.1322	0.2389	0.1622	0.2689	0.1489	0.2633	0.2833	0.2167	0.3167
M-of-n	0.0462	0.0615	0.1804	0.1007	0.1097	0.1187	0.0615	0.1804	0.2118	0.3802	0.3258	0.2504
PenglungEW	0.1018	0.1077	0.1492	0.2252	0.2015	0.0695	0.0268	0.1843	0.3200	0.3417	0.3077	0.2837
SonarEW	0.1112	0.1574	0.1195	0.1157	0.1526	0.1062	0.0881	0.1295	0.2833	0.4167	0.4500	0.2381
SpectEW	0.1879	0.2136	0.1682	0.2167	0.2561	0.1667	0.1712	0.2212	0.1630	0.2833	0.1370	0.2247
Tic-tac-toe3	0.1821	0.2465	0.3057	0.2847	0.2559	0.2325	0.2418	0.2418	0.2635	0.3380	0.3208	0.2934
Vote	0.0525	0.0838	0.1363	0.1138	0.1225	0.0675	0.1000	0.0363	0.0567	0.1783	0.1367	0.1563
WaveformEW	0.2792	0.2917	0.3093	0.3094	0.3210	0.3128	0.3021	0.3062	0.3574	0.4528	0.4472	0.3075
WineEW	0.0462	0.0462	0.0885	0.0692	0.1019	0.0538	0.0788	0.0769	0.1833	0.1917	0.2000	0.1709
Zoo	0.0125	0.0438	0.0625	0.0375	0.0500	0.0563	0.0250	0.0563	0.3333	0.2133	0.2333	0.2063

Table 7. Accuracy measure for all datasets.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE	LSPA	CHCLPSO
base_Breastcancer4	0.9643	0.9332	0.9643	0.9689	0.9629	0.9557	0.9657	0.9529	0.9286	0.9536	0.9429	0.9571
BreastEW	0.9772	0.9105	0.9491	0.9592	0.9684	0.9719	0.9649	0.9579	0.8684	0.8816	0.9035	0.9825
CongressEW	0.9672	0.9149	0.9759	0.9552	0.9609	0.9747	0.9632	0.9885	0.9540	0.9368	0.9655	0.9545
Exactly	1.0000	0.6565	0.9615	0.9723	0.9820	0.9810	0.9910	0.9510	0.7375	0.6750	0.6700	0.6800
Exactly2	0.7880	0.6288	0.7750	0.7358	0.7270	0.7390	0.7430	0.7600	0.7250	0.6550	0.7300	0.7400
HeartEW	0.8611	0.6750	0.8269	0.8694	0.8889	0.8444	0.8704	0.8519	0.7593	0.7500	0.7593	0.8148
IonosphereEW	0.9761	0.8930	0.9211	0.9634	0.9127	0.9549	0.9690	0.9296	0.9296	0.9296	0.9437	0.9167
KrvskpEW	0.9739	0.7465	0.9513	0.9627	0.9713	0.9675	0.9663	0.9275	0.5852	0.6414	0.5594	0.6719
Lymphography	0.8981	0.6967	0.9340	0.9666	0.8299	0.9400	0.8253	0.9133	0.8000	0.8333	0.8333	0.8667
M-of-n	1.0000	0.7685	0.9845	0.9935	0.9820	0.9890	1.0000	0.9440	0.7450	0.6900	0.7100	0.7300
PenglungEW	0.9333	0.9300	0.9431	0.8567	0.8667	0.9457	1.0000	0.8614	0.7333	0.7386	0.8846	0.8571
SonarEW	0.9690	0.7810	0.9560	0.9571	0.9381	0.9667	0.9810	0.9000	0.6905	0.5476	0.5357	0.8571
SpectEW	0.8685	0.7093	0.8954	0.8583	0.8111	0.9148	0.8741	0.8037	0.8148	0.8241	0.8519	0.7778
base_Tic-tac-toe3	0.8594	0.6859	0.7766	0.7938	0.8052	0.8271	0.8219	0.8188	0.7188	0.7760	0.8750	0.6354
Vote	0.9883	0.8567	0.9217	0.9525	0.9467	0.9700	0.9300	0.9733	0.9667	0.9833	0.9083	1.0000
WaveformEW	0.7648	0.6617	0.7368	0.7432	0.7394	0.7408	0.7520	0.7118	0.5370	0.5230	0.5170	0.7100
WineEW	1.0000	0.9264	0.9847	1.0000	0.9833	1.0000	0.9889	0.9778	0.8333	0.9306	0.9861	0.8889
Zoo	1.0000	0.9573	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	0.6667	0.9333	0.8333	0.9000

Table 8. The number of features and samples in each dataset.

Dataset	Features	Instances	Classes
GPS trajectories	6	163	2
UCI-HAR	9	10,299	6
Hepatitis	19	155	2
STS-Gold	192	2034	2
SemEval2017 Task4 (Sem)	192	61,854	3
GAS sensors	11	919,438	3
MovementAAL (Movm)	4	13,197	2
C6	192	32,462	6

Table 9. Average of the fitness function values.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE
Sem	0.37760	0.40823	0.39345	0.37973	0.45201	0.42381	0.37313	0.38422	0.42576	0.43123
STS-Gold	0.07208	0.10554	0.10526	0.08274	0.14304	0.12375	0.09615	0.07818	0.13393	0.13223
GPS trajectories	0.11680	0.22487	0.20273	0.19076	0.26536	0.22461	0.13398	0.11680	0.24349	0.24180
GAS sensors	0.01448	0.04243	0.03411	0.01949	0.08852	0.05805	0.01687	0.01848	0.07289	0.07731
Hepatitis	0.08211	0.13882	0.11423	0.10107	0.17731	0.14367	0.08424	0.07017	0.15407	0.15510
Movm	0.16774	0.23036	0.23166	0.21617	0.28009	0.23999	0.19390	0.17386	0.26519	0.26310
UCI-HAR	0.09026	0.13893	0.11633	0.09479	0.17783	0.13704	0.10488	0.09514	0.15574	0.15822
C6	0.02728	0.06670	0.04808	0.04136	0.10818	0.07553	0.03815	0.02985	0.08730	0.08773

Table 10. Number of selected features obtained from competitive algorithms.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE
Sem	20	131	61	39	232	133	112	27	172	181
STS-Gold	28	88	45	23	160	103	53	110	151	135
GPS trajectories	40	319	149	57	630	451	98	211	462	449
GAS sensors	15	34	22	37	93	74	95	50	73	80
Hepatitis	117	1102	504	184	2038	1212	159	245	1473	1498
Movm	97	249	167	132	415	258	118	247	383	328
UCI-HAR	58	531	253	133	958	678	100	450	693	718
C6	49	222	113	76	473	298	61	118	351	350

Table 11. Accuracy obtained from competitive algorithm for all datasets.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE
Sem	0.58816	0.59696	0.58637	0.59313	0.58727	0.56260	0.59004	0.57579	0.59329	0.59069
STS-Gold	0.93612	0.93366	0.90909	0.92138	0.93366	0.91155	0.92383	0.91892	0.93857	0.93120
GPS trajectories	0.87037	0.79630	0.79630	0.79630	0.79630	0.79630	0.85185	0.87037	0.79630	0.79630
GAS sensors	0.98825	0.98237	0.98120	0.98355	0.98237	0.98237	0.98472	0.98120	0.98237	0.98355
Hepatitis	0.90909	0.89610	0.89610	0.89610	0.89610	0.89610	0.90909	0.92208	0.89610	0.89610
Movm	0.81731	0.79808	0.77885	0.78846	0.77885	0.75000	0.78846	0.80769	0.78846	0.77885
UCI-HAR	0.90193	0.89684	0.89515	0.89786	0.89481	0.89141	0.89311	0.89718	0.89379	0.89345
C6	0.97104	0.96871	0.96838	0.96871	0.97104	0.97071	0.96937	0.96838	0.97071	0.97004

Table 12. Sensitivity obtained from competitive algorithm for all datasets.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE
Sem	0.54683	0.55992	0.53248	0.55161	0.53172	0.50453	0.55438	0.55337	0.54859	0.53902
STS-Gold	0.96085	0.96085	0.95018	0.94662	0.97153	0.95018	0.95374	0.93950	0.96797	0.95730
GPS trajectories	0.82759	0.75862	0.75862	0.75862	0.75862	0.75862	0.86207	0.93103	0.75862	0.75862
GAS sensors	0.99194	0.98387	0.98387	0.98387	0.98790	0.98790	0.99194	0.99194	0.98387	0.98790
Hepatitis	0.89474	0.86842	0.86842	0.86842	0.86842	0.86842	0.89474	0.89474	0.86842	0.86842
Movm	0.70000	0.68000	0.66000	0.66000	0.66000	0.62000	0.68000	0.72000	0.66000	0.66000
UCI-HAR	0.89113	0.89516	0.89919	0.88508	0.89718	0.89315	0.89516	0.88710	0.89516	0.89315
C6	0.97711	0.95951	0.96303	0.96127	0.96479	0.96479	0.96479	0.96479	0.96303	0.96479

Table 13. Specificity obtained from competitive algorithm for all datasets.

	DMOAQ	DMOA	bGWO	Chameleon	EFO	AOS	AO	RSA	LSHADE	LcnE
Sem	0.83638	0.84408	0.84805	0.84216	0.84564	0.84023	0.83482	0.84179	0.84179	0.84709
STS-Gold	0.88095	0.87302	0.81746	0.86508	0.84921	0.82540	0.85714	0.87302	0.87302	0.87302
GPS trajectories	0.92000	0.84000	0.84000	0.84000	0.84000	0.84000	0.84000	0.80000	0.84000	0.84000
GAS sensors	0.99834	0.99668	0.99668	0.99668	0.99668	0.99668	0.99834	0.99834	0.99668	0.99668
Hepatitis	0.92308	0.92308	0.92308	0.92308	0.92308	0.92308	0.92308	0.94872	0.92308	0.92308
Movm	0.92593	0.90741	0.88889	0.90741	0.88889	0.87037	0.88889	0.88889	0.90741	0.88889
UCI-HAR	0.97919	0.97838	0.97674	0.98164	0.97838	0.97797	0.97878	0.98205	0.97674	0.97797
C6	0.98645	0.98768	0.98810	0.98768	0.98810	0.98851	0.98892	0.98974	0.98851	0.98810

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Elaziz, M.A.; Ewees, A.A.; Al-qaness, M.A.A.; Alshathri, S.; Ibrahim, R.A. Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization. Mathematics 2022, 10, 4565. https://doi.org/10.3390/math10234565

AMA Style

Elaziz MA, Ewees AA, Al-qaness MAA, Alshathri S, Ibrahim RA. Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization. Mathematics. 2022; 10(23):4565. https://doi.org/10.3390/math10234565

Chicago/Turabian Style

Elaziz, Mohamed Abd, Ahmed A. Ewees, Mohammed A. A. Al-qaness, Samah Alshathri, and Rehab Ali Ibrahim. 2022. "Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization" Mathematics 10, no. 23: 4565. https://doi.org/10.3390/math10234565

APA Style

Elaziz, M. A., Ewees, A. A., Al-qaness, M. A. A., Alshathri, S., & Ibrahim, R. A. (2022). Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization. Mathematics, 10(23), 4565. https://doi.org/10.3390/math10234565

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization

Abstract

1. Introduction

2. Background

2.1. Dwarf Mongoose Optimization Algorithm

2.1.1. Phase 1: Initialization

2.1.2. Phase 2: Alpha Group

2.1.3. Step 3: Scouting Group

2.1.4. Phase 4: Babysitting Group

2.1.5. Phase 5: Termination

2.2. Quantum-Based Optimization

3. Proposed Method

3.1. First Stage

3.2. Second Stage

3.3. Third Stage

4. Experimental Setup and Dataset

4.1. Performance Measures

4.2. Experimental Series 1: FS Using UCI Datasets

4.3. Experimental Series 1: FS for High Dimensional Datasets

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI