1. Introduction
Feature selection (attribute reduction) is an effective data preprocessing step in the fields of pattern recognition and data mining. Davies and Russell proved that feature selection is a non-deterministic polynomial complete problem [
1]. It aims to obtain the optimal feature subset from a problem domain. Feature subset selection maintains the same classification accuracy by deleting the irrelevant feature and the redundant features [
2,
3]. Designing and optimizing feature selection algorithm for different feature sets is the key to solve the problem of feature selection [
4].
Many scholars have researched on feature selection methods. Two main methods for feature selection can be distinguished: the filter method and the wrapped approach [
5]. The filter method is an early feature selection method [
6], which uses the training set independent by the classifier to select feature subsets. Another approach to feature selection is the wrapped approach, which optimizes directly for a given learner. The final learning performance of the wrapped approach is better than that of the filtered approach. For improving the training accuracy of the deep learning tasks, these methods need to solve two problems: (i) How to evaluate the correlation of the feature subsets? (ii) How to explore different feature subsets?
In feature selection, a mathematical tool can deal with uncertain problems and reduce noisy features while keeping the attributes’ meaning. Rough set theory (RS), proposed by Pawlak, is effectively used as a tool to determine data dependencies and reduce the number of attributes from the dataset [
7,
8]. For feature subset evaluation, rough set theory is an objective to describe and deal with uncertain problems [
9]. In order to apply rough set theory for dealing with attributes reduction of heterogeneity, some researchers proposed handling methods with rough set in the positive region. Mi et al. put forward a variable precision rough set reduction model based on the upper and lower approximation reduction matrices [
10]. Qiu et al. studied the application of the f-rough principle as an evaluation rule [
11]. Hu et al. established the k-nearest rough set model to evaluate the feature subset [
12]. Furthermore, some researchers have optimized rough set methods based on changing discernable relationships to deal with feature selection for datasets with missing values. JERZYI et al. obtained more approximate informational decision rules by using valuable tolerance relations [
13]. Qian et al. studied the discernible matrix of upper and lower approximation reduction and obtained the method of neighborhood rough set [
14]. Yang et al. constructed a multi-granularity rough set selection model in incomplete information systems using tolerance relations [
15]. Degangetal et al. presented a rough set attribute reduction method covering decision systems under the consistency and inconsistency [
16]. Teng et al. used an entropy condition method to construct the attribute reduction model in the incomplete information system [
17]. Grey wolf optimizer based on quantum computing and uncertain symmetry rough set (QCGWORS) was proposed to improve uncertain symmetry rough set theory based on a positive region to complete the evaluation of the feature subset of the incomplete information system.
The swarm intelligence optimization algorithm is a parallel, efficient, and global search method to simulate the biological behavior of group animals in a natural environment. These groups of individuals search for prey in a cooperative manner. Each member learns from the experience of other members of the group. As for the search strategy of feature subsets, the rough set theory and collaborative selection strategy of swarm intelligence have been proposed. For instance, a feature selection method using a rough set and genetic algorithm has improved classification accuracy [
18]. Using a firefly algorithm, which is based on rough set theory, Long, N. C et al. worked out a spatial cooperative search strategy for feature subsets [
19]. Wang, Inbarani, and Bae et al. studied the particle swarm optimization feature selection method for rough sets, effectively solving the pre-handling work in classification tasks [
20,
21,
22]. Chen, Jensen, and Ke et al. used a rough set theory to optimize attribute reduction of ant colony optimization algorithm [
23,
24,
25]. Chen et al. put forward a new fish swarm algorithm by using rough set theory for effective feature selection [
26]. Based on the attribute reduction algorithm of the rough set, Luan et al. improved the strategy of the fish swarm algorithm [
27]. Yamany et al. proposed a new search strategy by effectively integrating rough set theory with flower pollination optimization [
28]. Yu et al. integrated accelerators into heuristic algorithms to optimize search strategies and improve the efficiency of feature extraction [
29]. Grey wolf optimizer (GWO) is a popular swarm intelligence technique that has received widespread attention from scholars. The algorithm was inspired by grey wolves’ predation activity in nature [
30]. The paper searched for feature subsets using the excellent search performance of the GWO algorithm.
For further improvement of the grey wolf optimization algorithm for feature selection, we used quantum computing to inspire it. Quantum computing refers to the manipulation of quantum systems in order to process information. The superposition principle of quantum mechanical states makes the state of quantum information units superimposed in multiple possibilities, which leads to greater potential for quantum information processing than efficiency for classical information processing [
31].
QCGWORS took advantage of the global and local abilities of grey wolf optimizer and uncertain symmetry rough set theory to evaluate feature subsets. In addition, QCGWORS used quantum computing to inspire the grey wolf optimizer algorithm for feature selection.
In short, researchers have improved the evaluation scheme and search strategy of feature subsets. This paper designed a new cooperative swarm intelligence algorithm-QCGWORS algorithm, based on quantum computing and rough set theory, which is responsible for attribute reduction in the data classification task. Each feature subset represented the position of search individual in the space. This algorithm explored a search space composed of possible feature subsets for the m input feature conditions. The main goal of the algorithm was to apply parallel quantum computing to carry out feature subset optimization and evaluate the feature subset searched by a grey wolf optimizer by using rough set theory to test the feasibility of the algorithm.
3. Grey Wolf Optimizer Based on Quantum Computing and Rough Set
For the QCGWORS algorithm to solve the attributes reduction, the paper represented it in three main modules. The first module involved quantum in feature selection and initialization of grey wolf instances (in
Section 3.3). In the second module (in
Section 3.4), the solution of binary grey wolves was constructed from the way to update the search agent of a grey wolf optimizer (in
Section 3.5).
The purpose of QCGWORS was to search for the potential optimal feature subset. First, it initialized some quantum grey wolf individuals in the optimized feature subset search space. Then, we got the group of n binary grey wolves from grey wolf individuals explored the set space by searching for prey and hunting prey by algorithm 3. Grey wolf individuals searched this space to find the best three omega wolf solutions to update the positions of alpha, beta, and delta. Before reaching the maximum iterations of the algorithm, the fitness of individual grey wolves was calculated to determine whether to update search wolves’ positions. Then, the quantum superposition was used to represent the probability characteristics of selection, and the quantum gate was used to extract the characteristics of quantum measurement (value 1 means selected, value 0 means rejected). Finally, the concept of dependency proposed by the rough set theory was used to evaluate the feature selection, and it could determine whether a subset of conditional properties was an optimal solution. Algorithm 1 described the QCWCORS process.
Algorithm 1. QCGWORS process |
Input: An extended Information System: F Output: optimal feature subset 1. Initialize n quantum grey wolf individuals using (34); 2. Get the group of n binary grey wolves from 3. Search the minimal feature subset of each binary Wolf by the Algorithm 3; 4. corresponding of binary wolf using (26); 5. while (t < maximum iterations) 6. for i=1:q (all q binary grey wolf individuals) do 7. Evaluate the feature subset ; 8. Update the best feature subset ; 9. end for 10. end while 11. return end while 12. return |
3.1. Rough Set Evaluation Function
The uncertain symmetry is used to initially remove the irrelevant features, reduce the workload of feature selection, and strengthen the credibility of the feature subset evaluation. It can be seen from Equation (3) that if the variables L and N are not related, the information gain IG (L|N) =0, otherwise, IG (L|N) > 0; the larger the IG (L|N), between the variables L and N, the stronger the correlation. Therefore, IG(L|N) can be used quantitatively to assess the dependency between two variables. However, IG(L|N) can be affected by the variable unit and the variable value, so further homogenization is required.
We could find from Equation (4) that the uncertainty symmetry US(L, N) satisfies 0 ≤ US(L, N) ≤ 1; US(L, N) = 0 means that two random variables L and N are independent of each other, while (L, N) = 1 means that the two random variables L and N are completely related. According to the above theory, the relevant features are selected in the dataset, and the redundant feature can be removed to reduce the workload of feature selection, thereby improving the classification accuracy.
To evaluate the correlation between the conditional feature subset W and the decision feature
D in the information system [
20], the evaluation function is as follows
is the dependence of conditional feature subset O with respect to decision feature D. represents the subset O of the selected conditional features and satisfies O W; represents the cardinality of the conditional features; represent the influence of the dependency between the conditional feature subset O and the decision feature D. That is, it affects the reduction rate of O. . This equation shows that classification quality and attribute subset length have different meanings for attribute reduction task.
In order to calculate the degree of dependence between each conditional feature subset and decision feature, QCGWORS follows the following equations.
Compared to
, a subset of the conditional feature, the three partitions of an unrecognized object are:
Compared to subsets of the conditional property
, the three partitions of an indistinguishable object are:
In summary, compared with the decision feature “class”, the two partitions of indistinguishable objects are:
The algorithm finds out the positive region that can determine the
IND and indiscernible relationship between the features
S1,
S2 and the decision feature
D and, finally, calculates the dependency of conditional feature subset and decision feature. The dependencies between the decision feature set and
S1 and
S2 are as follows:
3.2. Quantum Representation of Grey Wolf Individual
This paper used the rotation angle [
35] to represent qubit, the rotation angle is shown in
Figure 1. Quantum grey wolf individual for feature selection:
QGWx (the xth quantum grey wolf in a quantum group) corresponds to a vector
.
of variables
,with
∈ [0,
] for (1 ≤ y ≤ m). Each quantum grey wolf solution
is a string of qubits, calculated as follows:
3.3. Quantum Computing in Feature Selection
In feature selection, there are two states for each feature subset: select or not select (1 or 0). This feature can be easily described by the two-dimensional quantum superposition state of 1 and 0 [
31]. What needs to be done is to control the probability of updating the superposition when obtaining 1 or 0, so that the problem of feature selection can be solved in the field of quantum computing.
The quantum grey wolf population is represented by qubit sequences of length m, where m is the cardinal number of conditional attributes, and each qubit determines the selection probability of feature subset. The individual of the quantum grey wolf is represented by
, and the rotation angle is
. The relevant formula is as follows:
Each quantum grey wolf
is initialized by:
where
is the probability that condition feature
will be selected, and
is the probability that
will be rejected.
3.4. Quantum Measurement in the Proposed Algorithm
In the operation of quantum measurement, the solutions of quantum grey wolf (
QGWx) is used to generate a binary grey wolf (
BGWx) solution by qubits’ projection. For a quantum bit, a random number
r is generated from the interval [0, 1]. When
r >
sin2 (
), it sets to 1 to select the corresponding conditional feature; otherwise, the value is 0 and reject the corresponding conditional feature. Therefore, due to the superposition state of the qubits, a quantum superposition solution contains many binary solutions [
37]. However, each qubit determines the probability of selecting or rejecting the corresponding feature. In the quantum measurement step, only certain binary solutions are extracted from quantum solutions, and the selection is guided by the probability of quantized coding.
By using the method of quantum measurement, a feasible feature selection solution of a binary grey wolf algorithm is constructed. At first, the algorithm does not select any feature. For each individual, according to R > sin2 (), the feature is selected to refer to a condition, and the quantum measurement operation is repeated until all features are searched. The following algorithm 2 constructs the feasible solution of feature selection by observing the of the quantum grey wolf. The process of algorithm 2 is as follows.
Algorithm 2. Quantum measurement in the proposed algorithm |
Input:: Quantum Grey individual, C= {}: Conditional feature set Output:: Binary Grey Wolf Individual, : Feature Subset 1. ← 2. for each qubit y of do 3. real value r is generated between [0, 1]; 4. if r > then 5. ← 1; 6. R ←∪; 7. else 8. ← 0; 9. end if 10. end for 11. return |
3.5. Update Position of Binary Grey Wolves
The process of using the binary grey wolf optimization algorithm begins with the initialization of the binary grey wolf individual in algorithm 2. During the iteration processes, alpha, beta, and delta estimate the possible positions of the feature subset (prey), and each potential feature subset scheme updates its distance from the prey. We can use in [0, 2] to make the wolf diverge or converge from the prey. When > 1, the candidate solution tends to stay away from the prey; when <1, the candidate solution moves closer to the prey. The vector in [0, 2] plays an important role in avoiding local optimal stagnation. > 1 means that emphasizes the role of avoiding local optimal. If <1, the role of will be randomly weakened. Finally, the grey wolf optimization algorithm terminates by satisfying the end criterion to obtain an alpha wolf. Algorithm 3.
Algorithm 3. Process of updating the binary wolves’ position |
Input: I: Information System Output: minimum condition feature subset 1. Calculate the fitness of 2. Initialize search wolf for Alpha, Beta, and Delta. 3. Initialize parameters a, A and C. 4. while (t <Max iterations) 5. for each Omega wolf 6. calculate the fitness function (26) value of the 7. for each search wolf 8. if there is a search wolf(Alpha, Beta, Delta) position that needs to be replaced 9. Update parameters a, A and C. 10. Update the current search wolf position 11. end if 12. end For 13. end For 14. end while 15. return Alpha wolf, |