Ensemble and Quick Strategy for Searching Reduct: A Hybrid Mechanism

: Attribute reduction is commonly referred to as the key topic in researching rough set. Concerning the strategies for searching reduct, though various heuristics based forward greedy searchings have been developed, most of them were designed for pursuing one and only one characteristic which is closely related to the performance of reduct. Nevertheless, it is frequently expected that a justiﬁable searching should explicitly involves three main characteristics: (1) the process of obtaining reduct with low time consumption; (2) generate reduct with high stability; (3) acquire reduct with competent classiﬁcation ability. To ﬁll such gap, a hybrid based searching mechanism is designed, which takes the above characteristics into account. Such a mechanism not only adopts multiple ﬁtness functions to evaluate the candidate attributes, but also queries the distance between attributes for determining whether two or more attributes can be added into the reduct simultaneously. The former may be useful in deriving reduct with higher stability and competent classiﬁcation ability, and the latter may contribute to the lower time consumption of deriving reduct. By comparing with 5 state-of-the-art algorithms for searching reduct, the experimental results over 20 UCI data sets demonstrate the effectiveness of our new mechanism. This study suggests a new trend of attribute reduction for achieving a balance among various characteristics.


Introduction
Attribute reduction [1,2], as one filter feature selection technique emerges in rough set [3][4][5], plays a crucial role in the field of data dimension reduction. Generally speaking, given a constraint, the purpose of attribute reduction is to obtain an appropriate attribute subset through some specific searchings.
In general, if the form of the attribute reduction is fully defined, then how to derive such qualified reduct is the key. Up to now, exhaustion and heuristics based searchings are two frequently used strategies. Though the optimal reduct can be obtained through using exhaustion, the time consumption is frequently too high to be accepted because exhaustion is designed for finding all reducts. For such reason, the heuristics based searching [6,7] has been paid much attention to for its low complexity.
As a poster child of heuristic searching, forward greedy [8] is effective. However, some limitations can also be observed in forward greedy searching. On the one hand, the elapsed time of obtaining reduct may be higher with dramatically increasing volume of data [9]. For instance, if we are facing high-dimensional data [10,11], then for each iteration in the process of forward greedy searching, one and only one attribute is selected and immediately, the times of iterations may be greater. On the other hand, in the processes of most of the forward greedy searchings, each candidate attribute is evaluated based on one and only one fitness function, i.e., one measure with respect to the form of attribute reduction is calculated for each candidate attribute. Obviously, such device is only the single-view [12,13] based evaluation and then it may not be applicable to the stability requirement of selecting attribute.
By considering what has been discussed above and the learning task, it is not difficult to emphasize that a reasonable algorithm for deriving reduct should be equipped with the following important characteristics.
(1) Low time consumption of deriving reduct. This is the first perspective which should be considered in designing algorithm, especially when large-scale and high-dimensional data appear. (2) High stability of derived reduct. A reduct with low stability indicates that such reduct is susceptible if data perubation happens, and then it may be unsuitable for further data processing. (3) Competent classification of derived reduct. Attribute reduction can be regarded as an important step of data pre-processing, and then it does expect that the obtained reduct will offer competent performance if the classification task is explored.
Presently, to the best of our knowledge, most of the previous approaches for searching reduct mainly focus on one and only one of the above characteristics. For example, Chen et al. [14] have proposed an attribute group approach for calculating reduct based on the consideration of the relationship among attributes. Such approach consists of two main phases: (1) raw attributes are divided into different groups; (2) in the process of searching reduct, only the attributes in the group contain at least one attribute in the potential reduct should be evaluated. From this point of view, such process can reduce the times of evaluating candidate attributes, it follows that the elapsed time of deriving reduct may be decreased. Though Chen et al.'s attribute group has achieved success for low time consumption of deriving reduct, it may not be suitable for generating reduct with high stability. This is mainly because: (1) in such approach, each candidate attribute is still evaluated based on one and only one fitness function [13,15], which will lead to ignore the distribution of the samples; (2) the groups of attributes strongly depend on the process of K-means, which will result in some degrees of randomness for adding appropriate attributes into the potential reduct.
To overcome the above limitations, a new hybrid mechanism will be developed in this paper, where multiple characteristics are considered simultaneously. Firstly, to obtain the reduct with high stability, the ensemble selector [13,16] will be introduced into our approach, in which each attribute can be fully evaluated with respect to multiple fitness functions. Secondly, it is worth noting that the usage of ensemble selector will imply higher time consumption. Therefore, the dissimilarity relationship among attributes obtained by using the distance between attributes will be further employed, by which multiple different attributes can be selected and added into the potential reduct for each iteration in the process of deriving reduct. This is the core for effectively reducing the time consumption. In additional, following the researches shown in Refs. [13,17], it can be observed that the reducts obtained by both ensemble selector and dissimilarity are frequently equipped with competent generalization performance. For such reason, our hybrid mechanism is also expected to be with justifiable classification ability. The specific details of our mechanism will be shown in the following Figure 1.
In Figure 1, (1) each candidate attribute will be evaluated from different perspectives by using multiple fitness functions; (2) an appropriate attribute can be obtained by adopting the mechanism of ensemble selector based on the results of the attribute evaluations; (3) one or more attributes, which bear a striking dissimilarity to the attribute obtained in (2), will also be selected; (4) more than one attributes can be added into the potential reduct simultaneously. The main contribution of this research can be summarized as the following aspects: (1) observing that most of the state-of-the-art approaches are designed for pursuing one and only one characteristic which is closely related to the performance of reduct, a hybrid based searching mechanism is proposed to make a trade-off between the stability of derived reduct and the elapsed time of searching reduct; (2) though the neighborhood rough set based reduct is conducted by using the hybrid based searching mechanism in the context of this paper, it worth to point out that our hybrid based searching mechanism is independent of rough set model and then can be employed to any other attribute reduction.
The remainder of this paper is organized as follows. In Section 2, we will review the basic notions related to attribute reduction and some used measures. A new hybrid mechanism for searching reduct will be presented in Section 3. Comparative experimental results and the corresponding analyses will be shown in Section 4. This paper will come to an end with conclusions and future perspectives in Section 5.

Attribute Reduction
Presently, a variety of definitions of attribute reduction [18,19] have been proposed with respect to different requirements [20,21]. Through extracting the commonness of those definitions, Yao et al. [22], have proposed a general form which is shown in the following Definition 1. Definition 1. Given a decision system DS =< U, AT, D >, U is a nonempty finite set of samples, AT is a nonempty finite set of raw attributes and D is a decision attribute. Supposing that ρ-constraint is a constraint based on a considered measure ρ such that ρ : P(AT) → R (P(AT) is the power set of AT, R is the set of all real numbers), then ∀A ⊆ AT, A is referred to as a reduct if and only if (1) A satisfies the ρ-constraint; (2) ∀B ⊂ A, B does not satisfy the ρ-constraint.
Following Definition 1, the open problem is how to obtain a qualified reduct. As one of the widely used heuristic algorithms, the forward greedy searching [8,23] has been favored by many researchers. The details of such strategy are shown in the following Algorithm 1.
In Algorithm 1, fitness value can be obtained by a fitness function such that φ: AT → R, it follows that the importance of each attribute can be quantitatively characterized. It must be noticed that the form of fitness function is closely related to the measure ρ used in given constraint. For example, if the constraint is required to preserve the measure of approximation quality, then φ(a) can be regarded as the variation of the approximation quality [24] if a is added into the pool set.
It is not difficult to reveal that the process of Algorithm 1 contains two main phases. The first phase, adds the qualified attributes into the potential reduct. The second phase, removes redundant attributes from the potential reduct. Obviously, this process fits the two requirements shown in Definition 1. The time complexity of Algorithm 1 is O(|U| 2 × |AT| 2 ), where |U| and |AT| denote the numbers of samples and raw attributes, respectively. Algorithm 1. Forward Greedy Searching (FGS) Input: Decision system DS, ρ-constraint and fitness function φ. Output: One reduct A.
Step 1. Calculate the measure-value ρ(AT) over the raw attribute set AT; Step 2. A = ∅; Step 3. Do (1) Evaluate each candidate attribute a ∈ AT − A by calculating φ(a); (2) Select a qualified attribute b ∈ AT − A with the justifiable evaluation;

Stability Measure
Generally speaking, the stability of reduct can be regarded as the sensitivity of the attribute preferences if an algorithm produces differences in training sets drawn from the same generating distribution. Therefore, the stability of reduct can be quantified as the changing degree of reducts if sample disturbance happens.
To quantitatively characterize the concept of the stability, a series of measures [25][26][27][28] has been proposed. Furthermore, to make the comparisons among different measures more reasonable, Nogueira et al. [28] suggested five desirable properties that a stability measure should possess: (1) fully defined; (2) strict monotonicity; (3) bounds; (4) maximum stability; (5) correction for chance. Therefore, with a critical reviewing of the previous stability measures, it is not difficult to observe that only the measures designed by Akashata et al. [27] and Nogueira et al. [28] fully possess the above five properties. Such two measures will be shown in the following Definitions 2 and 3.

Definition 2.
Given a set of reducts Z = {A 1 , A 2 , ..., A M }, supposing that AT is a raw attribute set, the stability measure proposed by Akashata with respect to Z is defined as: where I ij , E ij , µ ij and M ij represent the intersection, expected intersection, minimum intersection and maximum intersection of attributes with respect to A i and A j , respectively. If |A i | = |A j |, then α = 1 and β = 0; otherwise, α = 0 and β = 1.

Definition 3.
Given a set of reducts Z = {A 1 , A 2 , ..., A M }, supposing that AT is a raw attribute set and k is the mean value of the numbers of attributes in reducts, the stability measure proposed by Nogueira with respect to Z is defined as: where s 2 a is unbiased sample variance of the selected attribute a.
Following the above discussions, Akashata's measure shown in Definition 2 is based on the similarity over reducts. Nogueira's measure shown in Definition 3 is based on the frequency over attributes. It is not difficult to reveal that both of them take the advantage of the differences among reducts for obtaining the quantified value. The former pay much attention to the overall differences between two different reducts, while the latter focuses on the difference of each attribute among multiple reducts.

Dissimilarity for Attribute Reduction
Through FGS, we can observe that one and only one appropriate attribute will be selected and added into the potential reduct for each iteration of evaluating candidate attributes. Therefore, if the number of attributes is large, then the elapsed time of deriving reduct may still be unacceptable. For such reason, the strategy of searching reduct by considering the dissimilarity between attributes has been proposed by Rao et al. [17], which can simultaneously add more than one attribute into the potential reduct for each iteration of evaluating candidate attributes. The detail is shown in the following Algorithm 2.

Algorithm 2. Dissimilarity for Attribute Reduction (DAR)
Input: Decision system DS, ρ-constraint, fitness function φ and number of attributes in one combination t. Output: One reduct A.
Step 1. Calculate the measure-value ρ(AT) over the raw attribute set AT; Step 2. Calculate the dissimilarities between attributes such that Ψ = {∆(a, b) : ∀a, b ∈ AT}; // ∆(a, b) denotes the distance between attributes a and b Step 3. A = ∅; Step 4. Do (1) Evaluate each candidate attribute a ∈ AT − A by calculating φ(a); (2) Select a qualified attribute b ∈ AT − A with the justifiable evaluation; (4) By Ψ b , derive attribute subset B with t − 1 attributes, in which the attributes bear the striking dissimilarity to b; // Selection of a combination of~attributes (6) Calculate ρ(A); Until ρ-constraint is satisfied; Step 6. Return A.
, the complexity of Algorithm 2 is lower. However, for the reason that more than one attribute can be selected during each iteration, then Algorithm 2 may derive reduct with lower stability.

Ensemble Selector for Attribute Reduction
As what has been shown in Section 2.1, the fitness function is actually used to evaluate the importance of the candidate attributes. However, it must be pointed out that only one fitness function cannot be used to characterize the importance of the candidate attributes with multiple views. Furthermore, the using of only one fitness function does not take the distribution of the samples into account and then it will lead to the instability of deriving reduct. To fill such gap, Yang and Yao [13] have proposed the ensemble selector for attribute reduction, which employs multiple fitness functions for evaluating the candidate attributes. Immediately, the voting mechanism can be used for selecting the appropriate attribute. The detailed process will be shown in the following Algorithm 3.
Step 1. Calculate the measure-value ρ(AT) over the raw attribute set AT; Select an attribute b ∈ T with the maximal frequency of occurrences; // Ensemble selector mechanism Step 5. Return A.
By comparing with both Algorithms 1 and 2, the time complexity of Algorithm 3 is significantly increased. This is mainly because multiple fitness functions have been used. Without loss of generality, the time complexity of Algorithm 3 is O(|U| 2 × |AT| 2 × s), in which s is the number of used fitness functions.

A New Hybrid Mechanism for Attribute Reduction
Reviewing the researches of attribute reduction, most of the previous approaches pay much attention to improving the performance of one aspect. For example, compared with Algorithm 1, Algorithm 2 can significantly reduce the elapsed time of calculating reduct while Algorithm 3 can generate reduct with higher stability. However, as what have been pointed out in Sections 3.1 and 3.2, the above two algorithms are highly possible to lead to performance degradation in some other aspects, e.g., Algorithm 2 may derive reduct with lower stability because more than one attribute have been selected for each iteration and Algorithm 3 may be with higher time consumption because multiple fitness functions should be used.
Without loss of generality, it is expected to design an algorithm for deriving reduct with the following three characteristics.
(1) Low time consumption of deriving reduct. Though many accelerators have been proposed for quickly deriving reduct, the dissimilarity approach presented in Algorithm 2 will be used in our research, this is mainly because such algorithm will provide us reduct with low stability, and then it is possible for us to optimize it for quickly obtaining reduct with high stability. (2) High stability of derived reduct. To search reduct with high stability, the ensemble selector presented in Algorithm 3 will be introduced into our research. However, though such algorithm may contribute to the reduct with high stability, it frequently result in a high time consumption of obtaining reduct. Then it is possible for us to optimize such algorithm for quickly obtaining reduct with high stability. (3) Competent classification of derived reduct. In the studies of Yang et al. [13] and Rao et al. [17], it has been pointed out that the reducts obtained by using Algorithms 2 and 3 possess the justifiable classification ability. For such reason, it is possible that the combination of those two algorithms can also preserve competent classification ability.
Therefore, a new hybrid mechanism for attribute reduction will be proposed. The specific process is shown in the following Algorithm 4. In Algorithm 4, on the one hand, ensemble selector is employed, which can provide higher stability. On the other hand, the dissimilarity between candidate attributes is also taken into account, and then multiple attributes can be added into the potential reduct simultaneously, which contributes to the lower time complexity. Therefore, the time complexity of Algorithm 4 is O(∑ s i=1 (|U| 2 × |AT| × m)) = O(|U| 2 × |AT| × m × s), in which m = |AT| t and s is the number of used fitness functions. Obviously, O(|U| 2 × |AT| × m) < O(|U| 2 × |AT| × m × s) < O(|U| 2 × |AT| 2 × s), i.e., though the time complexity of Algorithm 4 is higher than that of Algorithm 2, compared with that of Algorithm 3, it is lower.

Data Sets and Configuration
To demonstrate the effectiveness of the algorithm proposed in this paper, 20 UCI data sets have been selected to conduct the experiments. The detailed description of those data sets will be shown in the following Table 1. All the experiments have been carried out on a personal computer with Windows 10, AMD R7 3750H CPU (2.30 GHz) and 8.00 GB memory. The programming language is Matlab R2017a.

Experimental Setup
In the following experiments, the neighborhood rough set [8,10,29] will be employed to define forms of attribute reduction. Note that the 5-fold cross-validation is also used in our experiments to test the performances of reducts. In other words, for each data set, the set of raw samples is randomly partitioned into 5 groups with the same size, the 4 groups compose the training samples for computing reducts and the remaining 1 group is regarded as the testing samples. The threshold of approximation quality is set to be 0.95 (95%). Such value is beneficial to avoid a series of problems caused by too strict constraints, and reduce the time consumption of the experiments.
Furthermore, five state-of-the-art algorithms are selected for comparing with our proposed algorithm. The above five algorithms are shown as following.

Comparisons of Stability
In this section, the stability of reducts obtained by using different algorithms will be compared with each other. The detailed results will be shown in the following Table 2. To further reveal the differences between the stability of reducts obtained by using HMAR and other five algorithms from the statistical perspective. The changing ratio related to the stability of reducts obtained by different measures will be shown in the following Tables 3 and 4.
Following Tables 2-4, it is not difficult to observe that the reduct obtained by using HMAR is relatively high in terms of Akashat's and Nogueira's measures in most cases. Take "Ionosphere" data set and Akashata's measure as an example, the stability of reducts which obtained by using FGS, AGAR, ESAR , DAR, DGAR and HMAR are 0.2562, 0.1213, 0.6545, 0.2986, 0.2818 and 0.4416, respectively. Obviously, though the stability of the reduct obtained by using HMAR is lower than that by using ESAR which can generate reduct with high stability, compared with FGS, AGAR, DAR and DGAR, the obtained reduct by using HMAR is equipped with higher stability.
Furthermore, from the perspective of changing ratio related to the stability, the above conclusion can be further verified. For example, by using Akashata's measure and Nogueira's measure, the changing ratios of stability are 0.7240, 2.6415, −0.3252, 0.4790, 0.5669 and 0.6772, 2.7850, −0.4651, 0.3397, 0.3121, respectively. Through observing these values, it can be observed that compared with FGS, AGAR, DAR and DGAR, the changing ratios related to stability of the reduct obtained by using HMAR are greater than 0. It means that the stability of the reduct obtained by using HMAR is higher than that by using those four approaches.

Comparisons of Elapsed Time
In this section, the elapsed time of obtaining reducts and the changing ratio related to the elapsed time of deriving reducts will be shown in the following Tables 5 and 6.  Tables 5 and 6, it is not difficult to reveal that the time consumption of obtaining reduct by using HMAR is significantly lower than that by using ESAR. Take "Forest Type Mapping" data set as an example, the elapsed time of obtaining reducts by using FGS, AGAR, ESAR , DAR, DGAR and HMAR, 0.0161, 0.0137, 0.0217, 0.0050, 0.0724 and 0.0175 s are required, respectively. Obviously, though the elapsed time of obtaining reduct by using HMAR is higher than that by using FGS, AGAR and DAR, compare with that by using ESAR, the elapsed time of obtaining reduct by using HMAR is lower. Furthermore, from the perspective of changing ratio related to the time consumption the above conclusion can be further verified. For example, the changing ratio of the elapsed time of obtaining reduct the by using HMAR relative to by using ESAR is −0.4879. It means that the elapsed time of obtaining reduct by using HMAR is significantly lower than that by using ESAR.

Comparisons of Classification Performances
In this section, the classification accuracies of reducts obtained by six different algorithms will be compared. KNN classifier is employed to test the classification performance. It is worth noting that the parameter k used in KNN classifier is 5. The corresponding results and the value of the changing ratio related to the classification accuracy of deriving reducts will be presented in the following Tables 7 and 8, respectively.  Through observing Tables 7 and 8, it is not difficult to observe that our proposed approach will not lead to poorer classification accuracy compared with other approaches. Take "Dermatology" data set as an example, if k = 5, then the classification accuracies of reducts obtained by using FGS, AGAR, ESAR , DAR, DGAR and HMAR, are 0.9273, 0.8990, 0.9344, 0.9262, 0.9488 and 0.8744, respectively.
Furthermore, from the perspective of changing ratio related to the classification accuracies the above conclusion can be further verified. For example, the changing ratios of classification accuracies are −0.0571, −0.0273, −0.0642, −0.0560 and −0.0758. Through observing the these values, it can be observed that the changing ratios of classification accuracy of obtained reduct by using HMAR relative to that by using other five approaches are between −0.1 and 0.1. It means that our proposed approach performs similarly with other compared algorithms in classification ability.

Discussion of Experimental Results
In Section 4.3, the stability of reduct is discussed. In most cases, the reducts obtained by using ESAR and HAMR have relatively high stability. Take "Ionosphere" data set and Akashata's measure as the example, the stability of reducts which obtained by using FGS, AGAR, ESAR , DAR, DGAR and HMAR are 0.2562, 0.1213, 0.6545, 0.2986, 0.2818 and 0.4416, respectively.
In Section 4.4, the time consumption of obtaining reduct is discussed. In most data sets, the elapsed time of obtaining reduct by using HMAR is less than that by using ESAR. Take "Forest Type Mapping" data set as an example, the elapsed time of obtaining reducts by using FGS, AGAR, ESAR , DAR, DGAR and HMAR, 0.0161, 0.0137, 0.0217, 0.0050, 0.0724 and 0.0175 s are required, respectively.
In Section 4.5, the classification ability of reduct is discussed. The reducts obtained by using FGS, AGAR, ESAR, DAR and HMAR have similar classification ability. Take "Dermatology" data set as an example, if KNN classifier with k = 5 is used, then the classification accuracies of reducts obtained by using FGS, AGAR, ESAR , DAR, DGAR and HMAR, are 0.9273, 0.8990, 0.9344, 0.9262, 0.9488 and 0.8744, respectively.
Obviously, the approach of HMAR which proposed by us can be used to generate reduct with high stability. Furthermore, compared with the previous approaches which can generate reducts with high stability, our approach can obtain reduct in lesser time. Concurrently, it must be pointed out that the reduct obtained by using HMAR is equipped with justifiable classification ability. Above situations are mainly caused by both the ensemble selector and the acceleration strategy have been used in our hybrid searching.

Conclusions and Future Perspectives
In this paper, through considering multiple characteristics related to the searching of reduct, a hybrid based searching mechanism has not only been explored but also been developed. Different from the previous approaches in which only single characteristic is fully considered, our approach takes the time consumption of deriving reduct, the stability of the derived reduct and the classification ability offered by the reduct into account, simultaneously. The experimental results demonstrate that our proposed approach can make a trade-off between the stability of derived reduct and the elapsed time of searching reduct. This is mainly because both the ensemble selector and the acceleration strategy have been used in our hybrid searching. Moreover, it must be pointed out that the reduct derived by using our approach can also provide competent classification performance by comparing with several state-of-the-art approaches. In general, a hybrid based searching mechanism is proposed by us for attribute reduction considering multiple characteristics, simultaneously. However, in terms of the time consumption of obtaining reduct and the classification ability of the obtained reduct, there are some limitations in such two performances of our approach. Therefore, we will confront the following challenges in the further research.
(1) The elapsed time of obtaining reducts can be further reduced through combining some other acceleration strategies [31,32]. (2) Supervised information granulation [33,34] strategy can be further introduced into our approach for improving the generalization performance offered by the reduct.