E ﬃ cient Breadth-First Reduct Search

: This paper formulates the problem of determining all reducts of an information system as a graph search problem. The search space is represented in the form of a rooted graph. The proposed algorithm uses a breadth-ﬁrst search strategy to search for all reducts starting from the graph root. It expands nodes in breadth-ﬁrst order and uses a pruning rule to decrease the search space. It is mathematically shown that the proposed algorithm is both time and space e ﬃ cient.


Introduction
In machine learning, feature selection is a process that selects relevant features used as the input in learning models. Its intention is to obtain optimal features so that the model can be used to accurately predict the output. Such optimal features are known as reducts in rough set theory. A reduct of an information system with conditional attributes and a decision attribute is defined as a minimal subset of a set of conditional attributes, in which its degree of dependency on the decision attribute is the same as the set of conditional attributes; (see [1] for a formal definition). A reduct could be any nonempty subset of the conditional attributes with the degree of dependency. Hence, the number of possible reducts is exponential with respect to the number of conditional attributes. With the aim of computational efficiency, a number of algorithms are proposed as a solution to finding a single reduct or multiple reducts without exhaustively investigating all of these possibilities. Therefore, many heuristic algorithms [2][3][4][5][6] and metaheuristic algorithms [7][8][9][10][11][12][13][14] have been proposed, as discussed in [15]. This class of algorithms is known as approximate algorithms. They give a single reduct or multiple reducts but not an exhausted list of reducts as the exact algorithms do. As the results of approximate algorithms requiring a parameter setting, they may produce different reducts in different runs. Moreover, they could create reducts that are not optimal. The exact algorithms are necessary if we are interested in computing the best reduct, with a given criterion, out of all reducts. However, finding all reducts involves generating and examining all possible reducts. In the literature, this can be done based on a discernibility matrix [16] and a power set tree [17,18]. Therefore, these exact algorithms have a time complexity exponential with respect to the number of conditional attributes.
In this work, we propose an exact algorithm to find all reducts without generating and examining all possible reducts. A simple representation of the possible reducts called a solution rooted graph is proposed. The rooted graph is formed by possible subsets of conditional attributes and their connections. Its root node is an n-subset attribute. The node is connected to its (n − 1)-subset nodes. Each (n − 1)-subset node is connected to its (n − 2)-subset nodes. This continues until reaching a 0-subset node. Hence, each node of the rooted graph except for a 0-subset node is a possible reduct. Furthermore, there are n node types based on their cardinalities in the search space: n-subset, (n − 1)-subset, (n − 2)-subset, . . . , and 1-subset. The proposed algorithm searches on the solution Mathematics 2020, 8, 833; doi:10.3390/math8050833 www.mdpi.com/journal/mathematics rooted graph with the breadth-first search. This work adopts the breadth-first search because it is complete, i.e., it assures finding all reducts. This algorithm still involves generating and examining all possible reducts from n-subset type to 1-subset type, type by type. We know that any node with a degree of dependency less than the graph root is not a reduct and its subsets are not reducts according to "the monotonic property of dependency". Then, all of these subsets can be eliminated from consideration without losing any optimal reducts. The proposed algorithm is equipped with this rule of elimination as the pruning rule, which is the basis of its efficiency. This paper is organized as follows. Section 2 briefly gives a sufficient background on reducts in terms of the degree of dependency. Section 3 describes the new efficient breadth-first reduct search algorithm. Analysis of the algorithm is given in Section 4. We conclude the paper in Section 5. An illustrative example is shown in Appendix A.

Basic Concepts
This section gives the background on reducts in terms of the degree of dependency. More details on reducts and rough sets can be found in [1]. Definition 1. Any 4-tuple IS = <U, A = C ∪ D, V, f>; C ∩ D = Φ is called an information system; where U is a finite set of objects; A is a finite set of attributes; C is a finite set of conditional attributes; D is a finite set of decision attributes; V = ∪ p ∈ A Vp, where V p is a domain of the attribute p, and f: U × A → V is a function called an information function f(x i , q) for every q ∈ A and x i ∈ U. An information system is denoted by IS = (U, A).

Example 1.
Let us consider the simple information system shown in Table 1. We adopt this table to illustrate the basic concepts in the following examples.

Definition 6.
Let C' be any nonempty subset of C. C' is a D-reduct (reduct with respect to D) of C, if C' is a minimal subset of C such that γ C (D) = γ C' (D).  Therefore, we find a reduct by obtaining a minimal subset of conditional attribute (C) so that the decision attributes (D) depend on it in the same degree as depending on C. A brute force approach to the problem is to check all subsets of C, with a minimal condition, satisfying the dependency of D on C (γ C (D)). Each minimal subset obtained (reduct) requires checking all of its 2 n − 2 subsets-excluding itself and an empty set; where n is its cardinality.

Efficient Breadth-First Reduct Search Algorithm
Let C = {C 1 ,C 2 ,C 3 , . . . , C n } be a conditional attribute of an information system IS(C, D). There are n + 1 types of subsets of C: n-subset, (n − 1)-subset, (n − 2)-subset, . . . , 1-subset, and 0-subset. Reducts could be any types of these subsets except for the 0-subset. The breadth-first reduct search algorithm orderly investigates the following 2 n − 1 subsets, type by type: n-subset, (n − 1)-subset, (n − 2)-subset, . . . , and 1-subset. Thereafter, these subsets are referred to as reduct candidates. For each subset C' in reduct candidates, if γ C' (D) = γ C (D), then C' is a new element of the reduct set, and all of its supersets in the reduct set, if they exist, are eliminated. If γ C' (D) < γ C (D), all subsets of C' are not reduct candidates. Each nonreduct C' reduces 2 |c'| − 2 elements of the reduct candidates, according to Theorem 1. A reduct set is obtained once all reduct candidates are investigated.
The algorithm implements the idea above by using three data structures: Candidate Queue (the first-come first-served structure), Reduct_Set, and NonReduct_Set to maintain the reduct candidates, reducts, and nonreducts, respectively. Initially, it calculates k = γ C (D), adds C to Candidate Queue, and sets both Reduct_Set and NonReduct_Set to an empty set. It loops to update the data of Candidate Queue, Reduct_Set, and NonReduct_Set if Candidate Queue is not empty. For each loop, it gets an element C' from Candidate Queue and calculates γ C' (D). If γ C' (D) = k, then it adds C' to Reduct_Set using the updateReduct_Set(C') procedure. It also generates all (|C'| − 1)-subsets and insert them into Candidate Queue using the updateCandidate Queue(C') procedure. If γ C' (D) < k, it inserts C' and all its subsets into NonReduct_Set using the updateNonReduct_Set(C') procedure. The algorithm (in detail) is as shown in Figure 1. The algorithm implements the idea above by using three data structures: Candidate Queue (the first-come first-served structure), Reduct_Set, and NonReduct_Set to maintain the reduct candidates, reducts, and nonreducts, respectively. Initially, it calculates k = γC(D), adds C to Candidate Queue, and sets both Reduct_Set and NonReduct_Set to an empty set. It loops to update the data of Candidate Queue, Reduct_Set, and NonReduct_Set if Candidate Queue is not empty. For each loop, it gets an element C' from Candidate Queue and calculates γC'(D). If γC'(D) = k, then it adds C' to Reduct_Set using the updateReduct_Set(C') procedure. It also generates all (|C'| − 1)-subsets and insert them into Candidate Queue using the updateCandidate Queue(C') procedure. If γC'(D) < k, it inserts C' and all its subsets into NonReduct_Set using the updateNonReduct_Set(C') procedure. The algorithm (in detail) is as shown in Figure 1. Table T  There are three major procedures in our algorithm: updateReduct_Set(C'), updateCandidate Queue(C'), and updateNonReduct_Set(C').

updateReduct_Set(C')
Each element in Reduct_Set is not a reduct if we can find any of its subsets that are also reducts. Therefore, we have to test whether each reduct in Reduct_Set is a superset of a new reduct. If it is a superset, we eliminate it from Reduct_Set before putting the new reduct into it, to gain the reduct minimal condition. For example, let Reduct_Set be {{C1, C2, C3, C4, C5}}, let C' be {C1, C2, C3}, and γC'(D) = γC(D); therefore, a new reduct is C'. However, there is a reduct {C1, C2, C3, C4, C5} in Reduct_Set that is a superset of C'. We, therefore, remove {C1, C2, C3, C4, C5} from the Reduct_Set and insert C' into it. This gives the new Reduct_Set = {{C1, C2, C3}}. The procedure (in detail) is as shown in Figure 2.

Procedure updateReduct_Set(C')
Begin Remove all supersets of C' and insert C' to Reduct_Set End There are three major procedures in our algorithm: updateReduct_Set(C'), updateCandidate Queue(C'), and updateNonReduct_Set(C').

updateReduct_Set(C')
Each element in Reduct_Set is not a reduct if we can find any of its subsets that are also reducts. Therefore, we have to test whether each reduct in Reduct_Set is a superset of a new reduct. If it is a superset, we eliminate it from Reduct_Set before putting the new reduct into it, to gain the reduct minimal condition. For example, let Reduct_Set be {{C 1 ,C 2 ,C 3 ,C 4 ,C 5 }}, let C' be {C 1 ,C 2 ,C 3 }, and γ C' (D) = γ C (D); therefore, a new reduct is C'. However, there is a reduct {C 1 ,C 2 ,C 3 ,C 4 ,C 5 } in Reduct_Set that is a superset of C'. We, therefore, remove {C 1 ,C 2 ,C 3 ,C 4 ,C 5 } from the Reduct_Set and insert C' into it. This gives the new Reduct_Set = {{C 1 ,C 2 ,C 3 }}. The procedure (in detail) is as shown in Figure 2. The algorithm implements the idea above by using three data structures: Candidate Queue (the first-come first-served structure), Reduct_Set, and NonReduct_Set to maintain the reduct candidates, reducts, and nonreducts, respectively. Initially, it calculates k = γC(D), adds C to Candidate Queue, and sets both Reduct_Set and NonReduct_Set to an empty set. It loops to update the data of Candidate Queue, Reduct_Set, and NonReduct_Set if Candidate Queue is not empty. For each loop, it gets an element C' from Candidate Queue and calculates γC'(D). If γC'(D) = k, then it adds C' to Reduct_Set using the updateReduct_Set(C') procedure. It also generates all (|C'| − 1)-subsets and insert them into Candidate Queue using the updateCandidate Queue(C') procedure. If γC'(D) < k, it inserts C' and all its subsets into NonReduct_Set using the updateNonReduct_Set(C') procedure. The algorithm (in detail) is as shown in Figure 1. Table T  There are three major procedures in our algorithm: updateReduct_Set(C'), updateCandidate Queue(C'), and updateNonReduct_Set(C').

updateReduct_Set(C')
Each element in Reduct_Set is not a reduct if we can find any of its subsets that are also reducts. Therefore, we have to test whether each reduct in Reduct_Set is a superset of a new reduct. If it is a superset, we eliminate it from Reduct_Set before putting the new reduct into it, to gain the reduct minimal condition. For example, let Reduct_Set be {{C1, C2, C3, C4, C5}}, let C' be {C1, C2, C3}, and γC'(D) = γC(D); therefore, a new reduct is C'. However, there is a reduct {C1, C2, C3, C4, C5} in Reduct_Set that is a superset of C'. We, therefore, remove {C1, C2, C3, C4, C5} from the Reduct_Set and insert C' into it. This gives the new Reduct_Set = {{C1, C2, C3}}. The procedure (in detail) is as shown in Figure 2.

Procedure updateReduct_Set(C')
Begin Remove all supersets of C' and insert C' to Reduct_Set End

Procedure updateCandidate_Queue(C')
Begin Generate all candidate (|C'|-1)-subsets from C' For each candidate, if it is not a subset of any set in NonReduct_Set then add it to Candidate_Queue End

updateNonReduct_Set(C')
The property of the positive region as shown in Theorem 1 allows us to reduce the search space, i.e., the number of reduct candidates. We know that if C' ⊆ C, then we have POSC'(D) ⊆ POSC(D). This infers that γC'(D) ≤ γC(D). In addition, if γC(D) = k (the degree of dependency of D on C in the original data), then C' and its subsets are not reducts. All of the subsets can be eliminated from the candidates. For example, let C be a conditional attribute {C1, C2, C3, C4, C5} with γC(D) = 0.8 and let C' be {C1, C2, C4} with γC'(D) = 0.6. Then, C' and its subsets cannot be reducts according to Theorem 1. We, therefore, do not need to explore these subsets. We then remove all subsets of C' from Candidate Queue. The proposed algorithm stores these candidates in NonReduct_Set using the procedure updateNonReduct_Set(C'). The procedure (in detail) is as shown in Figure 4.

Procedure updateNonReduct_Set(C')
Begin Insert it into NonReduct_Set Remove all subsets of C' in Candidate_Queue End

Analysis of Algorithm
Let C = {C1, C2, C3, …, Cn} be a conditional attribute of an IS(C, D). Additionally, let Lk be a set of k-subsets. We know that |Lk| = and ∑ = 2 n . The algorithm searches for reducts from each Lk level by level starting with k = n, k = n − 1, and so on, until k = 1. Therefore, the size of its search space is ∑ = 2 n − 1. For the best case, C is the only element of Reduct_Set and each element of Ln-1 does not satisfy a reduct property. It tests C and all elements of Ln−1. The number of tests is 1 + = 1 + n. For the best-case scenario, the time complexity is O(n). For the worst-case scenario, the algorithm gives a 1-subset reduct as an element of Reduct_Set. If all generated subsets satisfy a reduct property, the number of test is ∑ = + + … + + + . Therefore, the worst-case time complexity is O( + + … + + + ) = O( ), where m = med and med is the median of n, n − 1, n − 2, …, 1, 0. Since = n (n − 1)(n − 2)…(n -m − 1)/m!, then the worst-case time complexity is O(n m ). However, the algorithm applies a property of the positive region to reduce the search space once it finds a nonreduct subset. Each such l-subset could eliminate 2 l−1 candidate elements in Lk where k = l − 1, l − 2, …, 1. These nonreduct subsets are stored in NonReduct_Set. Any subset that is a subset of an element of NonReduct_Set is not included as an element of Candidate Queue.

updateNonReduct_Set(C')
The property of the positive region as shown in Theorem 1 allows us to reduce the search space, i.e., the number of reduct candidates. We know that if C' ⊆ C, then we have POS C' (D) ⊆ POS C (D). This infers that γ C' (D) ≤ γ C (D). In addition, if γ C (D) = k (the degree of dependency of D on C in the original data), then C' and its subsets are not reducts. All of the subsets can be eliminated from the candidates. For example, let C be a conditional attribute {C 1 ,C 2 ,C 3 ,C 4 ,C 5 } with γ C (D) = 0.8 and let C' be {C 1 ,C 2 ,C 4 } with γ C' (D) = 0.6. Then, C' and its subsets cannot be reducts according to Theorem 1. We, therefore, do not need to explore these subsets. We then remove all subsets of C' from Candidate Queue. The proposed algorithm stores these candidates in NonReduct_Set using the procedure updateNonReduct_Set(C'). The procedure (in detail) is as shown in Figure 4. The procedure generates all (|C'| − 1)-subsets from C' and tests whether each is a reduct candidate. A reduct candidate is not a subset of any NonReduct_Set element. Such a candidate is appended into Candidate Queue.

Procedure updateCandidate_Queue(C')
Begin Generate all candidate (|C'|-1)-subsets from C' For each candidate, if it is not a subset of any set in NonReduct_Set then add it to Candidate_Queue End

updateNonReduct_Set(C')
The property of the positive region as shown in Theorem 1 allows us to reduce the search space, i.e., the number of reduct candidates. We know that if C' ⊆ C, then we have POSC'(D) ⊆ POSC(D). This infers that γC'(D) ≤ γC(D). In addition, if γC(D) = k (the degree of dependency of D on C in the original data), then C' and its subsets are not reducts. All of the subsets can be eliminated from the candidates. For example, let C be a conditional attribute {C1, C2, C3, C4, C5} with γC(D) = 0.8 and let C' be {C1, C2, C4} with γC'(D) = 0.6. Then, C' and its subsets cannot be reducts according to Theorem 1. We, therefore, do not need to explore these subsets. We then remove all subsets of C' from Candidate Queue. The proposed algorithm stores these candidates in NonReduct_Set using the procedure updateNonReduct_Set(C'). The procedure (in detail) is as shown in Figure 4.

Procedure updateNonReduct_Set(C')
Begin Insert it into NonReduct_Set Remove all subsets of C' in Candidate_Queue End

Analysis of Algorithm
Let C = {C1, C2, C3, …, Cn} be a conditional attribute of an IS(C, D). Additionally, let Lk be a set of k-subsets. We know that |Lk| = and ∑ = 2 n . The algorithm searches for reducts from each Lk level by level starting with k = n, k = n − 1, and so on, until k = 1. Therefore, the size of its search space is ∑ = 2 n − 1. For the best case, C is the only element of Reduct_Set and each element of Ln-1 does not satisfy a reduct property. It tests C and all elements of Ln−1. The number of tests is 1 + = 1 + n. For the best-case scenario, the time complexity is O(n). For the worst-case scenario, the algorithm gives a 1-subset reduct as an element of Reduct_Set. If all generated subsets satisfy a reduct property, the number of test is ∑ = + + … + + + . Therefore, the worst-case time complexity is O( + + … + + + ) = O( ), where m = med and med is the median of n, n − 1, n − 2, …, 1, 0. Since = n (n − 1)(n − 2)…(n -m − 1)/m!, then the worst-case time complexity is O(n m ). However, the algorithm applies a property of the positive region to reduce the search space once it finds a nonreduct subset. Each such l-subset could eliminate 2 l−1 candidate elements in Lk where k = l − 1, l − 2, …, 1. These nonreduct subsets are stored in NonReduct_Set. Any subset that is a subset of an element of NonReduct_Set is not included as an element of Candidate Queue.

Analysis of Algorithm
Let C = {C 1 ,C 2 ,C 3 , . . . , C n } be a conditional attribute of an IS(C, D). Additionally, let L k be a set of k-subsets. We know that |L k | = n k and n k=0 n k = 2 n . The algorithm searches for reducts from each L k level by level starting with k = n, k = n − 1, and so on, until k = 1. Therefore, the size of its search space is n k=1 n k = 2 n − 1. For the best case, C is the only element of Reduct_Set and each element of L n-1 does not satisfy a reduct property. It tests C and all elements of L n−1 . The number of tests is 1 + n n − 1 = 1 + n. For the best-case scenario, the time complexity is O(n).
For the worst-case scenario, the algorithm gives a 1-subset reduct as an element of Reduct_Set. If all generated subsets satisfy a reduct property, the number of test is n k=1 n k = n 1 + n 2 + . . . O(n m ). However, the algorithm applies a property of the positive region to reduce the search space once it finds a nonreduct subset. Each such l-subset could eliminate 2 l−1 candidate elements in L k where k = l − 1, l − 2, . . . , 1. These nonreduct subsets are stored in NonReduct_Set. Any subset that is a subset of an element of NonReduct_Set is not included as an element of Candidate Queue. In general, the space complexity of the breadth-first search algorithm is that all candidates remain in Candidate Queue. Therefore, it depends on the size of the largest Candidate Queue. For the best case, the space complexity is O(n). We observe that |L m | is the largest among all subset levels; where m = med and med is the median of n, n − 1, n − 2, . . . , 1, 0. Since the algorithm generates and tests level by level, its worst-case space complexity is determined by O( n m ) = O(n m ).

Conclusions
This paper presents a simple and efficient solution to the problem of finding all reducts in an information system. The problem is formulated as a search problem where the search space is a rooted graph. The rooted graph is a connected graph of possible reducts and their connections. Its root is a set of all conditional attributes. Each of its k-subset nodes is connected by (k − 1)-subset nodes where k is a non-negative integer not larger than the cardinality of the graph root. The proposed algorithm searches this graph using a breadth-first search strategy, starting from the graph root. It expands nodes in breadth-first order. With the monotonic property of the positive region (Theorem 1) as the pruning rule, it can prune all nonreduct nodes in the search space early. An illustrative example is given to demonstrate the algorithm. The algorithm's efficiency is confirmed by the results of the algorithm analysis. Let n be the cardinality of conditional attributes, and m be the floor of the median of n, n − 1, n − 2, . . . , 1, 0; it is shown that both the time and space complexity of the algorithm are O (n) and O (n m ) for the best case and the worst case, respectively.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A Illustrative Example
Let us consider an input to the algorithm as an information system IS (C.D) from Table 1 where {C 1 ,C 2 ,C 3 ,C 4 ,C 5 }.