# A Fast Attribute Reduction Algorithm Based on a Positive Region Sort Ascending Decision Table

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

^{2}|C|). By using the notion of information granules, several algorithms successfully reduce the object set from |U| to |U/C| and make the time complexity of the core set calculation be O(|C||U/C|

^{2}) [25,26]. Xu et al. proposed a fast core set algorithm with the complexity of O(|C||U|+|C|

^{2}|U/C|) [27]. At the same time, many formulas or methods were proposed to calculate the different types of attribute significances. Some classical formulas are designed based on the positive region [28,29,30], entropy [3,16,17,18], the discernibility ability of attributes [13,14,24,31,32], the relationship between attributes [33], etc. In addition, many researchers proposed the mixed formulas by combining rough set theory and other theories, such as fuzzy set [12], ant colony optimization [23], granular computing [2,6,16,34], etc.

## 2. Preliminaries

_{1},a

_{2},…,a

_{n}} is the condition attribute set and D = {d} is the decision attribute set. The decision table is considered to be inconsistent if two objects with the same condition values have different decision values. For example, Table 1 is a classical inconsistent decision table.

_{C}= {y∈U|(x,y)∈IND(C)}. The union of all of the granules with respect to C is referred to as a partition of the universe, which is described as: U/C = {[x]

_{C}|x∈U}. [x]

_{C}is exact if it has one decision value; otherwise, it is rough. The union of all of the exact granules with respect to C is referred to as the positive region.

## 3. Positive Region Sort Ascending Decision Table and Its Properties

#### 3.1. SADT

**Definition**

**1.**

_{m}= {a

_{1},a

_{2},…,a

_{m}}.

_{1}>a

_{2}>…>a

_{|C|}. In real applications, the order of condition attributes would be adjusted based on prior knowledge. For example, if the test costs of condition attributes are referenced, one makes the cheap attributes in advance for calculating a reduct with a lower cost.

- “[m,n] = size(S);
- SADT = sortrows (S,1: n);”
- Based on SADT, one easily obtains the following properties.

**Property**

**1.**

_{m}= {a

_{1},a

_{2},…,a

_{m}}. If $({x}_{i},{x}_{j})\in IND\left({B}_{m}\right)$ and i<k<j, then $({x}_{i},{x}_{k})\in IND\left({B}_{m}\right)$.

**Property**

**2.**

_{1},X

_{2},…,X

_{K}} be a partition of a SADT. For any X

_{i}∈ U/C, it has X

_{i}= {x

_{p+1}, x

_{p+2},…,x

_{p+q}}. where $p={{\displaystyle \sum}}_{j=1}^{i-1}\left|{X}_{j}\right|$, q = |X

_{i}|.

_{m}are adjacent physically. It is thus easy to discern the repeat objects and U/C.

#### 3.2. PR-SADT

**Definition**

**2.**

_{new}is a new decision value. In the related experiments inSection 5, we set d

_{new}= max(I

_{d}(x)) + 1.

**Property**

**3.**

**Property**

**4.**

Algorithm 1. Construct a PR-SADT without repeated objects. |

Input: a SADT; |

Output: a PR-SADT without repeated objects. |

1: Begin |

2: For k = |U|:-1:2 |

3: If $\forall a\in C,$[I_{a}(x_{k−1})= I_{a} (x_{k})] ∧ [I_{d} (x_{k−1})≠I_{d} (x_{k})] |

4: I_{d} (x_{k−1}) = d_{new}; |

5: Delete object x_{k}; |

6: end |

7: end |

8: end |

**Example**

**1.**

_{1},x

_{2},x

_{3}}, {x

_{4},x

_{5}}, {x

_{6},x

_{7}}, {x

_{8},x

_{9},x

_{10}}, and{x

_{11}}, and only the last granule {x

_{11}} is exact. The corresponding PR-SADT calculated by Algorithm 1 is presented in Table 2.

## 4. The Reduction Algorithm Based on PR-SADT

#### 4.1. Positive Region Reduction Method Based on PR-SADT

_{new}. This means that the third type is changed to the first type. In a similar way, the fourth type of the granule pair is changed to the second type.

**Definition**

**3.**

_{p}be a PR-SADT without repeated objects. An attribute set R⊆C is called a positive region reduct if and only if R satisfies the following two conditions:

- $\text{}\forall x,y\in U,\exists a\in R\left[{I}_{d}\right(x)\ne {I}_{d}\text{}(y)\u27f9{I}_{d}(x)\ne {I}_{d}\text{}(y\left)\right],$
- $\text{}\forall a\in R,\exists x,y\in U\left[{I}_{d}\text{}\right(x)\ne {I}_{d}\text{}(y)\text{}\wedge \left(x,y\right)\in IND\left(R-\left\{a\right\}\right)].$

#### 4.2. Fast Core Attribute Calculation Based on PR-SADT

**Theorem**

**1.**

_{p}and the last conditional attribute${a}_{n}\in C$, If${a}_{n}$is a core attribute, then$\exists {x}_{k}\in U$, which satisfies the conditions: (x

_{k},x

_{k+1}) $\in IND\left(B\right)$,${I}_{{a}_{n}}\left({x}_{k}\right)<{I}_{{a}_{n}}\left({x}_{k+1}\right)$, and${I}_{d}\left({x}_{k}\right)\ne {I}_{d}\left({x}_{k+1}\right)$, where B = {a

_{1},a

_{2},…,a

_{n−1}}.

**Proof.**

_{d}([x

_{i}]

_{B})|>1. This means that $\exists {x}_{k},{x}_{k+1}\in {\left[{x}_{i}\right]}_{B}$, it has ${I}_{d}\left({x}_{k}\right)\ne {I}_{d}\left({x}_{k+1}\right)$. Considering that S

_{p}is a PR-SADT, it also has $\text{}{I}_{{a}_{n}}\left({x}_{k}\right){I}_{{a}_{n}}\left({x}_{k+1}\right)$. □

_{n}. At the same time, the conditions (x

_{k},x

_{k}

_{+1})$\in IND\left(B\right)$, ${I}_{{a}_{n}}\left({x}_{k}\right)<{I}_{{a}_{n}}\left({x}_{k+1}\right)$ mean that the attribute a

_{n}is the unique attribute that discerns the object pair (x

_{k}, x

_{k+}

_{1}), and ${I}_{d}\left({x}_{k}\right)\ne {I}_{d}\left({x}_{k+1}\right)$ means that the object pair should be discerned according to Definition 3. Hence, the three conditions in Theorem 1 are also sufficient to check whether a

_{n}is a core attribute or not. Based on the above conclusion, an algorithm is given as follows.

Algorithm 2. Check the last condition attribute a_{n}. |

Input: a PR-SADT |

Output: flag |

1: Begin |

2: flag = 0; |

3: for k = 1: |U|-1 |

4: if (x_{k},x_{k}_{+1})$\in IND\left(B\right)$, ${I}_{{a}_{n}}\left({x}_{k}\right)<{I}_{{a}_{n}}\left({x}_{k+1}\right)$, and ${I}_{d}\left({x}_{k}\right)\ne {I}_{d}\left({x}_{k+1}\right)$ |

5: flag = 1 and return |

6: End |

7: End |

8: end |

**Theorem**

**2.**

_{1}is the new decision table when the last column data of a PR-SADT S

_{p}is deleted. If${a}_{n}\notin Core\left({S}_{p}\right)$, then RED(S

_{1})$\subseteq $RED(S

_{p}) and RED(S

_{1})≠$\varnothing $.

**Proof.**

_{p})= R

_{1}∪R

_{2}, where R

_{2}is the set of reducts that includes the last condition attribute a

_{n}. Owing to ${a}_{n}\notin Core\left({S}_{p}\right)$, R

_{1}≠$\varnothing $. According to the relationship between S

_{p}and S

_{1}, it has R

_{1}= RED(S

_{1}). Thus, RED(S

_{1}) $\subseteq $RED(S

_{p}) and RED(S

_{1})≠$\varnothing $. □

_{n}is not a core attribute. Namely, it is effective for obtaining a reduct of the original decision table based on S

_{1}because RED(S

_{1}) $\subseteq $RED(S

_{p}) and RED(S

_{1})≠$\varnothing $. To reduce the running time of all of the remaining heuristic steps, it is necessary to delete the data of column a

_{n}.

_{n}if the last column data is deleted. This shortcoming is acceptable because only one reduct is required in a heuristic reduction algorithm.

_{1}. Considering RED(S

_{1}) $\subseteq $RED(S

_{p}), it has $Core\left({S}_{p}\right)\subseteq Core\left({S}_{1}\right)$. Thus, the output may not be a core attribute of the original decision table S

_{p}.

_{1}, the time complexity is O(|U||C|

^{2}/2). The more exact analysis on time complexity is shown in Section 4.4.

Algorithm 3. The special core attribute calculation algorithm. |

Input: a PR-SADT |

Output: a core attribute |

1: Step 1: check the last condition attribute by Algorithm 2. |

2: Step 2: if flag = 0, then delete the data corresponding to the last condition attribute and jump to step1; else step3. |

3: Step 3: output the last condition attribute |

#### 4.3. Fast Positive Region Calculation Based on PR-SADT

**Theorem**

**3.**

_{1},a

_{2},…,a

_{m}}, U/R = {X

_{1},X

_{2},…,X

_{K}}. For$\forall {X}_{i}\in U/R$, if${X}_{i}{\displaystyle \cap}PO{S}_{R}\left(D\right)=\varnothing $, then$\text{}\exists {x}_{k},{x}_{k+1}\in {X}_{i}$, and it satisfies: I

_{d}(x

_{k})≠I

_{d}(x

_{k+1}).

**Proof.**

_{i}|. Owing to ${X}_{i}{\displaystyle \cap}PO{S}_{R}\left(D\right)=\varnothing $, it has |I

_{d}(X

_{i})| >1. Hence, $\exists {x}_{k},{x}_{k+1}\in {X}_{i}$ and it satisfies I

_{d}(x

_{k})≠I

_{d}(x

_{k}

_{+1}). □

^{2}|C|). The positive region calculation algorithm in [29] has the complexity of O(|U||C|

^{2}). In [32], the complexity of calculating the positive region is O(|U||C|log|U|).

Algorithm 4. Calculate the positive region with respect to R in a PR-SADT. |

Input: a PR-SADT, attribute set R = {c_{1},c_{2},…,c_{m}}. |

Output: the positive region with respect to R. |

1: Step 1: set the default value. |

PR =$\text{}\varnothing $, gra = {x_{1}}, flag = 0. |

2: Step 2: compare the adjacent object pair |

For i = 1: |U|−1 |

gra = gra $\cup$ {x_{i}_{+1}} if $\left({x}_{i},{x}_{i+1}\right)\in IND\left(R\right)$;//discern the object in a granule; |

flag = 1 if $\left({x}_{i},{x}_{i+1}\right)\in IND\left(R\right)$and I_{d} (x_{i})≠I_{d} (x_{i}_{+1});//the granule is rough if flag is 1; |

PR = PR $\text{}{\displaystyle \cup}$ gra if $\exists a\in R,{I}_{a}\left({x}_{i}\right)\ne {I}_{a}\left({x}_{i+1}\right)$ and flag == 0;//record the exact granule; |

gra = {x_{i}_{+1}}, flag = 0 if $\exists a\in R,{I}_{a}\left({x}_{i}\right)\ne {I}_{a}\left({x}_{i+1}\right)$;//prepare for the next granule |

end |

3: Step 3: record the last exact granule. |

If flag==0 |

PR=PR $\cup$ gra;//record the last exact granule |

end |

4: Step 4: output PR. |

**Example**

**2.**

_{1},a

_{2}}. In step 1, PR=$\varnothing $, gra = {x

_{1}}, flag =0. In step 2, these parameters were calculated in Figure 1.

_{5}is added into PR. Finally, output the positive region $PR=\{{x}_{1},{x}_{2},{x}_{3},{x}_{4},{x}_{5}\}$.

#### 4.4. The Attribute Reduction Algorithm Based on PR-SADT

Algorithm 5. The fast positive region reduction algorithm based on PR-SADT (FPRA) |

Input: a decision table S. |

Output: a complete reduct. |

1: Step 1. R =$\text{}\varnothing $. Sort the original decision table. |

2: Step 2. Delete the repeated objects, and calculate a PR-SADT by Algorithm 1. |

3: Step 3. Check the last condition attribute a_{n} by Algorithm 2. If it is a core attribute, then jump to step 5; else, step 4. |

4: Step 4. Delete the last column data, and jump to step 3. |

5: Step 5. R = R∪{a_{k} }. Place the last column to the first column, and sort the decision table. |

6: Step 6. Calculate the positive region with respect to R by Algorithm 4. Delete the positive region. |

7: If S_{p} is null or I_{d}(S_{p}) is d_{new}, then output the reduct R; else, jump to step 3. |

_{k},x

_{k+}

_{1}), which satisfies the conditions according to step 3 in FPRA: (x

_{k},x

_{k+}

_{1})$\in IND\left(B\right)$, ${I}_{{a}_{n}}\left({x}_{k}\right)<{I}_{{a}_{n}}\left({x}_{k+1}\right)$, and ${I}_{d}\left({x}_{k}\right)\ne {I}_{d}\left({x}_{k+1}\right)$, where B=R

_{i}∪{a

_{1},a

_{2},…,a

_{i}

_{−1}}, ${R}_{i}=\left\{{a}_{j}\in R|j>i\right\}$. This means that the object pair (x

_{k},x

_{k+}

_{1}) cannot be discerned by B. At the same time, owing to $R-\left\{{a}_{i}\right\}\subseteq B$, it is concluded that the object pair (x

_{k},x

_{k+}

_{1}) cannot be discerned by R-{a

_{i}}. However, the object pair can be discerned by R according to Algorithm 5. Thus, attribute a

_{i}is essential for attribute set R.

**Analysis on time complexity:**

_{i}and C

_{i}represent the object set and condition attribute set of the ith heuristic process, respectively. It has ${U}_{1}\supset {U}_{2}\supset \dots \supset {U}_{k-1}$,${C}_{1}\supseteq {C}_{2}\supseteq \dots \supseteq {C}_{k-1}$, where k = |R| is the number of attributes in reduct R, C

_{1}= C, and |U

_{1}| = |U/C|.

_{i}||C

_{i}|). Step 5 sorts the decision table, and the complexity of the ith heuristic process is O(|U

_{i}||C

_{i}|). The time complexity of step 6 includes two parts. One comes from Algorithm 4 and is represented as O(i |U

_{i}|), where i is the number of attributes of R for the ith heuristic process. The other part originated by deleting the positive region, and it also has a time complexity of O(i |U

_{i}|).

_{i}= |C

_{i}|-|C

_{i}

_{+1}|. S3 will also be performed |R| times with time complexity of $O\left({{\displaystyle \sum}}_{i=1}^{\left|R\right|}\left|{U}_{i}\right|\left(i+\left|{C}_{i}\right|\right)\right)$.

_{|C|}}, even the speed of O(|C||U|) is possible. In the worst case where R = C, the time complexity is $O\left(\left|U\right|\left|C\right|+{{\displaystyle \sum}}_{i=1}^{\left|C\right|}\left|{U}_{i}\right|\left(i+\left|{C}_{i}\right|\right)+{{\displaystyle \sum}}_{i=1}^{\left|C\right|}\left|{U}_{i}\right|\left|{C}_{i}\right|\right)$. Considering R is the output of FPRA, the time complexity is treated as $O(\left|U\right|\left|C\right|+{{\displaystyle \sum}}_{i=1}^{\left|C\right|}\left|{U}_{i}\right|\left(i+2\left|{C}_{i}\right|\right)$. The time complexity of FPRA is considerably less than those of traditional algorithms, which has a time complexity of O(|U|

^{2}|C|

^{2}) [2,27]. To stress the advantage of Algorithm 4, some excellent reduction algorithms are compared and listed in Table 5.

_{i}of the algorithm in [1] is different from U

_{i}of FPRA. This means that it is hard to compare the efficiencies of the two algorithms (algorithm in [1] and FPRA) by the time complexity in Table 5. The related experiments in Section 5 will propose the more effective evidence to represent the advantage of FPRA.

- 1.
- FPRA is dependent on an efficient sort function.

^{2})), Shell sort (O(nlogn)), Merge sort (O(nlogn)), Quick sort (O(nlogn)), etc., are not suitable for FPRA because of the limit of O(nlogn). Instead, bucket sort algorithms are considered because their time complexities below O(nlogn). In fact, we did not pay attention to how to design a sort function because many tools or software provide the efficient sort functions. Additionally, the sortrows function or the Shuffle in MapReduce is highly recommended.

- 2.
- FPRA does not calculate any attribute significances.

- 3.
- The heuristic method of FPRA is more efficient and concise.

## 5. Experimental Results

#### 5.1. Performance Analysis

^{2}) and entered the interval of (O(|U||C|), O(|U||C|

^{2})). In order to illustrate the advantages on computational efficiency, many researchers have to apply some inexact and sealed parameters, such as U

_{i}, C

_{i}, etc., to describe the time complexities of proposed algorithms. These time complexities suffer two disadvantages.

- It is difficult to estimate the real running times from the time complexities using sealed parameters. For example, the time complexity of the fast reduction algorithm in [1] is $O(|U\left|\right|C|+{\displaystyle {\sum}_{i=1}^{\left|C\right|}\left|{U}_{i}\right|}(\left|C\right|-i+1))$. It is less than O(|U||C|
^{2}). However, there are |C| sealed parameters |U_{1}|,|U_{2}|,…,|U_{|C|}|. It is hard to estimate the exact running time. - It is difficult to compare the computational efficiencies of different reduction algorithms. First, these sealed parameters are influenced by the heuristic constructions and real data sets. They have different values for different algorithms. Second, the time complexities with these sealed parameters are always very complex, such as FSPA, etc.

_{1},U

_{2},..,U

_{|}

_{R|},C

_{1},C

_{2},…,C

_{|R|}, respectively. It is very difficult to estimate the real efficiency based on the theoretical time complexity of $\text{}O(\left|U\right|\left|C\right|+{{\displaystyle \sum}}_{i=1}^{\left|C\right|}\left|{U}_{i}\right|\left(i+2\left|{C}_{i}\right|\right)$. In order to resolve these hard problems, we suggest an approximate time complexity of FPRA, which is simple and easy to estimate the real running time. The detailed way is described as follows.

_{1}, T

_{2}, and T

_{3}are the running time with respect to the three subprocesses of S1, S2, and S3, respectively.

_{2}/T, and T

_{3}/T, where the X-coordinate represents the ten data sets in Table 7, and T

_{1}, T

_{2}, and T

_{3}are the running time with respect to the three subprocesses of S1, S2, and S3, respectively. Some important conclusions are presented as follows.

- The S3 subprocess consumed the most running time when |R|/|C| was large. For data sets (3,4,6,7), the ratios of |R|/|C| were 1, 0.75, 0.8125, and 0.8095, respectively. The related ratios of T
_{3}/T were 0.811, 0.6868, 0.8295, and 0.8894. - The S2 subprocess consumed the most running time when |R|/|C| was small. For data sets (8, 10), the ratios of |R|/|C| were 13.3% and 11.1%, respectively. The related ratios of T
_{2}/T were 44.36% and 79.34%. - The trend of T
_{3}/T was similar to that of |R|/|C|; the trend of T_{2}/T was opposite to that of |R|/|C|.

^{2}) and O(|U||C||R|). The related results are presented in Figure 4.

_{1}. In Figure 4, the ratios of T/T

_{1}varied from 3.34 to 23.4, and the average value was 8.6. As a comparison, the average value of |C| was 40.4. Obviously, the time complexity of FPRA was considerably less than O(|U||C|

^{2}). The average value of |R| was 15.5, which was slightly more than that of the ratios of T/T

_{1}.

_{3}/T > 50%) when |R|/|C| > 40%. S2 consumed the most time (T

_{2}/T > 50%) when |R|/|C| < 20%. In all of the subfigures, it was easy to determine that the trend of T

_{2}/T was opposite to those of T

_{3}/T. These features show that the real running time had a tight relationship with |R|.

^{2}). There were 46 data sets that had |R| > T/T

_{1}. The other 14 data sets satisfied the condition that |R| < T/T

_{1}. The average value of |C| for the 60 data sets was 45.5. As a comparison, the average values of T/T

_{1}and |R| of the 60 data sets were 9.2252 and 15.3, respectively. In particular, in shuttle_all, ipums97, and ipums99 data sets, the curves of |R| and T/T

_{1}were very similar.

^{2}). It is noted that O(|U||C||R|) was an experimental result, not a theoretical result.

#### 5.2. Comparison Experiments

**Experiment**

**1.**

**Experiment**

**2.**

^{2}|C|

^{2}). Algorithm OADM (optimized ADM) is an optimized fast reduction algorithm proposed in [38], which has the complexity of O(|C|

^{2}|U|log|U|).

**Experiment**

**3.**

## 6. Conclusions

_{1}of Table 7 lists the exact times on sorting the original data and constructing a PR-SADT.

^{2}).

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Qian, Y.H.; Liang, J.Y.; Pedrycz, W.; Dang, C.Y. Positive approximation: An accelerator for attribute reduction in rough set theory. Artif. Intell.
**2010**, 174, 597–618. [Google Scholar] [CrossRef] [Green Version] - Hu, Q.H.; Liu, J.F.; Yu, D.R. Mixed feature selection based on granulation and approximation. Knowl. Based Syst.
**2008**, 21, 294–304. [Google Scholar] [CrossRef] - Wang, G.Y.; Ma, X.A.; Yu, H. Monotonic uncertainty measures for attribute reduction in probabilistic rough set model. Int. J. Approx. Reason.
**2015**, 59, 41–67. [Google Scholar] [CrossRef] - Chang, S. A novel attribute reduction method based on rough sets and its application. Int. J. Adv. Comput. Technol.
**2012**, 4, 99–104. [Google Scholar] - Hu, Q.H.; Yu, D.R.; Xie, Z.X. Neighborhood classifiers. Expert Syst. Appl.
**2008**, 34, 866–876. [Google Scholar] [CrossRef] - Liang, J.; Wang, F.; Dang, C.; Qian, Y. An efficient rough feature selection algorithm with a multi-granulation view. Int. J. Approx. Reason.
**2012**, 53, 912–926. [Google Scholar] [CrossRef] [Green Version] - Nie, S.Z.; Wang, Z.; Pujia, W.; Nie, Y.; Lu, P. Big data prediction of durations for online collective actions based on peak’s timing. Phys. A-Stat. Mech. Appl.
**2018**, 492, 138–154. [Google Scholar] [CrossRef] - Skowron, A.; Jankowski, A.; Swiniarski, R. 30 Years of Rough Sets and Future Perspectives, Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing; Springer: Berlin/Heidelberg, Germany, 2013; pp. 1–10. [Google Scholar]
- Zong, F.; Tian, Y.D.; He, Y.N.; Tang, J.J.; Lv, Y.Y. Trip destination prediction based on multi-day GPS data. Phys. A-Stat. Mech. Appl.
**2019**, 515, 258–269. [Google Scholar] [CrossRef] - Yin, L.; Gui, W.; Yang, C.; Wang, X.; Ling, C.X. Core set analysis in inconsistent decision tables. Inf. Sci.
**2013**, 241, 138–147. [Google Scholar] [CrossRef] - Deng, S.; Yue, D.; Fu, X.; Zhou, A.H. Security risk assessment of cyber physical power system based on rough set and gene expression programming. IEEE/CAA J. Autom. Sin.
**2015**, 2, 431–439. [Google Scholar] - Hu, Q.; Xie, Z.; Yu, D. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognit.
**2007**, 40, 3509–3521. [Google Scholar] [CrossRef] - Yang, M.; Yang, P. Algorithms based on general discernibility matrix for computation of a core and attribute reduction. Control Decis.
**2008**, 23, 1049–1054. [Google Scholar] - Yao, Y.Y.; Zhao, Y. Discernibility matrix simplification for constructing attribute reducts. Inf. Sci.
**2009**, 179, 867–882. [Google Scholar] [CrossRef] [Green Version] - Lu, Z.; Qin, Z.; Zhang, Y.; Fang, J. A fast selection approach based on rough set boundary regions. Pattern Recognit. Lett.
**2014**, 36, 81–88. [Google Scholar] [CrossRef] - Qian, Y.; Liang, J. Combination entropy and combination granulation in rough set theory. Int. J. Uncertain. Fuzziness Knowl. Based Syst.
**2008**, 16, 179–193. [Google Scholar] [CrossRef] - Wang, C.; Ou, F.F. An attribute reduction algorithm based on conditional entropy and frequency of attributes. In Proceedings of the International Conference on Intelligent Computation Technology and Automation, Changsha, China, 20–22 October 2008; pp. 752–756. [Google Scholar]
- Wang, G.Y.; Zhao, J.; An, J.J.; Wu, Y. A comparative study of algebra viewpoint and information viewpoint in attribute reduction. Fundam. Inform.
**2005**, 68, 289–301. [Google Scholar] - Qian, J.; Miao, D.; Zhang, Z.; Yue, X. Parallel attribute reduction algorithms using MapReduce. Inf. Sci.
**2014**, 279, 671–690. [Google Scholar] [CrossRef] - Shu, W.H.; Qian, W.B. An incremental approach to attribute reduction from dynamic incomplete decision systems in rough set theory. Data Knowl. Eng.
**2015**, 100, 116–132. [Google Scholar] [CrossRef] - Yin, L.Z.; Yang, C.H.; Wang, X.L.; Gui, W.H. An incremental algorithm for attribute reduction based on labeled discernibility matrix. Acta Autom. Sin.
**2014**, 40, 397–403. [Google Scholar] - Yao, Y.Y. Duality in rough set theory based on the square of opposition. Fundam. Inform.
**2013**, 127, 49–64. [Google Scholar] [CrossRef] - Chen, Y.M.; Miao, D.Q.; Wang, R.Z. A rough set approach to feature selection based on ant colony optimization. Pattern Recognit. Lett.
**2010**, 31, 226–233. [Google Scholar] [CrossRef] - Yang, P.; Li, J.; Huang, Y. An attribute reduction algorithm by rough set based on binary discernibility matrix. In Proceedings of the Fuzzy Systems and Knowledge Discovery, Jinan, China, 18–20 October 2008; pp. 276–280. [Google Scholar]
- Xu, Z.Y.; Yang, B.R.; Song, W. Quick computing core algorithm based on discernibility matrix. Comput. Eng. Appl.
**2006**, 42, 4–6. [Google Scholar] - Yang, M.; Sun, Z.H. Improvement of discernibility matrix and the computation of a core. J. Fudan. Univ.
**2004**, 43, 865–868. [Google Scholar] - Xu, Z.Y.; Shu, W.H.; Qian, W.B.; Yang, B.Y. Quick algorithm for computing core of the positive region based on order relation. Comput. Sci.
**2010**, 37, 208–211. [Google Scholar] - Liu, S.H.; Sheng, Q.J.; Wu, B.; Shi, Z.; Hu, F. Research on efficient algorithms for rough set methods. Chin. J. Comput.
**2003**, 26, 524–529. [Google Scholar] - Shen, J.; Lv, Y. A rapid algorithm for reduction based on positive region attribute significance. In Proceedings of the Electrical and Control Engineering (ICECE), 2010 International Conference on, Wuhan, China, 25–27 June 2010; pp. 4940–4943. [Google Scholar]
- Xu, Z.Y.; Liu, Z.P.; Yang, B.R.; Song, W. A Quick Attribute reduction algorithm with complexity of max (O(|C||U|,O(|C|
^{2}|U/C|))). Chin. J. Comput.**2006**, 29, 391–399. [Google Scholar] - Zhang, J.; Zhang, X.Y.; Xu, W.H. Lower approximation reduction based on discernibility information tree in inconsistent ordered decision information systems. Symmetry
**2018**, 10, 696. [Google Scholar] [CrossRef] [Green Version] - Zhao, Y.; Yao, Y.Y.; Luo, F. Data analysis based on discernibility and indiscernibility. Inf. Sci.
**2007**, 177, 4959–4976. [Google Scholar] [CrossRef] - Yin, L.Z.; Yang, C.H.; Wang, X.L.; Gui, W.-H. Reduction method based on attribute repulsion matrix. Control Decis.
**2013**, 28, 434–438. [Google Scholar] - Witold, P. Granular computing for data analytics: a manifesto of human-centric computing. IEEE/CAA J. Autom. Sin.
**2018**, 5, 1025–1034. [Google Scholar] - Qian, J.; Miao, D.; Zhang, Z.; Li, W. Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation. Int. J. Approx. Reason.
**2011**, 52, 212–230. [Google Scholar] [CrossRef] [Green Version] - Yao, Y.Y.; Zhao, Y.; Wang, J. On reduct construction algorithms. In Rough Sets and Knowledge Technology; Springer: Berlin/Heidelberg, Germany, 2006; pp. 297–304. [Google Scholar]
- Jehad, A.; Rami, M. An enhancement of major sorting algorithms. Int. Arab J. Inf. Technol.
**2010**, 7, 55–62. [Google Scholar] - Meng, Z.; Shi, Z. A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets. Inf. Sci.
**2009**, 179, 2774–2793. [Google Scholar] [CrossRef] - Qian, Y.; Liang, J.; Pedrycz, W.; Dang, C. An efficient accelerator for attribute reduction from incomplete data in rough set framework. Pattern Recognit.
**2011**, 44, 1658–1670. [Google Scholar] [CrossRef] - Li, M.; Shang, C.; Feng, S.; Fan, J. Quick attribute reduction in inconsistent decision tables. Inf. Sci.
**2014**, 254, 155–180. [Google Scholar] [CrossRef] - Song, M.; Wu, Y.F. Handbook of Research on Text and Web Mining Technologies; IGI Global: Hershey, PA, USA, 2009; Chapter XLIV; pp. 766–784. [Google Scholar]

a_{1} | a_{2} | a_{3} | a_{4} | a_{5} | d | |
---|---|---|---|---|---|---|

x_{1} | 0 | 0 | 0 | 1 | 1 | 0 |

x_{2} | 0 | 0 | 0 | 1 | 1 | 1 |

x_{3} | 0 | 0 | 0 | 1 | 1 | 1 |

x_{4} | 0 | 0 | 1 | 0 | 1 | 0 |

x_{5} | 0 | 0 | 1 | 0 | 1 | 1 |

x_{6} | 0 | 0 | 1 | 1 | 1 | 0 |

x_{7} | 0 | 0 | 1 | 1 | 1 | 1 |

x_{8} | 1 | 0 | 1 | 1 | 1 | 0 |

x_{9} | 1 | 0 | 1 | 1 | 1 | 1 |

x_{10} | 1 | 0 | 1 | 1 | 1 | 2 |

x_{11} | 1 | 1 | 1 | 1 | 1 | 1 |

**Table 2.**Positive region (PR)-SADT corresponding to Table 1

a_{1} | a_{2} | a_{3} | a_{4} | a_{5} | d | |
---|---|---|---|---|---|---|

x_{1} | 0 | 0 | 0 | 1 | 1 | 3 |

x_{2} | 0 | 0 | 1 | 0 | 1 | 3 |

x_{3} | 0 | 0 | 1 | 1 | 1 | 3 |

x_{4} | 1 | 0 | 1 | 1 | 1 | 3 |

x_{5} | 1 | 1 | 1 | 1 | 1 | 1 |

Type of Granule Pair | Decision Value Set | Discern | |
---|---|---|---|

1 | Two exact granules | I_{d}([x]_{C}) = I_{d}([y]_{C}) | No |

2 | Two exact granules | I_{d} ([x]_{C}) ≠ I_{d} ([y]_{C}) | Yes |

3 | Two rough granules | any | No |

4 | Exact granule and rough granule | any | Yes |

Type of Object Pair | Decision Value Set | Discern | |
---|---|---|---|

1 | Two exact objects | I_{d} (x) = I_{d} (y) | No |

2 | Two exact objects | I_{d} (x)≠I_{d} (y) | Yes |

Algorithm | Time Complexity |
---|---|

FPRA in this paper | $O(\left|U\right|\left|C\right|+{{\displaystyle \sum}}_{i=1}^{\left|C\right|}\left|{U}_{i}\right|\left(i+2\left|{C}_{i}\right|\right)$ |

FSPA in [1] | $O(|U\left|\right|C|+{\displaystyle {\sum}_{i=1}^{\left|C\right|}\left|{U}_{i}\right|}(\left|C\right|-i+1))$ |

Algorithm in [38] | O(|C|^{2}|U|log|U|) |

IFSPA in [39] | $O(|C{|}^{3}\left|U\right|+{\displaystyle {\sum}_{i=1}^{\left|C\right|}({(|C|-i+1)}^{2}|{U}_{i}|+{(|C|-i+1)}^{3}|{U}_{i}|)})$ |

Algorithm in [2] | O(|U|^{2}|C|^{2}) |

Data Set | Size |U| | Attributes |C| | Classes |V_{d}| | |
---|---|---|---|---|

1 | Dermatology | 358 | 34 | 6 |

2 | Backup_large.test | 376 | 35 | 19 |

3 | Breast-cancer-wisconsin | 683 | 9 | 2 |

4 | Tic-tac-toe | 958 | 9 | 2 |

5 | Kr_vs_kp | 3196 | 36 | 2 |

6 | mushroom | 5644 | 22 | 2 |

7 | Ticdata2000 | 5822 | 85 | 2 |

8 | nursery | 12960 | 8 | 5 |

9 | Letter-recognition | 20000 | 16 | 26 |

10 | Shuttle_all | 58000 | 9 | 7 |

11 | sensorless | 58509 | 48 | 11 |

12 | Connect-4 | 67557 | 42 | 3 |

13 | Ipums.la.97 | 70187 | 60 | 10 |

14 | Ipums.la.99 | 88443 | 60 | 10 |

15 | covertype | 581012 | 54 | 7 |

Data Set | |U| | |R|/|C| | Time of S1 T_{1}(s) | Time of S2 T_{2}(s) | Time of S3 T_{3} (s) | Total Time T(s) | |
---|---|---|---|---|---|---|---|

1 | mushroom | 5644 | 7/22 | 0.047 | 0.031 | 0.079 | 0.157 |

2 | Ticdata2000 | 5822 | 24/85 | 0.078 | 0.251 | 0.624 | 0.953 |

3 | nursery | 12960 | 8/8 | 0.062 | 0 | 0.266 | 0.328 |

4 | Letter-recognition | 20000 | 12/16 | 0.109 | 0.062 | 0.375 | 0.546 |

5 | Shuttle_all | 58000 | 4/9 | 0.235 | 0.078 | 0.516 | 0.829 |

6 | sensorless | 58509 | 39/48 | 0.828 | 0.592 | 6.907 | 8.327 |

7 | Connect_4 | 67557 | 34/42 | 0.719 | 1.143 | 14.967 | 16.829 |

8 | Ipums.la.97 | 70187 | 8/60 | 0.750 | 1.657 | 1.328 | 3.735 |

9 | Ipums.la.99 | 88443 | 13/60 | 0.906 | 2.311 | 2.424 | 5.641 |

10 | covertype | 581012 | 6/54 | 4.140 | 39.25 | 6.078 | 49.468 |

Data Sets | |U| | |C| | PR | FSPA-PR | FPRA | |||
---|---|---|---|---|---|---|---|---|

Time (s) | |R| | Time (s) | |R| | Time (s) | |R| | |||

Dermatology | 358 | 34 | 0.8438 | 10 | 0.4375 | 10 | 0.016 | 11 |

Backup_large.test | 376 | 35 | 0.6563 | 10 | 0.4219 | 10 | 0.016 | 9 |

Breast-cancer-wisconsin | 683 | 9 | 0.1250 | 4 | 0.0938 | 4 | 0.016 | 5 |

Tic-tac-toe | 958 | 9 | 0.3594 | 8 | 0.3125 | 8 | 0.031 | 8 |

Kr_vs_kp | 3196 | 36 | 28.0313 | 29 | 21.5781 | 29 | 0.407 | 29 |

Mushroom | 5644 | 22 | 24.875 | 3 | 20.4531 | 3 | 0.157 | 7 |

Ticdata2000 | 5822 | 85 | 886.4531 | 24 | 296.375 | 24 | 0.953 | 24 |

Letter-recognition | 20000 | 16 | 282.6406 | 11 | 112.625 | 11 | 0.546 | 8 |

Shuttle_all | 58000 | 9 | 906.0625 | 4 | 712.25 | 4 | 0.829 | 4 |

**Table 9.**Comparison results with the fast algorithms in [38].

Data Sets | |U| | |C| | Running Time(s) | ||
---|---|---|---|---|---|

ADM | OADM | FPRA | |||

Voting records | 435 | 16 | 1.375 | 0.171 | 0.015 |

Breast Cancer Wisconsin | 683 | 9 | 2.437 | 0.093 | 0.016 |

Tic-tac-toe | 958 | 9 | 4 | 0.136 | 0.031 |

Kr-vs-kp | 3196 | 36 | 79.719 | 6.169 | 0.407 |

nursery | 12960 | 8 | 1032.25 | 10.312 | 0.328 |

**Table 10.**Comparison results with Q-ARA in [40].

Data Sets | Objects |U| | Attributes |C| | Classes |V _{d}| | Running Time(s) | |
---|---|---|---|---|---|

Q-ARA | FPRA | ||||

waveform | 5000 | 21 | 3 | 16.466 | 0.156 |

Wine recognition | 178 | 13 | 3 | 0.182 | 0.016 |

Statlog heart | 270 | 13 | 2 | 0.275 | 0.015 |

Statlog project satellite image | 6435 | 36 | 6 | 82.812 | 0.281 |

Image segmentation | 2310 | 19 | 7 | 2.180 | 0.047 |

Pima indians diabets | 768 | 8 | 2 | 0.103 | 0.015 |

wdbc | 569 | 30 | 2 | 2.226 | 0.032 |

wpbc | 198 | 34 | 2 | 1.328 | 0 |

Sonar, mines vs. rocks | 208 | 60 | 2 | 0.312 | 0.031 |

Glass identification | 214 | 9 | 7 | 0.118 | 0 |

ionosphere | 351 | 34 | 2 | 2.211 | 0.015 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yin, L.; Jiang, Z.
A Fast Attribute Reduction Algorithm Based on a Positive Region Sort Ascending Decision Table. *Symmetry* **2020**, *12*, 1189.
https://doi.org/10.3390/sym12071189

**AMA Style**

Yin L, Jiang Z.
A Fast Attribute Reduction Algorithm Based on a Positive Region Sort Ascending Decision Table. *Symmetry*. 2020; 12(7):1189.
https://doi.org/10.3390/sym12071189

**Chicago/Turabian Style**

Yin, Linzi, and Zhaohui Jiang.
2020. "A Fast Attribute Reduction Algorithm Based on a Positive Region Sort Ascending Decision Table" *Symmetry* 12, no. 7: 1189.
https://doi.org/10.3390/sym12071189