A Fast Attribute Reduction Algorithm Based on a Positive Region Sort Ascending Decision Table

Attribute reduction is one of the challenging problems in rough set theory. To accomplish an efficient reduction algorithm, this paper analyzes the shortcomings of the traditional methods based on attribute significance, and suggests a novel reduction way where the traditional attribute significance calculation is replaced by a special core attribute calculation. A decision table called the positive region sort ascending decision table (PR-SADT) is defined to optimize some key steps of the novel reduction method, including the special core attribute calculation, positive region calculation, etc. On this basis, a fast reduction algorithm is presented to obtain a complete positive region reduct. Experimental tests demonstrate that the novel reduction algorithm achieves obviously high computational efficiency.


Introduction
Due to the development of data collection technology, more objects and attributes are stored. However, storing and processing all attributes could be very expensive and impractical computationally [1]. To address this issue, it is necessary to omit several attributes that will not seriously impact the resulting classification (recognition) error, cf. [2]. In rough set theory, an important method is emphasized to solve this problem and is referred to as attribute reduction [3].
Attribute reduction is one of the most important contributions and challenges in rough set theory. It deletes redundant attributes to enhance the efficiency and accuracy of knowledge abstraction technologies, such as pattern recognition, data mining, knowledge discovery, and decision analysis [4][5][6][7][8][9]. In general, classical reduction methods are divided into three types, which are referred to as positive region reduction, boundary region reduction, and entropy based reduction, respectively [10]. The positive region reduction method ignores the discernibility relationship between rough granules [11][12][13][14]. The second type ignores the discernibility relationship between rough granules with the same decision value sets [15]. The third type ignores the discernibility relationship of rough granules with the same information entropy [16][17][18]. As a comparison, positive region reduction is the most popular and widely used, especially for dynamic data sets and big data [19][20][21].
At present, a very important challenge of attribute reduction is to design an efficient and complete algorithm. It should be noted that calculating all of the reducts is a NP (non-deterministic polynomial) hard problem [22]. Therefore, most of the fast reduction algorithms apply the heuristic construction to calculate a single reduct. A classical heuristic algorithm calculates an entire core set first and then iterates the following heuristic processes until the algorithm is finished. The heuristic processes are: calculate the attribute significances of all the attributes, select the attribute with the most attribute significance, alter the object set or discernibility matrix, and return to the next heuristic process.
To accomplish an efficient heuristic reduction algorithm, many techniques have been developed in the last twenty years. In [23,24], the researchers calculate the entire core set by analyzing all of the object pairs, and the time complexity of the core set calculation is O(|U| 2 |C|). By using the notion of information granules, several algorithms successfully reduce the object set from |U| to |U/C| and make the time complexity of the core set calculation be O(|C||U/C| 2 ) [25,26]. Xu et al. proposed a fast core set algorithm with the complexity of O(|C||U|+|C| 2 |U/C|) [27]. At the same time, many formulas or methods were proposed to calculate the different types of attribute significances. Some classical formulas are designed based on the positive region [28][29][30], entropy [3,[16][17][18], the discernibility ability of attributes [13,14,24,31,32], the relationship between attributes [33], etc. In addition, many researchers proposed the mixed formulas by combining rough set theory and other theories, such as fuzzy set [12], ant colony optimization [23], granular computing [2,6,16,34], etc.
Although the efficiencies of traditional heuristic methods are optimized by the existing techniques, there are still some problems unresolved. First, the computation of attribute significance is inefficient. As a common feature, the formula of attribute significance would run (2|C|−|R|+1)×|R|/2 times if the addition construction was adopted, or (|C|+|R|+1|)×(|C|−|R|)/2 times if the deletion construction was adopted. These repeated calculations on attribute significance consume some running time. Second, when many attributes have the same significance, one randomly selects any one in general. However, a different subset of the selected attributes may make a great difference in classification accuracy [35].
To address these problems, this paper proposes a novel heuristic method. It applies a special core attribute calculation to replace the traditional attribute significance calculation. In detail, the new method only iterates the following heuristic processes, which are described as: calculate a single relative core attribute, alter the decision table, and return the next heuristic process, respectively. The new method is of simple structure, and includes three important features. First, it abandons the notion of attribute significance. Second, it only calculates a single core attribute in each heuristic processes. Third, each conditional attribute is checked at most once.
In order to realize the new method efficiently, some definitions and technologies are suggested. First, we define a positive region sort ascending decision table (PR-SADT), shown as Definition 1 and Algorithm 1. Next, a special core calculation algorithm is proposed (shown as Algorithms 2 and 3), which not only calculates a core attribute quickly, but also deletes some redundant column data. Besides, the traditional positive region calculation algorithm is also optimized based on PR-SADT (shown as Algorithm 4). These technologies are essential to achieve a fast reduction algorithm as in Algorithm 5.
The remainder of this paper is structured as follows. Some basic concepts are briefly reviewed in Section 2, which include attribute reduction and the positive region. In Section 3, the positive region sort ascending decision table is defined, and some related properties are discussed. In Section 4, we propose the reduction algorithm based on PR-SADT and analyze the advantages of the novel algorithm. Section 5 presents some numerical experiments to validate the efficiency of the proposed algorithm. Finally, we conclude this paper and discuss the outlook for further work in Section 6.

Preliminaries
In rough set theory, data are represented in an information table where a set of objects is described by using a finite set of attributes. An information table S is represented as the following tuple: where U is the universe of objects, At is a finite non-empty set of attributes, Va is the set of values of attribute a, and Ia:U →Va is an information function that maps an object of U to exactly one value in Va. As a special type, the information table S is also referred to as a decision table if At = C ∪ D, where C = {a 1 ,a 2 , . . . ,a n } is the condition attribute set and D = {d} is the decision attribute set.  The decision table is considered to be inconsistent if two objects with the same condition values have  different decision values. For example, Table 1 is a classical inconsistent decision table.   Table 1. A classical inconsistent sort ascending decision table (SADT).
Given a subset of attributes B ⊆ C, a symmetric indiscernibility relationship IND(B) is defined as follows: IND(B) = (x, y) ∈ U × U ∀a ∈ B, I a (x) = I a (y)) . The equivalence class (or granule) of an object x with respect to C is as follows: [x] C = {y∈U|(x,y)∈IND(C)}. The union of all of the granules with respect to C is referred to as a partition of the universe, which is described as: exact if it has one decision value; otherwise, it is rough. The union of all of the exact granules with respect to C is referred to as the positive region.
Given an information table S, an attribute set R is called a reduct if and only if it satisfies the following two conditions: A reduct is a subset of attributes that are jointly sufficient and individually necessary to represent the equivalent knowledge with the attribute set C [14]. In general, there are several reducts for an information table. The set of reducts is referred to as RED(S), and the intersection of all reducts is the core set, which is described as: Core(S) = ∩RED(S). The core attributes are so important that they should be added into the results for addition and addition-deletion construction methods and should not be deleted in the heuristic steps for the deletion construction method [36].

Positive Region Sort Ascending Decision Table and Its Properties
In this section, we defined a sort ascending decision table (SADT) and a positive region SADT and investigated some important properties. These definitions and properties are important to optimize the novel attribute reduction algorithm.

SADT
In general, a data set is arrayed in two ways: sort ascending or sort descending. They are both effective for the proposed algorithm in this paper. For convenience, only sort ascending is discussed here.
is referred to as a sort ascending decision table (SADT) if and only if it satisfies the following conditions: where B m = {a 1 ,a 2 , . . . ,a m }.
All of the objects in a SADT are sorted based on the ordered condition attribute set C. The default significance is: a 1 >a 2 > . . . >a |C| . In real applications, the order of condition attributes would be adjusted based on prior knowledge. For example, if the test costs of condition attributes are referenced, one makes the cheap attributes in advance for calculating a reduct with a lower cost.
The SADT is easily realized by sort functions or algorithms [37], such as Bubble Sort, Selection Sort, Insertion Sort, Shell Sort, Merge Sort, Quick Sort, Heap Sort, Counting Sort, Bucket Sort, Radix Sort, etc. However, in order to obtain a fast reduction algorithm, these sort algorithms with linear time complexity of O(|U||C|), such as Counting Sort, Bucket Sort, and the algorithm in [30], are only suggested. It is noted that we did not discuss how to design a fast sort algorithm. Additionally, we suggest the sortrows function to construct a SADT. The code is listed as follows.

Property 1. Given an attribute set
These properties show that the objects in a granule with respect to C or B m are adjacent physically. It is thus easy to discern the repeat objects and U/C.

PR-SADT
Since only positive region reducts were discussed in this paper, a positive region SADT (PR-SADT) was defined to replace SADT if the original decision table was inconsistent.    The repeated objects are not valuable for the reduction algorithms based on the positive region. Instead, they add the running time and the space requirement. Thus, it is necessary to delete the repeated objects. A fast algorithm for constructing a PR-SADT without repeated objects is described as Algorithm 1. If Delete object x k ; 6: end 7: end 8: end Algorithm 1 only compares the adjacent objects. The time complexity is O(|U||C|). In the next sections, we only discussed the PR-SADT calculated by Algorithm 1. Namely, the PR-SADT is defaulted as a decision table without any repeated objects for convenience. [10] is listed to show the difference between a SADT and a PR-SADT calculated by Algorithm 1. The original data set is sorted in ascending order and is presented in Table 1.  11 }, and only the last granule {x 11 } is exact. The corresponding PR-SADT calculated by Algorithm 1 is presented in Table 2. Table 2. Positive region (PR)-SADT corresponding to Table 1 a 1 a 2 a 3 a 4 a 5 d Table 2 has five objects, and there is a new decision value "3". PR-SADT also has five granules but does not have any rough granules or repeating objects.

The Reduction Algorithm Based on PR-SADT
In this section, we will discuss how to obtain a positive region reduct by using PR-SADT in theory. Next, two efficient subalgorithms are proposed. Finally, the complete reduction algorithm based on PR-SADT is presented.

Positive Region Reduction Method Based on PR-SADT
PR-SADT is different from the original decision table because it changes and deletes some objects. To obtain a positive region reduct of the original decision table, it is necessary to provide the related description.
In general, a positive region reduction keeps the positive region of the target decision table unchanged. Although all of the granules or objects in the positive region are exact, the rough granules or objects cannot be ignored. In [10], we noted that a positive region reduction method should satisfy the following discernibility matrix M = (m(i,j)). Matrix M illustrates the discernibility relationships corresponding to positive region reduction. To explain these relationships, we classified and listed them in Table 3. Table 3. The discernibility relationships corresponding to a positive region reduct.

Type of Granule Pair
Decision Value Set Discern Two exact granules Two rough granules any No 4 Exact granule and rough granule any Yes For the original inconsistent decision table, it is necessary to analyze the "type of granule pairs" and the "decision value set" for judging whether a granule pair should be discerned.
If the original decision table is reformed to a PR-SADT by using Algorithm 1, all of the rough granules in the original decision table are changed to exact granules with the new decision value d new . This means that the third type is changed to the first type. In a similar way, the fourth type of the granule pair is changed to the second type.
In conclusion, the discernibility relationships corresponding to positive region reduction in PR-SADT are described in Table 4. Table 4. The discernibility relationships corresponding to positive region reduction in PR-SADT.

Type of Object Pair Decision Value Set Discern
Two exact objects It is worth noting that there are no rough granules or repeating objects in a PR-SADT calculated by Algorithm 1. Each granule in a PR-SADT only has one object. Therefore, the object pair is used to judge the discernibility relationship for convenience. In Table 4, there are only two items, which are less than those in Table 3, and only the "decision value set" is necessary.
Based on Table 4, we gave a new definition on the positive region reduct, which is described as follows.
Definition 3. Let S p be a PR-SADT without repeated objects. An attribute set R⊆C is called a positive region reduct if and only if R satisfies the following two conditions: The first condition ensures the discernibility relationship corresponding to an unchanged positive region reduct. This means that each object pair with different decision values in PR-SADT should be discerned. The second condition means that each attribute in a reduct is necessary. They are jointly sufficient and individually necessary to represent a positive region reduct if a PR-SADT is constructed.

Fast Core Attribute Calculation Based on PR-SADT
In this section, a special core attribute calculation algorithm is presented for the novel heuristic reduction method. Theorem 1. Let a PR-SADT without repeated objects S p and the last conditional attributea n ∈ C, If a n is a core attribute, then ∃x k ∈ U, which satisfies the conditions: (x k ,x k+1 ) ∈ IND(B),I a n (x k ) < I a n (x k+1 ), Considering that S p is a PR-SADT, it also has I a n (x k ) < I a n (x k+1 ).
Theorem 1 shows three necessary conditions on the last condition attribute a n . At the same time, the conditions (x k ,x k+1 )∈ IND(B), I a n (x k ) < I a n (x k+1 ) mean that the attribute a n is the unique attribute that discerns the object pair (x k , x k+1 ), and I d (x k ) I d (x k+1 ) means that the object pair should be discerned according to Definition 3. Hence, the three conditions in Theorem 1 are also sufficient to check whether a n is a core attribute or not. Based on the above conclusion, an algorithm is given as follows.
If flag = 1, then the last condition attribute is a core attribute. In the worst case, Algorithm 2 iterates through the data set and has a time complexity of O(|U||C|).
The output of Algorithm 2 has two possibilities. If the last condition attribute is not a core attribute (flag = 0), one can efficiently check a core attribute by applying Theorem 2, which is described as follows. Algorithm 2. Check the last condition attribute a n .

Input: a PR-SADT
flag = 1 and return 6: End 7: End 8: end Theorem 2. Suppose S 1 is the new decision table when the last column data of a PR-SADT S p is deleted. If a n Core S p , then RED(S 1 ) ⊆RED(S p ) and RED(S 1 ) ∅.
Proof. Let RED(S p )= R 1 ∪R 2 , where R 2 is the set of reducts that includes the last condition attribute a n . Owing to a n Core S p , R 1 ∅. According to the relationship between S p and S 1 , it has R 1 = RED(S 1 ). Thus, RED(S 1 ) ⊆RED(S p ) and RED(S 1 ) ∅.
Theorem 2 shows that the column data corresponding to the last condition attribute is redundant for a heuristic reduction algorithm if a n is not a core attribute. Namely, it is effective for obtaining a reduct of the original decision table based on S 1 because RED(S 1 ) ⊆RED(S p ) and RED(S 1 ) ∅. To reduce the running time of all of the remaining heuristic steps, it is necessary to delete the data of column a n .
It is worth noting that it is impossible to obtain a reduct including a n if the last column data is deleted. This shortcoming is acceptable because only one reduct is required in a heuristic reduction algorithm. Algorithm 3 has several special features. First, it only calculates a single core attribute. Second, it deletes some redundant column data. Third, the output of Algorithm 3 is a relative core attribute. In other words, owing to some redundant column data have been deleted in Algorithm 3, the output is just a core attribute of S 1 . Considering RED(S 1 ) ⊆RED(S p ), it has Core S p ⊆ Core(S 1 ). Thus, the output may not be a core attribute of the original decision table S p .
The time complexity is dependent on the number of redundant condition attributes. In the worst case that the output is a 1 , the time complexity is O(|U||C| 2 /2). The more exact analysis on time complexity is shown in Section 4.4. Algorithm 3. The special core attribute calculation algorithm.
Input: a PR-SADT Output: a core attribute 1: Step 1: check the last condition attribute by Algorithm 2. 2: Step 2: if flag = 0, then delete the data corresponding to the last condition attribute and jump to step1; else step3. 3: Step 3: output the last condition attribute

Fast Positive Region Calculation Based on PR-SADT
In this section, a fast method based on PR-SADT is presented to calculate the positive region with respect to attribute set R.

Proof.
In a PR-SADT, the objects in a granule with respect to attribute set R are adjacent.
Theorem 3 illustrates a simple way to discern the positive region with respect to R. The related algorithm is described as follows.
Algorithm 4 calculates the positive region with respect to R by scanning a PR-SADT once. The time complexity is O(|U||R|), where |R|≤|C|. As a contrast, the time complexity of a classical positive region calculation is O(|U| 2 |C|). The positive region calculation algorithm in [29] has the complexity of O(|U||C| 2 ). In [32], the complexity of calculating the positive region is O(|U||C|log|U|).
;//prepare for the next granule end 3: Step 3: record the last exact granule.
If flag==0 PR=PR ∪ gra;// record the last exact granule end 4: Step 4: output PR. Example 2. According to Algorithm 4, the positive region of the PR-SADT in Table 2 is calculated by the following process.

Example 2. According to Algorithm 4, the positive region of the PR-SADT in
Suppose R = {a1,a2}. In step 1, PR=∅, gra = {x1}, flag =0. In step 2, these parameters were calculated in Figure 1.  In step 3, the last object x 5 is added into PR. Finally, output the positive region

The Attribute Reduction Algorithm Based on PR-SADT
The fast positive region reduction algorithm based on PR-SADT (FPRA) was proposed as Algorithm 5, and the related flow chart is described as Figure 2. In step 3, the last object x5 is added into PR. Finally, output the positive region } , , ,

The Attribute Reduction Algorithm Based on PR-SADT
The fast positive region reduction algorithm based on PR-SADT (FPRA) was proposed as Algorithm 5, and the related flow chart is described as Figure 2.   Analysis on the completeness of FPRA: FPRA satisfies two key features. First, it adopts the reduct construction by deletion. Second, each attribute in R is a core attribute with respect to the related heuristic steps. Thus, R is a complete reduct. The detail proof is described as follows. Considering any attribute a i ∈ R, there is a object pair (x k ,x k+1 ), which satisfies the conditions according to step 3 in FPRA: (x k ,x k+1 )∈ IND(B), I a n (x k ) < I a n (x k+1 ), and I d (x k ) I d (x k+1 ), where B=R i ∪{a 1 ,a 2 , . . . ,a i−1 }, R i = a j ∈ R j > i . This means that the object pair (x k ,x k+1 ) cannot be discerned by B. At the same time, owing to R − {a i } ⊆ B, it is concluded that the object pair (x k ,x k+1 ) cannot be discerned by R-{a i }. However, the object pair can be discerned by R according to Algorithm 5. Thus, attribute a i is essential for attribute set R.
In conclusion, the attributes of R are jointly sufficient and individually necessary for the original data set. Thus, R is a complete reduct.
Considering an original decision table, one adopts the algorithm in [27,30] to construct a PR-SADT with the time complexity of O(|U||C|). However, the real running times of algorithms in [27,30] are dependent on the good programming style or habit. In the related experimental section, we apply the sortrows function to sort a decision table.
Step 2 is accomplished by Algorithm 1, and the time complexity is O(|U||C|). Thus, the time complexity of the S1 subprocess is O(|U||C|).
In the next steps, the number of object sets and condition attribute sets are different in each heuristic process. The S2 process (step 3->step 4) deletes some related columns of data set, and the S3 process (step 5->step 6) rearranges the PR-SADT and deletes the related positive regions (some rows of data set). These two subprocesses reduce |U| and |C| and are highly efficient in optimizing the time complexity of FPRA.
Let U i and C i represent the object set and condition attribute set of the ith heuristic process, respectively. It has U 1 ⊃ U 2 ⊃ . . . ⊃ U k−1 ,C 1 ⊇ C 2 ⊇ . . . ⊇ C k−1 , where k = |R| is the number of attributes in reduct R, C 1 = C, and |U 1 | = |U/C|.
Step 3 is calculated with Algorithm 2, and the time complexity is O(|U i ||C i |).
Step 5 sorts the decision table, and the complexity of the ith heuristic process is O(|U i ||C i |). The time complexity of step 6 includes two parts. One comes from Algorithm 4 and is represented as O(i |U i |), where i is the number of attributes of R for the ith heuristic process. The other part originated by deleting the positive region, and it also has a time complexity of O(i |U i |).
In Algorithm 5, S2 subprocess will be performed |R| times and thus has a time complexity of ) . In the best case where R = {c |C| }, even the speed of O(|C||U|) is possible. In the worst case where R = C, the time complexity is O |U||C| The time complexity of FPRA is considerably less than those of traditional algorithms, which has a time complexity of O(|U| 2 |C| 2 ) [2,27]. To stress the advantage of Algorithm 4, some excellent reduction algorithms are compared and listed in Table 5. Table 5. Time complexity description.

FPRA in this paper
Obviously, the time complexity of FPRA is less than those of the algorithms in [2,38,39]. It is worth noting that the U i of the algorithm in [1] is different from U i of FPRA. This means that it is hard to compare the efficiencies of the two algorithms (algorithm in [1] and FPRA) by the time complexity in Table 5. The related experiments in Section 5 will propose the more effective evidence to represent the advantage of FPRA.
Analysis on the characteristic of FPRA: To summarize, FPRA is complete and efficient. It has the following important features and advantages.

1.
FPRA is dependent on an efficient sort function.
FPRA just repeats a simple procedure: sort->compare->delete. Only the most efficient sort function is considered in FPRA. Thus, all the comparisons sort algorithms, such as Bubble sort (O(n 2 )), Shell sort (O(nlogn)), Merge sort (O(nlogn)), Quick sort (O(nlogn)), etc., are not suitable for FPRA because of the limit of O(nlogn). Instead, bucket sort algorithms are considered because their time complexities below O(nlogn). In fact, we did not pay attention to how to design a sort function because many tools or software provide the efficient sort functions. Additionally, the sortrows function or the Shuffle in MapReduce is highly recommended.
Most of traditional heuristic attribute reduction algorithms would provide a simple or complex definition to calculate attribute significances of all the condition attributes. No matter how simple the definition is, it is necessary to calculate and compare the significances of all the attributes and select the most significant attribute. This calculation process on significance would be run (2|C|-|R|+1) ×|R|/2 times if the addition construction was adopted, or (|C|+|R|+1|)×(|C|-|R|)/2 times if the deletion construction was adopted. As a comparison, the special core attribute calculation in FPRA only would be run |C| times.

3.
The heuristic method of FPRA is more efficient and concise.
The traditional heuristic algorithms include two kinds of calculation: the entire core set calculation before the heuristic process and attribute significance calculation in heuristic processes, respectively. As a comparison, FPRA only has a kind of calculation: core attribute calculation in heuristic processes. In detail, FPRA calculates a single core attribute in each heuristic process, while the traditional algorithms have to calculate the attribute significances of all the existed condition attributes.

Experimental Results
In this section, we will evaluate the proposed approach (FPRA) based on several data sets from the UCI (University of California, Irvine) Repository [1,38,40,41]. The related works include performance analysis and comparison tests. All of the experiments on FPRA were conducted using a PC with Inter(R) CPU G645, 2.9 GHz and 1.81 GB memory.
Some data sets from the UCI Repository were used in the experiments as outlined in Table 6. There are some data sets with missing values, such as Mushroom and Breast-cancer-wisconsin. For uniform treatment of all data sets, we replaced the missing values with a new value that did not appear in the original data set. Some data sets, such as sensorless, were transformed into discrete data sets by a simple uniform discretization algorithm. Specifically, each of the related continuous columns was divided into 10 equal intervals.

Performance Analysis
At present, the time complexities of fast reduction algorithms have beyond O(|U||C| 2 ) and entered the interval of (O(|U||C|), O(|U||C| 2 )). In order to illustrate the advantages on computational efficiency, many researchers have to apply some inexact and sealed parameters, such as U i , C i , etc., to describe the time complexities of proposed algorithms. These time complexities suffer two disadvantages.

1.
It is difficult to estimate the real running times from the time complexities using sealed parameters. For example, the time complexity of the fast reduction algorithm in [1] is ) . It is less than O(|U||C| 2 ). However, there are |C| sealed parameters |U 1 |,|U 2 |, . . . ,|U |C| |. It is hard to estimate the exact running time.

2.
It is difficult to compare the computational efficiencies of different reduction algorithms. First, these sealed parameters are influenced by the heuristic constructions and real data sets. They have different values for different algorithms. Second, the time complexities with these sealed parameters are always very complex, such as FSPA, etc.
In this paper, the proposed algorithm FPRA is also confused by the above hard problems. It has some sealed parameters, which are U 1 ,U 2 ,..,U |R| ,C 1 ,C 2 , . . . ,C |R| , respectively. It is very difficult to estimate the real efficiency based on the theoretical time complexity of O(|U||C| + |C| i = 1 |U i |(i + 2|C i |). In order to resolve these hard problems, we suggest an approximate time complexity of FPRA, which is simple and easy to estimate the real running time. The detailed way is described as follows.
We will record the real running times of three subprocesses of FPRA and analyze the features of subprocesses. On the basis, an experimental model of time complexity is suggested.
Some classical data sets in UCI are applied to test the related running time and the experimental results are listed in Table 7, where T 1 , T 2 , and T 3 are the running time with respect to the three subprocesses of S1, S2, and S3, respectively.
Especially, the covertype data set was seldom reported by the existing reduction algorithms because of 581,012 objects and 54 attributes. However, FPRA could calculate this data set within only 49.468 s. This phenomenon means that FPRA was efficient to the existing reduction algorithms. Some ratios on the time consumption of subprocesses are presented in Figure 3. Especially, the covertype data set was seldom reported by the existing reduction algorithms because of 581,012 objects and 54 attributes. However, FPRA could calculate this data set within only 49.468 s. This phenomenon means that FPRA was efficient to the existing reduction algorithms. Some ratios on the time consumption of subprocesses are presented in Figure 3.  Figure 3 describes the ratios of |R|/|C|, T2/T, and T3/T, where the X-coordinate represents the ten data sets in Table 7, and T1, T2, and T3 are the running time with respect to the three subprocesses of S1, S2, and S3, respectively. Some important conclusions are presented as follows.
The above features show that the real running time was influenced by |R| as well as |U| and |C|.
Next, we compared the time complexity of FPRA with O(|U||C| 2 ) and O(|U||C||R|). The related results are presented in Figure 4.  Figure 3 describes the ratios of |R|/|C|, T 2 /T, and T 3 /T, where the X-coordinate represents the ten data sets in Table 7, and T 1 , T 2 , and T 3 are the running time with respect to the three subprocesses of S1, S2, and S3, respectively. Some important conclusions are presented as follows.

3.
The trend of T 3 /T was similar to that of |R|/|C|; the trend of T 2 /T was opposite to that of |R|/|C|.
The above features show that the real running time was influenced by |R| as well as |U| and |C|. Next, we compared the time complexity of FPRA with O(|U||C| 2 ) and O(|U||C||R|). The related results are presented in Figure 4.
The time complexity of S1 was O(|U||C|). Suppose the real time complexity of FPRA is similar to O(k|U||C|). Then, k is described as the ratio of T/T 1 . In Figure 4, the ratios of T/T 1 varied from 3.34 to 23.4, and the average value was 8.6. As a comparison, the average value of |C| was 40.4. Obviously, the time complexity of FPRA was considerably less than O(|U||C| 2 ). The average value of |R| was 15.5, which was slightly more than that of the ratios of T/T 1 .
As a result, the real time complexity of FPRA is similar to O(|U||C||R|).
To obtain more accurate experimental results, we constructed 60 data sets based on six original data sets, which were shuttle_all, sensorless, connect_4, ipums.la.97, ipums.la.99, and covertype, respectively. For each original data set, we divided it into 10 parts of equal size. The first part was regarded as the first data set, the combination of the 1st part and the 2nd part was viewed as the second data set, and the combination of all ten parts was viewed as the tenth data set.
The related ratios for the real running time of the 60 data sets are shown in Figure 5.  The time complexity of S1 was O(|U||C|). Suppose the real time complexity of FPRA is similar to O(k|U||C|). Then, k is described as the ratio of T/T1. In Figure 4, the ratios of T/T1 varied from 3.34 to 23.4, and the average value was 8.6. As a comparison, the average value of |C| was 40.4. Obviously, the time complexity of FPRA was considerably less than O(|U||C| 2 ). The average value of |R| was 15.5, which was slightly more than that of the ratios of T/T1.
As a result, the real time complexity of FPRA is similar to O(|U||C||R|).
To obtain more accurate experimental results, we constructed 60 data sets based on six original data sets, which were shuttle_all, sensorless, connect_4, ipums.la.97, ipums.la.99, and covertype, respectively. For each original data set, we divided it into 10 parts of equal size. The first part was regarded as the first data set, the combination of the 1st part and the 2nd part was viewed as the second data set, and the combination of all ten parts was viewed as the tenth data set.
The related ratios for the real running time of the 60 data sets are shown in Figure 5. In the 60 data sets in Figure 5, S3 consumed the most running time (T3/T > 50%) when |R|/|C| > 40%. S2 consumed the most time (T2/T > 50%) when |R|/|C| < 20%. In all of the subfigures, it was easy to determine that the trend of T2/T was opposite to those of T3/T. These features show that the real running time had a tight relationship with |R|.  The time complexity of S1 was O(|U||C|). Suppose the real time complexity of FPRA is similar to O(k|U||C|). Then, k is described as the ratio of T/T1. In Figure 4, the ratios of T/T1 varied from 3.34 to 23.4, and the average value was 8.6. As a comparison, the average value of |C| was 40.4. Obviously, the time complexity of FPRA was considerably less than O(|U||C| 2 ). The average value of |R| was 15.5, which was slightly more than that of the ratios of T/T1.
As a result, the real time complexity of FPRA is similar to O(|U||C||R|).
To obtain more accurate experimental results, we constructed 60 data sets based on six original data sets, which were shuttle_all, sensorless, connect_4, ipums.la.97, ipums.la.99, and covertype, respectively. For each original data set, we divided it into 10 parts of equal size. The first part was regarded as the first data set, the combination of the 1st part and the 2nd part was viewed as the second data set, and the combination of all ten parts was viewed as the tenth data set.
The related ratios for the real running time of the 60 data sets are shown in Figure 5. In the 60 data sets in Figure 5, S3 consumed the most running time (T3/T > 50%) when |R|/|C| > 40%. S2 consumed the most time (T2/T > 50%) when |R|/|C| < 20%. In all of the subfigures, it was easy to determine that the trend of T2/T was opposite to those of T3/T. These features show that the real running time had a tight relationship with |R|. In the 60 data sets in Figure 5, S3 consumed the most running time (T 3 /T > 50%) when |R|/|C| > 40%. S2 consumed the most time (T 2 /T > 50%) when |R|/|C| < 20%. In all of the subfigures, it was easy to determine that the trend of T 2 /T was opposite to those of T 3 /T. These features show that the real running time had a tight relationship with |R|.
Next, we evaluated the real time complexity with the 60 data sets.
In the 60 data sets in Figure 6, the curves on |C| were higher than the other curves. This shows that the real time complexity of FPRA was considerably less than O(|U||C| 2 ). There were 46 data sets that had |R| > T/T 1 . The other 14 data sets satisfied the condition that |R| < T/T 1 . The average value of |C| for the 60 data sets was 45.5. As a comparison, the average values of T/T 1 and |R| of the 60 data sets were 9.2252 and 15.3, respectively. In particular, in shuttle_all, ipums97, and ipums99 data sets, the curves of |R| and T/T 1 were very similar.
As a result, the real time complexity of FPRA could be evaluated as O(|U||C||R|), which was less than O(|U||C| 2 ). It is noted that O(|U||C||R|) was an experimental result, not a theoretical result.
In the 60 data sets in Figure 6, the curves on |C| were higher than the other curves. This shows that the real time complexity of FPRA was considerably less than O(|U||C| 2 ). There were 46 data sets that had |R| > T/T1. The other 14 data sets satisfied the condition that |R| < T/T1. The average value of |C| for the 60 data sets was 45.5. As a comparison, the average values of T/T1 and |R| of the 60 data sets were 9.2252 and 15.3, respectively. In particular, in shuttle_all, ipums97, and ipums99 data sets, the curves of |R| and T/T1 were very similar. As a result, the real time complexity of FPRA could be evaluated as O(|U||C||R|), which was less than O(|U||C| 2 ). It is noted that O(|U||C||R|) was an experimental result, not a theoretical result.

Comparison Experiments
To illustrate the advantage of FPRA, it was compared with some existing fast reduction algorithms, which also calculated a positive region-based reduct.
In order to obtain fair and objective conclusion, all the running times of compared algorithms were recorded from the related literatures. That is, the running times of compared algorithms were proved by the original researchers. At the same time, we used the similar PC and the same data sets to obtain the real running times of FPRA. This method avoids the influences on programming habits of researchers and makes the conclusion objective.

Experiment 1.
It was compared with the classical reduction algorithm and the optimized algorithm in [1]. The experimental results are listed in Table 8.
PR is a classical reduction algorithm based on a positive region, and FSPA-PR is an optimized reduction algorithm proposed in [29]. The running times of PR and FSPA-PR are recorded from the literature [1].
In Table 8, three reducts of FPRA were larger than those of PR, and two reducts (Backup_large.test and Letter-recognition) were less than those of PR. This is due to the different

Comparison Experiments
To illustrate the advantage of FPRA, it was compared with some existing fast reduction algorithms, which also calculated a positive region-based reduct.
In order to obtain fair and objective conclusion, all the running times of compared algorithms were recorded from the related literatures. That is, the running times of compared algorithms were proved by the original researchers. At the same time, we used the similar PC and the same data sets to obtain the real running times of FPRA. This method avoids the influences on programming habits of researchers and makes the conclusion objective.

Experiment 1.
It was compared with the classical reduction algorithm and the optimized algorithm in [1]. The experimental results are listed in Table 8.
PR is a classical reduction algorithm based on a positive region, and FSPA-PR is an optimized reduction algorithm proposed in [29]. The running times of PR and FSPA-PR are recorded from the literature [1].
In Table 8, three reducts of FPRA were larger than those of PR, and two reducts (Backup_large.test and Letter-recognition) were less than those of PR. This is due to the different heuristic construction. FPRA is based on reduct construction by deletion, while PR and FSPA-PR were based on reduct construction by addition. In [33], we noted that the reduct construction by deletion had a strong conservative property. As the price for obtaining a complete reduct, the construction by deletion was less effective in obtaining a minimal reduct.
In Table 8, FPRA clearly exhibited the best time efficiency on the nine datasets, and PR performed the worst. The ratios of running time on FPRA/PR varied from 0.09% to 12.8%. The other ratios of FPRA/FSPA-PR were from 0.12% to 17.6%. On average, for the nine data sets, the time consumption of FPRA was 0.14% of that of PR and 0.26% of that of FSPA-PR. The results show that the proposed algorithm FPRA was surprisingly efficient. Experiment 2. The proposed algorithm was also compared with algorithms in [38], and the results are shown in Table 9. Algorithm ADM (Algorithm based on discernibility matrix) is a classical reduction algorithm based on the discernibility matrix and discernibility function. Its complexity is O(|U| 2 |C| 2 ). Algorithm OADM (optimized ADM) is an optimized fast reduction algorithm proposed in [38], which has the complexity of O(|C| 2 |U|log|U|). Table 9 shows that the running time of FPRA was considerably less than those of the compared algorithms. The ratios of running time of FPRA/ADM were only from 0.03% to 1.09%. The other ratios of FPRA/OADM were from 3.18% to 27.79%. On average, for the five data sets, the time consumption of FPRA was 0.071% of Algorithm ADM and 4.72% of Algorithm OADM. Experiment 3. We compared FPRA with the reduction algorithm in [40], and the results are shown in Table 10.
It is noted that the running times of Q-ARA (Quick Assignment Reduction Algorithm) were reported by the literature [40] and tested in similar PC. Table 10 shows that the running time of FPRA was considerably less than that of Q-ARA. The ratios of running time of FPRA/Q-ARA were only from 0 to 14.56%. On average, for the 11 data sets, the time consumption of FPRA was 0.56% of Algorithm Q-ARA.

Conclusions
In this paper, we proposed a unique and innovative heuristic method, which applies a special core attribute calculation to replace the traditional attribute significance calculation. This method was concise and each conditional attribute was only checked at most once.
The key of the proposed method is a sort function, and the surprisingly running efficiency of FPRA is dependent on the sortrows function. The T 1 of Table 7 lists the exact times on sorting the original data and constructing a PR-SADT.
The experimental analysis shows that the real time complexity of FPRA was less than O(|U||C| 2 ). The proposed algorithm FPRA is also appropriate for big data reduction because it only uses two basic operations (sort and comparison), while MapReduce (model for big data) provides an efficient sort technology. This issue will be addressed in future work.