An Accelerating Reduction Approach for Incomplete Decision Table Using Positive Approximation Set

Due to the explosive growth of data collected by various sensors, it has become a difficult problem determining how to conduct feature selection more efficiently. To address this problem, we offer a fresh insight into rough set theory from the perspective of a positive approximation set. It is found that a granularity domain can be used to characterize the target knowledge, because of its form of a covering with respect to a tolerance relation. On the basis of this fact, a novel heuristic approach ARIPA is proposed to accelerate representative reduction algorithms for incomplete decision table. As a result, ARIPA in classical rough set model and ARIPA-IVPR in variable precision rough set model are realized respectively. Moreover, ARIPA is adopted to improve the computational efficiency of two existing state-of-the-art reduction algorithms. To demonstrate the effectiveness of the improved algorithms, a variety of experiments utilizing four UCI incomplete data sets are conducted. The performances of improved algorithms are compared with those of original ones as well. Numerical experiments justify that our accelerating approach enhances the existing algorithms to accomplish the reduction task more quickly. In some cases, they fulfill attribute reduction even more stably than the original algorithms do.


Introduction
With the wide usage of a diversity of advanced sensors, heterogeneous information acquisition in real-world applications has become much more simple. It also brings the challenge of dealing with a huge amount of data collected by these sensors and generating useful information. To address this challenge, multiple intelligent computing approaches were proposed, e.g., fuzzy set theory, Dempster-Shafer evidence theory, and rough set theory. Rough set theory (RST) is considered as a generalization of set theory for analyzing and processing a variety of data sets consisting of incomplete, imprecise, inconsistent, or uncertain data. It originated from Zdzislaw I. Pawlak [1] and has been identified as a creative and innovative mathematical tool in the last two decades. The rough-set-based data mining approaches have superiority in that they need no prior information, in contrast with other widely utilized strategies, such as SVM, PCA, and DNN [2][3][4][5][6]. Attribute reduction, or feature selection, has become one of the hot spots in the research area of big data. In recent years, the number of objects and dimensions of data sets has been increasing exponentially, as well as the quantity of large-scale data sets. For example, hundreds of thousands of attributes, which reflect various characteristics of the corresponding objects in practice, are stored in various data-sets [7]. However, a large portion of them give no benefit to the subsequent pattern recognition at all, but only take up precious storage space and consume computing time in vain. Hence, it has become a research focus to overcome this obstacle.
All conventional attribute reduction approaches can be classified into three main strategies-filtering, packing, and embedding [8]. The first strategy picks up attribute subsets on the basis of a specific type of measure, e.g., distance [9], information gain [10], dependence [11], and consistency [12]. There exist two types among these measures, one is based on distance and the other is based on consistency [13]. The second strategy adopts a particular learning algorithm to evaluate and choose attribute subsets. The third strategy is a combined strategy of the above two. Generally, the ultimate goal of rough-set-based attribute reduction is to make sure that the chosen attribute subset with lower dimension owns exactly the same discriminability as the universal set of original attributes, but does not maximize the discriminability of classes blindly [14].
The problem of attribute reduction has received increasing attention in recent decades and efforts have been made by different researchers to address various drawbacks. One of the representative methods is proposed by Skowron, who employed the discriminability matrix to retrieve all potential reducts from a given data set [15]. To fulfill the reduction task for an incomplete decision table (IDT), Skowron's method was developed by Kryszkiewicz into a generalized approach utilizing discriminability matrix [16]. Shu et al. researched an incremental attribute selection approach for the data sets with dynamic incomplete data to improve the performances of other algorithms [17][18][19]. To evaluate candidate features in incomplete data, Qian and Shu studied a feature selection approach on the basis of mutual information criterion [20]. Jin and Li investigated in a reduction algorithm based on positive region, i.e., FPR algorithm, to reduce the computation load of attribute reduction [21]. Yan and Han presented an conditional entropy-based reduction algorithm for IDT to evaluate the uncertainty of condition attributes and eliminate redundant ones [22,23]. Xie and Qin investigated the inconsistency degree and demonstrated an incremental attribute reduction algorithm in dynamic data environments [24]. Ma et al. researched a general steg analysis attribute selection approach on the basis of α-positive region reduction [25]. Jing et al. introduced the incremental mechanisms of computing a reduct with a multigranulation view and gave a method of updating reducts as the objects and attributes of DT change dynamically, or increasing simultaneously [26,27]. Sun et al. proposed an fuzzy neighborhood multi-granulation rough-set-based feature selection approach in neighborhood decision systems [28]. Unfortunately, some of the aforementioned methods and other reduction approaches can only deal with the issue of reduction for decision table, but not for IDT, because of the high complexity of the latter. Additionally, almost all of conventional reduction approaches for IDT would suffer from different degrees of long processing time due to large-scale computation when they process incomplete decision tables. To overcome this shortcoming, a variety of heuristic algorithms have been investigated, which can shorten computing time and reserve certain properties of corresponding IDT [29][30][31][32][33][34]. Nevertheless, their efficiencies for practical applications are still not satisfying. That is why we made our efforts to realize attribute reduction for IDT in a more intelligent and more efficient manner.
The aim of this paper is not to find a way of generating superior reducts, in contrast with most of existing attribute reduction approaches, but to study how to search for the same reduct in a more efficient way. Furthermore, the accelerating approaches for existing reduction algorithms in different rough set models, as well as their properties, are investigated in this paper. The major contributions of this research work are concluded as follows: (1) The concept of positive approximation set is constructed and one of its properties is investigated; (2) A novel heuristic accelerating approach of attribute reduction using positive approximation set for IDT is proposed; (3) The implementations of our accelerating approach are realized in different rough set models and tested by utilizing incomplete UCI data sets in the real world; (4) The performances of both computing time and stability of the proposed approach are exhibited and compared with some most recent reduction methods to verify its superiority. The simulations justify that our approach outputs precisely the same reducts as other reduction methods, while it consumes evidently less time and operates more stably in some cases. This paper is organized as follows. Some relevant preliminaries and background concepts are briefly reviewed in Section 2. The details of positive-approximation-set-based reduction approach are provided in Section 3. Section 4 conducts a series of simulations utilizing UCI data sets and gives some analysis. Section 5 draws some conclusions.

Preliminaries
For the purpose of presenting our accelerating approach, it is of significance to review some concepts of rough set concerning our main subject at the very beginning. The rough set theory was firstly proposed by Z. Pawlak to describe and tackle imprecise, uncertain, and vague concepts [1]. Both classical and generalized rough set model contain a variety of mathematical concepts and definitions. To keep our research understandable, some preliminaries are presented in this section at first. Additional mathematical foundations of this paper, described in more detail, with some examples, can be found in [22].

Classical Rough Set Model
RST-based attribute reduction begins with a given data table, i.e., an information system (IS). It consists of all objects we are interested in, as well as their features which can be described by a finite attribute set. An IS containing non-empty attribute values is considered as a complete IS, otherwise it implies as an incomplete information system (IIS). Generally, meeting empty attribute value in data mining and other data processing is almost inevitable. These empty values commonly stand for unavailable feature or inaccessible data, which may be caused by the error in measurement, the impreciseness in data acquisition, the low level of belief in the obtained data, and other potential factors. Therefore, an IIS means the existence of unavailable data or missing value in the system [35]. If an IIS contains a decision attribute which is different from other condition attributes and can indicate the category of the corresponding object, then it stands for an incomplete decision table (IDT).
An IS can be described by a pair (U, A), where U = {x 1 , . . . , x n } indicates the universe of discourse which is actually a non-empty, finite set of objects, and A = {a 1 , . . . , a m } indicates a finite attribute set. There also exists a mapping a : U → V a for any a ∈ A, where V a denotes the domain of the attribute a.
A decision table (DT) with the form of (U, C ∪ {d}) is actually a special information system, where C indicates the whole condition attribute set in DT which can reflect specific features of the target object, and d / ∈ C indicates decision attribute which implies the object's category. Let V d indicate the domain of decision attribute mapping d(x). An attribute set is actually a feature set for pattern classification, and a training pattern set or its sign set can be represented by the universe of discourse.
Let [x] R denote an equivalence relation on U, and ∅ denote an empty set. It implies that relation R is reflexive, symmetric, and transitive. Hence it can generate a partition where IND(R) indicates a equivalence class (i.e., an indiscernible class) which is generated by the relation R. In RST, it can also be considered as an elementary set of R. As for any target set X ⊆ U, the following two elementary sets of R can be used to approximate X.
They are defined as the lower and upper approximation sets of X, respectively. Furthermore, the equations of positive region, negative region, boundary region, and approximation measure are, respectively, presented as where X = ∅. The lower approximation is equivalent with the positive region of X, which denotes a subset consisting of the objects that can be undoubtedly classified as members of X. In contrast, the upper approximation consists of the objects that are possibly members of X. Moreover, the negative region consists of the objects that can be definitely ruled out as the members of X. Finally, the approximation measure α R (X) is utilized to evaluate the completeness degree of our knowledge on X.
We use * to denote empty attribute value, which means that the value of the corresponding condition attribute of the object is missing or unavailable. The IS and DT containing * attribute value are, respectively, defined as incomplete information system (IIS) and incomplete decision table (IDT). Commonly, the process of attribute reduction for incomplete data set is starts with an IDT.

Incomplete Variable Precision Rough Set Model
In the latest decade, a variety of generalized rough-set-model-based reduction approaches have been proposed and developed. This subsection is dedicated to introducing some notations concerning incomplete variable precision model for use.
Let (U, A) be an IS which owns attribute subset P ⊆ A. The definition of a binary similarity relation on U can be expressed as As a matter of fact, SIM(P) is essentially a tolerance relation on P. It can be simply obtained that SIM(P) = ∩ a∈P SIM({a}).
Let SIM(P) = ∩ a∈P SIM({a}) be an IIS, P ⊆ A be a subset of condition attributes A, and X be a subset of the universe of discourse U. The target set X can be approximated by SIM(P)X and SIM(P)X, i.e., where U SIM(P) denotes a partition of the universe of discourse U with respect to SIM(P). A classification task for DT can be characterized by DT = (U, C ∪ D), where C indicates the universe of condition attributes, D indicates the decision attribute set, and there exists C ∩ D = ∅. All objects are assumed to be partitioned by D into r disjoint sets, i.e., {X 1 , X 2 , . . . , X r }. Given a tolerance relation, SIM(P), generated from P, where P indicates a condition attribute subset P ⊆ C, then the lower and upper approximation set with respect to D can be defined, respectively, as    SIM(P)D = SIM(P)X 1 , SIM(P)X 2 , . . . , SIM(P)X r SIM(P)D = SIM(P)X 1 , SIM(P)X 2 , . . . , SIM(P)X r (9) Given POS P (D) = r i=1 SIM(P)X i , i.e., the positive region of D with respect to P. The misclassification function c and the granularity-based approximation set have been proposed to construct variable precision rough set models [36]. This model can be further generalized for acquiring a more flexible algorithm for IDT attribute reduction.
Let the pair (U, A) be an IIS, P ⊆ A be a subset of condition attributes, and X be a target subset of the universe of discourse U. The threshold β is given as β ∈ [0, 0.5], then X can be approximated by SIM(P) β X and SIM(P) β X, i.e., where they satisfy SIM(P) β X ⊆ X ⊆ SIM(P) β X.
Let the pair (U, C ∪ D) be a DT. All objects are assumed to be partitioned by D into r disjoint sets, i.e., {X 1 , X 2 , . . . , X r }. Given a tolerance relation SIM(P) generated from P, where P indicates a condition attribute subset P ⊆ C, then the lower and upper approximation set with respect to D in variable precision model can be defined, respectively, as The positive region of rough set in variable precision model can be obtained as POS , β-positive region of D with respect to P. According to the above framework, a novel algorithm can be demonstrated for attribute reduction in an incomplete variable precision model.
Let the pair (U, A) be an IIS. Given a partial order relation ≺ on 2 A (power set of A) [36], if set P is crisper than set Q, in other words Q is rougher than P, then it is definite that P≺Q satisfies (if S P (x i ) ⊆ S Q (x i ) holds for any i ∈ {1, 2, . . . , |U|}). If P = Q and P≺Q satisfy simultaneously, then we use the notation P ≺ Q .

The Positive Approximation Set of IIS and IDT
An introduction of positive approximation set is demonstrated in this subsection as a preparation for proposing our algorithm. With regard to an incomplete data set, a granularity domain, which can be employed to describe target knowledge, is provided by a covering generated from a tolerance relation. Furthermore, a sequence of granularity domains ranging from rough to crisp is determined by a corresponding sequence of condition attribute subsets with granularity (same ranging as the domains) in the power set of condition attributes.
Let the pair (U, A) be an IIS, X ⊆ U be a target subset, and P = {P 1 , P 2 , . . . , P n } be a subset family satisfying P 1 P 2 . . . P n , where P i ∈ 2 A , i = 1, 2, . . . , n. Given P i = {P 1 , P 2 , . . . , P i }, P i -lower and P i -upper approximation sets of X for IIS can be defined as where X 1 = X. It can be obtained that X k = X − k−1 j=1 P j X j for k = 2, 3, . . . , i, where i = 1, 2, . . . , n. This definition demonstrates the fact that X can be approximated by the corresponding approximation sets, i.e., P i (X) and P i (X). The P i -lower and P i -upper approximation sets of X for IIS in variable precision model can be defined, respectively, as Let the pair (U, A) be an IIS, X ⊆ U be a target subset, and P = {P 1 , P 2 , . . . , P n } be a subset family satisfying P 1 P 2 . . . P n , where P i ∈ 2 A , i = 1, 2, . . . , n. Given P i = {P 1 , P 2 , . . . , P i }, where i = 1, 2, . . . , n, it can be obtained that where . Since the positive approximation set of IIS is related to the structure of target concept X (i.e. it is related to the tolerance class in the lower approximation set of X with respect to P), the tolerance class on U can be employed to redefine the P-positive approximation set of X.
Let the pair (U, C ∪ D) be an IDT, X ⊆ U be a target subset, P = {P 1 , P 2 , . . . , P n } be a subset family satisfying P 1 P 2 . . . P n , and U/D = {X 1 , X 2 , . . . , X r } be a partition of the universe, U, with respect to D. The P-lower and P-upper approximation sets of D for IDT can be defined, respectively, as The P-lower and P-upper approximation sets of D for IDT in a variable precision model can be defined, respectively, as There exists a similar conclusion for IDT, which is POS . This implies that the granularity sequence can be used to approximate the target knowledge D from positive direction. Our accelerating reduction algorithm for IDT was mainly inspired by this conclusion.

Accelerating Reduction Approach for IDT Using Positive Approximation Set
To achieve the ultimate goal of attribute reduction, it is necessary to obtain the specific attribute subset that contains least condition attributes and reserve the same discriminability as C. Three procedures should be taken into consideration for realizing a heuristic reduction algorithm-searching strategy, significance evaluation, and termination condition.
Most of conventional heuristic algorithms for attribute reduction have been suffering from huge amounts of computation in different degree. For addressing this disadvantage, our research does not intend to design a brand new reduction algorithm directly, but to utilize the aforementioned positive approximation set to optimize the existing heuristic strategies for reduction and improve their performances.

Definitions of Condition Attribute Significance
One of modern reduction approaches proposed by Xie et al. (abbreviation as IPR) [24] is adopted in the following section. It is essentially developed from Shu's algorithm [18,19]. To realize our accelerating reduction algorithm, several definitions of condition attribute significance should be presented at first. Each of the these definitions can be utilized for the subsequent reduction process. Definition 1. Let the pair (U, C ∪ D) be an IDT, and B ⊆ C be a subset of condition attributes. As for ∀a ∈ B , the definition of the condition attribute significance of a inside B can be expressed as where γ B (D) = |POS B (D)| |U|.

Definition 2.
Let the pair (U, C ∪ D) be an IDT, and B ⊆ C be a subset of condition attributes. As for ∀a ∈ C − B, the definition of the condition attribute significance of a outside B can be expressed as SIG outer The above two definitions are provided by Qian and Liang et al. [37], and the following two come from Liang and Shi et al. [35]. Definition 3. Let the pair (U, C ∪ D) be an IDT, and B ⊆ C be a subset of condition attributes. As for ∀a ∈ B, the definition of the condition attribute significance of a inside B can be expressed as Definition 4. Let the pair (U, C ∪ D) be an IDT, and B ⊆ C be a subset of condition attributes. As for ∀a ∈ C − B, the definition of the condition attribute significance of a outside B can be expressed as On the basis of Definitions 1 and 2, the corresponding measures of significance can be utilized to construct a new algorithm in incomplete variable precision model, which is capable of reserving β-positive region with respect to the target knowledge D.

Definition 5.
Let the pair (U, C ∪ D) be an IDT, and B ⊆ C be a subset of condition attributes. As for ∀a ∈ B, the definition of the condition attribute significance of a inside B can be expressed as Definition 6. Let the pair (U, C ∪ D) be an IDT, and B ⊆ C be a subset of condition attributes. As for ∀a ∈ C − B, the definition of the condition attribute significance of a outside B can be expressed as SIG outer

Rank Reservation Property of Attribute Significance
This subsection plans to give a discussion on rank reservation property of the condition attribute significance to provide a theory fundamental for proposing our accelerating reduction algorithm. For simplicity and clarity of the content, the notation SIG outer λ (a, U, B, D) is adopted to indicate the condition attribute significance in previous subsection, where λ ∈ (1, 2, 3). Additionally, S U B (x) denotes a tolerance class generated from the object x, with respect to the attribute subset B, on the universe of discourse U. The detailed proofs of all lemmas and theorems appearing in this subsection are demonstrated in Appendixes A and B , respectively.
Firstly, two Lemmas are presented and proved aiming at investigating in the rank reservation property of the dependence based condition attribute significance for IDT.
Secondly, the theorem of rank reservation property can be proved as follows according to Lemmas 1 and 2.
Finally, to investigate in the rank reservation property of condition attribute significance in Yan's conditional entropy reduction approach for IDT [23], the following Lemma 3 is indispensable. Additionally, this property can be described by Theorems 2 and 3 in incomplete rough set model and incomplete variable precision model, respectively.   It can be concluded from the above theorems that the result of reduction would be unchanged as the object number of lower approximation set of positive approximation set for IDT is reduced. In other words, the significance rank of the selected reducts can be reserved when the positive region of positive approximation set for IDT narrows.

Accelerating Attribute Reduction Algorithms
Generally, all reduction approaches based on RST are designed to find a minimal subset consisting of no redundant attribute and reserving specific property, like the whole universe of condition attributes C. It is essentially NP-hard to seek out all potential reducts of an IDT, hence it is only necessary to search for any of them.
It is indispensable to achieve the tolerance class generated from the concerning attributes. Therefore, an accelerating algorithm of tolerance class acquisition for IDT reduction is proposed. The inspiration of this implementation partially comes from the method of radix sorting, and the computation complexity of the algorithm equates as follows: where * a k indicates the number of objects that own empty value in condition attribute a k , and V a k indicates the number of objects that own no empty value in a k . A derived result of reduced computation complexity equates O |A| 2 |U| . The analysis of computation complexity reveals that the dimension of condition attributes has greater influence in the length of computing time, compared with the amount of target objects. Based on the above discussion, an accelerating reduction approach for IDT using positive approximation set (ARIPA) is proposed. In the framework of ARIPA, the evaluation function (or termination condition) can be expressed as EF U (B, D) = EF U (C, D), which implies that the discernibility of condition attribute subset B is exactly the same as that of the universe of condition attributes C. The evaluation function can be chosen according to the original reduction algorithm we plan to accelerate. For an instance, if the original algorithm adopted is Yan's rough conditional entropy-based reduction algorithm in [23], IDT's kernel partly consists of condition attributes in red at this step.

Experiments
To investigate the efficiency and effectiveness of the proposed ARIPA and ARIPA-IVPR, four incomplete data sets are picked up from the UCI Machine Learning Database at University of California for experimental purposes. The performances of the proposed algorithms were analyzed and compared with those of other state-of-the-art algorithms to prove their superiority.

Experiments on ARIPA and ARIPA-IVPR
Due to the existence of continuous attribute values contained in the chosen incomplete data sets, Tsai's CACC discretization algorithm [38] is adopted as a preprocess before reduction to discretize continuous values into discrete ones. Another aim of this step is to reduce the computation load of subsequent steps and compress the data scale. The average CPU time of ARIPA, ARIPA-IVPR, and their competitors is counted in seconds as their running time. All simulation work is conducted on the PC with the configurations of 8GB RAM, Intel i5-8400 2.8GHz CPU, Matlab R2019a, Win10 (64 bit). The statistical results of the four incomplete data sets for simulations are summarized and analyzed, respectively, in Table 1. To compare our improved reduction algorithms with other competitors (Xie's IPR [24] and Yan's ILCE [23]), a modern approach is carried out for evaluating their computation complexities [39]. The same reduct would be obtained by each pair of the improved and original algorithm, thus we just have to make an comparison between their running times. The graphical illustrations of their performances are shown in Figures 1 and 2. In these figures, the x-axis indicates the number of data segments which increases from 1 to 20 (all objects of each incomplete data set are equally divided into 20 segments), and the y-axis indicates the corresponding running time. The experiments using incomplete data segments in different scales would make us aware of the trend of the computing time as the scale grows. Furthermore, the simulations indirectly prove that our accelerating algorithm would exhibit more outstanding performance when the incomplete data set contains tens of thousands of objects.
With regard to the framework of incomplete variable precision model, Kang's IVPR algorithm [36] is conducted as a competitor for our improved ARIPA-IVPR. The experiment results are illustrated in Figures 3-6.

Results and Discussions
It can be noticed from Figures 1-6 that the computing time of the improved algorithm increases more smoothly than that of the original algorithm as the number of data segments grow. Essentially, this consequence can be the result for the following three reasons. (1) The accelerated algorithm consumes much less computing time when the universe of discourse shrinks dramatically. (2) As for the same incomplete data segments, the original algorithms have to consume more time to evaluate the condition attribute significance of the potential reducts. (3) Our accelerating algorithm would encapsulate all concerning objects into the lower approximation set with respect to the decision attribute set during the reduction, hence it ensures that the improved reduction algorithm would consume less time to finish the reduction. These results are caused by the rank reservation property of the condition attribute significance, as discussed in Section 3.2. It provides a solution to the inefficiency of the existing heuristic algorithms for IDT reduction. Since the reducts from different algorithms are identical, the same classification accuracy can be ensured in subsequent process, no matter what type of classifier is chosen, e.g., SVM, decision tree, etc. It is possible that the accelerating reduction algorithm we propose leads in the problem of overfitting, in the perspective of classifier. However, discussion on this issue is not included in this paper.     It also can be observed that the computing time rises up for most of time when the number of data segments increases in each experiment, no matter which incomplete data set, competitor algorithm, style of rough set model, or value of β we choose. However, not all the curves show a strictly monotone increasing function, and the opposite may take place in a few cases (e.g., in Figure 4). This phenomenon a result of the possibility that the new added data segment, in contrast to the existing ones, may contain specific knowledge that is more useful for attribute reduction as well as compressing the computation load.  The computation complexities of state-of-the-art [23] and improved algorithms are analyzed step by step in Table 2. It can be observed that the major difference in computation aspect is brought by step 2 and steps 5-9 of the algorithms. Among these steps, step 2 corresponds to the evaluation of the attribute significance of potential reducts, and steps 5-9 correspond to the loop which includes the evaluation of the positive region of positive approximation set and the heuristic search for real reducts. Moreover, Figures 7 and 8 indicate that our improved algorithms run more efficiently than the original algorithms, both in rough set model and variable precision model (β = 0.0, 0.1, 0.2). Hence, the experiment results justify the conclusion that the accelerated algorithms are more efficient for reduction in practical applications. Table 2. Analysis on the computation complexity of existing and accelerated attribute reduction algorithm.

Steps 5-9 Other Steps
Existing algorithm

Algorithm Stability Analysis
To evaluate the stability of both original and improved algorithms, ten-fold crossvalidation was applied. In this validation, a given data set is randomly parted into ten nearly equally sized subsets. Nine of them are treated as training sets, and one last subset is reserved as a testing set to evaluate the classification accuracy. The distance between two different reducts C i and C j is evaluated in Equation (25), where C 0 and C i indicate the reducts generated from U and the ith segment of U, respectively.
Furthermore, by using the statistical method, mean (i.e., µ in Equation (26)) and standard deviation (i.e., σ in Equation (27)) of the above ten distances of the segments can be determined as well.
The stability of the reduct outputted from the heuristic reduction algorithm is characterized by standard deviation of those distances. More specifically, lower the standard deviation gets, more stably the corresponding reduction algorithm would run. The stability analysis of each pair of algorithms is carried out in Tables 3-5.   In Table 3, it can be found that ARIPA-IPR consumes less computing time, and its lower standard deviation of computing time (in ten-fold cross-validation) implies better robustness than that of the original IPR algorithm. On the other hand, they both own exactly the same stability, as well as the same standard deviation of stability. By borrowing the positive approximation set approach, ARIPA-IPR not only reduces the computation of IPR evidently and enhances its robustness simultaneously, but also holds the same stability as IPR by generating the identical reduct. Similarly, same conclusions can be drawn from Table 4 for the pair of ARIPA-ILCE and ILCE. With regard to the pair of ARIPA-IVPR and IVPR in Table 5, the former half of the above conclusion still holds, and the stability of them are identical if β = 0.0. This result can be explained reasonably by Theorem 3. While in case of β = 0.1 or 0.2, ARIPA-IVPR runs more stably than IVPR does. This is because in incomplete variable precision rough set model, the selected reduct (which is with respect to a nonzero β), would become closer to the reduct generated from the universe of condition attributes, when the norm of the lower approximation set of the positive approximation set decreases.
When β varies between 0.0 and 0.5, it can be noticed that the reducts output from our reduction algorithm may be diverse in different cases. This result can be explained through the definition of incomplete variable precision model, i.e., the concerning inclusion degree function is non-monotonic. Although this does not meet our expectation, it is still meaningful because of the following reasons. (1) When the improved reduction algorithm meets its termination condition, the output reduct would definitely contain all the condition attributes that are included in the reduct output from the original reduction algorithm, on the condition that the compressed subset of universe U i is nonempty. Since the termination condition demands that γ Since all of the objects in the universe of discourse U are encapsulated into the lower approximation set with respect to the decision attribute in this case, the improved reduction algorithm, which provides us with a more satisfying option, would have a better approximation capability than the original one.

Conclusions
To address the disadvantage of conventional methods of attribute reduction for incomplete decision table in the aspect of computational efficiency, the concept of a positive approximation set based on a tolerance relation is introduced. Additionally, the rank reservation property of the condition attribute significance is discussed, and it is employed to accelerate other existing reduction algorithms under various heuristic strategies. As a result, a novel accelerating reduction approach for IDT using positive approximation set (ARIPA) is proposed. Several state-of-the-art reduction algorithms in different rough set models are accelerated by ARIPA. To assess the performances of both improved and original reduction algorithms, a series of experiments utilizing four real-world incomplete data sets are conducted. The results show that the ARIPA-improved algorithm would ensure the output of the same reduct as that from the original reduction algorithm. While the former can finish attribute reduction in a more efficient and maybe a more stable manner, in contrast with the latter. Average computing time of ARIPA-IPR, ARIPA-ILCE, and ARIPA-IVPR is cut to 33.32%, 55.21%, and 43.62%, respectively. The proposed approach has been verified distinctly effective for dealing with incomplete data sets with large amounts of objects. However, the question of how to ensure its high efficiency for incomplete data sets with hundreds of thousands of dimensions (condition attributes) is still an unresolved issue left for the future.

Acknowledgments:
We would like to thank to our students Meng Tian, Jiyuan Yang, Tongyuehao Zhou, Hui Chen, Yunhao Wang, and Yuxin Zhao for their dedicated implementation work.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: RST rough set theory IS  information system  IIS  incomplete information system  DT  decision table  IDT  incomplete decision table  POS  Proof. Let a ∈ A satisfy, since A = A ∪ C and A ⊆ A , we have a ∈ A . Since A ⊆ B , we can derive a ∈ B . (A ∪ B) ∩ C = ∅, thus we have A ∩ C = ∅; furthermore, a / ∈ C . Since B = B ∪ C , a ∈ B and a / ∈ C , there exists a ∈ B. Finally, we can obtain A ⊆ B. QED.
Lemma A2. Let the pair (U, C ∪ D) be an IDT, such that we have B ⊆ C and Proof. Since U = U − POS U B (D) and x ∈ U satisfy, two notations X and Y can be defined as follows.
Therefore, we can obtain S U B∪{a} (x ) = S U B∪{a} (x ) ∪ X and S U D (x ) = S U D (x ) ∪ Y. According to Y's formula, it can be derived that Y ⊆ POS U B (D). Thus, there exists Y ∩ S U B∪{a} (x ) = ∅ and Y ∩ S U D (x ) = ∅, i.e., Y ∩ S U B∪{a} (x ) ∪ S U D (x ) = ∅. Then, according to S U B∪{a} (x ) ⊆ S U D (x ) and Lemma 1, it can be derived that S U B∪{a} (x ) ⊆ S U D (x ).
Lemma A3. Let the pair (U, C ∪ D) be an IDT, such that we have B ⊆ C and U = U − POS U B (D) satisfied. Then, we have S U Proof. Since U = U − POS U B (D) and x ∈ U , two notations X and Y can be defined as follows.
Then it can be obtained that S U B (x ) = S U B (x ) ∪ X and S U D (x ) = S U D (x ) ∪ Y. According to the definitions of X and Y, we can derive that X ⊆ POS U B (D) and Y ⊆ POS U B (D). Thus, there exists Y ∩ S U B (x ) = ∅ and X ∩ S U D (x ) = ∅. As for ∀x ∈ X, it can be obtained that x ∈ S U B (x ). It can be derived that x ∈ S U B (x) on the basis of the symmetry of tolerance relation. Furthermore, it can be derived that S U B (x) ⊆ S U D (x) on the basis of the definition of positive region, hence x ∈ S U D (x). Similarly, it can be obtained that x ∈ S U D (x ). Since for ∀x ∈ X and ∀x ∈ POS U B (D), there exists x ∈ Y, i.e., X ⊆ Y. Therefore, the following formula can be derived.
And since X ⊆ POS U B (D), we can obtain S U B (x ) ∩ S U D (x ) ∩ X = ∅ and S U B (x ) ∩ S U D (x ) = S U B (x ) ∩ S U D (x ) + |X|. Therefore, the following formula can be derived. x i ∈ POS U B (D), i = 1, 2, . . . , q. The notation EN U (D|B ) is used to indicate the rough conditional entropy on the universe of discourse U in Yan's approach [23].
In addition, the following equation can be derived according to Lemma A3 Therefore, there exists Theorem A3. Let the pair (U, C ∪ D) be an IDT, such that we have B ⊆ C, U = U − POS U B (D) and β = 0 satisfied. As for ∀a, b ∈ C − B, if SIG outer