Next Article in Journal
Calibration Estimators with Different Types of Distance Measures Under Stratified Sampling in the Presence of Measurement Error
Previous Article in Journal
Initial Coefficient Bounds for Bi-Close-to-Convex Classes of n-Fold-Symmetric Bi-Univalent Functions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Heuristic Attribute-Reduction Algorithm Based on Conditional Entropy for Incomplete Information Systems

by
Yanling Bao
* and
Shumin Cheng
College of Mathematics and System Science, Xinjiang University, Urumqi 830047, China
*
Author to whom correspondence should be addressed.
Axioms 2024, 13(11), 736; https://doi.org/10.3390/axioms13110736
Submission received: 7 September 2024 / Revised: 16 October 2024 / Accepted: 21 October 2024 / Published: 25 October 2024

Abstract

:
With the continuous expansion of databases, the extraction of information has been an urgent research topic in many fields. As an effective method to remove redundant attributes, attribute reduction demonstrates extraordinary ability in simplifying information systems. This paper applies a novel form of conditional entropy to investigate the attribute reduction in incomplete information systems. Firstly, a novel definition of conditional entropy is introduced based on tolerance relation. Additionally, in order to reduce time complexity, we propose a binsearch heuristic attribute-reduction algorithm with conditional entropy as heuristic knowledge. Furthermore, two examples are used to illustrate the feasibility and validity of the reduction algorithm.

1. Introduction

      Human society has been in the era of network information for many years. How to efficiently extract useful information from information systems with massive data has become a hot and difficult topic in big data analysis and machine learning. Sometimes, the data are incomplete and many redundant attributes exist in an information system. How to reduce the number of redundant attributes without changing classification ability is a tricky question in many fields. Fortunately, attribute reduction, as one of the core contents of a rough set, can reduce redundant attributes effectively and has been studied by many researchers [1,2,3,4,5,6,7,8]. Attribute-reduction methods for complete information systems have been well established; however, there are limitations in dealing with attribute-reduction problems in incomplete information systems since the objects with missing data cannot be classified effectively by an equivalence relation. Therefore, developing reliable attribute-reduction methods in incomplete information systems is an important direction for current research [9,10,11].
So far, there exist two main kinds of algorithms for attribute reduction. One is the discernibility matrix-based method proposed by Skowron [12]. In recent years, Liu et al. [13,14,15] published several works about attribute-reduction algorithms based on discernibility matrix, but the high time complexity in transforming a conjunctive normal form into a disjunctive normal form has always been a problem. Although the time complexity of this algorithm is high, it can obtain all reducts. The other is the heuristic algorithm, which has low time complexity and is effective for large data sets, but it can only obtain one reduct, not all reducts. For information systems with a large number of attributes, a heuristic algorithm is a better choice. Heuristic knowledge is a core element for heuristic algorithms, so it is crucial to use appropriate heuristic knowledge to obtain an accurate reduct. In many heuristic algorithms, attribute significance is usually adopted, as heuristic knowledge and information entropy are the most-favored choices to measure attribute significance, such as Shannon entropy [16], joint maximum entropy [17], and relative decision entropy [18]. Specifically, Dai et al. [19] introduced a novel form of conditional entropy to calculate attribute significance and further constructed three attribute-selection algorithms from the viewpoint of exhaustive search, greedy (heuristic) search, and probabilistic search for incomplete decision systems. To reduce the complexity of attribute-reduction algorithms, Thuy and Wongthanavasu [20] adopted stripped-quotient sets to identify several reducts based on information entropies. However, only one candidate attribute can be tested at a time in their algorithms. Zou et al. [21] employed a supervised strategy to update the conditional neighborhood entropy of three-layer granulation and considered the mutual influence between conditional attributes, then proposed a neighborhood rough set attribute-reduction algorithm. Chen and Zhu [22] proposed variable-precision multigranulation rough sets (VPMGRSs) and designed a heuristic algorithm to incorporate the α lower distribution reduct into the multigranulation environment. On the other hand, to reduce time complexity, Zhou et al. [23] designed a binsearch heuristic algorithm based on attribute significance, in which the attribute significance is measured with the help of a similarity matrix. It turns out that the idea of binsearch can effectively reduce the time complexity of attribute reduction. Until now, the research about binsearch heuristic algorithm has been scarce.
In summary, algorithms based on the discernibility matrix can identify all reductions, but they have high time complexity when transforming data representations. On the other hand, heuristic algorithms generally have higher computational efficiency and are suitable for large datasets. However, they are limited to identifying only one reduction and cannot capture the full complexity of the dataset. Therefore, this paper proposes a novel form of conditional entropy as heuristic knowledge and combines the idea of binsearch to investigate the attribute-reduction algorithm in incomplete information systems.
The remainder of this paper is organized as follows. In Section 2, we review some basic definitions and give the calculation method of conditional entropy based on the tolerance relation. In Section 3, the method for acquiring core attributes and the binsearch heuristic attribute-reduction algorithm are proposed. In Section 4, two examples are given to illustrate the feasibility and availability of our algorithm. Section 5 concludes the paper.

2. Preliminaries

In this section, we recall some basic definitions about incomplete information systems, tolerance relations, and conditional entropy.
Definition 1
([23]). Let U = { x 1 , x 2 , , x n } be an object set, A = { a 1 , a 2 , , a m } be an attribute set, and f: U × A V be an information function. The value set is defined as V = a A V a , where V a represents the set of evaluation values of all objects associated with attribute a A . Then, S = ( U , A , f , V ) is called an information system. If there exist a A , x U such that f ( x , a ) = , where “∗” represents missing value, then S = ( U , A , f , V ) is an incomplete information system.
Sometimes, S = ( U , A , f , V ) is abbreviated as S = ( U , A ) —as long as there are no confusions. If A = C D and C D = , then ( U , C D ) is called a decision information system, where C and D are termed as the condition attribute set and decision attribute set, respectively.
In incomplete information systems, an equivalence relation is often not effective in categorizing objects since missing data make it difficult to recognize the similarity between objects. To classify objects effectively in incomplete information systems, a tolerance relation is introduced. It regards objects as similar if they have the same values under all attributes with complete data. The specific definition is as follows.
Definition 2
([12]). Suppose that S = ( U , A ) is an incomplete information system, C A ; for any x , y U , tolerance relation R C is defined as
R C = { ( x , y ) | a C , f ( x , a ) = f ( y , a ) or f ( x , a ) = or f ( y , a ) = } .
In many situations, f ( x , a ) is also denoted as a ( x ) for simplicity. Obviously, R C is reflexive and symmetric, but not necessarily transitive, and U / R C = { R C ( x 1 ) , R C ( x 2 ) , ,   R C ( x | U | ) } ( x i U ) is a covering of U, where R C ( x i ) = { y U | ( x i , y ) R C ) } represents the tolerance class of x i with respect to C. Let U C D = { x | R C ( x ) R D ( x ) } ; then, U C D is called the consistent part of ( U , C D )  [24].
Theorem 1.
Let S = ( U , C D ) be an incomplete information system. If T C , then U T D U C D .
Proof. 
Since T C , we can have R C ( x ) R T ( x ) for any x U . If x U T D , then R T ( x ) R D ( x ) . Furthermore, R C ( x ) R D ( x ) . Thus, x U C D . Therefore, we can obtain U T D U C D .    □
Definition 3
([24]). Suppose that S = ( U , C D ) is an incomplete decision information system, T C . If T satisfies the following conditions:
(1) U T D = U C D ;
(2) For any T T , U T D U C D ;
then T is called a reduct of C.
It should be noticed that there may be many different attribute reducts for an information system, and the attributes that appear in all reducts are called core attributes.
Information entropy, proposed by Shannon [25], refers to the average uncertainty of information sources, also known as unconditional entropy. When some certain conditions affect the probability distribution of information sources, they may affect the uncertainty of information sources, so conditional entropy appears. Conditional entropy represents the conditional uncertainty of one event under another event [26]. The smaller the conditional entropy, the smaller the conditional uncertainty of one event with respect to another event, and vice versa. Thuy  [20] gave a calculation method of conditional entropy for complete information systems.
Definition 4
([20]). In a complete information system S = ( U , C D ) , the conditional entropy of D with respect to C is defined as
H ( D | C ) = X U / C | X | | U | Y U / D | X Y | | X | log | X Y | | X | ,
where U / C denotes the partition of U with respect to C and U / D denotes the partition of U with respect to D.
The above definition suggests that conditional entropy is the extent to which uncertainty about another variable ( Y ) is reduced after some of the information ( X ) is known. H ( D | C ) denotes the uncertainty of D under condition C. The smaller the value of H ( D | C ) , the smaller the uncertainty of D under condition C, and vice versa. Then, H ( D | { a i } ) ( a i C , i = 1 , 2 , , | C | ) denotes the conditional entropy of D with respect to condition attribute a i . The smaller the value of H ( D | { a i } ) , the smaller the uncertainty of D under condition attribute a i , indicating that attribute a i is more significant in a system. Therefore, we can determine the significance of an attribute in an information system according to its conditional entropy, i.e., conditional entropy can be used as the heuristic knowledge in the process of attribute reduction.
When dealing with the attribute-reduction problem of a complete information system, the conditional entropy defined in Definition 5 is often adopted, but it is not suitable for incomplete information systems. To overcome this difficulty, we define a novel form of conditional entropy based on the tolerance relation as follows.
Definition 5.
Given an incomplete information system S = ( U , C D ) , the conditional entropy of D with respect to C is defined as
H ( D | C ) = i = 1 | U | | R C ( x i ) | | U | j = 1 | U / D | | R C ( x i ) D j | | R C ( x i ) | log | R C ( x i ) D j | | R C ( x i ) | .

3. Binsearch Heuristic Reduction Algorithm Based on Conditional Entropy

In order to reduce time complexity and space complexity, many heuristic reduction algorithms start from core attributes. Thus, determining the core attributes of an information system is of great significance. Next, we give a sufficient condition to determine the core attributes of an incomplete decision system.
Theorem 2.
Given an incomplete decision system S = ( U , C { d } ) , for objects x i , x j U ( x i x j ) , if d ( x i ) d ( x j ) , min { | σ C ( x i ) | , | σ C ( x j ) | } = 1 ( σ C ( x i ) = { d ( y ) , y R C ( x i ) } ) , and only one a k ( a k C ) satisfies the condition a k ( x i ) a k ( x j ) , then a k is a core attribute, i.e., a k c o r e ( C ) .
Proof. 
Let D = { d } . Consider min { | σ C ( x i ) | , | σ C ( x j ) | } = 1 . If | σ C ( x j ) | = 1 , then R C ( x j ) R { d } ( x j ) , so x j U C D . If | σ C ( x i ) | = 1 , then R C ( x i ) R { d } ( x i ) , so x i U C D . In addition, since there is only one attribute a k ( a k C ) satisfying a k ( x i ) a k ( x j ) , so a l ( x i ) = a l ( x j ) ( a l ( C { a k } ) , and x j R C ( x i ) , x i R C ( x j ) . Let E = C { a k } ; then, x j R E ( x i ) , x i R E ( x j ) . Since d ( x i ) d ( x j ) , it can be observed that x j R { d } ( x i ) and x i R { d } ( x j ) . U E D = { x | R E ( x ) R { d } ( x ) } , which means that x i U E D and x j U E D . Then, U E D U C D . Therefore, a k c o r e ( C ) .    □
Example 1.
Suppose that S = ( U , C { d } ) is an incomplete decision table (as shown in Table 1), U = { x 1 , x 2 , , x 8 } , C = { a 1 , a 2 , a 3 , a 4 } , and D = { d } . The numbers in Table 1 indicate scores for each decision alternative, while ★ signifies missing values. The core attributes can be obtained as follows.
Firstly, it can be seen that object pairs that satisfy condition d ( x i ) d ( x j ) are ( x 1 , x 2 ) , ( x 1 , x 5 ) , ( x 1 , x 8 ) , ( x 2 , x 3 ) , ( x 2 , x 4 ) , ( x 2 , x 6 ) , ( x 2 , x 7 ) , ( x 3 , x 5 ) , ( x 3 , x 8 ) , ( x 4 , x 5 ) , ( x 4 , x 8 ) , ( x 5 , x 6 ) , ( x 5 , x 7 ) , ( x 6 , x 8 ) , ( x 7 , x 8 ) .
Secondly, we can observe that R C ( x 1 ) = { x 1 } , R C ( x 2 ) = { x 2 , x 3 } , R C ( x 3 ) = { x 2 , x 3 } , R C ( x 4 ) = { x 4 } , R C ( x 5 ) = { x 5 , x 6 , x 7 } , R C ( x 6 ) = { x 5 , x 6 } , R C ( x 7 ) = { x 5 , x 7 } , R C ( x 8 ) = { x 2 , x 3 , x 6 , x 8 } . Accordingly, σ ( x 1 ) = { 1 } , σ ( x 2 ) = { 1 , 2 } , σ ( x 3 ) = { 1 , 2 } , σ ( x 4 ) = { 1 } , σ ( x 5 ) = { 1 , 2 } , σ ( x 6 ) = { 1 , 2 } , σ ( x 7 ) = { 1 , 2 } , σ ( x 8 ) = { 1 , 2 } . Thus, all the object pairs that satisfy conditions d ( x i ) d ( x j ) and min { | σ ( x i ) | , | σ ( x j ) | } = 1 are ( x 1 , x 2 ) , ( x 1 , x 5 ) , ( x 1 , x 8 ) , ( x 2 , x 4 ) , ( x 4 , x 5 ) , ( x 4 , x 8 ) .
Thirdly, it is obvious that objects x 1 and x 2 can be distinguished by a 2 and a 4 , objects x 1 and x 5 can be distinguished by a 1 and a 4 , objects x 2 and x 4 can be distinguished by a 2 , objects x 4 and x 5 can be distinguished by a 1 and a 4 , objects x 1 and x 8 can be distinguished by a 1 , and objects x 4 and x 8 can be distinguished by a 1 .
Consequently, a 1 and a 2 are core attributes, i.e., c o r e ( C ) = { a 1 , a 2 } .
In fact, the essence of attribute reduction is to obtain the smallest T ( T C ) such that U T D = U C D . Next, we apply the idea of binsearch to propose a heuristic attribute-reduction algorithm based on conditional entropy to obtain T. The process of Algorithm 1 and the corresponding Figure 1 are shown below.
Algorithm 1 Process of the binsearch heuristic reduction algorithm based on conditional entropy
  • Input: An incomplete information system S = ( U , C D ) .
  • Output: A reduct T of C.
  • Step 1: Calculate U C D . c o r e ( C ) is obtained by Theorem 2; other remaining attributes are placed in ascending order in a candidate attribute set Z according to their conditional entropies.
  • Step 2: Initialization: T = c o r e ( C ) ; if U T D = U C D , then go to Step 4, else go to Step 3.
  • Step 3: Initialization: min = 1 , max = | C | | c o r e ( C ) | .
  • while (true) do
  •     t e m p t = T ; // Save T before making the change.
  •     mid = [ min + max 2 ] ;
  •    Attributes numbered from min to mid in Z are added to T. Then, calculate U T D and U C D .
  •    if  ( U T D U C D )  then
  •       if  ( max mid 1 )  then
  •          (1) Attribute numbered max in Z is added to T;
  •          (2) Exit the cycle; // End of the algorithm.
          else
  •           min = mid + 1 ; // Enter the next cycle.
  •       end if
       else  ( U T D = U C D )  then
  •       if  ( min = mid )  then
  •          (3) Exit the cycle; // End of the algorithm.
          else
  •          (4) max = mid ;
  •           T = t e m p t ; // Restore T to its previous state, and enter the next cycle.
  •       end if
  •    end if
  • end while
  • Step 4: End of the algorithm.
This algorithm is designed to reduce the time complexity by testing multiple candidate attributes each time. When one half of the attributes in Z are added to T, if U T D U C D , this indicates that other attributes need to be added to T, and, if max mid 1 , this means that only the attribute numbered max in Z can be added to T. Therefore, the attribute numbered max in Z is directly added to T, and we stop the algorithm. If max mid > 1 , which means that attributes with a number larger than mid should be added to T, then mid + 1 is directly assigned to min. When U T D = U C D , if min = mid , this indicates that T is the smallest reduct, and we end the algorithm. If min mid , this means that T may not be the smallest reduct. In order to obtain the smallest reduct, let max = mid . The attribute reduction of information systems can be obtained by repeating this computation. Obviously, when adding one half of the attributes in Z to T, if U T D = U C D , this implies that the algorithm terminates before an attribute numbered mid in Z or terminates at the position of mid . If U T D U C D , this means that the algorithm terminates after an attribute numbered mid in Z. It can be seen that, during the process of attribute reduction, our heuristic algorithm can add multiple candidate attributes each time, thereby reducing time complexity.
The time complexity of the conditional entropy-based binary search heuristic reduction algorithm can be summarized as follows. The input phase is completed in O ( 1 ) . In Step 1, calculating U C D takes O ( m n ) time, where n is the number of attributes and m is the number of attributes. Sorting the attributes takes O ( n l o g n ) , resulting in a total of O ( m n + n l o g n ) . In Step 2, initializing T in O ( 1 ) and checking the condition U T D = U C D requires O ( m n ) time. Thus, the total time complexity for Step 2 is O ( m n ) . In Step 3, the binary search runs in O ( l o g n ) iterations, with each iteration requiring O ( m n ) , leading to a complexity of O ( n m l o g n ) . The termination step is O ( 1 ) . In conclusion, the overall time complexity of the algorithm is O ( n m l o g n ) . In addition, the time complexity of the algorithm in [23] is O ( n 2 ) , which is significantly higher than that of our algorithm when dealing with large sample data. This shows that the algorithm in this paper has superior efficiency.

4. Examples

To verify the effectiveness of the proposed algorithm, an incomplete decision table S = ( U , C D ) (as shown in Table 2) from reference [23] is adopted. Next, we apply the binsearch heuristic algorithm to obtain a reduct of C.
Example 2.
An incomplete decision table S = ( U , C { d } ) is shown in Table 2, where U = { x 1 , x 2 , , x 12 } , C = { a 1 , a 2 , , a 8 } . We can obtain a reduct of C as follows:
Step 1. We first use a Boolean matrix to calculate U C D . According to Table 2, the Boolean relation matrices of R { a 1 } , R { a 2 } , , R { a 8 } , and R { d } can be acquired as follows:
M R { a 1 } = 1 0 0 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 0 1 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 0 1 0 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 1 0 0 0 1 1 0 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 1 12 × 12 , M R { a 2 } = 1 0 0 1 1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 12 × 12 ,
M R { a 3 } = 1 0 0 1 1 0 1 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 0 1 1 12 × 12 , M R { a 4 } = 1 0 0 1 1 1 0 1 0 1 1 1 0 1 1 0 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 12 × 12 ,
M R { d } = 1 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 12 × 12 , i = 1 8 M R { a i } = 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 12 × 12 .
It is easy to obtain R C ( x 1 ) = { x 1 , x 11 , x 12 } , R C ( x 2 ) = R C ( x 3 ) = { x 2 , x 3 } , R C ( x 4 ) = R C ( x 5 ) = { x 4 , x 5 , x 11 } , R C ( x 6 ) = { x 6 } , R C ( x 7 ) = { x 7 , x 8 , x 12 } , R C ( x 8 ) = { x 7 , x 8 , x 10 } , R C ( x 9 ) = { x 9 } , R C ( x 10 ) = { x 8 , x 10 } , R C ( x 11 ) = { x 1 , x 4 , x 5 , x 11 } , R C ( x 12 ) = { x 1 , x 7 , x 12 } from matrix i = 1 8 M R { a i } . Similarly, we obtain R { d } ( x 1 ) = { x 1 , x 2 , x 7 , x 10 , x 11 , x 12 } , R { d } ( x 2 ) = { x 1 , x 2 , x 7 , x 10 , x 11 , x 12 } , R { d } ( x 3 ) = { x 3 , x 4 , x 5 , x 6 , x 8 , x 9 } , R { d } ( x 4 ) = { x 3 , x 4 , x 5 , x 6 , x 8 , x 9 } , R { d } ( x 5 ) = { x 3 , x 4 , x 5 , x 6 , x 8 , x 9 } , R { d } ( x 6 ) = { x 3 , x 4 , x 5 , x 6 , x 8 , x 9 } , R { d } ( x 7 ) = { x 1 , x 2 , x 7 , x 10 , x 11 , x 12 } , R { d } ( x 8 ) = { x 3 , x 4 , x 5 , x 6 , x 8 , x 9 } , R { d } ( x 9 ) = { x 3 , x 4 , x 5 , x 6 , x 8 , x 9 } , R { d } ( x 10 ) = { x 1 , x 2 , x 7 , x 10 , x 11 , x 12 } , R { d } ( x 11 ) = { x 1 , x 2 , x 7 , x 10 , x 11 , x 12 } , R { d } ( x 12 ) = { x 1 , x 2 , x 7 , x 10 , x 11 , x 12 } from matrix M R { d } ; then, U C D = { x | R C ( x ) R { d } ( x ) } = { x 1 , x 6 , x 9 , x 12 } .
According to Theorem 2, we can have c o r e ( C ) = { a 4 , a 6 , a 7 } . The remaining attributes are a 1 , a 2 , a 3 , a 5 , and a 8 . Next, calculate their conditional entropies, taking a 1 as an example.
U / R { a 1 } = { R { a 1 } ( x 1 ) , R { a 1 } ( x 2 ) , R { a 1 } ( x 3 ) , , R { a 1 } ( x 12 ) } = { { x 1 , x 4 , x 5 , x 7 , x 8 , x 9 , x 11 , x 12 } , { x 2 , x 3 , x 4 , x 5 , x 6 , x 8 , x 11 } , { x 2 , x 3 , x 4 , x 5 , x 6 , x 8 , x 11 } , { x 1 , x 2 , , x 12 } , { x 1 , x 2 , , x 12 } , { x 2 , x 3 , x 4 , x 5 , x 6 , x 8 , x 11 } , { x 1 , x 4 , x 5 , x 7 , x 8 , x 9 , x 11 , x 12 } , { x 1 , x 2 , , x 12 } , { x 1 , x 4 , x 5 , x 7 , x 8 , x 9 , x 11 , x 12 } , { x 4 , x 5 , x 8 , x 10 , x 11 } , { x 1 , x 2 , , x 12 } , { x 1 , x 4 , x 5 , x 7 , x 8 , x 9 , x 11 , x 12 } } ,
U / D = { { x 1 , x 2 , x 7 , x 10 , x 11 , x 12 } , { x 3 , x 4 , x 5 , x 6 , x 8 , x 9 } } ,
H ( D | { a 1 } ) = i = 1 12 | R { a 1 } ( x i ) | | U | j = 1 2 | R { a 1 } ( x i ) D j | | R { a 1 } ( x i ) | log | R { a 1 } ( x i ) D j | | R { a 1 } ( x i ) | = [ 4 × 8 12 ( 4 8 log 4 8 + 4 8 log 4 8 ) + 3 × 7 12 ( 2 7 log 2 7 + 2 7 log 2 7 ) + 4 × 12 12 ( 6 12 log 6 12 + 6 12 log 6 12 ) + 5 12 ( 2 5 log 2 5 + 3 5 log 3 5 ) ] = 2.58335 .
Similarly, H ( D | { a 2 } ) = 2.24587 , H ( D | { a 3 } ) = 2.83235 , H ( D | { a 5 } ) = 3.24598 , H ( D | { a 8 } ) = 2.33118 . Obviously, H ( D | { a 2 } ) < H ( D | { a 8 } ) < H ( D | { a 1 } ) < H ( D | { a 3 } ) < H ( D | { a 5 } ) . Therefore, Z = { a 2 , a 8 , a 1 , a 3 , a 5 } ;
Step 2. Let T = c o r e ( C ) = { a 4 , a 6 , a 7 } ; then, U T D = { x 1 , x 9 , x 12 } and U C D = { x 1 , x 6 , x 9 , x 12 } , U T D U C D ;
Step 3. Enter the first circulation. Initialization: min = 1 , max = | C | | c o r e ( C ) | = 8 3 = 5 , mid = [ min + max 2 ] = [ 1 + 5 2 ] = 3 , t e m p t = T = { a 4 , a 6 , a 7 } . Attributes numbered from 1 to 3 in Z are added to T; we can obtain T = { a 4 , a 6 , a 7 , a 2 , a 8 , a 1 } . It can be obtained that U T D = { x 1 , x 6 , x 9 , x 12 } , U C D = { x 1 , x 6 , x 9 , x 12 } , U T D = U C D . Since min = 1 , mid = 3 , min mid , so max = mid = 3 , T = t e m p t = { a 4 , a 6 , a 7 } . Then, enter the second circulation, min = 1 , mid = [ min + max 2 ] = [ 1 + 3 2 ] = 2 , attributes numbered from 1 to 2 in Z are added to T, we have T = { a 4 , a 6 , a 7 , a 2 , a 8 } . It is easy to calculate U T D = { x 1 , x 6 , x 9 , x 12 } , U C D = { x 1 , x 6 , x 9 , x 12 } , U T D = U C D . As min = 1 , mid = 2 , min mid , so max = mid = 2 , T = t e m p t = { a 4 , a 6 , a 7 } . Then enter the third circulation, mid = [ min + max 2 ] = [ 1 + 2 2 ] = 1 , t e m p t = T = { a 4 , a 6 , a 7 } . The attribute numbered 1 in Z is added to T; thus, T = { a 4 , a 6 , a 7 , a 2 } . It can be obtained that U T D = { x 1 , x 9 , x 12 } , U C D = { x 1 , x 6 , x 9 , x 12 } , U T D U C D . As max mid = 2 1 = 1 , we can have T = { a 4 , a 6 , a 7 , a 2 , a 8 } . Consequently, the reduct is T = { a 2 , a 4 , a 6 , a 7 , a 8 } .
From the above calculation, we acquire the reduct T = { a 4 , a 6 , a 7 , a 2 , a 8 } . This means that our algorithm is effective. In addition, the reduction result in reference [23] is T = { a 1 , a 3 , a 4 , a 5 , a 6 , a 8 } . Obviously, the number of attributes in our reduction is one less, so our algorithm is more efficient.
The proposed attribute-approximation algorithm adopts a bisection search method, which effectively reduces the space of candidate attributes in each iteration, thus significantly reducing the number of iterations required for attribute reduction. In contrast, the attribute approximation method in [23] is based on a tolerance relation similarity matrix and employs a strategy of ranking attributes by their importance and adding them gradually. This approach may lead to more iterations on large-scale datasets, which significantly increases the time complexity. On the other hand, our approach quantifies the uncertainty and incompleteness of data by conditional entropy, which, in turn, improves the accuracy of attribute-importance assessment. Comparatively, although the method in [23] is able to deal with incomplete data by extended rough set theory, its adaptability to missing values in data may be insufficient, which, in turn, weakens the accuracy of attribute approximation.
Example 3.
In order to verify the effectiveness of our algorithm for decision systems with multiple attributes, we run the algorithm with the hepatitis data set in the UCI machine learning repository (http://archive.ics.uci.edu/datasets accessed on 21 June 2023). The hepatitis data set is considered as an incomplete decision table ( U , C D ) , where U = { x 1 , x 2 , , x 155 } , C = { a , b , , s } , D = { w } , as shown in Table 3
Using Python software 3.11.0, we implement our binsearch heuristic reduction algorithm and successfully obtain the attribute-reduction result of C as T = { q , p , b , s , o , r , g , i , c } . Moreover, it can be concluded that our algorithm can be successfully applied in many practical problems, especially for large incomplete information systems.

5. Conclusions

In this paper, we introduce a novel conditional entropy in an incomplete decision information system and use conditional entropy as heuristic knowledge to design a binsearch heuristic attribute-reduction algorithm. Because the algorithm can test multiple attributes at one time, the time complexity is significantly reduced. Therefore, when dealing with incomplete information systems with large attributes, the advantage of this algorithm is obvious. In addition, two examples show that our algorithm can obtain attribute-reduction results quickly and accurately. It is worth mentioning that the algorithm is effective not only for consistent decision information systems, but also for inconsistent decision information systems. In the future, we will focus on extending the proposed heuristic reduction algorithm to a variety of information systems, so as to obtain more accurate decision rules.

Author Contributions

Methodology, Y.B.; writing—original draft preparation, Y.B. and S.C.; writing—review and editing, Y.B.; supervision, Y.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by Natural Science Foundation of Xinjiang Uygur Autonomous Region (2023D01C03).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Akram, M.; Ali, G.; Alcantud, J.C.R. Attributes reduction algorithms for m-polar fuzzy relation decision systems. Int. J. Approx. Reason. 2022, 140, 232–254. [Google Scholar] [CrossRef]
  2. Chu, X.L.; Sun, B.Z.; Chu, X.D.; Wu, J.Q.; Han, K.Y.; Zhang, Y.; Huang, Q.C. Multi-granularity dominance rough concept attribute reduction over hybrid information systems and its application in clinical decision-making. Inf. Sci. 2022, 597, 274–299. [Google Scholar] [CrossRef]
  3. He, J.L.; Qu, L.D.; Wang, Z.H.; Chen, Y.Y.; Luo, D.M. Attribute reduction in an incomplete categorical decision information system based on fuzzy rough sets. Artif. Intell. Rev. 2022, 55, 5313–5348. [Google Scholar] [CrossRef]
  4. Hu, M.; Tsang, E.C.C.; Guo, Y.T.; Chen, D.G.; Xu, W.H. Attribute reduction based on overlap degree and k-nearest-neighbor rough sets in decision information systems. Inf. Sci. 2022, 584, 301–324. [Google Scholar] [CrossRef]
  5. Hu, M.; Tsang, E.C.C.; Guo, Y.T.; Xu, W.H. Fast and robust attribute reduction based on the separability in fuzzy decision systems. IEEE Trans. Cybern. 2021, 52, 5559–5572. [Google Scholar] [CrossRef]
  6. Huang, C.; Huang, C.-C.; Chen, D.-N.; Wang, Y. Decision rules for renewable energy utilization using rough set theory. Axioms 2023, 12, 811. [Google Scholar] [CrossRef]
  7. Lang, G.M.; Cai, M.J.; Fujita, H.M.H.; Xiao, Q.M. Related families-based attribute reduction of dynamic covering decision information systems. Knowl. Based Syst. 2018, 162, 161–173. [Google Scholar] [CrossRef]
  8. Zhou, Y.; Bao, Y.L. A novel attribute reduction algorithm for incomplete information systems based on a binary similarity matrix. Symmetry 2023, 15, 674. [Google Scholar] [CrossRef]
  9. Liang, B.H.; Jin, E.L.; Wei, L.F.; Hu, R.Y. Knowledge granularity attribute reduction algorithm for incomplete systems in a clustering context. Mathematics 2024, 12, 333. [Google Scholar] [CrossRef]
  10. Liu, X.F.; Dai, J.H.; Chen, J.L.; Zhang, C.C. A fuzzy [formula omitted]-similarity relation-based attribute reduction approach in incomplete interval-valued information systems. Appl. Soft Comput. J. 2021, 109, 107593. [Google Scholar] [CrossRef]
  11. Zhang, C.L.; Li, J.J.; Lin, Y.D. Knowledge reduction of pessimistic multigranulation rough sets in incomplete information systems. Soft Comput. 2021, 25, 12825–12838. [Google Scholar] [CrossRef]
  12. Skowron, A.; Rauszer, C. The discernibility matrices and functions in information systems. Intell. Decis. Support 1992, 21, 331–362. [Google Scholar]
  13. Liu, G.L. Attribute reduction algorithms determined by invariants for decision tables. Cogn. Comput. 2022, 14, 1818–1825. [Google Scholar] [CrossRef]
  14. Liu, G.L. Using covering reduction to identify reducts for object-oriented concept lattices. Axioms 2022, 11, 381. [Google Scholar] [CrossRef]
  15. Liu, G.L.; Hua, Z.; Chen, Z.H. A general reduction algorithm for relation decision systems and its application. Knowl. Based Syst. 2017, 119, 87–93. [Google Scholar] [CrossRef]
  16. Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 623–656. [Google Scholar] [CrossRef]
  17. Zheng, K.F.; Wang, X.J. Feature selection method with joint maximal information entropy between features and class. Pattern Recognit. 2018, 77, 30–44. [Google Scholar] [CrossRef]
  18. Jiang, F.; Sui, Y.F.; Zhou, L. A relative decision entropy-based feature selection approach. Pattern Recognit. 2015, 48, 2151–2163. [Google Scholar] [CrossRef]
  19. Dai, J.H.; Wang, W.T.; Tian, H.W.; Liu, L. Attribute selection based on a new conditional entropy for incomplete decision systems. Knowl. Based Syst. 2013, 39, 207–213. [Google Scholar] [CrossRef]
  20. Thuy, N.; Wongthanavasu, S. On reduction of attributes in inconsistent decision tables based on information entropies and stripped quotient sets. Expert Syst. Appl. 2019, 137, 308–323. [Google Scholar] [CrossRef]
  21. Zou, L.; Ren, S.Y.; Sun YBYang, X.H. Attribute reduction algorithm of neighborhood rough set based on supervised granulation and its application. Soft Comput. 2023, 27, 1565–1582. [Google Scholar] [CrossRef]
  22. Chen, J.Y.; Zhu, P. A variable precision multigranulation rough set model and attribute reduction. Soft Comput. 2023, 27, 85–106. [Google Scholar] [CrossRef]
  23. Zhou, J.; Xu, E.; Li, Y.H.; Wang, Z.; Liu, Z.X.; Bai, X.Y.; Huang, X.Y. A new attribute reduction algorithm dealing with the incomplete information system. In Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Zhangjiajie, China, 10–11 October 2009; pp. 12–19. [Google Scholar]
  24. Liu, G.L.; Li, L.; Yang, J.T.; Feng, Y.B.; Zhu, K. Attribute reduction approaches for general relation decision systems. Pattern Recognit. Lett. 2015, 65, 81–87. [Google Scholar] [CrossRef]
  25. Shannon, C.; Weaver, W. The Mathematical Theory of Communication; The University of Illinois Press: Urbana, IL, USA, 1949; p. 60. [Google Scholar]
  26. Zhao, X.Q. Basis and Application of Information Theory; Mechanical Industry Press: Beijing, China, 2015. (In Chinese) [Google Scholar]
Figure 1. The flowchart corresponding to the proposed algorithm.
Figure 1. The flowchart corresponding to the proposed algorithm.
Axioms 13 00736 g001
Table 1. Incomplete decision Table 1.
Table 1. Incomplete decision Table 1.
a 1 a 2 a 3 a 4 d
x 1 23201
x 2 212
x 3 211
x 4 23211
x 5 332
x 6 001
x 7 32131
x 8 12
Table 2. Incomplete decision Table 2.
Table 2. Incomplete decision Table 2.
a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 d
x 1 3211100
x 2 23201310
x 3 23201311
x 4 212011
x 5 2112011
x 6 23213111
x 7 331020
x 8 000201
x 9 321311211
x 10 11000
x 11 21010
x 12 3210230
Table 3. The representation symbol of the attributes of the hepatitis data set.
Table 3. The representation symbol of the attributes of the hepatitis data set.
abcde
a g e s e x s t e r o i a n t i v i r a l s f a t i g u e
fghij
m a l a i s e a n o r e x i a l i v e r g i g l i v e r f i r m s p l e e n p a l p a b l e
klmno
s p i d e r s a s c i t e v a r i c e s b i l i r u b i n a l k p h o s p h a t e
pqrsw
s g o t a l b u m i n p r o t i m e h i s t o l o g y c l a s s
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bao, Y.; Cheng, S. A Heuristic Attribute-Reduction Algorithm Based on Conditional Entropy for Incomplete Information Systems. Axioms 2024, 13, 736. https://doi.org/10.3390/axioms13110736

AMA Style

Bao Y, Cheng S. A Heuristic Attribute-Reduction Algorithm Based on Conditional Entropy for Incomplete Information Systems. Axioms. 2024; 13(11):736. https://doi.org/10.3390/axioms13110736

Chicago/Turabian Style

Bao, Yanling, and Shumin Cheng. 2024. "A Heuristic Attribute-Reduction Algorithm Based on Conditional Entropy for Incomplete Information Systems" Axioms 13, no. 11: 736. https://doi.org/10.3390/axioms13110736

APA Style

Bao, Y., & Cheng, S. (2024). A Heuristic Attribute-Reduction Algorithm Based on Conditional Entropy for Incomplete Information Systems. Axioms, 13(11), 736. https://doi.org/10.3390/axioms13110736

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop