Next Article in Journal
Conditional Quantization for Some Discrete Distributions
Previous Article in Journal
Mathematical Modelling and Performance Assessment of Neural Network-Based Adaptive Law of Model Reference Adaptive System Estimator at Zero and Very Low Speeds in the Regenerating Mode
 
 
Due to scheduled maintenance work on our database systems, there may be short service disruptions on this website between 10:00 and 11:00 CEST on June 14th.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unsupervised Attribute Reduction Algorithms for Multiset-Valued Data Based on Uncertainty Measurement

1
School of Computer Science, Zhuhai College of Science Technology, Zhuhai 519000, China
2
School of Computer Science and Engineering, Yulin Normal University, Yulin 537000, China
3
Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou 510640, China
4
School of Alibaba Cloud Big Data Application, Zhuhai College of Science and Technology, Zhuhai 519041, China
5
College of Mathematics and Information Science, Guangxi University, Nanning 530004, China
*
Authors to whom correspondence should be addressed.
Mathematics 2025, 13(11), 1718; https://doi.org/10.3390/math13111718
Submission received: 19 April 2025 / Revised: 13 May 2025 / Accepted: 18 May 2025 / Published: 23 May 2025

Abstract

:
Missing data introduce uncertainty in data mining, but existing set-valued approaches ignore frequency information. We propose unsupervised attribute reduction algorithms for multiset-valued data to address this gap. First, we define a multiset-valued information system (MSVIS) and establish θ -tolerance relation to form the information granules. Then, θ -information entropy and θ -information amount are introduced as uncertainty measures. Finally, these two UMs are used to design two unsupervised attribute reduction algorithms in an MSVIS. The experimental results demonstrate the superiority of the proposed algorithms, achieving average reductions of 50% in attribute subsets while improving clustering accuracy and outlier detection performance. Parameter analysis further validates the robustness of the framework under varying missing rates.

1. Introduction

1.1. Research Background

Data with missing information values, also called missing data, are a common occurrence in datasets for various reasons. Simple approaches like listwise deletion handle missing data by removing incomplete records. Despite their simplicity, such methods risk losing valuable information and introducing bias. Moreover, it could result in bias sometimes [1]. Data imputation is a statistical method that fills the missing values by applying reasonable rules. Common imputation techniques include mean substitution and regression imputation [1]. Despite their utility, imputation methods also carry the risk of inducing bias [2].
On the other hand, mathematical models such granular computing (GrC) [3] and rough set theory (RST) [4] are effective tools for dealing with uncertainty. GrC is a significant methodology in artificial intelligence. It has its unique theory, which tries to obtain a granular structure representation of a problem. Providing a basic conceptual framework, GrC is broadly researched in pattern recognition [5] and data mining [6]. As a realistic approach of GrC, RST provides a mathematical tool to handle uncertainty, such as imprecision, inconsistency, and incomplete information [7,8]. An information system (IS) was also presented by Pawlak [4]. The vast majority of applications of RST are bound with an IS.
In RST, a dataset with missing information values is usually represented by an incomplete information system (IIS). An SVIS represents each missing information value under an attribute with all possible values, and each information value that is not missing with a set of its original value. Thus, an SVIS can be obtained from an IIS. As an effective way of handling missing information values, SVIS has drawn great attention from researchers. For instance, Chen et al. [9] investigated the attribute reduction of an SVIS based on a tolerance relation. Dai et al. [8] investigated UMs in an SVIS. Furthermore, an SVIS with missing information values is also studied. Xie et al. [10] investigated the UMs for an incomplete probability SVIS. Chen et al. [11] presented UMs for an incomplete SVIS by using Gaussian kernel.
Feature selection, also known as attribute reduction in RST, is used to reduce redundant attributes and the complexity of calculation for high-dimensional data while improving or maintaining the algorithm performance of a particular task. UM holds particular relevance in the context of attribute reduction within an information system. Zhang et al. [12] investigated UM for categorical data using fuzzy information structures and applied it to attribute reduction. Gao et al. [13] studied a monotonic UM for attribute reduction based on granular maximum decision entropy.
An SVIS is a valuable tool for handling datasets with missing information values. Specifically, an SVIS approach involves replacing the missing information values of an attribute with a set comprising all the possible values that could exist under the same attribute. Meanwhile, the existing information values are substituted with a set containing the original values. By employing this technique, a dataset containing missing information values can be converted into an SVIS, enabling the application of further processes such as calculating the Jaccard distance within the framework of an SVIS.
The study on attribute reduction in an SVIS are abundant. Just to name a few, Peng et al. [14] delved into uncertainty measurement-based feature selection for set-valued data. Singh et al. [15] explored attribute selection in an SVIS based on a fuzzy similarity. Zhang et al. [16] studied attribute reduction for set-valued data based on D-S evidence theory. Liu et al. [17] proposed an incremental attribute reduction method for set-valued decision information system with variable attribute sets. Lang et al. [18] studied an incremental approach to attribute reduction of a dynamic SVIS.
However, the straightforward replacement of missing information with all possible values in an SVIS can be considered overly simplistic and may result in some loss of information. This approach fails to consider the potential variations in the occurrence frequency of different attribute values, leading to a lack of differentiation between values that may occur more frequently than others and should thus be treated distinctively.
An MSVIS has been developed to enhance the functionality of an SVIS [19,20,21]. In the MSVIS framework, the information values associated with an attribute in a dataset are organized as multisets, which allow elements to be repeated. Within an MSVIS, each missing information value is represented by a multiset, ensuring that the frequency of each value is maintained. Information values that are not missing are depicted by multisets that are equivalent to traditional sets containing only the original value.
This approach enables an MSVIS to accurately capture the frequency distribution of information values within the dataset, addressing one of the limitations of an SVIS related to potential information loss arising from oversimplified imputation strategies. By preserving the frequency information associated with each value, an MSVIS provides a more nuanced representation of the dataset, enhancing the robustness and accuracy of data analysis processes.
Despite the benefits of an MSVIS, research on attribute reduction in an MSVIS is relatively scarce compared to the extensive body of research focused on an SVIS. Huang et al. [22] developed a supervised feature-selection method for multiset-valued data using fuzzy conditional information entropy, while Li et al. [23] proposed a semi-supervised approach to attribute reduction for partially labelled multiset-valued data.
Recent advancements in unsupervised attribute reduction have primarily focused on improving computational efficiency and scalability. For instance, Feng et al. [24] proposed a dynamic attribute reduction algorithm using relative neighborhood discernibility, achieving incremental updates for evolving datasets. While their method demonstrates efficiency in handling object additions, it neglects the critical role of frequency distributions in multiset-valued data: a gap that leads to information loss in scenarios like dynamic feature evolution or missing value imputation. Similarly, He et al. [25] introduced uncertainty measures for partially labeled categorical data, yet their semi-supervised framework still relies on partial labels and predefined thresholds, limiting applicability in fully unsupervised environments. Zonoozi et al. [26] proposed an unsupervised adversarial domain adaptation framework using variational auto-encoders (VAEs) to align feature distributions across domains. While it validates the robustness of domain-invariant feature learning, it neglects frequency semantics in missing data. Chen et al. [27] introduced an ensemble regression method that assigns weights to base models based on relative error rates. Although effective for continuous targets, their approach relies on predefined error metrics and ignores the intrinsic frequency distributions in multiset-valued data.
To bridge these gaps, we propose the multiset-valued information system (MSVIS) framework in unsupervised attribute reduction, which uniquely integrates frequency-sensitive uncertainty measurement for missing data with granular computing principles. Unlike conventional SVIS-based methods (e.g., uniform set imputation), an MSVIS explicitly preserves frequency distributions (e.g., 2/S,3/M) through multisets, enabling dynamic adjustments via θ -tolerance relations. Building on this, this paper uses an MSVIS to represent a dataset with missing information values and builds a UM-based attribute reduction method within it.
( 1 ) Attribute reduction and missing data processing have been researched mainly as two separate data preprocess topics, and related work has been focused on attribute reduction in an SVIS. This paper combines them together, that is, the task of attribute reduction for missing data. The proposed method offers several advantages. Firstly, unlike some data imputation methods, it does not presume any data distribution and is totally based on data itself. Secondly, unlike the approaches proposed by Huang et al. [22] and Li et al. [23], which necessitate the presence of decision attributes in a dataset, the method introduced in this paper does not require decision attributes, thereby expanding its applicability across a broader range of datasets. Lastly, the method proposed in this paper is user-friendly and easy to implement, ensuring accessibility and facilitating its practical application in diverse research and application scenarios.
( 2 ) Different attribute reduction algorithms are compared. Particularly, the proposed attribute reduction algorithms are conducted in both MSVIS and SVIS and then are compared. Parameter analysis is also conducted to see the influence of the parameters. The experimental results show the effectiveness and superiority of the proposed algorithms.

1.2.  Organization

The structure of this paper is outlined as follows. Section 2 recalls multisets and rational probability distribution sets, and the one-to-one correspondence between them, as shown in Theorem 1. Section 3 shows that an IIS induces an MSVIS, and defines a θ -tolerance relation with the help of Theorem 1. Section 4 presents two UMs based on the θ -tolerance relation for an MSVIS. Section 5 proposes unsupervised attribute reduction algorithms based on the UMs. Section 6 carries out clustering analysis, outlier detection, and parameter analysis to show the effectiveness of the proposed algorithms. Section 7 concludes this paper.
Figure 1 depicts the structure of this paper.

2. Preliminaries

Let U = { u 1 , u 2 , , u n } be a finite object set. 2 U denotes a collection of all object subsets of U and | X | means the cardinality of X 2 U . Put
δ = U × U , = { ( u , u ) : u U } .
Definition 1
([28]). Given a non-empty finite set V, a multiset or bag M drawn from V can be defined by a count function C M : V N { 0 } .
For convenience,
M ( v ) i s u s e d t o d e n o t e C M ( v ) ( v V ) .
M ( v ) = m indicates that v occurs m times in the multiset M, which is denoted by m / v M or v m M .
Suppose V = { v 1 , v 2 , , v s } ; if  M ( v i ) = m i ( i = 1 , 2 , , s ) , then M is denoted by { m 1 / v 1 , m 2 / v 2 , , m s / v s } , i.e.,
M = { m 1 / v 1 , m 2 / v 2 , , m s / v s } .
Definition 2
([28]). Consider a non-empty finite set V, and let M and N be two multisets drawn from V. The following definitions apply:
( 1 )   M = N M ( v ) = N ( v ) ( v V ) ;
( 2 )   M N M ( v ) N ( v ) ( v V ) ;
( 3 )   P = M N P ( v ) = M ( v ) N ( v ) ( v V ) ;
( 4 )   P = M N P ( v ) = M ( v ) N ( v ) ( v V ) .
( 5 )   P = M N P ( v ) = M ( v ) + N ( v ) ( v V ) ;
( 6 )   P = M N P ( v ) = ( M ( v ) N ( v ) ) 0 ( v V ) .
Definition 3
([29]). Let V = { v 1 , v 2 , , v s } and P = v 1 , v 2 , , v s p 1 , p 2 , , p s , P is called a probability distribution set over V, if  i , 0 p i 1 and i = 1 s p i = 1 ,
P could be represented as a mapping P : V [ 0 , 1 ] , i , P ( v i ) = p i .
Definition 4
([19]). Let V = { v 1 , v 2 , , v s } , and  P = v 1 , v 2 , , v s p 1 , p 2 , , p s be a probability distribution set over V. If  i , p i is a rational number, P is referred to as a rational probability distribution set over V; otherwise, P is referred to as an irrational probability distribution set over V.
Definition 5
([29]). Let V = { v 1 , v 2 , , v s } , and let
P = v 1 , v 2 , , v s p 1 , p 2 , , p s , Q = v 1 , v 2 , , v s q 1 , q 2 , , q s
be two probability distribution sets overs V. The well-known Hellinger distance between P and Q is as follows
H D ( P , Q ) = 1 2 i = 1 s ( p i q i ) 2 .
Definition 6
([19]). Given V = { v 1 , v 2 , , v s } , let M = { m 1 / v 1 , m 2 / v 2 , , m s / v s } be a multiset drawn from V. Put  
P M = v 1 , v 2 , , v s p 1 , p 2 , , p s ,
where p i = m i m 1 + m 2 + + m s ( i = 1 , 2 , , s ) . Apparently, i , p i is rational, and  P M is a rational probability distribution set over V. P M is referred to as the probability distribution set induced by M.
It is obvious that
i , P M ( v i ) = p i = m i m 1 + m 2 + + m s = M ( v i ) M ( v 1 ) + M ( v 2 ) + + M ( v s ) .
Definition 7
([19]). Let V = { v 1 , v 2 , , v s } , and  P = v 1 , v 2 , , v s p 1 , p 2 , , p s be a rational probability distribution set over V. i , denote p i = m i n i , where m i and n i are both rational. Let n be the least common multiple of n 1 , n 2 ,⋯, n s , denoted as n = [ n 1 , n 2 , , n s ] . Obviously, i , n has the factorization n = k i n i ( k i N ). By p 1 + p 2 + + p s = 1 , we have n = k 1 m 1 + k 2 m 2 + + k s m s . Define
M P = { k 1 m 1 / v 1 , k 2 m 2 / v 2 , , k s m s / v s } .
Apparently, M P is a multiset drawn from V. M P is referred to as the multiset induced by P.
It can be observed through simple calculations that
i , M P ( v i ) = k i m i = k i p i n i = P ( v i ) n = P ( v i ) [ n 1 , n 2 , , n s ] .
Theorem 1
([19]). Given V = { v 1 , v 2 , , v s } , denote
Ω = { M : M i s a m u l t i s e t d r a w n f r o m V }
and
Ψ = { P : P i s a r a t i o n a l p r o b a b i l i t y d i s t r i b u t i o n s e t o v e r V } .
Then, there exists a one-to-one correspondence between Ω and Ψ.
In the light of Theorem 1, we can treat multisets in an MSVIS as rational probability distribution sets. In the next section, a tolerance relation is defined based on this fact.

3. Multiset-Valued Information Systems and a Tolerance Relation in an MSVIS

In this section, we show how an IIS induces an MSVIS. Then, with the help of Theorem 1, a tolerance relation is defined.
Definition 8
([4]). Suppose that non-empty sets U and A are the object set and the feature set, respectively. Then, ( U , A ) is defined as an information system (IS); if  a A , a can determine an information function   a : U V a , where V a = { a ( u ) : u U } .
An IS ( U , A ) is called an incomplete information system (IIS), if there is a A , such that V a contains missing information values (denoted by *).
Given an IIS ( U , A ) , denote
V a = V a { a ( u ) : a ( u ) = } , a A .
Example 1.
An IIS ( U , A ) is shown in Table 1, where
V a 1 = { S i c k ( S ) , M i d d l e ( M ) , N o ( N ) } , V a 2 = { Y e s ( Y ) , N o ( N ) } ,
V a 3 = { H i g h ( H ) , L o w ( L ) , N o r m a l ( N ) } , V a 4 = V a 4 = { F l u ( F ) , R h i n i t i s ( R ) , H e a l t h ( H ) } .
Table 1. An IIS.
Table 1. An IIS.
UHeadache ( a 1 )Muscle Pain ( a 2 )Temperature ( a 3 )Symptom ( a 4 )
u 1 SickYesHighFlu
u 2 SickYesLowFlu
u 3 Middle*NormalFlu
u 4 NoYesNormalFlu
u 5 *YesNormalRhinitis
u 6 MiddleNo*Rhinitis
u 7 NoNoLowHealth
u 8 No**Health
u 9 *YesLowHealth
Definition 9
([19]).  Given an IIS ( U , A ) where U = { u 1 , u 2 , , u n } , ( U , A ) is called a multiset-valued information system (MSVIS), if  a A , a ( u 1 ) , a ( u 2 ) , ⋯ a ( u n ) are all multisets drawn from one set.
We say ( U , P ) is a subsystem of ( U , A ) if P A .
In an MSVIS ( U , A ) , where U = { u 1 , u 2 , , u n } , A = { a 1 , a 2 , , a m } ,   i , a i ( u 1 ) , a i ( u 2 ) , and ⋯ a i ( u n ) are multisets drawn from V i = { v i 1 , v i 2 , , v i s i } :
a i ( u 1 ) = { k 1 ( 1 ) / v i 1 , k 2 ( 1 ) / v i 2 , , k s i ( 1 ) / v i s i } ,
a i ( u 2 ) = { k 1 ( 2 ) / v i 1 , k 2 ( 2 ) / v i 2 , , k s i ( 2 ) / v i s i } ,
a i ( u n ) = { k 1 ( n ) / v i 1 , k 2 ( n ) / v i 2 , , k s i ( n ) / v i s i } .
Here, v stands for information values and k stands for the times they occur. Note that a general MSVIS does not necessarily cope with missing data. It could also come as a result of data fusion [19]. For the situation of missing data, we have Definition 10 and Example 2, as below.
Definition 10
([19]). Given an IIS ( U , A ) where U = { u 1 , u 2 , , u n } , denote V a = { v 1 , v 2 , , v s } , a A . i , let m i express the number of occurrences of v i in { a ( u 1 ) , a ( u 2 ) , , a ( u n ) } { } , where V a is an ordinary set and { a ( u 1 ) , a ( u 2 ) , , a ( u n ) } { } is a multiset. If  a ( u ) = , then a ( u ) is substituted with { m 1 / v 1 , m 2 / v 2 , , m s / v s } ; if a ( u ) is not a missing value, say a ( u ) = v j , then a ( u ) is substituted with { 0 / v 1 , , 0 / v j 1 , 1 / v j , 0 / v j + 1 , , 0 / v s } . This process gives an MSVIS. We say it is an MSVIS induced by the IIS ( U , A ) .
Example 2 below shows the process of an IIS inducing an MSVIS in more detail.
Example 2
(Continued from Example 1). Table 2 is an MSVIS induced by the IIS in Table 1.
   Let us look at Table 1, and take attribute a 1 , for example. There are three different information values: Sick(S), Middle(M), and No(N), and one missing information value: *. So, according to Definition 10, attribute a 1 gives an ordinary set V a 1 = { S , M , N } and a multiset { a 1 ( u 1 ) , a 1 ( u 2 ) , , a 1 ( u 9 ) } { } = { S , S , M , N , M , N , N } = { 2 / S , 2 / M , 3 / N } . Since a 1 ( u 5 ) = a 1 ( u 9 ) = in Table 1, they are replaced by { 2 / S , 2 / M , 3 / N } in Table 2. On the other hand, in Table 1, a 1 ( u 1 ) = S i c k ( S ) , then a 1 ( u 1 ) is replaced by { 1 / S , 0 / M , 0 / N } in Table 2. Following this process, we derive the MSVIS in Table 2 from the IIS in Table 1.
Now that the issue of missing data is addressed by an MSVIS induced from an IIS, information granules in the MSVIS can be designed for further modeling. Note that, since multisets are not convenient for calculations, their corresponding probability distribution sets are utilized in Definition 11.
Definition 11
([19]). Given an MSVIS ( U , A ) , P A and θ [ 0 , 1 ] , a tolerance relation could be defined as follows:
R P θ = { ( u , u ) U × U : a P , H D ( P a ( u ) , P a ( u ) ) θ } ,
where P a ( u ) and P a ( u ) represent the probability distribution sets induced by a ( u ) and a ( u ) , respectively.
Clearly, R P θ = a P R a θ , where R { a } θ = R a θ .
Definition 12
([19]). Given an MSVIS ( U , A ) , P A and θ [ 0 , 1 ] , the following θ-tolerance class of u U serves as the information granule in an MSVIS:
R P θ ( u ) = { u U : ( u , u ) R P θ } .
Apparently, R P θ ( u ) = a P R a θ ( u ) .

4. Uncertainty Measurement for an MSVIS

In this section, two UMs for an MSVIS are reviewed, which will be utilized for the attribute reduction in the next section.
Definition 13
([19]). Given an MSVIS ( U , A ) , P A , and θ [ 0 , 1 ] , the definition of θ-information entropy for subsystem ( U , P ) is as follows:
H θ ( P ) = i = 1 n 1 n log 2 | R P θ ( u i ) | n .
Proposition 1.
Given an MSVIS ( U , A ) , P A , and θ [ 0 , 1 ] , we have
0 H θ ( P ) l o g 2 n .
Moreover, H θ = 0 if R P θ = δ ; H θ = log 2 n if R P θ = .
Proof. 
Please see “Appendix A”.   □
Proposition 2
([19]). Let ( U , A ) be an MSVIS.
( 1 ) If P Q A , then θ [ 0 , 1 ] , H θ ( P ) H θ ( Q ) ;
( 2 ) If 0 θ 1 θ 2 1 , then P A , H θ 2 ( P ) H θ 1 ( P ) .
Proof. 
Please see “Appendix A”.    □
Definition 14
([19]). Given an MSVIS ( U , A ) , P A , and θ [ 0 , 1 ] , the definition of the θ information amount for subsystem ( U , P ) is as follows:
E θ ( P ) = i = 1 n 1 n ( 1 | R P θ ( u i ) | n ) .
Proposition 3.
Given an MSVIS ( U , A ) , P A , and θ [ 0 , 1 ] , we have
0 E θ ( P ) 1 1 n .
Moreover, E θ = 0 if R P θ = δ ; E θ = 1 1 n if R P θ = .
Proof. 
Please see “Appendix A”.    □
Proposition 4
([19]). Let ( U , A ) be an MSVIS.
( 1 ) If P Q A , then θ [ 0 , 1 ] , E θ ( P ) E θ ( Q ) .
( 2 ) If 0 θ 1 θ 2 1 , then P A , E θ 2 ( P ) E θ 1 ( P ) .
Proof. 
Please see “Appendix A”.    □
The monotonicity showed in Propositions 2 and 4 demonstrates the validity of the proposed UMs in an MSVIS.

5. Unsupervised Attribute Reduction Algorithms in an MSVIS

In this section, we demonstrate that the UMs H θ and E θ can be used for attribute reduction in an MSVIS. Then, we propose two specific unsupervised attribute reduction algorithms using these UMs.
Definition 15.
Given an MSVIS ( U , A ) , P A , and θ [ 0 , 1 ] , P is called a θ-coordination subset of A, if  R P θ = R A θ .
For convenience, we denote the family of all θ -coordination subsets of A by c o θ ( A ) .
Definition 16.
Given an MSVIS ( U , A ) , P A , and θ [ 0 , 1 ] , P is called a θ-reduct of A, if  P c o θ ( A ) and a P , P { a } c o θ ( A ) .
For convenience, the family of all θ -reducts of A is denoted by r e d θ ( A ) .
Theorem 2.
Let ( U , A ) be an MSVIS, P A and θ [ 0 , 1 ] . The following three conditions are equal:
( 1 ) P c o θ ( A ) ;
( 2 ) H θ ( P ) = H θ ( A ) ;
( 3 ) E θ ( P ) = E θ ( A ) .
Proof. 
Please see “Appendix A”.    □
Corollary 1.
Let ( U , A ) be an MSVIS, P A , and θ [ 0 , 1 ] . The following three conditions are equal:
( 1 ) P r e d θ ( A ) ;
( 2 ) H θ ( P ) = H θ ( A ) and a A , H θ ( P { a } ) H θ ( A ) ;
( 3 ) E θ ( P ) = E θ ( A ) and a A , E θ ( P { a } ) E θ ( A ) .
Proof. 
According to Theorem 2, this proof is straightforward.    □
According to this corollary, below we give two attribute reduction algorithms based on E θ and H θ for an MSVIS, respectively. For Algorithms 1 and 2, the attribute selection process is heuristic. It starts from an empty set and uses the UM to select attributes to add to the candidate attribute subset. The algorithms are terminated when the UM of the candidate attribute subset reaches the UM of the whole attribute set, and a reduct is found. For a dataset with n objects and m attributes, the time complexity for searching a reduct is O ( m 2 ) , while the time complexity for calculating E θ ( A ) or H θ ( A ) is O ( n 2 m ) . Consequently, the time complexity of Algorithms 1 and 2 is O ( m 2 + n 2 m ) .  
Algorithm 1:Unsupervised attribute reduction algorithm based on E θ in an MSVIS ( E θ -MSVIS).
Mathematics 13 01718 i001
Algorithm 2:Unsupervised attribute reduction algorithm based on H θ in an MSVIS ( H θ -MSVIS).
Mathematics 13 01718 i002

6. Experimental Analysis

In this section, cluster analysis and outlier detection are used to verify the effectiveness of the proposed unsupervised attribute reduction algorithms, and the influence of parameter θ and missing rate λ are studied. For an MSVIS ( U , A ) , denote the number of objects and the number of attributes by n and m, respectively, the missing rate of (U, A), denoted as λ , can be calculated as follows:
λ = n u m b e r o f m i s s i n g v a l u e s m × n

6.1. Cluster Analysis

In this subsection, clustering on reduced data is conducted to verify the reduction effect of the proposed algorithms.
For this part, nine datasets from UCI [30] are used, as shown in Table 3. Every dataset in Table 3 is transformed into an MSVIS with λ = 0.1 , except for dataset. An whose missing rate is originally larger than 0.1. Numerical attributes in datasets An, Sp, and Wa are discretized by the method proposed in [31].
For clustering, k-modes clustering algorithm is adopted. k-modes is used for clustering categorical variables. Unlike k-means clustering which clusters numerical data based on Euclidean distance, k-modes defines clusters based on the number of matching categorical attribute values between data points.
The clustering effects are evaluated by three criteria: Davies–Bouldin index (DB index), silhouette coefficient (SC), and Calinski–Harabasz index (CH index). A smaller Davies–Bouldin index value suggests better clustering effect [32], and larger silhouette coefficient value or larger Calinski–Harabasz index value indicates better clustering effects [33,34].
Using criterion SC, the optimal reduction results by E θ -MSVIS and H θ -MSVIS are showed in Table 4. It is evident from Table 4 that E θ -MSVIS and H θ -MSVIS effectively reduce the number of attributes, and that different datasets may require different parameters to achieve optimal clustering results.
The clustering results are depicted through PCA visualization (Figure 2, Figure 3 and Figure 4). Taking subgraph Sp (second line, third column) for example, in Figure 2, the PCA algorithm recognizes almost all of the data as one class (purple), and the few cyan dots are close to the purple dots. While in Figure 3 and Figure 4, the number of cyan dots and purple dots are closer to their real number in the original dataset, and there is a clear border line to separate these two kinds of dots.
To show the effectiveness and improvement of algorithms E θ -MSVIS and H θ -MSVIS, four other representative algorithms are compared: Unsupervised Quick Reduct (UQR) [35], Unsupervised Entropy-Based Reduct (UEBR) [36], E θ -SVIS, and H θ -SVIS. UEBR and UQR are conducted with an MSVIS. E θ -SVIS and H θ -SVIS are the SVIS version of E θ -MSVIS and H θ -MSVIS, respectively, i.e., E θ -MSVIS and H θ -MSVIS realised in SVIS.
The reduction results evaluated by DB, SC, and CH are shown in Table 5, Table 6 and Table 7, respectively. Obviously, by all three criteria, E θ -MSVIS and H θ -MSVIS show improvements to E θ -SVIS and H θ -SVIS, respectively, and are both better than raw data, UQR and UEBR.

6.2. Outlier Detection

In this subsection we use the experiment of outlier detection to test the performance of the proposed attribute reduction algorithms. Three outlier detection algorithms Dis [37], kNN [38] and Seq [39] are applied to datasets in Table 8. For the sake of simplicity, this subsection only considers E θ -MSVIS.
Downsampling is a common approach to forming suitable datasets for the evaluation of outlier detection [40,41]. Here, we follow the experimental technique of [41] to form an imbalanced distribution for the datasets Mo, Vr, Wa, Sp, and Io. As in clustering experiments, ref. [31] is used to discretize Sp, Wa, and Io.
The performance of each algorithm before and after reduction is evaluated by AUC (area under curve), as shown in Table 9.
AUC is a common indicator to evaluate and compare the performance of binary classification model. The larger the AUC, the better the performance. Like the experiment of clustering analysis, the optimal reduction results are listed in Table 10.
On the datasets Bw and Ly, the performance of three algorithms remains high after reduction. On the dataset Vr and Wa, all three algorithms improve significantly after reduction. In other cases, except for Algorithm Seq on dataset Sp, reduced data all achieve better performance. In general, the reduction algorithm can help promote or maintain performance for all three outlier detection algorithms.

6.3. Parameter Analysis

As have been showed in the experiments of clustering and outlier detection, the optimal parameter θ is different for different datasets and tasks. In this subsection, the influence of parameter θ and the missing rate λ are studied. For the sake of simplicity, only E θ -MSVIS and the task of outlier detection is considered.
In Definition 11, θ has a significant influence on tolerance classes. The tolerance classes are then used to define UMs E θ , H θ . Finally, the proposed attribute reduction algorithms based on UMs E θ and H θ are influenced by θ , and the outlier detection experiments are carried out on the reduced attributes. To quantify the influence of λ and θ , two indicators are used: AUC and reduction rate. The reduction rate is defined by the ratio of the number of reduced attributes to the total number of attributes.
The results are showed in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19 and Figure 20. The optimal parameter generally depends on specific dataset. On some datasets, such as Bw, Ly, Sf, and Io, all three outlier detection algorithms sustain a high performance for different θ and λ , while on others the performance of each outlier detection algorithm changes with different θ . On some datasets, such as Mo, Vr, and Io, the reduction rate generally becomes smaller when θ becomes larger. On some datasets, such as Bw, Sp, and Wa, the reduction rate does not change much for different θ . No clear correlation is observed between λ and performance trends.

6.4.  Discussion

In this study, we primarily focused on establishing the theoretical and empirical validity of our unsupervised attribute reduction framework for multiset-valued data in static, small-to-medium-scale scenarios.
The computational complexity of our framework, O ( m 2 + n 2 m ) , as noted in Section 5, may pose challenges for ultra-high-dimensional data (e.g., thousands of attributes). However, our parameter analysis (Section 6.3) demonstrates that the proposed algorithms effectively reduce dimensionality even in moderately sized datasets (e.g., the Sports Article dataset with 59 attributes, Table 4), suggesting potential scalability with optimizations. For instance, the iterative attribute selection process (Algorithms 1 and 2) inherently prioritizes critical features, which could mitigate redundancy in high-dimensional spaces.
We acknowledge that further adaptations such as integrating sparse representation techniques or parallel computing are necessary for large-scale applications. These extensions, while beyond the scope of this foundational work, are highlighted as future directions in the conclusion (Section 7). We believe our framework’s emphasis on frequency preservation and granularity control (via θ ) provides a robust theoretical basis for addressing high-dimensional challenges, particularly in domains like bioinformatics or text mining where multiset-valued representations naturally arise.
While automatic θ optimization (e.g., via metaheuristic algorithms or adaptive thresholds) would enhance usability, we intentionally preserved θ as a user-defined parameter in this foundational work. This design choice allows domain experts to align granularity with their specific objectives; for instance, selecting smaller θ for fine-grained clustering or larger θ for efficient outlier detection. Our current framework provides a theoretical and empirical basis for such extensions, prioritizing methodological transparency and reproducibility over prescriptive parameterization. We fully agree that automated parameter adaptation is a vital direction for future research, particularly in high-dimensional or streaming scenarios.
The selection of the parameter θ plays a pivotal role in balancing granularity and computational efficiency in our framework. As shown in Section 6.3, θ acts as a threshold to control the similarity between multisets via the Hellinger distance, directly influencing the size of θ -tolerance classes and subsequently the uncertainty measures. While our experiments demonstrate θ ’s task-dependent nature (e.g., smaller θ for fine-grained clustering, larger θ for efficient outlier detection), we emphasize that its empirical tuning aligns with the common practice in granular computing and rough set-based methods, where domain knowledge often guides parameter selection. Future work could explore automated θ optimization via metaheuristic algorithms (e.g., genetic algorithms) or adaptive thresholding based on data characteristics (e.g., missing rate λ ), particularly in scenarios requiring minimal human intervention.
Although our current experiments focus on small-to-medium-scale datasets, the proposed framework’s principles are extensible to high-dimensional multiset-valued data. The iterative attribute selection process (Algorithms 1 and 2) inherently prioritizes features with discriminative frequency distributions, which could mitigate the curse of dimensionality by eliminating redundant attributes. However, the computational complexity of O ( m 2 + n 2 m ) may limit scalability for ultra-high-dimensional scenarios (e.g., gene expression data with thousands of features). To address this, future extensions could integrate sparse representation techniques (e.g., L1-norm regularization) to enhance efficiency or adopt parallel computing architectures for distributed attribute reduction. These adaptations would align our method with emerging needs in bioinformatics and text mining, where multiset-valued representations naturally encode frequency-rich semantics.
To align with real-world scenarios where datasets with excessive missing rates λ are less practical, we focus on moderate missing rates ( λ = 0.1 and 0.2) in our experiments. These values reflect common missing data levels in real applications. The results in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19 and Figure 20 demonstrate that the proposed method remains robust under the setting.

7. Conclusions

In this paper, datasets containing missing information values are transformed into an MSVIS model, ensuring that the frequencies of different attribute values are taken into full consideration. This data-driven approach towards handling missing data is user-friendly and entirely based on the dataset itself. Furthermore, novel unsupervised attribute reduction algorithms for an MSVIS are presented in this paper, leveraging the concepts of θ -information amount and θ -information entropy, which serve as measures of uncertainty. The proposed method outperforms existing SVIS-based approaches (e.g., E θ -SVIS and H θ -SVIS) in clustering accuracy and in outlier detection AUC, aligning with Huang et al. [22] on the importance of frequency preservation. However, our unsupervised framework contrasts with supervised methods by eliminating dependency on decision attributes, as emphasized in Peng et al. [14]. Additionally, an analysis of the parameters θ and λ is conducted to provide further insights into their impact and importance within the proposed methodologies.
Theoretically, this work bridges granular computing and rough set theory by formalizing multisets as rational probability distributions, advancing uncertainty measurement frameworks. Practically, the method enables efficient preprocessing of medical or IoT datasets with missing values (e.g., Table 1’s symptom data) without imputation biases. Future work will optimize time complexity via hash-based granulation, enhancing scalability for real-world applications. While the proposed framework demonstrates robust performance and practical effectiveness, some limitations warrant attention. One notable drawback is its relatively high time complexity when utilizing a tolerance relation for information granules. Introducing hash technology could potentially enhance the time efficiency of the method. The MSVIS in this paper focuses primarily on discrete data. Expanding this methodology to accommodate different data types presents an intriguing area for future research. Our forthcoming work will focus on developing time-efficient unsupervised attribute reduction techniques for gene expression data that may contain missing information values.

Author Contributions

Methodology, X.G.; Software, Y.L. and H.L.; Formal analysis, Y.P. and H.L.; Investigation, X.G., Y.P. and Y.L.; Data curation, Y.L.; Writing—original draft, X.G.; Writing—review & editing, Y.P. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Guangdong Key Disciplines Project (2024ZDJS137).

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proposition A1.
Given an MSVIS ( U , A ) , P A and θ [ 0 , 1 ] , we have
0 H θ ( P ) l o g 2 n .
Moreover, H θ = 0 if R P θ = δ ; H θ = log 2 n if R P θ = .
Proof. 
Since R P θ is a tolerance relation on U, we have i , u i R P θ ( u i ) .
So i , 1 | R P θ ( u i ) | n . This implies that
i , 0 log 2 | R P θ ( u i ) | n log 2 n .
By Definition 13,
0 H θ ( P ) l o g 2 n .
If R P θ = , then i , | R P θ ( u i ) | = 1 . Thus, H θ ( P ) = log 2 n .
If R P θ = δ , then i , | R P θ ( u i ) | = n . Thus, H θ ( P ) = 0 . □
Proposition A2.
Given an MSVIS ( U , A ) , P A and θ [ 0 , 1 ] , we have
0 H θ ( P ) l o g 2 n .
Moreover, H θ = 0 if R P θ = δ ; H θ = log 2 n if R P θ = .
Proof. 
Since R P θ is a tolerance relation on U, we have i , u i R P θ ( u i ) .
So i , 1 | R P θ ( u i ) | n . This implies that
i , 0 log 2 | R P θ ( u i ) | n log 2 n .
By Definition 13,
0 H θ ( P ) l o g 2 n .
If R P θ = , then i , | R P θ ( u i ) | = 1 . Thus, H θ ( P ) = log 2 n .
If R P θ = δ , then i , | R P θ ( u i ) | = n . Thus, H θ ( P ) = 0 . □
Proposition A3.
Given an MSVIS ( U , A ) , P A and θ [ 0 , 1 ] , we have
0 E θ ( P ) 1 1 n .
Moreover, E θ = 0 if R P θ = δ ; E θ = 1 1 n if R P θ = .
Proof. 
Since R P θ is a tolerance relation on U, we have i , u i R P θ ( u i ) .
Thus i , 1 | R P θ ( u i ) | n . This implies that
i , 0 1 | R P θ ( u i ) | n 1 1 n .
By Definition 14,
0 E θ ( P ) 1 1 n .
If R P θ = , then i , | R P θ ( u i ) | = 1 . So E θ ( P ) = 1 1 n .
If R P θ = δ , then i , | R P θ ( u i ) | = n . So E θ ( P ) = 0 . □
Proposition A4.
Let ( U , A ) be an MSVIS.
( 1 ) If P Q A , then θ [ 0 , 1 ] , E θ ( P ) E θ ( Q ) .
( 2 ) If 0 θ 1 θ 2 1 , then P A , E θ 2 ( P ) E θ 1 ( P ) .
Proof. 
(1) Since P Q A , i , we have R Q θ ( u i ) R P θ ( u i ) . So
i , | R Q θ ( u i ) | | R P θ ( u i ) | .
By Definition 14,
E θ ( P ) = i = 1 n 1 n ( 1 | R P θ ( u i ) | n ) , E θ ( Q ) = i = 1 n 1 n ( 1 | R Q θ ( u i ) | n ) .
Consequently,
E θ ( P ) E θ ( Q ) .
( 2 ) Since 0 θ 1 θ 2 1 , i , we have R P θ 1 ( u i ) R P θ 2 ( u i ) . Thus
i , | R P θ 1 ( u i ) | | R P θ 2 ( u i ) | .
By Definition 14,
E θ 1 ( P ) = i = 1 n 1 n ( 1 | R P θ 1 ( u i ) | n ) , E θ 2 ( P ) = i = 1 n 1 n ( 1 | R P θ 2 ( u i ) | n ) .
Thus, E θ 2 ( P ) E θ 1 ( P ) . □
Theorem A1.
Let ( U , A ) be an MSVIS, P A and θ [ 0 , 1 ] . The following three conditions are equal:
( 1 ) P c o θ ( A ) ;
( 2 ) H θ ( P ) = H θ ( A ) ;
( 3 ) E θ ( P ) = E θ ( A ) .
Proof. 
( 1 ) ( 2 ) and ( 1 ) ( 3 ) are obvious.
( 2 ) ( 1 ) . Suppose that H θ ( P ) = H θ ( A ) . Then,
i = 1 n 1 n log 2 | R P θ ( u i ) | n = i = 1 n 1 n log 2 | R A θ ( u i ) | n .
Therefore,
i = 1 n log 2 | R P θ ( u i ) | | R A θ ( u i ) | = 0 .
Note that R A θ R P θ ; then, i , R A θ ( u i ) R P θ ( u i ) . This implies that
i , log 2 | R P θ ( u i ) | | R A θ ( u i ) | 0 .
Therefore, i , log 2 | R P θ ( u i ) | | R A θ ( u i ) | = 0 . It follows that i , R P θ ( u i ) = R A θ ( u i ) .
Thus, R P θ = R A θ . Hence,
P c o θ ( A ) .
( 3 ) ( 1 ) . Suppose that E θ ( P ) = E θ ( A ) . Then,
i = 1 n 1 n ( 1 | R P θ 1 ( u i ) | n ) = i = 1 n 1 n ( 1 | R A θ 1 ( u i ) | n ) .
So
i = 1 n ( | R P θ 1 ( u i ) | | R A θ 1 ( u i ) | ) = 0 .
Note that R A θ R P θ ; then, i , R A θ ( u i ) R P θ ( u i ) . This implies that
i , | R P θ 1 ( u i ) | | R A θ 1 ( u i ) | 0 .
So, i , | R P θ 1 ( u i ) | | R A θ 1 ( u i ) | = 0 .
Consequently, i , R P θ ( u i ) = R A θ ( u i ) , R P θ = R A θ .
Therefore,
P c o θ ( A ) .
Figure A1. The core codes of the proposed algorithms.
Figure A1. The core codes of the proposed algorithms.
Mathematics 13 01718 g0a1

References

  1. Kang, H. The prevention and handling of the missing data. Korean J. Anesthesiol. 2013, 64, 402–406. [Google Scholar] [CrossRef] [PubMed]
  2. Asendorpf, J.B.; Schoot, R.V.D.; Denissen, J.J.; Hutteman, R. Reducing bias due to systematic attrition in longitudinal studies: The benefits of multiple imputation. Int. J. Behav. Dev. 2014, 38, 453–460. [Google Scholar] [CrossRef]
  3. Zadeh, L.A. Fuzzy logic equals computing with words. IEEE Trans. Fuzzy Syst. 1996, 4, 103–111. [Google Scholar] [CrossRef]
  4. Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
  5. Pal, S.K.; Meher, S.K.; Dutta, S. Class-dependent rough-fuzzy granular space, dispersion index and classification. Pattern Recognit. 2012, 45, 2690–2707. [Google Scholar] [CrossRef]
  6. Yao, Y.Y. Granular computing for data mining. In Proceedings of the SPIE Conference on Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security, Kissimmee, FL, USA, 7–18 April 2006; pp. 1–12. [Google Scholar]
  7. Dong, L.J.; Chen, D.G.; Wang, N.; Lu, Z.H. Key energy-consumption feature selection of thermal power systems based on robust attribute reduction with rough sets. Inf. Sci. 2020, 532, 61–71. [Google Scholar]
  8. Dai, J.H.; Tian, H.W. Entropy measures and granularity measures for set-valued information systems. Inf. Sci. 2013, 240, 72–82. [Google Scholar] [CrossRef]
  9. Chen, Z.C.; Qin, K.Y. Attribute reduction of set-valued information systems based on a tolerance relation. Comput. Sci. 2010, 23, 18–22. [Google Scholar]
  10. Xie, X.L.; Li, Z.W.; Zhang, P.F.; Zhang, G.Q. Information structures and uncertainty measures in an incomplete probabilistic set-valued information system. IEEE Access 2019, 7, 27501–27514. [Google Scholar] [CrossRef]
  11. Chen, L.J.; Liao, S.M.; Xie, N.X.; Li, Z.W.; Zhang, G.Q.; Wen, C.F. Measures of uncertainty for an incomplete set-valued information system with the optimal selection of subsystems: Gaussian kernel method. IEEE Access 2020, 8, 212022–212035. [Google Scholar] [CrossRef]
  12. Zhang, Q.L.; Chen, Y.Y.; Zhang, G.Q.; Li, Z.W.; Chen, L.J.; Wen, C.F. New uncertainty measurement for categorical data based on fuzzy information structures: An application in attribute reduction. Inf. Sci. 2021, 580, 541–577. [Google Scholar] [CrossRef]
  13. Gao, C.; Lai, Z.H.; Zhou, J.; Wen, J.J.; Wong, W.K. Granular maximum decision entropy-based monotonic uncertainty measure for attribute reduction. Int. J. Approx. Reason. 2019, 104, 9–24. [Google Scholar] [CrossRef]
  14. Peng, Y.C.; Zhang, Q.L. Uncertainty measurement for set-valued data and its application in feature selection. Int. J. Fuzzy Syst. 2022, 24, 1735–1756. [Google Scholar] [CrossRef]
  15. Singh, S.; Shreevastava, S.; Som, T.; Somani, G. A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems. Soft Comput. 2020, 24, 4675–4691. [Google Scholar] [CrossRef]
  16. Zhang, Q.L.; Li, L.L. Attribute reduction for set-valued data based on D-S evidence theory. Int. J. Gen. Syst. 2022, 51, 822–861. [Google Scholar] [CrossRef]
  17. Liu, C.; Wang, L.; Yang, W.; Zhong, Q.Q.; Li, M. Incremental attribute reduction method for set-valued decision information system with variable attribute sets. J. Comput. Appl. 2022, 42, 463–468. [Google Scholar]
  18. Lang, G.M.; Li, Q.G.; Yang, T. An incremental approach to attribute reduction of dynamic set-valued information systems. Int. J. Mach. Learn. Cybern. 2014, 5, 775–788. [Google Scholar] [CrossRef]
  19. Huang, D.; Lin, H.; Li, Z.W. Information structures in a multiset-valued information system with application to uncertainty measurement. J. Intell. Fuzzy Syst. 2022, 43, 7447–7469. [Google Scholar] [CrossRef]
  20. Song, Y.; Lin, H.; Li, Z.W. Outlier detection in a multiset-valued information system based on rough set theory and granular computing. Inf. Sci. 2024, 657, 119950. [Google Scholar] [CrossRef]
  21. Zhao, X.R.; Hu, B.Q. Three-way decisions with decision-theoretic rough sets in multiset-valued information tables. Inf. Sci. 2020, 507, 684–699. [Google Scholar] [CrossRef]
  22. Huang, D.; Chen, Y.Y.; Liu, F.; Li, Z.W. Feature selection for multiset-valued data based on fuzzy conditional information entropy using iterative model and matrix operation. Appl. Soft Comput. 2023, 142, 110345. [Google Scholar] [CrossRef]
  23. Li, Z.W.; Yang, T.L.; Li, J.J. Semi-supervised attribute reduction for partially labelled multiset-valued data via a prediction label strategy. Inf. Sci. 2023, 634, 477–504. [Google Scholar] [CrossRef]
  24. Feng, W.B.; Sun, T.T. A dynamic attribute reduction algorithm based on relative neighborhood discernibility degree. Sci. Rep. 2024, 14, 15637. [Google Scholar] [CrossRef]
  25. He, J.L.; Zhang, G.Q.; Huang, D.; Wang, P.; Yu, G.J. Measures of uncertainty for partially labeled categorical data based on an indiscernibility relation: An application in semi-supervised attribute reduction. Appl. Intell. 2023, 53, 29486–29513. [Google Scholar] [CrossRef]
  26. Zonoozi, M.H.P.; Seydi, V.; Deypir, M. An unsupervised adversarial domain adaptation based on variational auto-encoder. Mach. Learn. 2025, 114, 128. [Google Scholar] [CrossRef]
  27. Chen, S.K.; Zheng, W.L. RRMSE-enhanced weighted voting regressor for improved ensemble regression. PLoS ONE 2025, 20, e0319515. [Google Scholar] [CrossRef]
  28. Jena, S.P.; Ghosh, S.K.; Tripathy, B.K. On the theory of bags and lists. Inf. Sci. 2001, 132, 241–254. [Google Scholar] [CrossRef]
  29. Nikulin, M.S. Hellinger distance. In Hazewinkel, Michiel, Encyclopedia of Mathematics; Springer Science: Berlin/Heidelberg, Germany, 2001; ISBN 978-1-55608-010-4. [Google Scholar]
  30. Dua, D.; Graff, C. UCI Machine Learning Repository; University of California: Irvine, CA, USA, 2017. [Google Scholar]
  31. Fayyad, U.M.; Irani, K.B. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France, 28 August 1993–3 September 1993; pp. 1022–1027. [Google Scholar]
  32. Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 2, 224–227. [Google Scholar] [CrossRef]
  33. Calinski, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat. 1974, 3, 1–27. [Google Scholar]
  34. Rouseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
  35. Velayutham, C.; Thangavel, K. Unsupervised quick reduct algorithm using rough set theory. J. Eng. Sci. Technol. 2011, 9, 193–201. [Google Scholar]
  36. Velayutham, C.; Thangavel, K. A novel entropy based unsupervised feature selection algorithm using rough set theory. In Proceedings of the IEEE-International Conference on Advances in Engineering, Science and Management (ICAESM-2012), Nagapattinam, India, 30–31 March 2012; pp. 156–161. [Google Scholar]
  37. Knorr, E.M.; Ng, R.T.; Tucakov, V. Distance-based outliers: Algorithms and applications. VLDB J. 2000, 8, 237–253. [Google Scholar] [CrossRef]
  38. Ramaswamy, S.; Rastogi, R.; Shim, K. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; pp. 427–438. [Google Scholar]
  39. Jiang, F.; Sui, Y.F.; Cao, C.G. Some issues about outlier detection in rough set theory. Expert Syst. Appl. 2009, 36, 4680–4687. [Google Scholar] [CrossRef]
  40. Campos, G.O.; Zimek, A.; Sander, J.; Campello, R.; Micenkova, B.; Schubert, E.; Assent, I.; Houle, M.E. On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Min. Knowl. Discov. 2016, 30, 891–927. [Google Scholar] [CrossRef]
  41. Hawkins, S.; He, H.X.; Williams, G.J.; Baxter, R.A. Outlier detection using replicator neural networks. In Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, Aix-en-Provence, France, 4–6 September 2002; pp. 170–180. [Google Scholar]
Figure 1. The workflow of this paper.
Figure 1. The workflow of this paper.
Mathematics 13 01718 g001
Figure 2. Clustering image of original datasets with PCA. Colors represent distinct classes (e.g., purple: Class 1, cyan: Class 2). Overlapping regions indicate poor separability in the unreduced feature space.
Figure 2. Clustering image of original datasets with PCA. Colors represent distinct classes (e.g., purple: Class 1, cyan: Class 2). Overlapping regions indicate poor separability in the unreduced feature space.
Mathematics 13 01718 g002
Figure 3. Clustering image of reduction by E θ -MSVIS with PCA. Colors represent distinct classes (e.g., purple: Class 1, cyan: Class 2). Overlapping regions indicate poor separability in the unreduced feature space.
Figure 3. Clustering image of reduction by E θ -MSVIS with PCA. Colors represent distinct classes (e.g., purple: Class 1, cyan: Class 2). Overlapping regions indicate poor separability in the unreduced feature space.
Mathematics 13 01718 g003
Figure 4. Clustering image of reduction by H θ -MSVIS with PCA. Colors represent distinct classes (e.g., purple: Class 1, cyan: Class 2). Overlapping regions indicate poor separability in the unreduced feature space.
Figure 4. Clustering image of reduction by H θ -MSVIS with PCA. Colors represent distinct classes (e.g., purple: Class 1, cyan: Class 2). Overlapping regions indicate poor separability in the unreduced feature space.
Mathematics 13 01718 g004
Figure 5. Average reduction rates and AUC values for different θ by E θ -MSVIS on Mo ( λ = 0.1 ).
Figure 5. Average reduction rates and AUC values for different θ by E θ -MSVIS on Mo ( λ = 0.1 ).
Mathematics 13 01718 g005
Figure 6. Average reduction rates and AUC values for different θ by E θ -MSVIS on Mo ( λ = 0.2 ).
Figure 6. Average reduction rates and AUC values for different θ by E θ -MSVIS on Mo ( λ = 0.2 ).
Mathematics 13 01718 g006
Figure 7. Average reduction rates and AUC values for different θ by E θ -MSVIS on Bw ( λ = 0.1 ).
Figure 7. Average reduction rates and AUC values for different θ by E θ -MSVIS on Bw ( λ = 0.1 ).
Mathematics 13 01718 g007
Figure 8. Average reduction rates and AUC values for different θ by E θ -MSVIS on Bw ( λ = 0.2 ).
Figure 8. Average reduction rates and AUC values for different θ by E θ -MSVIS on Bw ( λ = 0.2 ).
Mathematics 13 01718 g008
Figure 9. Average reduction rates and AUC values for different θ by E θ -MSVIS on Ly ( λ = 0.1 ).
Figure 9. Average reduction rates and AUC values for different θ by E θ -MSVIS on Ly ( λ = 0.1 ).
Mathematics 13 01718 g009
Figure 10. Average reduction rates and AUC values for different θ by E θ -MSVIS on Ly ( λ = 0.2 ).
Figure 10. Average reduction rates and AUC values for different θ by E θ -MSVIS on Ly ( λ = 0.2 ).
Mathematics 13 01718 g010
Figure 11. Average reduction rates and AUC values for different θ by E θ -MSVIS on Sf ( λ = 0.1 ).
Figure 11. Average reduction rates and AUC values for different θ by E θ -MSVIS on Sf ( λ = 0.1 ).
Mathematics 13 01718 g011
Figure 12. Average reduction rates and AUC values for different θ by E θ -MSVIS on Sf ( λ = 0.2 ).
Figure 12. Average reduction rates and AUC values for different θ by E θ -MSVIS on Sf ( λ = 0.2 ).
Mathematics 13 01718 g012
Figure 13. Average reduction rates and AUC values for different θ by E θ -MSVIS on Vr ( λ = 0.1 ).
Figure 13. Average reduction rates and AUC values for different θ by E θ -MSVIS on Vr ( λ = 0.1 ).
Mathematics 13 01718 g013
Figure 14. Average reduction rates and AUC values for different θ by E θ -MSVIS on Vr ( λ = 0.2 ).
Figure 14. Average reduction rates and AUC values for different θ by E θ -MSVIS on Vr ( λ = 0.2 ).
Mathematics 13 01718 g014
Figure 15. Average reduction rates and AUC values for different θ by E θ -MSVIS on Sp ( λ = 0.1 ).
Figure 15. Average reduction rates and AUC values for different θ by E θ -MSVIS on Sp ( λ = 0.1 ).
Mathematics 13 01718 g015
Figure 16. Average reduction rates and AUC values for different θ by E θ -MSVIS on Sp ( λ = 0.2 ).
Figure 16. Average reduction rates and AUC values for different θ by E θ -MSVIS on Sp ( λ = 0.2 ).
Mathematics 13 01718 g016
Figure 17. Average reduction rates and AUC values for different θ by E θ -MSVIS on Wa ( λ = 0.1 ).
Figure 17. Average reduction rates and AUC values for different θ by E θ -MSVIS on Wa ( λ = 0.1 ).
Mathematics 13 01718 g017
Figure 18. Average reduction rates and AUC values for different θ by E θ -MSVIS on Wa ( λ = 0.2 ).
Figure 18. Average reduction rates and AUC values for different θ by E θ -MSVIS on Wa ( λ = 0.2 ).
Mathematics 13 01718 g018
Figure 19. Average reduction rates and AUC values for different θ by E θ -MSVIS on Io ( λ = 0.1 ).
Figure 19. Average reduction rates and AUC values for different θ by E θ -MSVIS on Io ( λ = 0.1 ).
Mathematics 13 01718 g019
Figure 20. Average reduction rates and AUC values for different θ by E θ -MSVIS on Io ( λ = 0.2 ).
Figure 20. Average reduction rates and AUC values for different θ by E θ -MSVIS on Io ( λ = 0.2 ).
Mathematics 13 01718 g020
Table 2. An MSVIS ( U , A ) .
Table 2. An MSVIS ( U , A ) .
UHeadache ( a 1 )Muscle Pain ( a 2 )Temperature ( a 3 )Symptom ( a 4 )
u 1 { 1 / S , 0 / M , 0 / N } { 1 / Y , 0 / N } { 1 / H , 0 / N , 0 / L } { 1 / F , 0 / R , 0 / H }
u 2 { 1 / S , 0 / M , 0 / N } { 1 / Y , 0 / N } { 0 / H , 0 / N , 1 / L } { 1 / F , 0 / R , 0 / H }
u 3 { 0 / S , 1 / M , 0 / N } { 5 / Y , 2 / N } { 0 / H , 1 / N , 0 / L } { 1 / F , 0 / R , 0 / H }
u 4 { 0 / S , 0 / M , 1 / N } { 1 / Y , 0 / N } { 0 / H , 1 / N , 0 / L } { 1 / F , 0 / R , 0 / H }
u 5 { 2 / S , 2 / M , 3 / N } { 1 / Y , 0 / N } { 0 / H , 1 / N , 0 / L } { 0 / F , 1 / R , 0 / H }
u 6 { 0 / S , 1 / M , 0 / N } { 0 / Y , 1 / N } { 1 / H , 3 / N , 3 / L } { 0 / F , 1 / R , 0 / H }
u 7 { 0 / S , 0 / M , 1 / N } { 0 / Y , 1 / N } { 0 / H , 0 / N , 1 / L } { 0 / F , 0 / R , 1 / H }
u 8 { 0 / S , 0 / M , 1 / N } { 5 / Y , 2 / N } { 1 / H , 3 / N , 3 / L } { 0 / F , 0 / R , 1 / H }
u 9 { 2 / S , 2 / M , 3 / N } { 1 / Y , 0 / N } { 0 / H , 0 / N , 1 / L } { 0 / F , 0 / R , 1 / H }
Table 3. The details of the datasets for clustering.
Table 3. The details of the datasets for clustering.
DatasetAbbreviationObjectAttributeClass
AnnealingAn798386
Breast Cancer WisconsinBw69992
LymphographyLy148184
Solar FlareSf1066103
SoybeanSo3073519
Spect HeartSh267222
Voting RecordsVr435162
Sports ArticleSp1000592
WaveformWa5000403
All datasets were sourced from the UCI Machine Learning Repository. They were selected for their diversity in size, attribute types, and missing value rates, ensuring the broad applicability of the proposed method.
Table 4. Optimal reduction results for k-modes clustering by the proposed algorithms.
Table 4. Optimal reduction results for k-modes clustering by the proposed algorithms.
DatasetAttribute E θ -MSVIS (Number) θ H θ -MSVIS (Number) θ
An38{1,2,4,5,7,8,10,11,12,13,0.6{1,2,3,4,5,6,8,9,10,11,0.6
14,16,17,18,20,24,26,27, 12,14,15,16,17,19,20,22,23,24,
28,29,30,31,32,36} (24) 25,26,27,28,31,33,34,37} (28)
Bw9{2,3,5,7,8} (5)0.7{2,3,5,7,8,9} (6)0.4
Ly18{2,3,4,5,6,8,11,15,17} (9)0.5{4,8,9,10,13,16,17} (7)0.2
Sf10{2,3,5,7,8,9,10} (7)0.8{1,2,3,4,5,6,10} (7)0.9
So35{6,14,19,24,26,30,31} (7)0.3{3,8,9,14,15,16,17,20,0.1
22,25,26,29,35} (13)
Sh22{3,4,7,8,11,12,0.1{5,9,10,11,12,13,15,0.3
13,14,15,18} (10) 17,19,20,22} (11)
Vr16{3,4,7,8,9,10,12,13,16} (9)0.3{1,2,4,5,6,7,8,9,13,14,15} (11)0.1
Sp59{7,15,17,19,20,21,23,0.1{3,4,6,8,16,17,18,22,0.9
28,30,33,37,38,39,40,41, 24,33,36,37,38,40,
44,46,49,51,57,59} (21) 48,49,50,52,54,56,59} (21)
Wa40{9,10,12,13,14,15,16,19,0.2{5,6,12,14,15,17,18,21,0.4
20,25,26,29,35,37,39} (15) 23,29,32,34,36,37,40} (15)
Table 5. The comparison of clustering by DB (the smaller the better).
Table 5. The comparison of clustering by DB (the smaller the better).
Data SetsRaw DataUEBR + MSVISUQR + MSVIS E θ -SVIS H θ -SVIS E θ -MSVIS H θ -MSVIS
An18.43632.71132.3883.970211.30422.61352.3975
Bw5.07368.13417.83272.56685.60081.46661.5080
Vr60.77928.8628.77254.291321.73741.15191.1786
Ly19.53745.92346.9312.78693.1811.92701.6204
Sf10.83917.20317.12122.74944.30312.0063.1803
Sh5.21198.79998.06111.9264.07491.66481.8139
So45.73722.90492.62862.02062.80830.67422.1919
Sp1.77278.24256.45591.16361.97332.51482.4528
Wa10.35967.01596.59183.73816.63053.38202.9394
Average19.74966.64416.30922.80146.845941.93342.1425
Table 6. The comparison of clustering by SC (the larger the better).
Table 6. The comparison of clustering by SC (the larger the better).
Data SetsRaw DataUEBR + MSVISUQR + MSVISE-SVISH-SVIS E θ -MSVIS H θ -MSVIS
An−0.17050.12290.03130.2042−0.11740.086340.0839
Bw0.02930.01180.08410.2240.17070.38180.3818
Vr0.00240.08610.08420.30090.13710.37440.3601
Ly−0.1001−0.0026−0.07730.20240.1010.14620.2316
Sf−0.04420.01820.01190.17120.10250.19730.0986
Sh−0.01210.09250.11220.18240.08760.26290.2400
So−0.1720.12240.11270.36290.13150.78770.1411
Sp0.15860.07360.10550.13520.21510.12540.1337
Wa−0.04040.042−0.01750.21050.01940.07410.08249
Average−0.03870.06290.04960.22150.09410.27060.19481
Table 7. The comparison of clustering by CH (the larger the better).
Table 7. The comparison of clustering by CH (the larger the better).
Data SetsRaw DataUEBR + MSVISUQR + MSVISE-SVISH-SVIS E θ -MSVIS H θ -MSVIS
An26.429736.5639128.8762140.263958.5015151.9336130.9597
Bw16.001438.134220.3436241.0643101.6981274.7756263.0480
Vr0.350535.24723.3679139.686995.3597287.5936287.6115
Ly1.122923.952513.8246122.606115.997622.526437.6951
Sf15.251143.08713.423482.3059136.7673170.392973.6132
Sh5.805622.668917.080879.797627.154489.241974.8191
So15.836815.183426.7413148.146749.3074335.433464.1818
Sp1.65884.937812.55097.155610.848914.258615.4533
Wa0.99675.56455.042853.94688.450317.218712.8518
Average9.272625.037629.0279112.774856.0094151.4860106.6926
Table 8. Details of UCI datasets for outlier detection.
Table 8. Details of UCI datasets for outlier detection.
DatasetsAbbreviationObjectsAttributesClassesOutlier Ratio
MonkMo237628.86%
Breast Cancer WisconsinBw6999234.5%
LymphographyLy1481844.05%
Solar FlareSf10661030.47%
Voting RecordVr33516220.3%
Sports ArticleSp75759215.98%
WaveformWa375040311.09%
IonosphereIo2383325.46%
Table 9. The results of AUC before and after reduction (the larger the better).
Table 9. The results of AUC before and after reduction (the larger the better).
Data SetsDiskNNSeq
RawReducedRawReducedRawReduced
Mo0.89810.93130.86080.94840.63290.6329
Bw0.97780.97560.98380.98230.97980.9735
Ly11110.99210.9930
Sf0.98310.98470.97600.98350.94090.9840
Vr0.81550.90510.39170.72100.69000.7890
Sp0.77920.83050.82020.83950.81970.7122
Wa0.62250.80360.62300.80210.59770.7579
Io0.97740.99640.99040.99690.97290.9904
Average0.88170.92840.83070.90920.82820.8541
Table 10. Optimal reduction results for outlier detection by E θ -MSVIS.
Table 10. Optimal reduction results for outlier detection by E θ -MSVIS.
DatasetsAttributesDis (Number) θ kNN (Number) θ Seq (Number) θ
Mo6{1,2,5,6} (4)0.5{1,2,4,5,6} (5)0.8{1,2,3,5,6} (5)0.7
Bw9{1,2,3,5,9} (5)0.9{3,5,6,7,9} (5)0.6{1,4,6,8,9} (5)0.1
Ly18{4,5,6,8,9,11,17} (7)0.1{4,5,6,8,9,11,17} (7)0.1{1,4,5,6,7,8,11,17,18} (9)0.9
Sf10{1,2,3,4,5,6,8,9,10} (9)0.9{1,3,4,5,6,8,9,10} (8)0.2{1,2,3,4,5,6,7,9,10} (9)0.7
Vr16{2,4,5,7,10,12,13,16} (8)0.2{2,4,5,7,10,12,13,16} (8)0.2{2,3,4,5,7,8,9,10,12,13,14,15} (12)0.5
Sp59{1,3,5,6,7,11,12,14,19,20,24,0.4{1,3,5,6,7,11,12,14,19,20,24,0.4{1,3,5,6,7,11,12,14,19,20,24,0.4
25,31,33,36,39,40,41,49,57,58} (21) 25,31,33,36,39,40,41,49,57,58} (21) 25,31,33,36,39,40,41,49,57,58} (21)
Wa40{3,4,5,9,10,11,12,13,0.9{3,4,5,9,10,11,12,0.9{4,6,7,9,10,11,20,21,0.8
27,29,32,36,39} (13) 13,27,29,32,36,39} (13) 22,23,25,30,32,38} (14)
Io33{4,6,10,11,12,13,15,0.6{4,6,10,11,12,13,15,0.6{1,2,3,7,9,11,12,14,15,0.4
19,24,26,30,31,33} (13) 19,24,26,30,31,33} (13) 16,17,21,23,31,33} (15)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, X.; Peng, Y.; Li, Y.; Lin, H. Unsupervised Attribute Reduction Algorithms for Multiset-Valued Data Based on Uncertainty Measurement. Mathematics 2025, 13, 1718. https://doi.org/10.3390/math13111718

AMA Style

Guo X, Peng Y, Li Y, Lin H. Unsupervised Attribute Reduction Algorithms for Multiset-Valued Data Based on Uncertainty Measurement. Mathematics. 2025; 13(11):1718. https://doi.org/10.3390/math13111718

Chicago/Turabian Style

Guo, Xiaoyan, Yichun Peng, Yu Li, and Hai Lin. 2025. "Unsupervised Attribute Reduction Algorithms for Multiset-Valued Data Based on Uncertainty Measurement" Mathematics 13, no. 11: 1718. https://doi.org/10.3390/math13111718

APA Style

Guo, X., Peng, Y., Li, Y., & Lin, H. (2025). Unsupervised Attribute Reduction Algorithms for Multiset-Valued Data Based on Uncertainty Measurement. Mathematics, 13(11), 1718. https://doi.org/10.3390/math13111718

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop