Next Article in Journal
Analysis of Mild Extremal Solutions in Nonlinear Caputo-Type Fractional Delay Difference Equations
Previous Article in Journal
A Chebyshev–Halley Method with Gradient Regularization and an Improved Convergence Rate
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Semi-Supervised Attribute Selection Algorithms for Partially Labeled Multiset-Valued Data

1
College of Computer Science, Guangdong University of Science and Technology, Dongguan 523083, China
2
Key Laboratory of Complex System Optimization and Big Data Processing, Department of Guangxi Education, Yulin Normal University, Yulin 537000, China
3
Center for Applied Mathematics of Guangxi, Yulin Normal University, Yulin 537000, China
*
Authors to whom correspondence should be addressed.
Mathematics 2025, 13(8), 1318; https://doi.org/10.3390/math13081318
Submission received: 25 February 2025 / Revised: 7 April 2025 / Accepted: 12 April 2025 / Published: 17 April 2025

Abstract

:
In machine learning, when the labeled portion of data needs to be processed, a semi-supervised learning algorithm is used. A dataset with missing attribute values or labels is referred to as an incomplete information system. Addressing incomplete information within a system poses a significant challenge, which can be effectively tackled through the application of rough set theory (R-theory). However, R-theory has its limits: It fails to consider the frequency of an attribute value and then cannot the distribution of attribute values appropriately. If we consider partially labeled data and replace a missing attribute value with the multiset of all possible attribute values under the same attribute, this results in the emergence of partially labeled multiset-valued data. In a semi-supervised learning algorithm, in order to save time and costs, a large number of redundant features need to be deleted. This study proposes semi-supervised attribute selection algorithms for partially labeled multiset-valued data. Initially, a partially labeled multiset-valued decision information system (p-MSVDIS) is partitioned into two distinct systems: a labeled multiset-valued decision information system (l-MSVDIS) and an unlabeled multiset-valued decision information system (u-MSVDIS). Subsequently, using the indistinguishable relation, distinguishable relation, and dependence function, two types of attribute subset importance in a p-MSVDIS are defined: the weighted sum of l-MSVDIS and u-MSVDIS determined by the missing rate of labels, which can be considered an uncertainty measurement (UM) of a p-MSVDIS. Next, two adaptive semi-supervised attribute selection algorithms for a p-MSVDIS are introduced, which leverage the degrees of importance, allowing for automatic adaptation to diverse missing rates. Finally, experiments and statistical analyses are conducted on 11 datasets. The outcome indicates that the proposed algorithms demonstrate advantages over certain algorithms.

1. Introduction

1.1. Research Background

Uncertainty measurement (UM) is applied extensively in various domains, such as data mining [1], medical diagnosis [2,3], image processing [4], and pattern recognition [5,6].
An information system (IS) was proposed by Pawlak [7] based on R-theory. A set-valued IS [8] refers to a system in which each sample or data point in a dataset is associated with one or more sets. These sets can be discrete or continuous, and they describe the categories or attributes to which each sample belongs. These systems often lose information due to human or mechanical reasons, sometimes involving feature values and sometimes involving sample labels. We refer to an IS in which some samples have labels, while those that do not are referred to as partially labeled data. Incomplete ISs are a common problem.
R-theory is widely used to measure information granularity in an IS. Many scholars have published research on this aspect. For instance, Liang et al. [9] studied information granulation in an IS. Qian et al. [10] investigated the measurement of fuzzy information granulation from five perspectives. Dai et al. [11] studied UM based on an incomplete decision IS. Zhang et al. [12] discussed applications and developments in the field of information fusion, focusing on the introduction and evaluation of multi-source information fusion methods based on R-theory. Yang et al. [13] explored how to effectively address uncertainty issues in multi-source fuzzy information systems through R-theory and uncertainty measurement methods.
In the current context, information plays a vital role in social development, but a significant amount of information is marked by useless attributes. In machine learning, in order to save time and costs, a large number of redundant features need to be deleted. Therefore, attribute selection is becoming increasingly important in data processing. The dimension of data can be reduced, and the original efficiency of the data can be maintained via attribute selection. Hu et al. [14] established a rough set model using a neighborhood and used it for attribute selection. Singh et al. [15] developed a method using a set-valued IS for attribute selection. Wang et al. [16] designed a novel attribute selection algorithm in view of local conditional entropy. Dai et al. [17] used information entropy to study the attribute selection of an interval-valued IS. Wang et al. [18] researched a greedy algorithm base for designing UMs for reduction.
Semi-supervised attribute selection fully improves data utilization by mining unlabeled data information so as to achieve the goal of improving classification accuracy. Many scholars are very interested in this field. Dai et al. [19] studied semi-supervised attribute selection for interval data and proposed a novel entropy structure in view of the misclassification cost for attribute selection. Liu et al. [20] proposed utilizing common methods to predict unknown labels and then constructed multiple fitness functions to assess the importance of attributes for reduction. Kim [21] obtained a semi-supervised dimensionality reduction framework that takes into account both the label and structural information. Li et al. [22] aimed to improve the effectiveness of feature selection by considering a subset of features of a specific category. Zhang et al. [23] integrated R-theory into an ensemble learning framework, leveraging labeled data to create an ensemble base classifier capable of both labeling unlabeled data and augmenting the existing labeled dataset. Ma et al. [24] presented a semi-supervised rough fuzzy Laplacian Eigenmaps method for the attribute selection of high-dimensional mixed data. The importance of each attribute was evaluated using defined information entropy measures. Han et al. [25] developed a semi-supervised attribute selection algorithm in view of spline regression, which can effectively handle video semantic recognition and other related problems. Compared to singular-value decomposition and non-negative matrix factorization, rank-revealing QR factorization is more computationally efficient. Thus, Moslemi et al. [26] presented a novel unsupervised feature selection technique that leverages rank-revealing QR factorization. Bohrer et al. [27] proposed a hybrid feature selection approach using a multi-objective genetic algorithm to enhance classification performance and reduce dimensionality across diverse classification tasks. Sheikhpour et al. [28] put forward a robust semi-supervised multi-label feature selection method that integrates shared subspace learning, graph Laplacian-based manifold learning, and norm minimization in both the loss function and regularization, and Sheikhpour et al. [29] also studied sparse feature selection using hypergraph Laplacian-based semi-supervised discriminant analysis.
However, attribute selection poses numerous challenges [30,31]: discretizing continuous attributes may compromise the structural integrity of the data, while the operational overhead of attribute selection increases with larger datasets, often resulting in diminished reduction quality.

1.2. Motivation and Contributions

Missing data are common in data mining and machine learning. Missing data raise uncertainty and can reduce the capacity of machine learning models. Missing data can be categorized into three classes: (1) missing completely at random (MCAR); (2) missing at random (MAR); and (3) missing not at random (MNAR). In this paper, we study MCAR. A common approach for handling missing data is to discard records with missing data (listwise deletion and pairwise deletion). This approach is convenient. However, some information can be lost, and this approach may lead to some bias in some circumstances. For the models that cannot deal with data with missing information values by themselves, missing data imputation is needed. In statistics, imputation is the process of filling missing information values by some reasonable rules. Imputation is a complicated task because it can create bias and can lead to inaccurate results, especially for MAR and MNAR.
Set-valued data are important for processing datasets with missing attributes [32]. Specifically, the use of set-valued data replaces missing attribute values with a set comprising all possible attribute values under the same attribute, while existing attribute values are represented by a single point set containing the actual attribute value. This process transforms a dataset with missing attribute values into a set-valued information system (SVIS). However, this approach has limitations: It do not consider the frequency of the attribute values. Here, the frequency of the attribute values can be interpreted as the importance or the attribute values. It can be calculated by the number of occurrences, which leads to the occurrence of multisets. To address this issue, a multiset-valued decision information system (MSVDIS) is proposed, in which missing attribute values are replaced with multisets. “Replacing missing attribute values with multisets” is more appropriate than “replacing missing attribute values with with a set comprising all possible attribute values under the same attribute”. By considering the frequency of attribute values, this method maximizes the extraction of useful information from datasets with missing attributes, offering a novel approach to handling incomplete data. Miyamoto [33] presented a model for information clustering based on fuzzy multisets, utilizing it to execute the clustering process. Zhao et al. [34] studied two rough set models and introduced loss functions for computing the expected costs of data in an MSVIS.
Multiset is an important concept of mathematics and computer science that extends the definition of traditional set. It allows elements to be repeated and record multiplicities, and it offers more powerful expression capabilities than traditional sets in data processing, algorithm design, mathematical modeling, and other fields.
This study investigates UMs in a partially labeled multiset-valued decision information system (p-MSVDIS) and considers semi-supervised attribute selection in a p-MSVDIS. Based on the above research motivation, the novel contributions of this article are as follows:
(1)
Merely substituting a missing attribute value with the set of all potential values is overly simplistic and risks losing valuable information. This study advocates for the utilization of multisets to address missing attribute values. Furthermore, it demonstrates the conversion of multisets into probability distribution sets, enabling the calculation of the Hellinger distance based on these distributions to measure the dissimilarity between attribute values in an MSVDIS.
(2)
This explains that p-MSVDIS can induce two MSVDISs: one of which is a l-MSVDIS, and the other is a u-MSVDIS.
(3)
Considering indistinguishable relations, distinguishable relations, and dependence functions, this study introduces two types of importance measures for each attribute subset within a p-MSVDIS. These measures are derived from the weighted sum of importance assigned to the induced l-MSVDIS and u-MSVDIS. This combined measure, termed UM, provides a comprehensive reflection of the importance or classification capability of the attribute subset within the given p-MSVDIS.
(4)
The performance of the defined importance on real datasets is examined. This study uses the datasets to construct two heuristic algorithms for semi-supervised attribute selection in a p-MSVDIS.

1.3. Organization

In Section 2, an MSDIS is recalled. In Section 3, a p-MSVDIS is introduced. In Section 4, two types of importance in a p-MSVDIS are defined. In Section 5, semi-supervised attribute selection in a p-MSVDIS is defined, and two corresponding algorithms are advanced. In Section 6, experiments on the performance of the proposed algorithms and effectiveness analysis are conducted. In Section 7, this study is summarized.
Figure 1 depicts a flow chart of this research.

2. Preliminaries

In this section, an MSDIS is reviewed.
In this study, let T = { t 1 , t 2 , , t n } , and | X | is the cardinality of X.

2.1. Multisets and Probability Distribution Sets

Definition 1
([35]). A multiset M extract from X is proposed by a function M : X N { 0 } .
If M ( x ) = m , this means that x appears m times in M, which is recorded as x m M .
Given X = { x 1 , x 2 , , x l } , if M ( x i ) = m i ( i = 1 , 2 , , s ) , then M is recorded as { m 1 / x 1 , m 2 / x 2 , , m l / x l } , i.e.,
M = { m 1 / x 1 , m 2 / x 2 , , m l / x l } .
Definition 2.
X = { x 1 , x 2 , , x l } . Suppose S = x 1 , x 2 , , x l s 1 , s 2 , , s l . If i , 0 s i 1 , and i = 1 l s i = 1 , then S is referred to as a probability distribution set (PDS) on X. Moreover, if i , s i is a rational number, S is referred to as a probability distribution set (RPDS) on X.
Definition 3
([36]). S and T are two PDSs on X. Denote
S = x 1 , x 2 , , x l s 1 , s 2 , , s l , T = x 1 , x 2 , , x l t 1 , t 2 , , t l .
Then, the Hellinger distance between S and T is defined as
H D ( S , T ) = 1 2 i = 1 l ( s i t i ) 2 .
Definition 4.
X = { x 1 , x 2 , , x l } and
M = { m 1 / x 1 , m 2 / x 2 , , m l / x l }
is a multiset drawn from X. We inesrt
S M = x 1 , x 2 , , x l s 1 , s 2 , , s l ,
where s i = m i m 1 + m 2 + + m l ( i = 1 , 2 , , s ) . Then, S M is an RPDS on X.

2.2. Multiset-Valued Decision Information Systems

Let T be a finite sample set and A T a finite attribute set. ( T , A T , V , f ) is then referred to as an information system (IS) if f : T × A T V is an information function, where V = a A T V a , V a = { f ( t , a ) : t T } .
If ( T , A { d } , V , f ) is an IS and d is a decision attribute, then ( T , A , V , f , d ) is referred to as a decision information system (DIS).
∗ shows the unknown information value. If a A , V a , but V d , then a DIS ( T , A , V , f , d ) is an IDIS. a A { d } , and we denote
V a = V a { a ( t ) : a ( t ) = } .
Example 1.
Table 1 is an IDIS ( T , A , V , f , d ) , where T = { t 1 , t 2 , , t 9 } , A = { a 1 , a 2 , a 3 } , V d = V d = { F l u , R h i n i t i s , H e a l t h } .
Definition 5
([37]). An IDIS ( T , A , V , f , d ) is referred to as an MSVDIS if a A , t T ; a ( t ) is a multiset drawn from the same set.
Definition 6
([37]). ( T , A , V , f , d ) is an IDIS with T = { t 1 , t 2 , , t n } . Let a A . Denote V a = { x 1 , x 2 , , x l } . i , m i expresses the number of occurrences of x i in { a ( t 1 ) , a ( t 2 ) , , a ( t n ) } { } , where V a is an ordinary set and { a ( t 1 ) , a ( t 2 ) , , a ( t n ) } { } is a multiset. If a ( t ) = , then a ( t ) is taken the place of { m 1 / x 1 , m 2 / x 2 , , m l / x l } ; if a ( t ) = x j , then a ( t ) is taken the place of { 0 / x 1 , , 0 / x j 1 , 1 / x j , 0 / x j + 1 , , 0 / x l } . After this treatment, ( T , A , V , f , d ) is an MSVDIS. It is the MSVDIS induced by the IDIS ( T , A , V , f , d ) .
Example 2.   (Continued from Example 1) According to Definition 6, an IDIS in Table 1 can be expressed as an MSVDIS in Table 2.

3. A Partially Labeled Multiset-Valued Information System

3.1. The Definition of a p-MSVDIS

Below, ∗ means the unknown information value, and ⋄ shows the unknown label.
( T , A , V , f , d ) is an IDIS. a A , and we denote
V a = V a { a ( t ) : a ( t ) = } .
Insert
V d = V d { d ( t ) : d ( t ) = } .
From the above, we know that V a denotes all known information values of a.
Definition 7.
( T , A , V , f , d ) is a DIS. Insert
T l = { t T : d ( t ) } , T u = { t T : d ( t ) = } .
Then, T l T u = T , T l T u = .
(1) ( T , A , V , f , d ) is referred to as an l-MSVDIS if a A and t T , a ( t ) = and T l = T exist.
(2) ( T , A , V , f , d ) is referred to as a p-MSVDIS if a A and t T , a ( t ) = , T l and T u exist.
(3) ( T , A , V , f , d ) is referred to as a u-MSVDIS if a A and t T , a ( t ) = and T u = T exist.
Obviously,
V d = { d ( t ) : t T l } .
Denote
| T | = n , | T u | = n u , | T l | = n l .
Since each sample has no label in a u-MSVDIS ( T , A , V , f , d ) , we think that ( T , A , V , f , d ) can be seen as ( T , A , V , f ) .
Definition 8.
( T l , A , V , f , d ) and ( T u , A , V , f , d ) are called the l-MSVDIS and u-MSVDIS induced by ( T , A , V , f , d ) , respectively.
( T , A , V , f , d ) can be viewed as the result of the information fusion of ( T l , A , V , f , d ) and ( T u , A , V , f , d ) .
λ = | T u | | T | = n u n
is referred to as the incomplete rate of labels.
Example 3.
Table 3 is a p-MSVDIS ( T , A , V , f , d ) :
V a 1 = { S i c k , M i d d l e , N o } , V a 2 = { T r u e , F a l s e } ,
V a 3 = { H i g h , N o r m a l , L o w } ;
V d = { F l u , R h i n i t i s , H e a l t h } V d ;
T l = { t 1 , t 2 , t 4 , t 5 , t 6 , t 7 , t 8 } , T u = { t 3 , t 9 } .

3.2. A Novel Distance Function in a p-MSVDIS

For effective discrimination between samples within a p-MSVDIS, a novel distance function is provided.
Definition 9.
For a p-MSVDIS ( T , A , V , f , d ) , let a A and t , t T l . Then, the distance between a ( t ) and a ( t ) is defined as
ρ ( a ( t ) , a ( t ) ) = 0 , t = t ; 0 , t t , a A , a ( t ) = o r a ( t ) = , d ( t ) = d ( t ) ; 1 1 | V a | 2 , t t , a A , a ( t ) = , a ( t ) = , d ( t ) d ( t ) ; 1 1 | V a | , t t , a A , a ( t ) , a ( t ) = , d ( t ) d ( t ) ; 1 1 | V a | , t t , a A , a ( t ) = , a ( t ) , d ( t ) d ( t ) ; 0 , t t , a A , a ( t ) , a ( t ) , a ( t ) = a ( t ) , d ( t ) = d ( t ) ; 0 , t t , a A , a ( t ) , a ( t ) , a ( t ) = a ( t ) , d ( t ) d ( t ) ; H D ( P a ( t ) , P a ( t ) ) , t t , a A , a ( t ) , a ( t ) , a ( t ) a ( t ) , d ( t ) = d ( t ) ; H D ( P a ( t ) , P a ( t ) ) , t t , a A , a ( t ) , a ( t ) , a ( t ) a ( t ) , d ( t ) d ( t ) .
Definition 10.
For a p-MSVDIS ( T , A , V , f , d ) , let P A , and we denote
R P l , δ = { ( t , t ) T l × T l : a P , ρ ( a ( t ) , a ( t ) ) δ } ,
[ t ] P l , δ = { t T l : ( t , t ) R P l , δ } ;
R d l = { ( t , t ) T l × T l : d ( t ) = d ( t ) } ,
[ t ] d l = { t T l : ( t , t ) R d l } ;
R P l , δ ̲ ( X ) = { t T l : [ t ] P l , δ X } , X T l ;
T l / d = { [ t ] d l : t T l } = { D 1 , , D r } ;
P O S P l , δ ( d ) = i = 1 r R P l , δ ̲ ( D i ) .
Definition 11.
For a p-MSVDIS ( T , A , V , f , d ) , let P A , and we denote
d i s d l , δ ( P ) = { ( t , t ) T l × T l : a P , ρ ( a ( t ) , a ( t ) ) > δ d ( t ) d ( t ) } .
Then, d i s d l , δ ( P ) is referred to as the relative discernibility relation of P relative to d on T l .
Definition 12.
For a p-MSVDIS ( T , A , V , f , d ) , let P A . Then,
i n d δ u ( P ) = { ( t , t ) T u × T u : a P , ρ ( a ( t ) , a ( t ) ) δ }
is referred to as the discernibility relation of P on T u .
For a p-MSVDIS ( T , A , V , f , d ) , let P A . According to Kryszkiewicz’s ideal [38], P l , δ : T l 2 V d is defined as follows:
P l , δ ( t ) = d ( [ t ] P l , δ ) ,
Then, P l , δ is referred to as generalized decision in ( T l , P , d ) .
Definition 13.
For a p-MSVDIS ( T , A , V , f , d ) , if ∀ t T l , | A l , δ ( t ) | = 1 , then ( T , A , V , f , d ) is referred to as δ-consistent; otherwise, ( T , A , V , f , d ) is referred to as δ-inconsistent.
Proposition 1.
( T , A , V , f , d ) is a p-MSVDIS. Given P A and t T l , then
R P l , δ R d l t T l , | P l , δ ( t ) | = 1 .
Proof. 
“⇒”: Let R P l , δ R d l . Then, t T l , [ t ] P l , δ [ t ] d l . Suppose t P l , δ ( t ) . Then, t [ t ] P l , δ , t = d ( t ) . t [ t ] P l , δ implies that t [ t ] d l . So, t = d ( t ) = d ( t ) . Thus, | P l , δ ( t ) | = 1 .
“⇐”: Let t T l , | P l , δ ( t ) | = 1 . Suppose t [ t ] P l , δ . Then, d ( t ) P l , δ ( t ) . Since d ( t ) P l , δ ( t ) and | P l , δ ( t ) | = 1 , d ( t ) = d ( t ) . Then, t [ t ] d l . Thus, [ t ] P l , δ [ t ] d l . This shows that R P l , δ R d l .    □
Proposition 2.
A p-MSVDIS ( T , A , V , f , d ) is δ-consistent  ⇔   R A l , δ R d l .
Proof. 
This is easily proven via Proposition 1.    □

4. Importance in a p-MSVDIS

In this section, two types of importances in a p-MSVDIS are defined.

4.1. Type 1 Importance in a p-MSVDIS

Definition 14.
For a p-MSVDIS ( T , A , V , f , d ) , let P A , and insert
Γ P l , δ ( d ) = | P O S P l , δ ( d ) | n l ;
Then, Γ P l , δ ( d ) is referred to as the dependence of P on d in T l .
Proposition 3.
For a p-MSVDIS ( T , A , V , f , d ) , denote
T l / R d l = { D 1 , , D r } .
(1) Γ P l , δ ( d ) = i = 1 r | R P l , δ ̲ ( D i ) | n l .
(2) 0 Γ P l , δ ( d ) 1 .
(3) If P Q A , then
Γ P l , δ ( d ) Γ Q l , δ ( d ) .
Proof. 
(1) Obviously, i , R P l , δ ̲ ( D i ) D i .
Since { D 1 , , D r } is a partition of T l , we have
| P O S P l , δ ( d ) | = | i = 1 r R P l , δ ̲ ( D i ) | = i = 1 r | R P l , δ ̲ ( D i ) | .
Thus,
Γ P l , δ ( d ) = i = 1 r | R P l , δ ̲ ( D i ) | n l .
(2) This holds by (1).
(3) Suppose P Q A ; then, t T l , [ t ] Q l , δ [ t ] P l , δ . So,
i , R P l , δ ̲ ( D i ) R Q l , δ ̲ ( D i ) .
This implies that
i , | R P ̲ ( D i ) | R Q ̲ ( D i ) | .
By (1),
Γ P l , δ ( d ) Γ Q l , δ ( d ) .
   □
Definition 15.
For a p-MSVDIS ( T , A , V , f , d ) , let P A . The type 1 importance of P is then defined as
I M λ , δ ( 1 ) ( P ) = ( 1 λ ) Γ P l , δ ( d ) Γ A l , δ ( d ) + λ | i n d δ u ( A ) | | i n d δ u ( P ) | .
Example 4.
On the basis of the p-MSVDIS ( T , A , V , f , d ) in Table 3, the incomplete rate is λ = 2 9 and n l = 7 . Let δ = 0.5 ; then,
R { a 1 } l , δ = { ( t 1 , t 1 ) , ( t 1 , t 2 ) , ( t 2 , t 1 ) , ( t 2 , t 2 ) , ( t 4 , t 4 ) , ( t 4 , t 7 ) , ( t 4 , t 8 ) , ( t 5 , t 5 ) , ( t 6 , t 6 ) , ( t 7 , t 4 ) , ( t 7 , t 7 ) , ( t 7 , t 8 ) , ( t 8 , t 4 ) , ( t 8 , t 7 ) , ( t 8 , t 8 ) } ;
R { a 2 } l , δ = { ( t 1 , t 1 ) , ( t 1 , t 2 ) , ( t 1 , t 4 ) , ( t 1 , t 5 ) , ( t 2 , t 1 ) , ( t 2 , t 2 ) , ( t 2 , t 4 ) , ( t 2 , t 5 ) , ( t 4 , t 1 ) , ( t 4 , t 2 ) , ( t 4 , t 4 ) , ( t 4 , t 5 ) , ( t 5 , t 1 ) , ( t 5 , t 2 ) , ( t 5 , t 4 ) , ( t 5 , t 5 ) , ( t 6 , t 6 ) , ( t 6 , t 7 ) , ( t 6 , t 8 ) , ( t 7 , t 6 ) , ( t 7 , t 7 ) , ( t 7 , t 8 ) , ( t 8 , t 6 ) , ( t 8 , t 7 ) , ( t 8 , t 8 ) } ;
R { a 3 } l , δ = { ( t 1 , t 1 ) , ( t 2 , t 2 ) , ( t 2 , t 7 ) , ( t 4 , t 4 ) , ( t 4 , t 5 ) , ( t 5 , t 4 ) , ( t 5 , t 5 ) , ( t 6 , t 6 ) , ( t 7 , t 2 ) , ( t 7 , t 7 ) , ( t 8 , t 8 ) } ;
R d l , δ = { ( t 1 , t 1 ) , ( t 1 , t 2 ) , ( t 1 , t 4 ) , ( t 1 , t 7 ) , ( t 2 , t 1 ) , ( t 2 , t 2 ) , ( t 2 , t 4 ) , ( t 2 , t 7 ) , ( t 4 , t 1 ) , ( t 4 , t 2 ) , ( t 4 , t 4 ) , ( t 4 , t 7 ) , ( t 5 , t 5 ) , ( t 5 , t 6 ) , ( t 6 , t 5 ) , ( t 6 , t 6 ) , ( t 7 , t 1 ) , ( t 7 , t 2 ) , ( t 7 , t 4 ) , ( t 7 , t 7 ) , ( t 8 , t 8 ) } ;
i n d δ u ( { a 1 } ) = { ( t 3 , t 3 ) , ( t 9 , t 9 ) } ; i n d δ u ( { a 2 } ) = { ( u 3 , u 3 ) , ( u 9 , u 9 ) } ; i n d δ u ( { a 3 } ) = { ( t 3 , t 3 ) , ( t 9 , t 9 ) } .
Thus, Γ { a 1 } l , δ ( d ) = ( 2 + 2 + 0 ) 7 0.5714 . Similarly, Γ { a 2 } l , δ ( d ) 0.1429 , Γ { a 3 } l , δ ( d ) 0.4286 and Γ A l , δ ( d ) = 1 .
Since | i n d δ u ( { a 1 } ) | = | i n d δ u ( { a 2 } ) | = | i n d δ u ( { a 3 } ) | = 2 and | i n d δ u ( A ) | = 2 , hence I M λ , δ ( 1 ) ( { a 1 } ) = ( 1 2 9 ) ( 0.5714 1 ) + 2 9 2 2 0.6667 .
Similarly, I M λ , δ ( 1 ) ( { a 2 } ) 0.3333 , I M λ , δ ( 1 ) ( { a 3 } ) 0.5556 .
Proposition 4.
For a p-MSVDIS ( T , A , V , f , d ) , we have the following:
(1) 0 I M λ , δ ( 1 ) ( P ) 1 ;
(2) I M λ , δ ( 1 ) ( A ) = 1 ;
(3) P Q A implies I M λ , δ ( 1 ) ( P ) I M λ , δ ( 1 ) ( Q ) ;
(4) I M λ , δ ( 1 ) ( P ) = 1   ⇔   Γ P l , δ ( d ) = Γ A l , δ ( d ) , | i n d δ u ( P ) | = | i n d δ u ( A ) | .
Proof. 
“(1) and (2)” are obvious.
(3) Since P Q A ,
Γ P l , δ ( d ) Γ Q l , δ ( d ) , | i n d δ u ( Q ) | | i n d δ u ( P ) | .
Then,
Γ P l , δ ( d ) Γ A l , δ ( d ) Γ Q l , δ ( d ) Γ A l , δ ( d ) , | i n d δ u ( A ) | | i n d δ u ( P ) | | i n d δ u ( A ) | | i n d δ u ( Q ) | .
Thus,
( 1 λ ) Γ P l , δ ( d ) Γ A l , δ ( d ) ( 1 λ ) Γ Q l , δ ( d ) Γ A l , δ ( d ) , λ | i n d δ u ( A ) | | i n d δ u ( P ) | λ | i n d δ u ( A ) | | i n d δ u ( Q ) | .
Hence, I M λ , δ ( 1 ) ( P ) I M λ , δ ( 1 ) ( Q ) .
(4) “⇐” is clear. Below, we prove “⇒”.
Suppose I M λ , δ ( 1 ) ( P ) = 1 ; then,
( 1 λ ) Γ P l , δ ( d ) Γ A l , δ ( d ) + λ | i n d δ u ( A ) | | i n d δ u ( P ) | = 1 = ( 1 λ ) + λ .
This implies that
( 1 λ ) ( 1 Γ P l , δ ( d ) Γ A l , δ ( d ) ) + λ ( 1 | i n d δ u ( A ) | | i n d δ u ( P ) | ) = 0 .
Note that 1 Γ P l , δ ( d ) Γ A l , δ ( d ) = Γ A l , δ ( d ) Γ P l , δ ( d ) Γ A l , δ ( d ) 0 , 1 | i n d δ u ( A ) | | i n d δ u ( P ) | = | i n d δ u ( P ) | | i n d δ u ( A ) | | i n d δ u ( P ) | 0 . Then, we have 1 Γ P l , δ ( d ) Γ A l , δ ( d ) = 0 , 1 | i n d δ u ( A ) | | i n d δ u ( P ) | = 0 . Then,
Γ P l , δ ( d ) = Γ A l , δ ( d ) a n d | i n d δ u ( P ) | = | i n d δ u ( A ) | .
   □

4.2. Type 2 Importance in a p-MSVDIS

Definition 16.
For a p-MSVDIS ( T , A , V , f , d ) , let P A ; insert
H δ l ( d | P ) = i = 1 n l j = 1 r | [ t i ] P l , δ D j | n l log 2 | [ t i ] P l , δ D j | | [ t i ] P l , δ | .
Then, H δ l ( d | P ) is referred to as the conditional information entropy of P to d in T l .
Proposition 5.
For a p-MSVDIS ( T , A , V , f , d ) , let | T l | = n l . If P Q A , then
H δ l ( Q | d ) H δ l ( d | P ) .
Proof. 
Denote
T l / R d l = { D 1 , , D r } ;
p i j ( 1 ) = | [ t i ] P l , δ D j | , p i j ( 2 ) = | [ t i ] P l , δ ( T l D j ) | ;
q i j ( 1 ) = | [ t i ] Q l , δ ( t i ) D j | , q i j ( 2 ) = | [ t i ] Q l , δ ( t i ) ( T l D j ) | .
Then,
i , j , | [ t i ] P l , δ | = p i j ( 1 ) + p i j ( 2 ) , | R Q l , δ ( t i ) | = q i j ( 1 ) + q i j ( 2 ) .
Obviously, i , [ t i ] Q l , δ ( t i ) [ t i ] P l , δ .
Then,
i , j , 0 q i j ( 1 ) p i j ( 1 ) , 0 q i j ( 2 ) p i j ( 2 ) .
Let f ( x , y ) = x log 2 x x + y ( x > 0 , y 0 ) . f ( x , y ) then increases with respect to x and y, respectively.
H δ l ( d | P ) = i = 1 n l j = 1 r | [ t i ] P l , δ D j | n l log 2 | [ t i ] P l , δ D j | | [ t i ] P l , δ | = i = 1 n l j = 1 r p i j ( 1 ) n l log 2 p i j ( 1 ) p i j ( 1 ) + p i j ( 2 ) 1 n l i = 1 n l j = 1 r f ( p i j ( 1 ) , p i j ( 2 ) ) .
H δ l ( Q | d ) = i = 1 n l j = 1 r | [ t i ] Q l , δ ( t i ) D j | n l log 2 | [ t i ] Q l , δ ( t i ) D j | | [ t i ] Q l , δ ( t i ) | = i = 1 n l j = 1 r q i j ( 1 ) n l log 2 q i j ( 1 ) q i j ( 1 ) + q i j ( 2 ) 1 n l i = 1 n l j = 1 r f ( q i j ( 1 ) , q i j ( 2 ) ) .
Since q i j ( 1 ) p i j ( 1 ) , q i j ( 2 ) p i j ( 2 ) , we have
f ( q i j ( 1 ) , q i j ( 2 ) ) f ( p i j ( 1 ) , q i j ( 2 ) ) f ( p i j ( 1 ) , p i j ( 2 ) ) .
Thus,
H δ l ( Q | d ) H δ l ( d | P ) .
   □
Proposition 6.
For a p-MSVDIS ( T , A , V , f , d ) , given P A , then H δ l ( d | P ) 0 .
Proof. 
Obviously, i , j , | [ t i ] P l , δ D j | n l 0 . Since
i , j , | [ t i ] P l , δ D j | | [ t i ] P l , δ | 1 ,
we have
i , j , log 2 | [ t i ] P l , δ D j | | [ t i ] P l , δ | 0 .
Thus, H δ l ( d | P ) 0 .    □
Definition 17.
For a p-MSVDIS ( T , A , V , f , d ) , let P A . The type 2 importance of P is then defined as
I M λ , δ ( 2 ) ( P ) = ( 1 λ ) H δ l ( d | A ) H δ l ( d | P ) + λ | i n d δ u ( A ) | | i n d δ u ( P ) | .
Example 5.
On the basis of the p-MSVDIS ( T , A , V , f , d ) in Table 3 and Example 4, I M λ , δ ( 2 ) ( P ) is calculated.
H δ l ( { a 1 } | d ) = 2 7 log 2 ( 2 2 ) + 2 7 log 2 ( 2 2 ) + 2 7 log 2 ( 2 3 ) + 0 7 log 2 ( 0 1 ) + 0 7 log 2 ( 0 1 ) + 2 7 log 2 ( 2 3 ) + 2 7 log 2 ( 2 3 ) + 0 7 log 2 ( 0 2 ) + 0 7 log 2 ( 0 2 ) + 0 7 log 2 ( 0 3 ) + 1 7 log 2 ( 1 1 ) + 1 7 log 2 ( 1 1 ) + 0 7 log 2 ( 0 3 ) + 0 7 log 2 ( 0 3 ) + 0 7 log 2 ( 0 2 ) + 0 7 log 2 ( 0 2 ) + 1 7 log 2 ( 1 3 ) + 0 7 log 2 ( 0 1 ) + 0 7 log 2 ( 0 1 ) + 1 7 log 2 ( 1 3 ) + 1 7 log 2 ( 1 3 ) 1.1807 .
Similarly, H δ l ( { a 2 } | d ) 3.8922 , H δ l ( { a 3 } | d ) 0.5714 , and H δ l ( d | A ) = 0 .
Since | i n d δ u ( { a 1 } ) | = | i n d δ u ( { a 2 } ) | = | i n d δ u ( { a 3 } ) | = 2 and | i n d δ u ( A ) | = 2 , hence I M λ , δ ( 2 ) ( { a 1 } ) = ( 1 2 9 ) 0 1.1807 ) + 2 9 2 2 0.2222 .
Similarly, I M λ , δ ( 2 ) ( { a 2 } ) 0.2222 , I M λ , δ ( 2 ) ( { a 3 } ) 0.2222 .
Proposition 7.
For a p-MSVDIS ( T , A , V , f , d ) , we have the following:
(1) 0 I M λ , δ ( 2 ) ( P ) 1 ;
(2) I M λ , δ ( 2 ) ( A ) = 1 ;
(3) P Q A implies to I M λ , δ ( 2 ) ( Q ) I M λ , δ ( 2 ) ( P ) ;
(4) I M λ , δ ( 2 ) ( P ) = 1   ⇔   H δ l ( d | P ) = H δ l ( d | A ) , | i n d δ u ( P ) | = | i n d δ u ( A ) | .
Proof. 
“(1) and (2)” are obvious.
(3) Since P Q A , we have
H δ l ( Q | d ) H δ l ( d | P ) , | i n d δ u ( Q ) | | i n d δ u ( P ) | .
Then,
H δ l ( Q | d ) H δ l ( d | A ) H δ l ( d | P ) H δ l ( d | A ) , | i n d δ u ( Q ) | | i n d δ u ( A ) | | i n d δ u ( P ) | | i n d δ u ( A ) | .
Thus,
( 1 λ ) H δ l ( Q | d ) H δ l ( d | A ) ( 1 λ ) H δ l ( d | P ) H δ l ( d | A ) , λ | i n d δ u ( Q ) | | i n d δ u ( A ) | λ | i n d δ u ( P ) | | i n d δ u ( A ) | .
Hence, I M λ , δ ( 1 ) ( Q ) I M λ , δ ( 1 ) ( P ) .
(4) “⇐” is clear. Below, we prove “⇒”.
Suppose I M λ , δ ( 2 ) ( P ) = 1 . Then,
( 1 λ ) H δ l ( d | A ) H δ l ( d | P ) + λ | i n d δ u ( A ) | | i n d δ u ( P ) | = 1 = ( 1 λ ) + λ .
This implies that
( 1 λ ) ( 1 H δ l ( d | A ) H δ l ( d | P ) ) + λ ( 1 | i n d δ u ( A ) | | i n d δ u ( P ) | ) = 0 .
Note that 1 H δ l ( d | A ) H δ l ( d | P ) = H δ l ( d | P ) H δ l ( d | A ) H δ l ( d | P ) 0 , 1 | i n d δ u ( A ) | | i n d δ u ( P ) | = | i n d δ u ( P ) | | i n d δ u ( A ) | | i n d δ u ( P ) | 0 . Then, we have 1 H δ l ( d | A ) H δ l ( d | P ) = 0 , 1 | i n d δ u ( A ) | | i n d δ u ( P ) | = 0 . Then, the following is the case:
H δ l ( d | A ) = H δ l ( d | P ) , | i n d δ u ( P ) | = | i n d δ u ( A ) | .
   □

5. Semi-Supervised Attribute Selection in a p-MSVDIS

In this section, semi-supervised attribute selection in a p-MSVDIS is explored.

5.1. The Definition of Semi-Supervised Attribute Selection in a p-MSVDIS

Definition 18.
For a p-MSVDIS ( T , A , V , f , d ) , let λ = | T u | | T | and P A . Then, P is referred to as a coordinate subset of A with respect to d in a p-MSVDIS ( T , A , V , f , d ) ; if P O S P l , δ ( d ) = P O S A l , δ ( d ) , i n d δ u ( P ) = i n d δ u ( A ) .
The family of all coordinate subsets of A with respect to d in a p-MSVDIS ( T , A , V , f , d ) is recorded as c o λ , δ p ( A ) .
Definition 19.
For a p-MSVDIS ( T , A , V , f , d ) , let λ = | T u | | T | and P A . P is then referred to as a reduct of A with respect to d in a p-MSVDIS ( T , A , V , f , d ) ; if P c o λ , δ p ( A ) and a P , P { a } c o λ , δ p ( A ) .
The family of all reducts of A with respect to d in a p-MSVDIS ( T , A , V , f , d ) is recorded as r e d λ , δ p ( A ) .
Theorem 1.
For a p-MSVDIS ( T , A , V , f , d ) , let λ = | T u | | T | . The following results can then be deduced from each other:
( 1 )   P c o λ , δ p ( A ) ;
( 2 )   I M λ , δ ( 1 ) ( P ) = 1 .
Proof. 
The proof is obvious.    □
Corollary 1.
For a p-MSVDIS ( T , A , V , f , d ) , let λ = | T u | | T | and P A . The following results can then be deduced from each other:
(1) P r e d λ , δ p ( A ) ;
(2) I M λ , δ ( 1 ) ( P ) = 1 and a P , I M λ , δ ( 1 ) ( P { a } ) < 1 .
Proof. 
This follows from Theorem 1.    □
Lemma 1.
For a p-MSVDIS ( T , A , V , f , d ) , let P A . If R P l , δ R d l , then t T l and j:
[ t ] P l , δ D j = [ t ] P l , δ t D j u D j .
Proof. 
If t D j , then D j = [ t ] d l . Since R P l , δ R d l , [ t ] P l , δ [ t ] d l . Thus, [ t ] P l , δ D j = [ t ] P l , δ .
If u D j , then [ t ] d l D j = . Since R P l , δ R d l , [ t ] P l , δ [ t ] d l . Thus, [ t ] P l , δ D j = .    □
Lemma 2.
For a p-MSVDIS ( T , A , V , f , d ) , let P A . If R P l , δ R d l , then t T l
j = 1 r | [ t ] P l , δ D j | n log 2 | [ t ] P l , δ D j | n = | [ t ] P l , δ | n log 2 | [ t ] P l , δ | n .
Proof. 
Since { D 1 , , D r } is a partition of T l , we have j 0 , t D j 0 .
Since R P l , δ R d l , by Lemma 1, we have
[ t ] P l , δ D j = [ t ] P l , δ j = j 0 j j 0 .
Thus,
j = 1 r | [ t ] P l , δ D j | n log 2 | [ t ] P l , δ D j | n = | [ t ] P l , δ | n log 2 | [ t ] P l , δ | n .
   □
Proposition 8.
For a p-MSVDIS ( T , A , V , f , d ) , let P A . The following results can be deduced from each other:
(1) R P l , δ R d l ;
(2) H δ l ( d | P ) = 0 .
Proof. 
“(1) ⇒ (2)” is proved by Lemma 2.
(2) ⇒ (1). Suppose H δ l ( d | P ) = 0 . Then,
i = 1 n j = 1 r | [ t i ] P l , δ D j | n log 2 | [ t i ] P l , δ | | [ t i ] P l , δ D j | = 0 .
Suppose R P l , δ R d l . Then, i 0 { 1 , , n } , [ t i 0 ] P l , δ [ t i 0 ] d l . Denote
[ t i 0 ] d l = D j 0 ( j 0 { 1 , , r } ) .
We have
| [ t i 0 ] P l , δ | > | [ t i ] P l , δ D j 0 | .
It follows that
| R P l , δ ( t i 0 ) D j 0 | n log 2 | [ t i 0 ] P l , δ | | R P l , δ ( t i 0 ) D j 0 | > 0 .
Note that
i , j , | [ t i ] P l , δ | | [ t i ] P l , δ D j | .
Then,
i , j , | [ t i ] P l , δ D j | n log 2 | [ t i ] P l , δ | | [ t i ] P l , δ D j | 0 .
So,
i = 1 n j = 1 r | [ t i ] P l , δ D j | n log 2 | [ t i ] P l , δ | | [ t i ] P l , δ D j | > 0 .
This is a contradiction.
Thus, R P l , δ R d l .
   □
Corollary 2.
For a p-MSVDIS ( T , A , V , f , d ) , if ( T , A , V , f , d ) is δ-consistent, then H δ l ( d | A ) = 0 .
Proof. 
This is proven by Propositions 2 and 8.    □
Theorem 2.
For a δ-consistent p-MSVDIS ( T , A , V , f , d ) , let λ = | T u | | T | . The following results can be deduced from each other:
(1) P c o λ , δ p ( A ) ;
(2) I M λ , δ ( 2 ) ( P ) = 1 .
Proof. 
(1) ⇒ (2). Suppose P c o λ , δ p ( A ) . Then, P O S P l , δ ( d ) = P O S A l , δ ( d ) and i n d δ u ( P ) = i n d δ u ( A ) . Thus, Γ P l , δ ( d ) = Γ A l , δ ( d ) , | i n d δ u ( P ) | = | i n d δ u ( A ) | .
By Proposition 3,
j = 1 r ( | R A l , δ ̲ ( D j ) | | R P l , δ ̲ ( D j ) | ) = 0 .
Obviously, j , R A l , δ ̲ ( D j ) R P l , δ ̲ ( D j ) . This implies that j ,
| R A l , δ ̲ ( D j ) | | R P l , δ ̲ ( D j ) | 0 .
Then, j ,
| R A l , δ ̲ ( D j ) | | R P l , δ ̲ ( D j ) | = 0 .
It follows that j ,
R A l , δ ̲ ( D j ) = R P l , δ ̲ ( D j ) .
Thus, j ,
[ t ] A l , δ D j [ t ] P l , δ D j .
( T , A , V , f , d ) is δ -consistent. From Proposition 2, we have R A l , δ R d l .
Therefore, t T l ,
[ t ] A l , δ [ t ] d l .
Let [ t ] d l = D t ; here, D t { D 1 , , D r } . Then, t T l , [ t ] P l , δ D t = [ t ] d l . This implies R P l , δ R d l .
By Proposition 8, H δ l ( d | P ) = 0 .
( T , A , V , f , d ) is δ -consistent, from Corollary 2, H δ l ( d | A ) = 0 . Then, H δ l ( d | P ) = H δ l ( d | A ) .
By Proposition 7, I M λ , δ ( 2 ) ( P ) = 1 .
(2) ⇒ (1). Suppose I M λ , δ ( 2 ) ( P ) = 1 . Then, by Proposition 7, H δ l ( d | P ) = H δ l ( d | A ) , | i n d δ u ( P ) | = | i n d δ u ( A ) | .
Since i n d δ u ( P ) i n d δ u ( A ) , we have i n d δ u ( P ) = i n d δ u ( A ) .
Since ( T , A , V , f , d ) is δ -consistent, from Corollary 2, H δ l ( d | A ) = 0 . Thus, H δ l ( d | P ) = 0 . By Proposition 8, R P l , δ R d l .
Suppose that j 0 ,
R A l , δ ̲ ( D j 0 ) R P l , δ ̲ ( D j 0 ) .
Then, R A l , δ ̲ ( D j 0 ) R P l , δ ̲ ( D j 0 ) . Pick
t 0 R A l , δ ̲ ( D j 0 ) R P l , δ ̲ ( D j 0 ) .
It follows that
t 0 R A l , δ ̲ ( D j 0 ) , t 0 R P l , δ ̲ ( D j 0 ) .
t 0 R A l , δ ̲ ( D j 0 ) implies that t 0 R A l , δ ( t 0 ) D j 0 . Then, D j 0 = [ t 0 ] d l . t 0 R P l , δ ̲ ( D j 0 ) implies that [ t 0 ] P l , δ D j 0 . Thus, [ t 0 ] P l , δ [ t 0 ] d l . So, R P l , δ R d l . This is a contradiction.
Hence, j ,
R A l , δ ̲ ( D j ) R P l , δ ̲ ( D j ) .
Obviously, j , R A l , δ ̲ ( D j ) R P l , δ ̲ ( D j ) .
Then, j ,
R A l , δ ̲ ( D j ) = R P l , δ ̲ ( D j ) .
Thus,
P O S A l , δ ( d ) = j = 1 r R A l , δ ̲ ( D j ) = j = 1 r R P l , δ ̲ ( D j ) = P O S P l , δ ( d ) .
Hence, P c o λ , δ p ( A ) .
   □
Corollary 3.
For a δ-consistent p-MSVDIS ( T , A , V , f , d ) , let λ = | T u | | T | . The following results can be deduced from each other:
(1) P r e d λ , δ p ( A ) ;
(2) I M λ , δ ( 2 ) ( P ) = 1 and a P , I M λ , δ ( 2 ) ( P { a } ) < 1 .

5.2. Semi-Supervised Attribute Selection Algorithms in a p-MSVDIS

Next, the proposed two UMs are used to devise algorithms for semi-supervised attribute selection in a p-MSVDIS (Algorithms 1 and 2). Mathematics 13 01318 i001Mathematics 13 01318 i002
Since the time and space complexity of these two algorithms are the same, only one algorithm is discussed. These two algorithms are encoded in matrix format to obtain faster results in practical applications. Starting from the second step, the binary relations calculated on T are expressed in the form of matrices. Therefore, the time complexity of step 2 is O ( | A | | T | 2 ) . Steps 3–8 are O ( | A | | T | 2 + ( | A | 1 ) | T | 2 + , + | P | | T | 2 ) . All the complexity of SARM1 is O ( ( | A | 2 2 + | A | 2 | P | 2 2 + | P | 2 ) | T | 2 ) . After removing the low-order and constant terms, the complexity is O ( ( | A | 2 | P | 2 ) | T | 2 ) . The spatial complexity of SARM1 is O ( | A | | T | 2 ) .
In the algorithms’ flow, the third step establishes the loop execution condition: The iteration operation is continuously performed while I M λ , δ ( 1 , 2 ) ( P ) 1 5 % . During each cycle, an effective attribute is selected. The 5% here serves as a key parameter, which is used to control the final scale of selected attributes. If the number of filtered attributes is found to be excessive, this parameter value can be appropriately increased (e.g., from 5% to 10% or higher) to reduce the number of cycles, thereby effectively limiting the attribute count. Conversely, when the selected attributes are insufficient or the model’s classification accuracy is unsatisfactory, the parameter can be adjusted downward to 0%, allowing more feature attributes to be obtained through additional iterations.

6. Experimental Analysis

Many experiments conducted to illustrate the effectiveness of SARM1 and SARM2 are described in this section.

6.1. Numerical Experiment

To test the algorithm’s effectiveness on real datasets, we utilized SARM1 and SARM2, comparing them against existing algorithms. The experiments were conducted on a Lenovo computer equipped with an Intel(R) Core(TM) i7-9700 CPU running at 3.00 GHz, with 16 GB of memory and a Windows 10 operating system. MATLAB 2019 and SPSS 2018 were the software platforms used for computation. Table 4 presents the 11 datasets selected from the UCI Machine Learning Repository [39] for experimental analysis. Since it is difficult to find set-valued datasets in this database, we made some adjustments to the data. Real value datasets were rounded, and all datasets were randomly incomplete, missing 10% of the information values. This was set so that when incomplete information was encountered, it could be considered an MSVDIS (see Example 2). Due to the combined limitations of excessive sample sizes in the 11th dataset Con and insufficient computer memory, the algorithm’s execution is rendered unfeasible. To address this, repeated random sampling is adopted, whereby 50% of the samples are systematically selected for each experimental analysis.
SARM1 and SARM2 both have two variables that need to be given in advance: One is δ , and the other is λ . λ represents the rate of incomplete labels. For convenient calculation, 20% of labels were randomly incomplete ( λ = 20 % ). We need to observe how a change in δ affected the results. In order to study whether the attribute subsets can still maintain the classification accuracy of the original datasets, three classifiers, Bagged Trees (BT), Support Vector Machine (SVM), and K-nearest neighbors (KNN, with K = 5), are used for the accuracy analysis of the subset. The five-fold cross-validation method is applied for classification, and the average value is taken after the program ran 10 times. The relationship curves between the change in δ and the classification accuracy with three classifiers are shown in Figure 2. It can be seen that the curves of each subgraph are very oscillatory, which indicates that δ has a considerable impact on the subset and classification accuracy. Only the curves of PS and Aud are relatively smooth. According to Figure 2, we know how to choose δ to achieve the maximum accuracy for each dataset.
Next, we need to verify whether SARM1 and SARM2 are more effective than other similar algorithms. Five algorithms are selected from some references for comparison with SARM1 and SARM2. Given the absence of directly comparable studies on p-MSVDIS, semi-supervised attribute selection algorithms and their related counterparts are selected for comparative analysis. The compared algorithms fall into three categories: (1) FSRS [15] and FSDIS [40] algorithms designed for set-valued information systems; (2) SADA [41] and FSFS [20] attribute selection algorithms specifically developed for p-MSVDIS; (3) the Semi2MNR [42] algorithm, a semi-supervised feature selection method based on the minimum neighborhood redundancy and maximum neighborhood relevance criteria.
All comparative algorithms are reimplemented and experimentally validated. The number of selected attributes by each algorithm is summarized in Table 5. The numbers in bold are the optimal values. The average number of attributes calculated by FSRS is the least, while that of SARM2 is the most. However, the PW dataset in FSRS displays 30, which is the same as the number of attributes in the original dataset, indicating that the algorithm is invalid for the dataset. Similarly, the Spa and Con datasets of FSRS are “—”, which means that there are no results. Because the computer ran out of memory when calculating the datasets, it could not continue. The average number of attributes of SARM1 and SARM2 is 7.64 and 8.27, respectively. Although the values are not excellent, they are close to those of other algorithms, and the average value of SARM1 is close to the best seven in FSRS. Consequently, we conclude that the attribute subset effect of the two proposed algorithms is acceptable.
Then, three classifiers are used for accuracy analysis based on the reduced subsets. Table 6, Table 7 and Table 8 present the comparative results of each algorithm under three different classifiers. Bold font indicates the optimal value. It can be seen from Table 6, Table 7 and Table 8 that SARM1 performed the best in a total of 18 out of these 33 experiments, while SARM2 performed the best 5 times. Notably, SARM1 outperformed all competitors, with sustained accuracy rates of 0.8518, 0.8015, and 0.8131 in the three classification frameworks.
The experiments indicate that SARM1 in a p-MSVDIS outperforms the other algorithms in terms of accuracy. Of course, it is not enough to analyze the accuracy only; the Receiver Operating Characteristic (ROC) curve and Area Under Curve (AUC) must also be analyzed [43]. The ROC curve unites specificity and sensitivity via a graphic method, accurately reflecting their relationship. It is a comprehensive representative of classification accuracy. The ROC curve is a graph with the false-positive rate ( F P R ) on the horizontal axis and the true-positive rate ( T P R ) on the vertical axis. As the ROC curve approaches the upper left corner, the model’s performance improves because it indicates that a higher true-positive rate can be achieved while maintaining a low false-positive rate. The AUC represents the area under the ROC curve, serving as a quantitative measure of classifier performance. A higher AUC indicates superior classification performance. The ROC curves are depicted, accompanied by the calculated AUC values, in Figure 3 and Figure 4 and Table 9 and Table 10.
In Figure 3 and Figure 4, the red and blue lines represent SARM1 and SARM2, respectively. It can be seen that the red and blue lines in most of the subgraphs are closer to the upper horizontal border, which indicates that the algorithm in this study is significantly better. In subplots ACA and MB of Figure 3, the red and blue lines are not more convex relative to the frame than other lines, which is consistent with the calculated classification accuracy. The four curves in subplot PS of Figure 3 are overlapped, which shows that the classification effect of the four algorithms is very good. In the subplot OLD of Figure 4, the black, green, and magenta curves are concave, which indicates that the attribute subsets obtained by these three algorithms are poor and generate bad classification accuracy. Table 9 and Table 10 record the areas enclosed by the curves and x-axis in Figure 3 and Figure 4. As evident from the results, SARM1 consistently achieves optimal average AUC scores of 0.8855 and 0.8213 across different experimental conditions. By comparing Table 6 and Table 7 and Table 9 and Table 10, we find that the classification accuracy ranking of each algorithm aligns closely with the corresponding AUC value. This confirms that SARM1 and SARM2 do not suffer from significantly reduced accuracy due to the uneven distribution of samples.
Subsequently, computational time is adopted as an additional metric to evaluate algorithmic efficiency. Since FSRS failed to complete validation on datasets PW, Spa, and Con, the remaining available datasets are ultimately utilized for comparative efficiency analysis. As presented in Table 11 (unit: seconds), SARM1 demonstrated optimal time efficiency with an average execution time of merely 2.7914 s. SARM2 ranked second at 3.8629 s, while FSFS required substantially longer computation times (79.1526 s). These results confirm that SARM1 and SARM2 realize the dual optimization of classification accuracy and computational efficiency.

6.2. Statistical Analysis

Statistical analyses of the aforementioned results are described in this section. Further studies are required to determine whether SARM1 and SARM2 are more effective than other algorithms of this type. The Friedman test is applied to test the differences among the seven algorithms. The results of ranking the data in Table 6, Table 7 and Table 8 are shown in Table 12, Table 13 and Table 14. The data in Table 12, Table 13 and Table 14 were inputted into SPSS software for the Friedman test. Table 15 presents the calculation results. According to the data obtained by the three classifiers, the calculated p values are 0.0001, 0.00002, and 0.00006. When significance level is α = 0.05 , then α = 0.05 > 0.0001 , α = 0.05 > 0.00002 , and α = 0.05 > 0.00006 . This indicates that significant differences exist among the seven algorithms.
Given the significant differences among these seven algorithms, it is necessary to conduct further tests to determine which performed best. The Nemenyi test is used for the post hoc test, which is required to calculate the critical range C D of the difference between the average order values. The domain value C D is calculated as follows: C D = q α k ( k + 1 ) 6 N . Let α = 0.05 ; then, from Tukey’s q table, we have q α = 2.949 . If seven algorithms and eleven datasets are known, then k = 7 , N = 11 , and C D = 2.716 . The posterior results of these algorithms are plotted (see Figure 5). Subfigures (a), (b), and (c) in Figure 5 present the statistical analysis results based on the BT, SVM, and KNN classifiers, respectively. Figure 5 shows that the red and blue lines exhibit closer proximity to the y-axis compared to the other lines, indicating that the experimental results of SARM1 and SARM2 are better.
From Figure 5a, the following conclusions are obtained:
(a)
The classification accuracy of SARM1 is significantly superior to SADA, Semi2MNR, FSRS, and FSDIS;
(b)
In terms of statistics, there is no significant difference among SARM1, SARM2, and FSFS;
(c)
There is no obvious difference among SARM2, FSFS, SADA, Semi2MNR, FSRS, and FSDIS.
From Figure 5b, we also obtain the following conclusions:
(a)
The classification accuracy of SARM1 is better than that of FSDIS, SADA, and FSRS;
(b)
SARM2 is significantly better than SADA and FSRS;
(c)
In terms of statistics, there is no significant difference among SARM1, SARM2, Semi2MNR, FSFS, and FSDIS.
From Figure 5c, we also obtain the following conclusions:
(a)
The classification accuracy of SARM1 is better than that of SADA, FSRS, Semi2MNR, and FSDIS;
(b)
In terms of statistics, there is no significant difference among SARM1, SARM2, and FSFS;
(c)
There is no significant difference among SARM2, FSFS, SADA, FSRS, Semi2MNR, and FSDIS.
Obviously, SARM1 realizes the best performance in the experiments under the three classifiers. Although the differences between SARM1, SARM2, Semi2MNR, and FSFS are statistically nonsignificant, SARM1 and SARM2 always rank first and second on average. In summary, we conclude that SARM1 and SARM2 are superior to other comparison algorithms.

6.3. Parameter Analysis

This section describes another parameter, λ , in SARM1 and SARM2. λ represents the rate of incomplete labels, and λ = 20 % in the above experiments. In many real datasets, λ cannot be fixed to a value, and sometimes, only a small portion of labels is lost. We need to continue conducting experiments to explore the impact of λ on the attribute subsets. Given the initial condition, let δ = 0.6 in the algorithms; the reduced subsets are tested using the BT classifier. Let the incomplete rate λ = 0.1 , 0.2 , , 0.9 , and the step size is 0.1. After executing the algorithms repeatedly, the reduced subsets are found on each λ value. The BT classifier is used to evaluate the subsets. Figure 6 displays the results. The red line indicates SARM1, and the blue line indicates SARM2. It can be seen from Figure 6 that the curves in most subgraphs are relatively stable and remain within a certain range except where they fluctuate slightly in the subgraph (Spa) and (MB). This indicates that no matter what value λ takes, the attribute subsets generated by SARM1 and SARM2 have little influence on the classification accuracy. Consequently, we can conclude that SARM1 and SARM2 exhibit relatively stable characteristics and effectiveness in a p-MSVDIS.

7. Conclusions

In this study, we presented semi-supervised attribute selection algorithms in a p-MSVDIS. First, a p-MSVDIS was divided into two multiset-valued decision information systems: l-MSVDIS and u-MSVDIS. Two uncertainty measurements on an attribute subset were then established using the concepts of indistinguishable relations, distinguishable relations, and dependence functions. According to the two proposed measurements, some theories and properties were proven, and two algorithms, namely SARM1 and SARM2, were provided for attribute selection. The incomplete label rate λ = 20 % so as to evenly divide the system into two parts: l-MSVDIS and u-MSVDIS. In the experiment, five other algorithms of the same type were selected for comparison with SARM1 and SARM2, and 11 datasets, which were selected from UCI, were applied to examine the algorithms’ validity.
This study systematically evaluated SARM1 and SARM2 across 11 diverse datasets. The experimental results demonstrated that both algorithms achieved statistically significant improvements in computational efficiency (exhibiting the shortest average runtime) while simultaneously maintaining superior feature selection quality, outperforming baseline methods in classification accuracy across BT, SVM, and KNN classifiers (p < 0.05). Notably, SARM1 and SARM2 exhibited robust performance stability, as evidenced by consistently high AUC values during cross-dataset validation. These findings collectively indicate that the proposed algorithms possess inherent advantages in feature discriminability and model generalizability. Crucially, their enhanced computational efficiency was attained without compromising selection quality, demonstrating remarkable robustness against both data distribution variations and classifier selection.
While SARM1 and SARM2 demonstrate notable advantages in computational efficiency and accuracy, three key limitations warrant discussion: (1) Memory consumption scales linearly with sample size, potentially causing overflow when processing ultra-large datasets. (2) The parameter δ lacks theoretical selection criteria, requiring exhaustive grid search that increases time complexity. (3) Performance degradation occurs with low-quality input data, necessitating additional preprocessing modules. These identified constraints establish clear directions for future optimization, particularly in developing memory-efficient data streaming and automated hyperparameter tuning frameworks.
The algorithms discussed herein could benefit substantially from parallelization, potentially improving computational efficiency and reducing memory usage. However, their implementation would require careful management of output handling in parallel environments. We identify this as an important research direction worthy of methodical exploration in future studies. In future work, we plan to integrate approximation algorithms with parallel computing techniques to develop feature selection algorithms that achieve higher computational speed and lower memory consumption.

Author Contributions

Methodology and writing—original draft, Y.H.; methodology, editing, and investigation, J.H.; experiment and programming, H.L.; editing and investigation, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Innovative Research on VR+Online Open Course–Research on the Model of Guangke Ideological and Political Course (2022ZXKC519), Doctoral Research of Guangdong University of Science and Technology (GKY-2024BSQDK-11), and Science Foundation in Guangdong University of Science and Technology (GKY-023KYZDK-1).

Informed Consent Statement

The data used or analyzed during the current study are available from the corresponding author after the paper is accepted for publication.

Data Availability Statement

The data presented in this study are openly available in Machine Learning Repository http://archive.ics.uci.edu/datasets, accessed on 20 January 2024.

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Dai, J.H.; Xu, Q.; Wang, W.T.; Tian, H.W. Conditional entropy for incomplete decision systems and its application in data mining. Int. J. Gen. Syst. 2012, 41, 713–728. [Google Scholar] [CrossRef]
  2. Hempelmann, C.F.; Sakoglu, U.; Gurupur, V.P.; Jampana, S. An entropy-based evaluation method for knowledge bases of medical information systems. Expert Syst. Appl. 2016, 46, 262–273. [Google Scholar] [CrossRef]
  3. Wang, P.; Zhang, P.F.; Li, Z.W. A three-way decision method based on Gaussian kernel in a hybrid information system with images: An application in medical diagnosis. Appl. Soft Comput. 2019, 77, 734–749. [Google Scholar] [CrossRef]
  4. Navarrete, J.; Viejo, D.; Cazorla, M. Color smoothing for RGB-D data using entropy information. Appl. Soft Comput. 2016, 46, 361–380. [Google Scholar] [CrossRef]
  5. Cament, L.A.; Castillo, L.E.; Perez, J.P.; Galdames, F.J.; Perez, C.A. Fusion of local normalization and Gabor entropy weighted features for face identification. Pattern Recognit. 2014, 47, 568–577. [Google Scholar] [CrossRef]
  6. Swiniarski, R.W.; Skowron, A. Rough set methods in feature selection and recognition. Pattern Recognit. Lett. 2003, 24, 833–849. [Google Scholar] [CrossRef]
  7. Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
  8. Guan, Y.Y.; Wang, H.K. Set-valued information systems. Inf. Sci. 2006, 176, 2507–2525. [Google Scholar] [CrossRef]
  9. Liang, J.Y.; Qian, Y.H. Information granules and entropy theory in information systems. Sci. China (Ser. F) 2008, 51, 1427–1444. [Google Scholar] [CrossRef]
  10. Qian, Y.H.; Liang, J.Y.; Wu, W.Z.; Dang, C.Y. Information granularity in fuzzy binary GrC model. IEEE Trans. Fuzzy Syst. 2011, 19, 253–264. [Google Scholar] [CrossRef]
  11. Dai, J.H.; Wang, W.T.; Xu, Q. An uncertainty measure for incomplete decision tables and its applications. IEEE Trans. Cybern. 2013, 43, 1277–1289. [Google Scholar] [CrossRef]
  12. Zhang, P.; Li, T.; Wang, G.; Luo, C.; Chen, H. Multi-source information fusion based on rough set theory: A review. Inf. Fusion 2021, 68, 85–117. [Google Scholar] [CrossRef]
  13. Yang, L.; Zhang, X.; Xu, W.; Sang, B. Multi-granulation rough sets and uncertainty measurement for multi-source fuzzy information system. Int. J. Fuzzy Syst. 2019, 21, 1919–1937. [Google Scholar] [CrossRef]
  14. Hu, Q.H.; Yu, D.R.; Liu, J.; Wu, C. Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 2008, 178, 3577–3594. [Google Scholar] [CrossRef]
  15. Singh, S.; Shreevastava, S.; Som, T.; Somani, G. A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems. Soft Comput. 2020, 24, 4675–4691. [Google Scholar] [CrossRef]
  16. Wang, Y.B.; Chen, X.J.; Dong, K. Attribute reduction via local conditional entropy. Int. J. Mach. Learn. Cybern. 2019, 10, 3619–3634. [Google Scholar] [CrossRef]
  17. Dai, J.H.; Hu, H.; Zheng, G.J.; Hu, Q.H.; Han, H.F.; Shi, H. Attribute reduction in interval-valued information systems based on information entropies. Front. Inf. Technol. Electron. Eng. 2016, 17, 919–928. [Google Scholar] [CrossRef]
  18. Wang, C.Z.; Huang, Y.; Shao, M.W.; Hu, Q.H.; Chen, D.G. Feature selection based on neighborhood self-information. IEEE Trans. Cybern. 2020, 50, 4031–4042. [Google Scholar] [CrossRef]
  19. Dai, J.H.; Liu, Q. Semi-supervised attribute reduction for interval data based on misclassification cost. Int. J. Mach. Learn. Cybern. 2022, 13, 1739–1750. [Google Scholar] [CrossRef]
  20. Liu, K.Y.; Yang, X.B.; Yu, H.L.; Mi, J.S.; Wang, P.X. Rough set based semi-supervised feature selection via ensemble selector. Knowl.-Based Syst. 2019, 165, 282–296. [Google Scholar] [CrossRef]
  21. Kim, K. An improved semi-supervised dimensionality reduction using feature weighting: Application to sentiment analysis. Expert Syst. Appl. 2018, 109, 49–65. [Google Scholar] [CrossRef]
  22. Li, Z.; Tang, J. Semi-supervised local feature selection for data classification. Sci. China (Ser. F) 2021, 64, 192108. [Google Scholar] [CrossRef]
  23. Zhang, W.; Miao, D.Q.; Gao, C.; Li, F. Semi-supervised attribute reduction based on rough-subspace ensemble learning. J. Chin. Comput. Syst. 2016, 37, 2727–2732. [Google Scholar]
  24. Ma, M.H.; Deng, T.Q.; Wang, N.; Chen, Y.M. Semi-supervised rough fuzzy Laplacian Eigenmaps for dimensionality reduction. Int. J. Mach. Learn. Cybern. 2019, 10, 397–411. [Google Scholar] [CrossRef]
  25. Han, Y.H.; Yang, Y.; Yan, Y.; Ma, Z.G.; Zhou, X.F. Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 252–264. [Google Scholar]
  26. Moslemi, A.; Ahmadian, A. Subspace learning for feature selection via rank revealing QR factorization: Fast feature selection. Expert Syst. Appl. 2024, 256, 124919. [Google Scholar] [CrossRef]
  27. Bohrer, J.D.S.; Dorn, M. Enhancing classification with hybrid feature selection: A multi-objective genetic algorithm for high-dimensional data. Expert Syst. Appl. 2024, 255, 124518. [Google Scholar] [CrossRef]
  28. Sheikhpour, R.; Mohammadi, M.; Berahmand, K.; Movahed, F.S.; Khosravi, H. Robust semi-supervised multi-label feature selection based on shared subspace and manifold learning. Inf. Sci. 2025, 699, 121800. [Google Scholar] [CrossRef]
  29. Sheikhpour, R.; Berahmand, K.; Mohammadi, M.; Khosravi, H. Sparse feature selection using hypergraph Laplacian-based semi-supervised discriminant analysis. Pattern Recognit. 2025, 157, 110882. [Google Scholar] [CrossRef]
  30. Liu, K.Y.; Yang, X.B.; Ding, W.P.; Ju, H.R.; Li, T.R.; Wang, J.; Yin, T.Y. A survey on rough feature selection: Recent advances and challenges. IEEE/CAA J. Autom. Sin. 2025, 12, 125231. [Google Scholar]
  31. Saberi-Movahed, F.; Berahman, K.; Sheikhpour, R.; Li, Y.F.; Pan, S. Nonnegative matrix factorization in dimensionality reduction: A survey. arXiv 2024, arXiv:2405.03615. [Google Scholar]
  32. Yao, Y.Y. Information granulation and rough set approximation. Int. J. Intell. Syst. 2001, 16, 87–104. [Google Scholar] [CrossRef]
  33. Miyamoto, S. Information clustering based on fuzzy multisets. Inf. Process. Manag. 2003, 39, 195–213. [Google Scholar] [CrossRef]
  34. Zhao, X.R.; Hu, B.Q. Three-way decisions with decision-theoretic rough sets in multiset-valued information tables. Inf. Sci. 2020, 507, 684–699. [Google Scholar] [CrossRef]
  35. Jena, S.P.; Ghosh, S.K.; Tripathy, B.K. On the theory of bags and lists. Inf. Sci. 2001, 132, 241–254. [Google Scholar] [CrossRef]
  36. Nikulin, M.S. Hellinger distance, in Hazewinkel, Michiel, Encyclopedia of Mathematics; Springer Science: Berlin/Heidelberg, Germany, 2001; ISBN 978–1-55608-010-4. [Google Scholar]
  37. Huang, D.; Lin, H.; Li, Z.W. Information structures in a multiset-valued information system with application to uncertainty measurement. J. Intell. Fuzzy Syst. 2022, 43, 7447–7469. [Google Scholar] [CrossRef]
  38. Kryszkiewicz, M. Rules in incomplete information systems. Inf. Sci. 1999, 113, 271–292. [Google Scholar] [CrossRef]
  39. UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/datasets (accessed on 20 January 2024).
  40. Ahmed, W.; Sufyan, M.M.B.; Ahmad, T. Entropy based feature selection for fuzzy set-valued information systems. 3D Res. 2018, 9, 1–17. [Google Scholar] [CrossRef]
  41. Zhong, W.C.; Chen, X.J.; Nie, F.P.; Huang, J.Z. Adaptive discriminant analysis for semi-supervised feature selection. Inf. Sci. 2021, 566, 178–194. [Google Scholar] [CrossRef]
  42. Qian, D.; Liu, K.; Zhang, S.; Yang, X. Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy. Appl. Intell. 2024, 54, 7750–7764. [Google Scholar] [CrossRef]
  43. Narkhede, S. Understanding auc-roc curve. Towards Data Sci. 2018, 26, 220–227. [Google Scholar]
Figure 1. Flow chart of research.
Figure 1. Flow chart of research.
Mathematics 13 01318 g001
Figure 2. The correlation between δ and accuracy.
Figure 2. The correlation between δ and accuracy.
Mathematics 13 01318 g002aMathematics 13 01318 g002b
Figure 3. ROC curve under classifier BT.
Figure 3. ROC curve under classifier BT.
Mathematics 13 01318 g003aMathematics 13 01318 g003b
Figure 4. ROC curve under classifier SVM.
Figure 4. ROC curve under classifier SVM.
Mathematics 13 01318 g004aMathematics 13 01318 g004b
Figure 5. Nemenyi test of experimental results under three classifiers.
Figure 5. Nemenyi test of experimental results under three classifiers.
Mathematics 13 01318 g005
Figure 6. The correlation between λ and accuracy with BT.
Figure 6. The correlation between λ and accuracy with BT.
Mathematics 13 01318 g006aMathematics 13 01318 g006b
Table 1. An IDIS ( T , A , V , f , d ) .
Table 1. An IDIS ( T , A , V , f , d ) .
T a 1 a 2 a 3 d
t 1 SickTrueHighFlu
t 2 SickTrueLowFlu
t 3 MiddleNormalFlu
t 4 NoTrueNormalFlu
t 5 TrueNormalRhinitis
t 6 MiddleFalseRhinitis
t 7 NoFalseLowHealth
t 8 NoHealth
t 9 TrueLowHealth
Table 2. An MSVDIS ( T , A , V , f , d ) .
Table 2. An MSVDIS ( T , A , V , f , d ) .
T a 1 a 2 a 3 d
t 1 { 1 / S i c k , 0 / M i d d l e , 0 / N o } { 1 / T r u e , 0 / F a l s e } { 1 / H i g h , 0 / N o r m a l , 0 / L o w } Flu
t 2 { 1 / S i c k , 0 / M i d d l e , 0 / N o } { 1 / T r u e , 0 / F a l s e } { 0 / H i g h , 0 / N o r m a l , 1 / L o w } Flu
t 3 { 0 / S i c k , 1 / M i d d l e , 0 / N o } { 5 / T r u e , 2 / F a l s e } { 0 / H i g h , 1 / N o r m a l , 0 / L o w } Flu
t 4 { 0 / S i c k , 0 / M i d d l e , 1 / N o } { 1 / T r u e , 0 / F a l s e } { 0 / H i g h , 1 / N o r m a l , 0 / L o w } Flu
t 5 { 2 / S i c k , 2 / M i d d l e , 3 / N o } { 1 / T r u e , 0 / F a l s e } { 0 / H i g h , 1 / N o r m a l , 0 / L o w } Rhinitis
t 6 { 0 / S i c k , 1 / M i d d l e , 0 / N o } { 0 / T r u e , 1 / F a l s e } { 1 / H i g h , 3 / N o r m a l , 3 / L o w } Rhinitis
t 7 { 0 / S i c k , 0 / M i d d l e , 1 / N o } { 0 / T r u e , 1 / F a l s e } { 0 / H i g h , 0 / N o r m a l , 1 / L o w } Flu
t 8 { 0 / S i c k , 0 / M i d d l e , 1 / N o } { 5 / T r u e , 2 / F a l s e } { 1 / H i g h , 3 / N o r m a l , 3 / L o w } Health
t 9 { 2 / S i c k , 2 / M i d d l e , 3 / N o } { 1 / T r u e , 0 / F a l s e } { 0 / H i g h , 0 / N o r m a l , 1 / L o w } Health
Table 3. A p-MSVDIS ( T , A , V , f , d ) .
Table 3. A p-MSVDIS ( T , A , V , f , d ) .
T a 1 a 2 a 3 d
t 1 { 1 / S i c k , 0 / M i d d l e , 0 / N } { 1 / T r u e , 0 / F a l s e } { 1 / H i g h , 0 / N o r m a l , 0 / L o w } Flu
t 2 { 1 / S i c k , 0 / M i d d l e , 0 / N o } { 1 / T r u e , 0 / F a l s e } { 0 / H i g h , 0 / N o r m a l , 1 / L o w } Flu
t 3 { 0 / S i c k , 1 / M i d d l e , 0 / N o } { 5 / T r u e , 2 / F a l s e } { 0 / H i g h , 1 / N o r m a l , 0 / L o w }
t 4 { 0 / S i c k , 0 / M i d d l e , 1 / N o } { 1 / T r u e , 0 / F a l s e } { 0 / H i g h , 1 / N o r m a l , 0 / L o w } Flu
t 5 { 2 / S i c k , 2 / M i d d l e , 3 / N o } { 1 / T r u e , 0 / F a l s e } { 0 / H i g h , 1 / N o r m a l , 0 / L o w } Rhinitis
t 6 { 0 / S i c k , 1 / M i d d l e , 0 / N o } { 0 / T r u e , 1 / F a l s e } { 1 / H i g h , 3 / N o r m a l , 3 / L o w } Rhinitis
t 7 { 0 / S i c k , 0 / M i d d l e , 1 / N o } { 0 / T r u e , 1 / F a l s e } { 0 / H i g h , 0 / N o r m a l , 1 / L o w } Flu
t 8 { 0 / S i c k , 0 / M i d d l e , 1 / N o } { 5 / T r u e , 2 / F a l s e } { 1 / H i g h , 3 / N o r m a l , 3 / L o w } Health
t 9 { 2 / S i c k , 2 / M i d d l e , 3 / N o } { 1 / T r u e , 0 / F a l s e } { 0 / H i g h , 0 / N o r m a l , 1 / L o w }
Table 4. Basic information about the datasets.
Table 4. Basic information about the datasets.
IDDatasetLogogramSampleAttributeClass
1ArrhythmiaArr45227916
2Audit riskAud776262
3Australian Credit ApprovalACA690142
4Diabetic Retinopathy DebrecenDRD1151192
5Ozone-Level DetectionOLD2534732
6Parkinson SpeechPS1040262
7Phishing WebsitesPW2456302
8Image SegmentationIS2310197
9SpambaseSpa4601572
10Molecular BiologyMB3910613
11Connect-4Con67,557423
Table 5. Number of selected attributes.
Table 5. Number of selected attributes.
DatasetRawFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr2793355542
Aud261565644
ACA143665435
DRD193664843
OLD734566955
PS261656556
PW3030201817101717
IS194787867
Spa5779891516
MB61141099111016
Con42159761110
Average58.7377.737.917.187.367.648.27
Table 6. Assessing the accuracy of classification with BT.
Table 6. Assessing the accuracy of classification with BT.
DatasetRaw DataFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr0.69250.41150.42700.58190.55090.54200.58190.5941
Aud0.99360.99480.75130.96010.98030.95100.99870.9472
ACA0.80140.62320.70140.84140.84060.63620.71880.7667
DRD0.49610.51690.39700.52220.52910.56130.66550.6299
OLD0.94280.91400.83270.92380.92190.91830.93880.9357
PS10.98250.98880.65870.99080.972110.9998
PW0.95070.95520.95720.85830.90020.95070.96170.9581
IS0.96230.93160.54420.93550.92720.93680.96060.9195
Spa0.921100.84390.82130.82050.91440.93440.8496
MB0.83980.82100.77210.67930.91940.77050.89030.9075
Con0.710500.67570.70330.67500.69170.71900.7163
Average0.84640.65010.71740.77140.82330.80410.85180.8386
Table 7. Assessing the accuracy of classification with SVM.
Table 7. Assessing the accuracy of classification with SVM.
DatasetRaw DataFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr0.54200.48230.53980.53980.52250.53980.57300.5420
Aud0.97040.69970.59790.77960.84530.88270.90880.8557
ACA0.85650.60430.67250.85650.85360.64930.74780.7507
DRD0.52560.52480.43870.42310.52650.56730.57950.5308
OLD0.93650.93300.84650.84650.92480.93450.93690.9360
PS0.99710.92500.97880.58270.97880.98650.98560.9962
PW0.94910.94280.94830.90840.92920.91730.95400.9495
IS0.93900.80430.45320.83940.89610.88870.91950.8491
Spa0.930200.74400.77830.85720.85790.89700.8848
MB0.51540.68900.71290.51880.75050.57300.64330.706
Con0.662000.66330.65830.65970.66100.67100.7406
Average0.80220.60050.69050.70290.79490.76890.80150.7947
Table 8. Assessing the accuracy of classification with KNN.
Table 8. Assessing the accuracy of classification with KNN.
DatasetRaw DataFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr0.56860.50660.48450.41150.57300.54650.55300.5486
Aud0.97040.91750.88400.96390.98970.90460.94590.9240
ACA0.84060.60720.66230.83770.83330.61010.71370.7246
DRD0.59340.52220.55430.62990.66900.61420.66810.6299
OLD0.93250.93330.93210.93250.92940.93090.93530.9369
PS0.96350.99040.98940.57020.99420.98850.99230.9913
PW0.94260.94260.93400.89050.93120.92830.94500.9393
IS0.92990.88440.50390.86710.90560.90000.94290.9325
Spa0.907800.77720.68460.87650.84500.89110.9031
MB0.65240.68240.51940.51660.80250.59090.69090.6542
Con0.637700.65630.62130.55430.56430.66570.6583
Average0.81270.63510.71790.72050.82350.76580.81310.8039
Table 9. AUC under classifier BT.
Table 9. AUC under classifier BT.
DatasetFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr0.63140.64130.54780.56280.66240.82560.7145
Aud0.95430.91980.99580.99970.98090.99990.9652
ACA0.62820.77690.92370.90890.72660.69230.7074
DRD0.55880.61720.54350.58830.53500.71190.6802
OLD0.59090.70370.67430.72570.70160.74140.7811
PS110.590010.947911
PW00.99130.93180.98750.96540.99210.9938
IS0.99480.80440.99750.99910.98000.99990.9880
Spa00.86830.84150.80560.81330.92470.8689
MB0.90620.88880.69250.98210.76070.90580.8703
Con00.78430.75890.64110.82660.94700.9485
Average0.56950.81780.77250.83640.80910.88550.8653
Table 10. AUC under classifier SVM.
Table 10. AUC under classifier SVM.
DatasetFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr0.61920.59250.53700.56130.61600.73290.6666
Aud0.99700.92310.97010.96840.98090.95960.9433
ACA0.62820.77690.92370.90890.71410.69230.7074
DRD0.55880.61720.54350.58830.65310.71190.6802
OLD0.38710.40450.54380.38770.58600.61710.5355
PS0.93650.97450.61950.96770.74260.99941
PW00.98540.95780.97470.94000.98510.9852
IS0.93190.74140.99870.99570.96510.99600.9284
Spa00.86990.85050.93020.91760.94430.9340
MB0.64040.83320.58540.80920.76070.64090.6947
Con00.65560.53370.55730.57520.75510.6933
Average0.51810.76130.73310.78630.76830.82130.7971
Table 11. Comparison of computational speed (unit: seconds).
Table 11. Comparison of computational speed (unit: seconds).
DatasetFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr44.264679.03899.5137173.45232.05700.67151.2215
Aud1.54122.25473.297439.01630.667250.27920.5720
ACA0.96660.61142.381837.81340.322450.14880.1587
DRD2.43094.14977.433826.04880.798140.31840.7071
OLD132.5813313.427450.0988165.32088.34923.30653.5238
PS3.26696.26065.754240.47056.30040.30361.0441
IS42.602968.544927.251129.07933.94281.29974.0554
MB301.141088.942573.0712122.019611.956116.003219.6206
Average66.099470.403822.350379.15264.29922.79143.8629
Table 12. The ranking of classification accuracies with BT.
Table 12. The ranking of classification accuracies with BT.
DatasetFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr762.5452.51
Aud2743516
ACA7512643
DRD6754312
OLD6734512
PS5473612
PW4376512
IS4735216
Spa7456213
MB4571632
Con7536412
Table 13. The ranking of classification accuracies with SVM.
Table 13. The ranking of classification accuracies with SVM.
DatasetFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr7446412
Aud6754213
ACA7512643
DRD5674213
OLD46.56.55312
PS64.574.5231
PW4375612
IS6752314
Spa7654312
MB4271653
Con7365421
Table 14. The ranking of classification accuracies with KNN.
Table 14. The ranking of classification accuracies with KNN.
DatasetFSRSFSDISSADAFSFSSemi2MNRSARM1SARM2
Arr5671423
Aud5721634
ACA7512643
DRD763.51523.5
OLD3547621
PS4571623
PW2475613
IS5763412
Spa7563421
MB3671524
Con7346512
Table 15. Friedman test of experimental results under three classifiers.
Table 15. Friedman test of experimental results under three classifiers.
ClassifiersSourceSSdfMS χ 2 p-Value
Groups126.32621.0527.110.0001
BTError181.18603.02
Total17559
Groups145.23624.231.430.00002
SVMError159.77602.66
Total30576
Groups135.32622.5529.040.00006
KNNError172.18602.87
Total307.576
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, Y.; He, J.; Liu, H.; Li, Z. Semi-Supervised Attribute Selection Algorithms for Partially Labeled Multiset-Valued Data. Mathematics 2025, 13, 1318. https://doi.org/10.3390/math13081318

AMA Style

He Y, He J, Liu H, Li Z. Semi-Supervised Attribute Selection Algorithms for Partially Labeled Multiset-Valued Data. Mathematics. 2025; 13(8):1318. https://doi.org/10.3390/math13081318

Chicago/Turabian Style

He, Yuanzi, Jiali He, Haotian Liu, and Zhaowen Li. 2025. "Semi-Supervised Attribute Selection Algorithms for Partially Labeled Multiset-Valued Data" Mathematics 13, no. 8: 1318. https://doi.org/10.3390/math13081318

APA Style

He, Y., He, J., Liu, H., & Li, Z. (2025). Semi-Supervised Attribute Selection Algorithms for Partially Labeled Multiset-Valued Data. Mathematics, 13(8), 1318. https://doi.org/10.3390/math13081318

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop