Semi-Supervised Attribute Selection Algorithms for Partially Labeled Multiset-Valued Data

Yuanzi He; Jiali He; Haotian Liu; Zhaowen Li

doi:10.3390/math13081318

,

and

¹

College of Computer Science, Guangdong University of Science and Technology, Dongguan 523083, China

²

Key Laboratory of Complex System Optimization and Big Data Processing, Department of Guangxi Education, Yulin Normal University, Yulin 537000, China

³

Center for Applied Mathematics of Guangxi, Yulin Normal University, Yulin 537000, China

^*

Authors to whom correspondence should be addressed.

Mathematics2025, 13(8), 1318;https://doi.org/10.3390/math13081318

Version Notes

Order Reprints

Abstract

In machine learning, when the labeled portion of data needs to be processed, a semi-supervised learning algorithm is used. A dataset with missing attribute values or labels is referred to as an incomplete information system. Addressing incomplete information within a system poses a significant challenge, which can be effectively tackled through the application of rough set theory (R-theory). However, R-theory has its limits: It fails to consider the frequency of an attribute value and then cannot the distribution of attribute values appropriately. If we consider partially labeled data and replace a missing attribute value with the multiset of all possible attribute values under the same attribute, this results in the emergence of partially labeled multiset-valued data. In a semi-supervised learning algorithm, in order to save time and costs, a large number of redundant features need to be deleted. This study proposes semi-supervised attribute selection algorithms for partially labeled multiset-valued data. Initially, a partially labeled multiset-valued decision information system (p-MSVDIS) is partitioned into two distinct systems: a labeled multiset-valued decision information system (l-MSVDIS) and an unlabeled multiset-valued decision information system (u-MSVDIS). Subsequently, using the indistinguishable relation, distinguishable relation, and dependence function, two types of attribute subset importance in a p-MSVDIS are defined: the weighted sum of l-MSVDIS and u-MSVDIS determined by the missing rate of labels, which can be considered an uncertainty measurement (UM) of a p-MSVDIS. Next, two adaptive semi-supervised attribute selection algorithms for a p-MSVDIS are introduced, which leverage the degrees of importance, allowing for automatic adaptation to diverse missing rates. Finally, experiments and statistical analyses are conducted on 11 datasets. The outcome indicates that the proposed algorithms demonstrate advantages over certain algorithms.

Keywords:

partially labeled multiset-valued data; p-MSVDIS; uncertainty measurement; semi-supervised attribute selection; dependence function; information entropy

MSC:

68T09; 68T99; 68W40

1. Introduction

1.1. Research Background

Uncertainty measurement (UM) is applied extensively in various domains, such as data mining [1], medical diagnosis [2,3], image processing [4], and pattern recognition [5,6].

An information system (IS) was proposed by Pawlak [7] based on R-theory. A set-valued IS [8] refers to a system in which each sample or data point in a dataset is associated with one or more sets. These sets can be discrete or continuous, and they describe the categories or attributes to which each sample belongs. These systems often lose information due to human or mechanical reasons, sometimes involving feature values and sometimes involving sample labels. We refer to an IS in which some samples have labels, while those that do not are referred to as partially labeled data. Incomplete ISs are a common problem.

R-theory is widely used to measure information granularity in an IS. Many scholars have published research on this aspect. For instance, Liang et al. [9] studied information granulation in an IS. Qian et al. [10] investigated the measurement of fuzzy information granulation from five perspectives. Dai et al. [11] studied UM based on an incomplete decision IS. Zhang et al. [12] discussed applications and developments in the field of information fusion, focusing on the introduction and evaluation of multi-source information fusion methods based on R-theory. Yang et al. [13] explored how to effectively address uncertainty issues in multi-source fuzzy information systems through R-theory and uncertainty measurement methods.

In the current context, information plays a vital role in social development, but a significant amount of information is marked by useless attributes. In machine learning, in order to save time and costs, a large number of redundant features need to be deleted. Therefore, attribute selection is becoming increasingly important in data processing. The dimension of data can be reduced, and the original efficiency of the data can be maintained via attribute selection. Hu et al. [14] established a rough set model using a neighborhood and used it for attribute selection. Singh et al. [15] developed a method using a set-valued IS for attribute selection. Wang et al. [16] designed a novel attribute selection algorithm in view of local conditional entropy. Dai et al. [17] used information entropy to study the attribute selection of an interval-valued IS. Wang et al. [18] researched a greedy algorithm base for designing UMs for reduction.

Semi-supervised attribute selection fully improves data utilization by mining unlabeled data information so as to achieve the goal of improving classification accuracy. Many scholars are very interested in this field. Dai et al. [19] studied semi-supervised attribute selection for interval data and proposed a novel entropy structure in view of the misclassification cost for attribute selection. Liu et al. [20] proposed utilizing common methods to predict unknown labels and then constructed multiple fitness functions to assess the importance of attributes for reduction. Kim [21] obtained a semi-supervised dimensionality reduction framework that takes into account both the label and structural information. Li et al. [22] aimed to improve the effectiveness of feature selection by considering a subset of features of a specific category. Zhang et al. [23] integrated R-theory into an ensemble learning framework, leveraging labeled data to create an ensemble base classifier capable of both labeling unlabeled data and augmenting the existing labeled dataset. Ma et al. [24] presented a semi-supervised rough fuzzy Laplacian Eigenmaps method for the attribute selection of high-dimensional mixed data. The importance of each attribute was evaluated using defined information entropy measures. Han et al. [25] developed a semi-supervised attribute selection algorithm in view of spline regression, which can effectively handle video semantic recognition and other related problems. Compared to singular-value decomposition and non-negative matrix factorization, rank-revealing QR factorization is more computationally efficient. Thus, Moslemi et al. [26] presented a novel unsupervised feature selection technique that leverages rank-revealing QR factorization. Bohrer et al. [27] proposed a hybrid feature selection approach using a multi-objective genetic algorithm to enhance classification performance and reduce dimensionality across diverse classification tasks. Sheikhpour et al. [28] put forward a robust semi-supervised multi-label feature selection method that integrates shared subspace learning, graph Laplacian-based manifold learning, and norm minimization in both the loss function and regularization, and Sheikhpour et al. [29] also studied sparse feature selection using hypergraph Laplacian-based semi-supervised discriminant analysis.

However, attribute selection poses numerous challenges [30,31]: discretizing continuous attributes may compromise the structural integrity of the data, while the operational overhead of attribute selection increases with larger datasets, often resulting in diminished reduction quality.

1.2. Motivation and Contributions

Missing data are common in data mining and machine learning. Missing data raise uncertainty and can reduce the capacity of machine learning models. Missing data can be categorized into three classes: (1) missing completely at random (MCAR); (2) missing at random (MAR); and (3) missing not at random (MNAR). In this paper, we study MCAR. A common approach for handling missing data is to discard records with missing data (listwise deletion and pairwise deletion). This approach is convenient. However, some information can be lost, and this approach may lead to some bias in some circumstances. For the models that cannot deal with data with missing information values by themselves, missing data imputation is needed. In statistics, imputation is the process of filling missing information values by some reasonable rules. Imputation is a complicated task because it can create bias and can lead to inaccurate results, especially for MAR and MNAR.

Set-valued data are important for processing datasets with missing attributes [32]. Specifically, the use of set-valued data replaces missing attribute values with a set comprising all possible attribute values under the same attribute, while existing attribute values are represented by a single point set containing the actual attribute value. This process transforms a dataset with missing attribute values into a set-valued information system (SVIS). However, this approach has limitations: It do not consider the frequency of the attribute values. Here, the frequency of the attribute values can be interpreted as the importance or the attribute values. It can be calculated by the number of occurrences, which leads to the occurrence of multisets. To address this issue, a multiset-valued decision information system (MSVDIS) is proposed, in which missing attribute values are replaced with multisets. “Replacing missing attribute values with multisets” is more appropriate than “replacing missing attribute values with with a set comprising all possible attribute values under the same attribute”. By considering the frequency of attribute values, this method maximizes the extraction of useful information from datasets with missing attributes, offering a novel approach to handling incomplete data. Miyamoto [33] presented a model for information clustering based on fuzzy multisets, utilizing it to execute the clustering process. Zhao et al. [34] studied two rough set models and introduced loss functions for computing the expected costs of data in an MSVIS.

Multiset is an important concept of mathematics and computer science that extends the definition of traditional set. It allows elements to be repeated and record multiplicities, and it offers more powerful expression capabilities than traditional sets in data processing, algorithm design, mathematical modeling, and other fields.

This study investigates UMs in a partially labeled multiset-valued decision information system (p-MSVDIS) and considers semi-supervised attribute selection in a p-MSVDIS. Based on the above research motivation, the novel contributions of this article are as follows:

(1): Merely substituting a missing attribute value with the set of all potential values is overly simplistic and risks losing valuable information. This study advocates for the utilization of multisets to address missing attribute values. Furthermore, it demonstrates the conversion of multisets into probability distribution sets, enabling the calculation of the Hellinger distance based on these distributions to measure the dissimilarity between attribute values in an MSVDIS.
(2): This explains that p-MSVDIS can induce two MSVDISs: one of which is a l-MSVDIS, and the other is a u-MSVDIS.
(3): Considering indistinguishable relations, distinguishable relations, and dependence functions, this study introduces two types of importance measures for each attribute subset within a p-MSVDIS. These measures are derived from the weighted sum of importance assigned to the induced l-MSVDIS and u-MSVDIS. This combined measure, termed UM, provides a comprehensive reflection of the importance or classification capability of the attribute subset within the given p-MSVDIS.
(4): The performance of the defined importance on real datasets is examined. This study uses the datasets to construct two heuristic algorithms for semi-supervised attribute selection in a p-MSVDIS.

1.3. Organization

In Section 2, an MSDIS is recalled. In Section 3, a p-MSVDIS is introduced. In Section 4, two types of importance in a p-MSVDIS are defined. In Section 5, semi-supervised attribute selection in a p-MSVDIS is defined, and two corresponding algorithms are advanced. In Section 6, experiments on the performance of the proposed algorithms and effectiveness analysis are conducted. In Section 7, this study is summarized.

Figure 1 depicts a flow chart of this research.

Figure 1. Flow chart of research.

2. Preliminaries

In this section, an MSDIS is reviewed.

In this study, let

T = {t_{1}, t_{2}, \dots, t_{n}}

, and

| X |

is the cardinality of X.

2.1. Multisets and Probability Distribution Sets

Definition 1

([35]). A multiset M extract from X is proposed by a function

M : X \to N \cup {0}

.

If

M (x) = m

, this means that x appears m times in M, which is recorded as

x \in^{m} M

.

Given

X = {x_{1}, x_{2}, \dots, x_{l}}

, if

M (x_{i}) = m_{i} (i = 1, 2, \dots, s)

, then M is recorded as

{m_{1} / x_{1}, m_{2} / x_{2}, \dots, m_{l} / x_{l}}

, i.e.,

M = {m_{1} / x_{1}, m_{2} / x_{2}, \dots, m_{l} / x_{l}} .

Definition 2.

X = {x_{1}, x_{2}, \dots, x_{l}}

. Suppose

S = \{\frac{x_{1}, x_{2}, \dots, x_{l}}{s_{1}, s_{2}, \dots, s_{l}}\} .

If

\forall i

,

0 \leq s_{i} \leq 1

, and

\sum_{i = 1}^{l} s_{i} = 1

, then S is referred to as a probability distribution set (PDS) on X. Moreover, if

\forall i

,

s_{i}

is a rational number, S is referred to as a probability distribution set (RPDS) on X.

Definition 3

([36]). S and T are two PDSs on X. Denote

S = \{\frac{x_{1}, x_{2}, \dots, x_{l}}{s_{1}, s_{2}, \dots, s_{l}}\}, T = \{\frac{x_{1}, x_{2}, \dots, x_{l}}{t_{1}, t_{2}, \dots, t_{l}}\} .

Then, the Hellinger distance between S and T is defined as

H D (S, T) = \frac{1}{\sqrt{2}} \sqrt{\sum_{i = 1}^{l} {(\sqrt{s_{i}} - \sqrt{t_{i}})}^{2}} .

Definition 4.

X = {x_{1}, x_{2}, \dots, x_{l}}

and

M = {m_{1} / x_{1}, m_{2} / x_{2}, \dots, m_{l} / x_{l}}

is a multiset drawn from X. We inesrt

S_{M} = \{\frac{x_{1}, x_{2}, \dots, x_{l}}{s_{1}, s_{2}, \dots, s_{l}}\},

where

s_{i} = \frac{m_{i}}{m_{1} + m_{2} + \dots + m_{l}} (i = 1, 2, \dots, s)

. Then,

S_{M}

is an RPDS on X.

2.2. Multiset-Valued Decision Information Systems

Let T be a finite sample set and

A T

a finite attribute set.

(T, A T, V, f)

is then referred to as an information system (IS) if

f : T \times A T \to V

is an information function, where

V = ⋃_{a \in A T} V_{a}

,

V_{a} = {f (t, a) : t \in T}

.

If

(T, A \cup {d}, V, f)

is an IS and d is a decision attribute, then

(T, A, V, f, d)

is referred to as a decision information system (DIS).

∗ shows the unknown information value. If

\exists a \in A

,

* \in V_{a}

, but

* \notin V_{d}

, then a DIS

(T, A, V, f, d)

is an IDIS.

\forall a \in A \cup {d}

, and we denote

V_{a}^{*} = V_{a} - {a (t) : a (t) = *} .

Example 1.

Table 1 is an IDIS

(T, A, V, f, d)

, where

T = {t_{1}, t_{2}, \dots, t_{9}}

,

A = {a_{1}, a_{2}, a_{3}}

,

V_{d}^{*} = V_{d} = {F l u, R h i n i t i s, H e a l t h} .

Table 1. An IDIS

(T, A, V, f, d)

.

Definition 5

([37]). An IDIS

(T, A, V, f, d)

is referred to as an MSVDIS if

\forall a \in A, \forall t \in T

;

a (t)

is a multiset drawn from the same set.

Definition 6

([37]).

(T, A, V, f, d)

is an IDIS with

T = {t_{1}, t_{2}, \dots, t_{n}}

. Let

a \in A

. Denote

V_{a}^{*} = {x_{1}, x_{2}, \dots, x_{l}}

.

\forall i

,

m_{i}

expresses the number of occurrences of

x_{i}

in

{a (t_{1}), a (t_{2}), \dots, a (t_{n})} - {*}

, where

V_{a}^{*}

is an ordinary set and

{a (t_{1}), a (t_{2}), \dots, a (t_{n})} - {*}

is a multiset. If

a (t) = *

, then

a (t)

is taken the place of

{m_{1} / x_{1}, m_{2} / x_{2}, \dots, m_{l} / x_{l}}

; if

a (t) = x_{j}

, then

a (t)

is taken the place of

{0 / x_{1}, \dots, 0 / x_{j - 1}, 1 / x_{j}, 0 / x_{j + 1}, \dots, 0 / x_{l}}

. After this treatment,

(T, A, V, f, d)

is an MSVDIS. It is the MSVDIS induced by the IDIS

(T, A, V, f, d)

.

Example 2. (Continued from Example 1) According to Definition 6, an IDIS in Table 1 can be expressed as an MSVDIS in Table 2.

Table 2. An MSVDIS

(T, A, V, f, d)

.

3. A Partially Labeled Multiset-Valued Information System

3.1. The Definition of a p-MSVDIS

Below, ∗ means the unknown information value, and ⋄ shows the unknown label.

(T, A, V, f, d)

is an IDIS.

\forall a \in A

, and we denote

V_{a}^{*} = V_{a} - {a (t) : a (t) = *} .

Insert

V_{d}^{⋄} = V_{d} - {d (t) : d (t) = ⋄} .

From the above, we know that

V_{a}^{*}

denotes all known information values of a.

Definition 7.

(T, A, V, f, d)

is a DIS. Insert

T^{l} = {t \in T : d (t) \neq ⋄}, T^{u} = {t \in T : d (t) = ⋄} .

Then,

T^{l} \cup T^{u} = T, T^{l} \cap T^{u} = \emptyset

.

(1)

(T, A, V, f, d)

is referred to as an l-MSVDIS if

a \in A

and

t \in T

,

a (t) = *

and

T^{l} = T

exist.

(2)

(T, A, V, f, d)

is referred to as a p-MSVDIS if

a \in A

and

t \in T

,

a (t) = *

,

T^{l} \neq \emptyset

and

T^{u} \neq \emptyset

exist.

(3)

(T, A, V, f, d)

is referred to as a u-MSVDIS if

a \in A

and

t \in T

,

a (t) = *

and

T^{u} = T

exist.

Obviously,

V_{d}^{⋄} = {d (t) : t \in T^{l}} .

Denote

| T | = n, | T^{u} | = n_{u}, | T^{l} | = n_{l} .

Since each sample has no label in a u-MSVDIS

(T, A, V, f, d)

, we think that

(T, A, V, f, d)

can be seen as

(T, A, V, f)

.

Definition 8.

(T^{l}, A, V, f, d)

and

(T^{u}, A, V, f, d)

are called the l-MSVDIS and u-MSVDIS induced by

(T, A, V, f, d)

, respectively.

(T, A, V, f, d)

can be viewed as the result of the information fusion of

(T^{l}, A, V, f, d)

and

(T^{u}, A, V, f, d)

.

λ = \frac{| T^{u} |}{| T |} = \frac{n_{u}}{n}

is referred to as the incomplete rate of labels.

Example 3.

Table 3 is a p-MSVDIS

(T, A, V, f, d)

:

V_{a_{1}}^{*} = {S i c k, M i d d l e, N o}, V_{a_{2}}^{*} = {T r u e, F a l s e},

V_{a_{3}}^{*} = {H i g h, N o r m a l, L o w};

V_{d}^{*} = {F l u, R h i n i t i s, H e a l t h} \neq V_{d};

T^{l} = {t_{1}, t_{2}, t_{4}, t_{5}, t_{6}, t_{7}, t_{8}}, T^{u} = {t_{3}, t_{9}} .

Table 3. A p-MSVDIS

(T, A, V, f, d)

.

3.2. A Novel Distance Function in a p-MSVDIS

For effective discrimination between samples within a p-MSVDIS, a novel distance function is provided.

Definition 9.

For a p-MSVDIS

(T, A, V, f, d)

, let

a \in A

and

t, t^{'} \in T^{l}

. Then, the distance between

a (t)

and

a (t^{'})

is defined as

ρ (a (t), a (t^{'})) = \{\begin{matrix} 0, & t = t^{'}; \\ 0, & t \neq t^{'}, a \in A, a (t) = * o r a (t^{'}) = *, d (t) = d (t^{'}); \\ 1 - \frac{1}{| V_{a}^{*} |^{2}}, & t \neq t^{'}, a \in A, a (t) = *, a (t^{'}) = *, d (t) \neq d (t^{'}); \\ 1 - \frac{1}{| V_{a}^{*} |}, & t \neq t^{'}, a \in A, a (t) \neq *, a (t^{'}) = *, d (t) \neq d (t^{'}); \\ 1 - \frac{1}{| V_{a}^{*} |}, & t \neq t^{'}, a \in A, a (t) = *, a (t^{'}) \neq *, d (t) \neq d (t^{'}); \\ 0, & t \neq t^{'}, a \in A, a (t) \neq *, a (t^{'}) \neq *, a (t) = a (t^{'}), d (t) = d (t^{'}); \\ 0, & t \neq t^{'}, a \in A, a (t) \neq *, a (t^{'}) \neq *, a (t) = a (t^{'}), d (t) \neq d (t^{'}); \\ H D (P_{a (t)}, P_{a (t^{'})}), & t \neq t^{'}, a \in A, a (t) \neq *, a (t^{'}) \neq *, a (t) \neq a (t^{'}), d (t) = d (t^{'}); \\ H D (P_{a (t)}, P_{a (t^{'})}), & t \neq t^{'}, a \in A, a (t) \neq *, a (t^{'}) \neq *, a (t) \neq a (t^{'}), d (t) \neq d (t^{'}) . \end{matrix}

Definition 10.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

, and we denote

R_{P}^{l, δ} = {(t, t^{'}) \in T^{l} \times T^{l} : \forall a \in P, ρ (a (t), a (t^{'})) \leq δ},

{[t]}_{P}^{l, δ} = {t^{'} \in T^{l} : (t, t^{'}) \in R_{P}^{l, δ}};

R_{d}^{l} = {(t, t^{'}) \in T^{l} \times T^{l} : d (t) = d (t^{'})},

{[t]}_{d}^{l} = {t^{'} \in T^{l} : (t, t^{'}) \in R_{d}^{l}};

\underset{̲}{R_{P}^{l, δ}} (X) = {t \in T^{l} : {[t]}_{P}^{l, δ} \subseteq X}, X \subseteq T^{l};

T^{l} / d = {{[t]}_{d}^{l} : t \in T^{l}} = {D_{1}, \dots, D_{r}};

P O S_{P}^{l, δ} (d) = ⋃_{i = 1}^{r} \underset{̲}{R_{P}^{l, δ}} (D_{i}) .

Definition 11.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

, and we denote

d i s_{d}^{l, δ} (P) = {(t, t^{'}) \in T^{l} \times T^{l} : \exists a \in P, ρ (a (t), a (t^{'})) > δ ⋀ d (t) \neq d (t^{'})} .

Then,

d i s_{d}^{l, δ} (P)

is referred to as the relative discernibility relation of P relative to d on

T^{l}

.

Definition 12.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

. Then,

i n d_{δ}^{u} (P) = {(t, t^{'}) \in T^{u} \times T^{u} : \forall a \in P, ρ (a (t), a (t^{'})) \leq δ}

is referred to as the discernibility relation of P on

T^{u}

.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

. According to Kryszkiewicz’s ideal [38],

\partial_{P}^{l, δ} : T^{l} \to 2^{V_{d}^{⋄}}

is defined as follows:

\partial_{P}^{l, δ} (t) = d ({[t]}_{P}^{l, δ}),

Then,

\partial_{P}^{l, δ}

is referred to as generalized decision in

(T^{l}, P, d)

.

Definition 13.

For a p-MSVDIS

(T, A, V, f, d)

, if ∀

t \in T^{l}

,

| \partial_{A}^{l, δ} (t) | = 1

, then

(T, A, V, f, d)

is referred to as δ-consistent; otherwise,

(T, A, V, f, d)

is referred to as δ-inconsistent.

Proposition 1.

(T, A, V, f, d)

is a p-MSVDIS. Given

P \subseteq A

and

t \in T^{l}

, then

R_{P}^{l, δ} \subseteq R_{d}^{l} \Leftrightarrow \forall t \in T^{l}, | \partial_{P}^{l, δ} (t) | = 1 .

Proof.

“⇒”: Let

R_{P}^{l, δ} \subseteq R_{d}^{l}

. Then,

\forall t \in T^{l}

,

{[t]}_{P}^{l, δ} \subseteq {[t]}_{d}^{l}

. Suppose

t^{''} \in \partial_{P}^{l, δ} (t)

. Then,

\exists t^{'} \in {[t]}_{P}^{l, δ}

,

t^{''} = d (t^{'})

.

t^{'} \in {[t]}_{P}^{l, δ}

implies that

t^{'} \in {[t]}_{d}^{l}

. So,

t^{''} = d (t^{'}) = d (t)

. Thus,

| \partial_{P}^{l, δ} (t) | = 1 .

“⇐”: Let

\forall t \in T^{l}

,

| \partial_{P}^{l, δ} (t) | = 1

. Suppose

t^{'} \in {[t]}_{P}^{l, δ}

. Then,

d (t^{'}) \in \partial_{P}^{l, δ} (t)

. Since

d (t) \in \partial_{P}^{l, δ} (t)

and

| \partial_{P}^{l, δ} (t) | = 1

,

d (t) = d (t^{'})

. Then,

t^{'} \in {[t]}_{d}^{l}

. Thus,

{[t]}_{P}^{l, δ} \subseteq {[t]}_{d}^{l}

. This shows that

R_{P}^{l, δ} \subseteq R_{d}^{l}

. □

Proposition 2.

A p-MSVDIS

(T, A, V, f, d)

is δ-consistent ⇔

R_{A}^{l, δ} \subseteq R_{d}^{l} .

Proof.

This is easily proven via Proposition 1. □

4. Importance in a p-MSVDIS

In this section, two types of importances in a p-MSVDIS are defined.

4.1. Type 1 Importance in a p-MSVDIS

Definition 14.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

, and insert

Γ_{P}^{l, δ} (d) = \frac{| P O S_{P}^{l, δ} (d) |}{n_{l}};

Then,

Γ_{P}^{l, δ} (d)

is referred to as the dependence of P on d in

T^{l}

.

Proposition 3.

For a p-MSVDIS

(T, A, V, f, d)

, denote

T^{l} / R_{d}^{l} = {D_{1}, \dots, D_{r}} .

(1)

Γ_{P}^{l, δ} (d) = \frac{\sum_{i = 1}^{r} | \underset{̲}{R_{P}^{l, δ}} (D_{i}) |}{n_{l}} .

(2)

0 \leq Γ_{P}^{l, δ} (d) \leq 1

.

(3) If

P \subseteq Q \subseteq A

, then

Γ_{P}^{l, δ} (d) \leq Γ_{Q}^{l, δ} (d) .

Proof.

(1) Obviously,

\forall i

,

\underset{̲}{R_{P}^{l, δ}} (D_{i}) \subseteq D_{i}

.

Since

{D_{1}, \dots, D_{r}}

is a partition of

T^{l}

, we have

| P O S_{P}^{l, δ} (d) | = | ⋃_{i = 1}^{r} \underset{̲}{R_{P}^{l, δ}} (D_{i}) | = \sum_{i = 1}^{r} | \underset{̲}{R_{P}^{l, δ}} (D_{i}) | .

Thus,

Γ_{P}^{l, δ} (d) = \frac{\sum_{i = 1}^{r} | \underset{̲}{R_{P}^{l, δ}} (D_{i}) |}{n_{l}} .

(2) This holds by (1).

(3) Suppose

P \subseteq Q \subseteq A

; then,

\forall t \in T^{l}

,

{[t]}_{Q}^{l, δ} \subseteq {[t]}_{P}^{l, δ}

. So,

\forall i, \underset{̲}{R_{P}^{l, δ}} (D_{i}) \subseteq \underset{̲}{R_{Q}^{l, δ}} (D_{i}) .

This implies that

\forall i, | \underset{̲}{R_{P}} (D_{i}) \leq | \underset{̲}{R_{Q}} (D_{i}) | .

By (1),

Γ_{P}^{l, δ} (d) \leq Γ_{Q}^{l, δ} (d) .

□

Definition 15.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

. The type 1 importance of P is then defined as

I M_{λ, δ}^{(1)} (P) = (1 - λ) \frac{Γ_{P}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)} + λ \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} .

Example 4.

On the basis of the p-MSVDIS

(T, A, V, f, d)

in Table 3, the incomplete rate is

λ = \frac{2}{9}

and

n_{l} = 7

. Let

δ = 0.5

; then,

R_{{a_{1}}}^{l, δ} = {(t_{1}, t_{1}), (t_{1}, t_{2}), (t_{2}, t_{1}), (t_{2}, t_{2}), (t_{4}, t_{4}), (t_{4}, t_{7}), (t_{4}, t_{8}), (t_{5}, t_{5}), (t_{6}, t_{6}), (t_{7}, t_{4}), (t_{7}, t_{7}), (t_{7}, t_{8}), (t_{8}, t_{4}), (t_{8}, t_{7}), (t_{8}, t_{8})}

;

R_{{a_{2}}}^{l, δ} = {(t_{1}, t_{1}), (t_{1}, t_{2}), (t_{1}, t_{4}), (t_{1}, t_{5}), (t_{2}, t_{1}), (t_{2}, t_{2}), (t_{2}, t_{4}), (t_{2}, t_{5}), (t_{4}, t_{1}), (t_{4}, t_{2}), (t_{4}, t_{4}), (t_{4}, t_{5}), (t_{5}, t_{1}), (t_{5}, t_{2}), (t_{5}, t_{4}), (t_{5}, t_{5}), (t_{6}, t_{6}), (t_{6}, t_{7}), (t_{6}, t_{8}), (t_{7}, t_{6}), (t_{7}, t_{7}), (t_{7}, t_{8}), (t_{8}, t_{6}), (t_{8}, t_{7}), (t_{8}, t_{8})}

;

R_{{a_{3}}}^{l, δ} = {(t_{1}, t_{1}), (t_{2}, t_{2}), (t_{2}, t_{7}), (t_{4}, t_{4}), (t_{4}, t_{5}), (t_{5}, t_{4}), (t_{5}, t_{5}), (t_{6}, t_{6}), (t_{7}, t_{2}), (t_{7}, t_{7}), (t_{8}, t_{8})}

;

R_{d}^{l, δ} = {(t_{1}, t_{1}), (t_{1}, t_{2}), (t_{1}, t_{4}), (t_{1}, t_{7}), (t_{2}, t_{1}), (t_{2}, t_{2}), (t_{2}, t_{4}), (t_{2}, t_{7}), (t_{4}, t_{1}), (t_{4}, t_{2}), (t_{4}, t_{4}), (t_{4}, t_{7}), (t_{5}, t_{5}), (t_{5}, t_{6}), (t_{6}, t_{5}), (t_{6}, t_{6}), (t_{7}, t_{1}), (t_{7}, t_{2}), (t_{7}, t_{4}), (t_{7}, t_{7}), (t_{8}, t_{8})}

;

i n d_{δ}^{u} ({a_{1}}) = {(t_{3}, t_{3}), (t_{9}, t_{9})}

;

i n d_{δ}^{u} ({a_{2}}) = {(u_{3}, u_{3}), (u_{9}, u_{9})}

;

i n d_{δ}^{u} ({a_{3}}) = {(t_{3}, t_{3}), (t_{9}, t_{9})}

.

Thus,

Γ_{{a_{1}}}^{l, δ} (d) = \frac{(2 + 2 + 0)}{7} \approx 0.5714

. Similarly,

Γ_{{a_{2}}}^{l, δ} (d) \approx 0.1429

,

Γ_{{a_{3}}}^{l, δ} (d) \approx 0.4286

and

Γ_{A}^{l, δ} (d) = 1

.

Since

| i n d_{δ}^{u} ({a_{1}}) | = | i n d_{δ}^{u} ({a_{2}}) | = | i n d_{δ}^{u} ({a_{3}}) | = 2

and

| i n d_{δ}^{u} (A) | = 2

, hence

I M_{λ, δ}^{(1)} ({a_{1}}) = (1 - \frac{2}{9}) * (\frac{0.5714}{1}) + \frac{2}{9} * \frac{2}{2} \approx 0.6667

.

Similarly,

I M_{λ, δ}^{(1)} ({a_{2}}) \approx 0.3333

,

I M_{λ, δ}^{(1)} ({a_{3}}) \approx 0.5556

.

Proposition 4.

For a p-MSVDIS

(T, A, V, f, d)

, we have the following:

(1)

0 \leq I M_{λ, δ}^{(1)} (P) \leq 1

;

(2)

I M_{λ, δ}^{(1)} (A) = 1

;

(3)

P \subseteq Q \subseteq A

implies

I M_{λ, δ}^{(1)} (P) \leq I M_{λ, δ}^{(1)} (Q)

;

(4)

I M_{λ, δ}^{(1)} (P) = 1

⇔

Γ_{P}^{l, δ} (d) = Γ_{A}^{l, δ} (d)

,

| i n d_{δ}^{u} (P) | = | i n d_{δ}^{u} (A) |

.

Proof.

“(1) and (2)” are obvious.

(3) Since

P \subseteq Q \subseteq A

,

Γ_{P}^{l, δ} (d) \leq Γ_{Q}^{l, δ} (d), | i n d_{δ}^{u} (Q) | \leq | i n d_{δ}^{u} (P) | .

Then,

\frac{Γ_{P}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)} \leq \frac{Γ_{Q}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)}, \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} \leq \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (Q) |} .

Thus,

(1 - λ) \frac{Γ_{P}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)} \leq (1 - λ) \frac{Γ_{Q}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)}, λ \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} \leq λ \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (Q) |} .

Hence,

I M_{λ, δ}^{(1)} (P) \leq I M_{λ, δ}^{(1)} (Q)

.

(4) “⇐” is clear. Below, we prove “⇒”.

Suppose

I M_{λ, δ}^{(1)} (P) = 1

; then,

(1 - λ) \frac{Γ_{P}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)} + λ \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} = 1 = (1 - λ) + λ .

This implies that

(1 - λ) (1 - \frac{Γ_{P}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)}) + λ (1 - \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |}) = 0 .

Note that

1 - \frac{Γ_{P}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)} = \frac{Γ_{A}^{l, δ} (d) - Γ_{P}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)} \geq 0

,

1 - \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} = \frac{| i n d_{δ}^{u} (P) | - | i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} \geq 0

. Then, we have

1 - \frac{Γ_{P}^{l, δ} (d)}{Γ_{A}^{l, δ} (d)} = 0

,

1 - \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} = 0

. Then,

Γ_{P}^{l, δ} (d) = Γ_{A}^{l, δ} (d) a n d | i n d_{δ}^{u} (P) | = | i n d_{δ}^{u} (A) | .

□

4.2. Type 2 Importance in a p-MSVDIS

Definition 16.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

; insert

H_{δ}^{l} (d | P) = - \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{r} \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{n_{l}} {log}_{2} \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{| {[t_{i}]}_{P}^{l, δ} |} .

Then,

H_{δ}^{l} (d | P)

is referred to as the conditional information entropy of P to d in

T^{l}

.

Proposition 5.

For a p-MSVDIS

(T, A, V, f, d)

, let

| T^{l} | = n_{l}

. If

P \subseteq Q \subseteq A

, then

H_{δ}^{l} (Q | d) \leq H_{δ}^{l} (d | P) .

Proof.

Denote

T^{l} / R_{d}^{l} = {D_{1}, \dots, D_{r}};

p_{i j}^{(1)} = | {[t_{i}]}_{P}^{l, δ} \cap D_{j} |, p_{i j}^{(2)} = | {[t_{i}]}_{P}^{l, δ} \cap (T^{l} - D_{j}) |;

q_{i j}^{(1)} = | {[t_{i}]}_{Q}^{l, δ} (t_{i}) \cap D_{j} |, q_{i j}^{(2)} = | {[t_{i}]}_{Q}^{l, δ} (t_{i}) \cap (T^{l} - D_{j}) | .

Then,

\forall i, j, | {[t_{i}]}_{P}^{l, δ} | = p_{i j}^{(1)} + p_{i j}^{(2)}, | R_{Q}^{l, δ} (t_{i}) | = q_{i j}^{(1)} + q_{i j}^{(2)} .

Obviously,

\forall i

,

{[t_{i}]}_{Q}^{l, δ} (t_{i}) \subseteq {[t_{i}]}_{P}^{l, δ} .

Then,

\forall i, j, 0 \leq q_{i j}^{(1)} \leq p_{i j}^{(1)}, 0 \leq q_{i j}^{(2)} \leq p_{i j}^{(2)} .

Let

f (x, y) = - x {log}_{2} \frac{x}{x + y} (x > 0, y \geq 0)

.

f (x, y)

then increases with respect to x and y, respectively.

\begin{matrix} H_{δ}^{l} (d | P) & = - \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{r} \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{n_{l}} {log}_{2} \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{| {[t_{i}]}_{P}^{l, δ} |} \\ = - \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{r} \frac{p_{i j}^{(1)}}{n_{l}} {log}_{2} \frac{p_{i j}^{(1)}}{p_{i j}^{(1)} + p_{i j}^{(2)}} \\ ≜ \frac{1}{n_{l}} \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{r} f (p_{i j}^{(1)}, p_{i j}^{(2)}) . \end{matrix}

\begin{matrix} H_{δ}^{l} (Q | d) & = - \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{r} \frac{| {[t_{i}]}_{Q}^{l, δ} (t_{i}) \cap D_{j} |}{n_{l}} {log}_{2} \frac{| {[t_{i}]}_{Q}^{l, δ} (t_{i}) \cap D_{j} |}{| {[t_{i}]}_{Q}^{l, δ} (t_{i}) |} \\ = - \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{r} \frac{q_{i j}^{(1)}}{n_{l}} {log}_{2} \frac{q_{i j}^{(1)}}{q_{i j}^{(1)} + q_{i j}^{(2)}} \\ ≜ \frac{1}{n_{l}} \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{r} f (q_{i j}^{(1)}, q_{i j}^{(2)}) . \end{matrix}

Since

q_{i j}^{(1)} \leq p_{i j}^{(1)}, q_{i j}^{(2)} \leq p_{i j}^{(2)},

we have

f (q_{i j}^{(1)}, q_{i j}^{(2)}) \leq f (p_{i j}^{(1)}, q_{i j}^{(2)}) \leq f (p_{i j}^{(1)}, p_{i j}^{(2)}) .

Thus,

H_{δ}^{l} (Q | d) \leq H_{δ}^{l} (d | P) .

□

Proposition 6.

For a p-MSVDIS

(T, A, V, f, d)

, given

P \subseteq A

, then

H_{δ}^{l} (d | P) \geq 0

.

Proof.

Obviously,

\forall i, j, \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{n_{l}} \geq 0

. Since

\forall i, j, \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{| {[t_{i}]}_{P}^{l, δ} |} \leq 1,

we have

\forall i, j, {log}_{2} \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{| {[t_{i}]}_{P}^{l, δ} |} \leq 0 .

Thus,

H_{δ}^{l} (d | P) \geq 0

. □

Definition 17.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

. The type 2 importance of P is then defined as

I M_{λ, δ}^{(2)} (P) = (1 - λ) \frac{H_{δ}^{l} (d | A)}{H_{δ}^{l} (d | P)} + λ \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} .

Example 5.

On the basis of the p-MSVDIS

(T, A, V, f, d)

in Table 3 and Example 4,

I M_{λ, δ}^{(2)} (P)

is calculated.

H_{δ}^{l} ({a_{1}} | d) = \frac{2}{7} * {log}_{2} (\frac{2}{2}) + \frac{2}{7} * {log}_{2} (\frac{2}{2}) + \frac{2}{7} * {log}_{2} (\frac{2}{3}) + \frac{0}{7} * {log}_{2} (\frac{0}{1}) + \frac{0}{7} * {log}_{2} (\frac{0}{1}) + \frac{2}{7} * {log}_{2} (\frac{2}{3}) + \frac{2}{7} * {log}_{2} (\frac{2}{3}) + \frac{0}{7} * {log}_{2} (\frac{0}{2}) + \frac{0}{7} * {log}_{2} (\frac{0}{2}) + \frac{0}{7} * {log}_{2} (\frac{0}{3}) + \frac{1}{7} * {log}_{2} (\frac{1}{1}) + \frac{1}{7} * {log}_{2} (\frac{1}{1}) + \frac{0}{7} {log}_{2} (\frac{0}{3}) + \frac{0}{7} * {log}_{2} (\frac{0}{3}) + \frac{0}{7} * {log}_{2} (\frac{0}{2}) + \frac{0}{7} * {log}_{2} (\frac{0}{2}) + \frac{1}{7} * {log}_{2} (\frac{1}{3}) + \frac{0}{7} * {log}_{2} (\frac{0}{1}) + \frac{0}{7} * {log}_{2} (\frac{0}{1}) + \frac{1}{7} {log}_{2} (\frac{1}{3}) + \frac{1}{7} * {log}_{2} (\frac{1}{3}) \approx 1.1807

.

Similarly,

H_{δ}^{l} ({a_{2}} | d) \approx 3.8922

,

H_{δ}^{l} ({a_{3}} | d) \approx 0.5714

, and

H_{δ}^{l} (d | A) = 0

.

Since

| i n d_{δ}^{u} ({a_{1}}) | = | i n d_{δ}^{u} ({a_{2}}) | = | i n d_{δ}^{u} ({a_{3}}) | = 2

and

| i n d_{δ}^{u} (A) | = 2

, hence

I M_{λ, δ}^{(2)} ({a_{1}}) = (1 - \frac{2}{9}) * \frac{0}{1.1807)} + \frac{2}{9} * \frac{2}{2} \approx 0.2222

.

Similarly,

I M_{λ, δ}^{(2)} ({a_{2}}) \approx 0.2222

,

I M_{λ, δ}^{(2)} ({a_{3}}) \approx 0.2222

.

Proposition 7.

For a p-MSVDIS

(T, A, V, f, d)

, we have the following:

(1)

0 \leq I M_{λ, δ}^{(2)} (P) \leq 1

;

(2)

I M_{λ, δ}^{(2)} (A) = 1

;

(3)

P \subseteq Q \subseteq A

implies to

I M_{λ, δ}^{(2)} (Q) \leq I M_{λ, δ}^{(2)} (P)

;

(4)

I M_{λ, δ}^{(2)} (P) = 1

⇔

H_{δ}^{l} (d | P) = H_{δ}^{l} (d | A)

,

| i n d_{δ}^{u} (P) | = | i n d_{δ}^{u} (A) |

.

Proof.

“(1) and (2)” are obvious.

(3) Since

P \subseteq Q \subseteq A

, we have

H_{δ}^{l} (Q | d) \leq H_{δ}^{l} (d | P), | i n d_{δ}^{u} (Q) | \leq | i n d_{δ}^{u} (P) | .

Then,

\frac{H_{δ}^{l} (Q | d)}{H_{δ}^{l} (d | A)} \leq \frac{H_{δ}^{l} (d | P)}{H_{δ}^{l} (d | A)}, \frac{| i n d_{δ}^{u} (Q) |}{| i n d_{δ}^{u} (A) |} \leq \frac{| i n d_{δ}^{u} (P) |}{| i n d_{δ}^{u} (A) |} .

Thus,

(1 - λ) \frac{H_{δ}^{l} (Q | d)}{H_{δ}^{l} (d | A)} \leq (1 - λ) \frac{H_{δ}^{l} (d | P)}{H_{δ}^{l} (d | A)}, λ \frac{| i n d_{δ}^{u} (Q) |}{| i n d_{δ}^{u} (A) |} \leq λ \frac{| i n d_{δ}^{u} (P) |}{| i n d_{δ}^{u} (A) |} .

Hence,

I M_{λ, δ}^{(1)} (Q) \leq I M_{λ, δ}^{(1)} (P)

.

(4) “⇐” is clear. Below, we prove “⇒”.

Suppose

I M_{λ, δ}^{(2)} (P) = 1

. Then,

(1 - λ) \frac{H_{δ}^{l} (d | A)}{H_{δ}^{l} (d | P)} + λ \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} = 1 = (1 - λ) + λ .

This implies that

(1 - λ) (1 - \frac{H_{δ}^{l} (d | A)}{H_{δ}^{l} (d | P)}) + λ (1 - \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |}) = 0 .

Note that

1 - \frac{H_{δ}^{l} (d | A)}{H_{δ}^{l} (d | P)} = \frac{H_{δ}^{l} (d | P) - H_{δ}^{l} (d | A)}{H_{δ}^{l} (d | P)} \geq 0

,

1 - \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} = \frac{| i n d_{δ}^{u} (P) | - | i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} \geq 0

. Then, we have

1 - \frac{H_{δ}^{l} (d | A)}{H_{δ}^{l} (d | P)} = 0

,

1 - \frac{| i n d_{δ}^{u} (A) |}{| i n d_{δ}^{u} (P) |} = 0

. Then, the following is the case:

H_{δ}^{l} (d | A) = H_{δ}^{l} (d | P), | i n d_{δ}^{u} (P) | = | i n d_{δ}^{u} (A) | .

□

5. Semi-Supervised Attribute Selection in a p-MSVDIS

In this section, semi-supervised attribute selection in a p-MSVDIS is explored.

5.1. The Definition of Semi-Supervised Attribute Selection in a p-MSVDIS

Definition 18.

For a p-MSVDIS

(T, A, V, f, d)

, let

λ = \frac{| T^{u} |}{| T |}

and

P \subseteq A

. Then, P is referred to as a coordinate subset of A with respect to d in a p-MSVDIS

(T, A, V, f, d)

; if

P O S_{P}^{l, δ} (d) = P O S_{A}^{l, δ} (d)

,

i n d_{δ}^{u} (P) = i n d_{δ}^{u} (A)

.

The family of all coordinate subsets of A with respect to d in a p-MSVDIS

(T, A, V, f, d)

is recorded as

c o_{λ, δ}^{p} (A)

.

Definition 19.

For a p-MSVDIS

(T, A, V, f, d)

, let

λ = \frac{| T^{u} |}{| T |}

and

P \subseteq A

. P is then referred to as a reduct of A with respect to d in a p-MSVDIS

(T, A, V, f, d)

; if

P \in c o_{λ, δ}^{p} (A)

and

\forall a \in P

,

P - {a} \notin c o_{λ, δ}^{p} (A)

.

The family of all reducts of A with respect to d in a p-MSVDIS

(T, A, V, f, d)

is recorded as

r e d_{λ, δ}^{p} (A)

.

Theorem 1.

For a p-MSVDIS

(T, A, V, f, d)

, let

λ = \frac{| T^{u} |}{| T |}

. The following results can then be deduced from each other:

(1)

P \in c o_{λ, δ}^{p} (A)

;

(2)

I M_{λ, δ}^{(1)} (P) = 1

.

Proof.

The proof is obvious. □

Corollary 1.

For a p-MSVDIS

(T, A, V, f, d)

, let

λ = \frac{| T^{u} |}{| T |}

and

P \subseteq A

. The following results can then be deduced from each other:

(1)

P \in r e d_{λ, δ}^{p} (A)

;

(2)

I M_{λ, δ}^{(1)} (P) = 1

and

\forall a \in P

,

I M_{λ, δ}^{(1)} (P - {a}) < 1

.

Proof.

This follows from Theorem 1. □

Lemma 1.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

. If

R_{P}^{l, δ} \subseteq R_{d}^{l}

, then

\forall t \in T^{l}

and j:

{[t]}_{P}^{l, δ} \cap D_{j} = \{\begin{matrix} {[t]}_{P}^{l, δ} & t \in D_{j} \\ \emptyset & u \notin D_{j} \end{matrix} .

Proof.

If

t \in D_{j}

, then

D_{j} = {[t]}_{d}^{l}

. Since

R_{P}^{l, δ} \subseteq R_{d}^{l}

,

{[t]}_{P}^{l, δ} \subseteq {[t]}_{d}^{l}

. Thus,

{[t]}_{P}^{l, δ} \cap D_{j} = {[t]}_{P}^{l, δ}

.

If

u \notin D_{j}

, then

{[t]}_{d}^{l} \cap D_{j} = \emptyset

. Since

R_{P}^{l, δ} \subseteq R_{d}^{l}

,

{[t]}_{P}^{l, δ} \subseteq {[t]}_{d}^{l}

. Thus,

{[t]}_{P}^{l, δ} \cap D_{j} = \emptyset

. □

Lemma 2.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

. If

R_{P}^{l, δ} \subseteq R_{d}^{l}

, then

\forall t \in T^{l}

\sum_{j = 1}^{r} \frac{{| [t]}_{P}^{l, δ} \cap D_{j} |}{n} {log}_{2} \frac{{| [t]}_{P}^{l, δ} \cap D_{j} |}{n} = \frac{{| [t]}_{P}^{l, δ} |}{n} {log}_{2} \frac{{| [t]}_{P}^{l, δ} |}{n} .

Proof.

Since

{D_{1}, \dots, D_{r}}

is a partition of

T^{l}

, we have

\exists j_{0}

,

t \in D_{j_{0}}

.

Since

R_{P}^{l, δ} \subseteq R_{d}^{l}

, by Lemma 1, we have

{[t]}_{P}^{l, δ} \cap D_{j} = \{\begin{matrix} {[t]}_{P}^{l, δ} & j = j_{0} \\ \emptyset & j \neq j_{0} \end{matrix} .

Thus,

\sum_{j = 1}^{r} \frac{{| [t]}_{P}^{l, δ} \cap D_{j} |}{n} {log}_{2} \frac{{| [t]}_{P}^{l, δ} \cap D_{j} |}{n} = \frac{{| [t]}_{P}^{l, δ} |}{n} {log}_{2} \frac{{| [t]}_{P}^{l, δ} |}{n} .

□

Proposition 8.

For a p-MSVDIS

(T, A, V, f, d)

, let

P \subseteq A

. The following results can be deduced from each other:

(1)

R_{P}^{l, δ} \subseteq R_{d}^{l}

;

(2)

H_{δ}^{l} (d | P) = 0

.

Proof.

“(1) ⇒ (2)” is proved by Lemma 2.

(2) ⇒ (1). Suppose

H_{δ}^{l} (d | P) = 0 .

Then,

\sum_{i = 1}^{n} \sum_{j = 1}^{r} \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{n} {log}_{2} \frac{| {[t_{i}]}_{P}^{l, δ} |}{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |} = 0 .

Suppose

R_{P}^{l, δ} ⊈ R_{d}^{l} .

Then,

\exists i_{0} \in {1, \dots, n}

,

{[t_{i_{0}}]}_{P}^{l, δ} ⊈ {[t_{i_{0}}]}_{d}^{l} .

Denote

{[t_{i_{0}}]}_{d}^{l} = D_{j_{0}} (j_{0} \in {1, \dots, r}) .

We have

| {[t_{i_{0}}]}_{P}^{l, δ} | > | {[t_{i}]}_{P}^{l, δ} \cap D_{j_{0}} | .

It follows that

\frac{| R_{P}^{l, δ} (t_{i_{0}}) \cap D_{j_{0}} |}{n} {log}_{2} \frac{| {[t_{i_{0}}]}_{P}^{l, δ} |}{| R_{P}^{l, δ} (t_{i_{0}}) \cap D_{j_{0}} |} > 0 .

Note that

\forall i, j, | {[t_{i}]}_{P}^{l, δ} | \geq | {[t_{i}]}_{P}^{l, δ} \cap D_{j} | .

Then,

\forall i, j, \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{n} {log}_{2} \frac{| {[t_{i}]}_{P}^{l, δ} |}{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |} \geq 0 .

So,

\sum_{i = 1}^{n} \sum_{j = 1}^{r} \frac{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |}{n} {log}_{2} \frac{| {[t_{i}]}_{P}^{l, δ} |}{| {[t_{i}]}_{P}^{l, δ} \cap D_{j} |} > 0 .

This is a contradiction.

Thus,

R_{P}^{l, δ} \subseteq R_{d}^{l} .

□

Corollary 2.

For a p-MSVDIS

(T, A, V, f, d)

, if

(T, A, V, f, d)

is δ-consistent, then

H_{δ}^{l} (d | A) = 0

.

Proof.

This is proven by Propositions 2 and 8. □

Theorem 2.

For a δ-consistent p-MSVDIS

(T, A, V, f, d)

, let

λ = \frac{| T^{u} |}{| T |}

. The following results can be deduced from each other:

(1)

P \in c o_{λ, δ}^{p} (A)

;

(2)

I M_{λ, δ}^{(2)} (P) = 1

.

Proof.

(1) ⇒ (2). Suppose

P \in c o_{λ, δ}^{p} (A)

. Then,

P O S_{P}^{l, δ} (d) = P O S_{A}^{l, δ} (d)

and

i n d_{δ}^{u} (P) = i n d_{δ}^{u} (A)

. Thus,

Γ_{P}^{l, δ} (d) = Γ_{A}^{l, δ} (d)

,

| i n d_{δ}^{u} (P) | = | i n d_{δ}^{u} (A) |

.

By Proposition 3,

\sum_{j = 1}^{r} (| \underset{̲}{R_{A}^{l, δ}} (D_{j}) | - | \underset{̲}{R_{P}^{l, δ}} (D_{j}) |) = 0 .

Obviously,

\forall j

,

\underset{̲}{R_{A}^{l, δ}} (D_{j}) \supseteq \underset{̲}{R_{P}^{l, δ}} (D_{j}) .

This implies that

\forall j

,

| \underset{̲}{R_{A}^{l, δ}} (D_{j}) | - | \underset{̲}{R_{P}^{l, δ}} (D_{j}) | \geq 0 .

Then,

\forall j

,

| \underset{̲}{R_{A}^{l, δ}} (D_{j}) | - | \underset{̲}{R_{P}^{l, δ}} (D_{j}) | = 0 .

It follows that

\forall j

,

\underset{̲}{R_{A}^{l, δ}} (D_{j}) = \underset{̲}{R_{P}^{l, δ}} (D_{j}) .

Thus,

\forall j

,

{[t]}_{A}^{l, δ} \subseteq D_{j} \Leftrightarrow {[t]}_{P}^{l, δ} \subseteq D_{j} .

(T, A, V, f, d)

is

δ

-consistent. From Proposition 2, we have

R_{A}^{l, δ} \subseteq R_{d}^{l} .

Therefore,

\forall t \in T^{l}

,

{[t]}_{A}^{l, δ} \subseteq {[t]}_{d}^{l} .

Let

{[t]}_{d}^{l} = D^{t}

; here,

D^{t} \in {D_{1}, \dots, D_{r}}

. Then,

\forall t \in T^{l}

,

{[t]}_{P}^{l, δ} \subseteq D^{t} = {[t]}_{d}^{l} .

This implies

R_{P}^{l, δ} \subseteq R_{d}^{l} .

By Proposition 8,

H_{δ}^{l} (d | P) = 0 .

(T, A, V, f, d)

is

δ

-consistent, from Corollary 2,

H_{δ}^{l} (d | A) = 0 .

Then,

H_{δ}^{l} (d | P) = H_{δ}^{l} (d | A) .

By Proposition 7,

I M_{λ, δ}^{(2)} (P) = 1

.

(2) ⇒ (1). Suppose

I M_{λ, δ}^{(2)} (P) = 1

. Then, by Proposition 7,

H_{δ}^{l} (d | P) = H_{δ}^{l} (d | A),

| i n d_{δ}^{u} (P) | = | i n d_{δ}^{u} (A) |

.

Since

i n d_{δ}^{u} (P) \supseteq i n d_{δ}^{u} (A)

, we have

i n d_{δ}^{u} (P) = i n d_{δ}^{u} (A)

.

Since

(T, A, V, f, d)

is

δ

-consistent, from Corollary 2,

H_{δ}^{l} (d | A) = 0

. Thus,

H_{δ}^{l} (d | P) = 0 .

By Proposition 8,

R_{P}^{l, δ} \subseteq R_{d}^{l} .

Suppose that

\exists j_{0}

,

\underset{̲}{R_{A}^{l, δ}} (D_{j_{0}}) ⊈ \underset{̲}{R_{P}^{l, δ}} (D_{j_{0}}) .

Then,

\underset{̲}{R_{A}^{l, δ}} (D_{j_{0}}) - \underset{̲}{R_{P}^{l, δ}} (D_{j_{0}}) \neq \emptyset .

Pick

t^{0} \in \underset{̲}{R_{A}^{l, δ}} (D_{j_{0}}) - \underset{̲}{R_{P}^{l, δ}} (D_{j_{0}}) .

It follows that

t^{0} \in \underset{̲}{R_{A}^{l, δ}} (D_{j_{0}}), t^{0} \notin \underset{̲}{R_{P}^{l, δ}} (D_{j_{0}}) .

t^{0} \in \underset{̲}{R_{A}^{l, δ}} (D_{j_{0}})

implies that

t^{0} \in R_{A}^{l, δ} (t^{0}) \subseteq D_{j_{0}}

. Then,

D_{j_{0}} = {[t^{0}]}_{d}^{l}

.

t^{0} \notin \underset{̲}{R_{P}^{l, δ}} (D_{j_{0}})

implies that

{[t^{0}]}_{P}^{l, δ} ⊈ D_{j_{0}}

. Thus,

{[t^{0}]}_{P}^{l, δ} ⊈ {[t^{0}]}_{d}^{l}

. So,

R_{P}^{l, δ} ⊈ R_{d}^{l}

. This is a contradiction.

Hence,

\forall j

,

\underset{̲}{R_{A}^{l, δ}} (D_{j}) \subseteq \underset{̲}{R_{P}^{l, δ}} (D_{j}) .

Obviously,

\forall j

,

\underset{̲}{R_{A}^{l, δ}} (D_{j}) \supseteq \underset{̲}{R_{P}^{l, δ}} (D_{j}) .

Then,

\forall j

,

\underset{̲}{R_{A}^{l, δ}} (D_{j}) = \underset{̲}{R_{P}^{l, δ}} (D_{j}) .

Thus,

P O S_{A}^{l, δ} (d) = ⋃_{j = 1}^{r} \underset{̲}{R_{A}^{l, δ}} (D_{j}) = ⋃_{j = 1}^{r} \underset{̲}{R_{P}^{l, δ}} (D_{j}) = P O S_{P}^{l, δ} (d) .

Hence,

P \in c o_{λ, δ}^{p} (A)

.

□

Corollary 3.

For a δ-consistent p-MSVDIS

(T, A, V, f, d)

, let

λ = \frac{| T^{u} |}{| T |}

. The following results can be deduced from each other:

(1)

P \in r e d_{λ, δ}^{p} (A)

;

(2)

I M_{λ, δ}^{(2)} (P) = 1

and

\forall a \in P

,

I M_{λ, δ}^{(2)} (P - {a}) < 1

.

5.2. Semi-Supervised Attribute Selection Algorithms in a p-MSVDIS

Next, the proposed two UMs are used to devise algorithms for semi-supervised attribute selection in a p-MSVDIS (Algorithms 1 and 2). Mathematics 13 01318 i001

Since the time and space complexity of these two algorithms are the same, only one algorithm is discussed. These two algorithms are encoded in matrix format to obtain faster results in practical applications. Starting from the second step, the binary relations calculated on T are expressed in the form of matrices. Therefore, the time complexity of step 2 is

O (| A | | T |^{2})

. Steps 3–8 are

O (| A | {| T |}^{2} + (| A | - 1) {| T |}^{2} + \dots, + | P | {| T |}^{2})

. All the complexity of SARM1 is

O ((\frac{{| A |}^{2}}{2} + \frac{| A |}{2} - \frac{{| P |}^{2}}{2} + \frac{| P |}{2}) | T |^{2})

. After removing the low-order and constant terms, the complexity is

{O ((| A |}^{2} - {| P |}^{2} {) | T |}^{2})

. The spatial complexity of SARM1 is

O (| A | {| T |}^{2})

.

In the algorithms’ flow, the third step establishes the loop execution condition: The iteration operation is continuously performed while

I M_{λ, δ}^{(1, 2)} (P) \leq 1 - 5 %

. During each cycle, an effective attribute is selected. The 5% here serves as a key parameter, which is used to control the final scale of selected attributes. If the number of filtered attributes is found to be excessive, this parameter value can be appropriately increased (e.g., from 5% to 10% or higher) to reduce the number of cycles, thereby effectively limiting the attribute count. Conversely, when the selected attributes are insufficient or the model’s classification accuracy is unsatisfactory, the parameter can be adjusted downward to 0%, allowing more feature attributes to be obtained through additional iterations.

6. Experimental Analysis

Many experiments conducted to illustrate the effectiveness of SARM1 and SARM2 are described in this section.

6.1. Numerical Experiment

To test the algorithm’s effectiveness on real datasets, we utilized SARM1 and SARM2, comparing them against existing algorithms. The experiments were conducted on a Lenovo computer equipped with an Intel(R) Core(TM) i7-9700 CPU running at 3.00 GHz, with 16 GB of memory and a Windows 10 operating system. MATLAB 2019 and SPSS 2018 were the software platforms used for computation. Table 4 presents the 11 datasets selected from the UCI Machine Learning Repository [39] for experimental analysis. Since it is difficult to find set-valued datasets in this database, we made some adjustments to the data. Real value datasets were rounded, and all datasets were randomly incomplete, missing 10% of the information values. This was set so that when incomplete information was encountered, it could be considered an MSVDIS (see Example 2). Due to the combined limitations of excessive sample sizes in the 11th dataset Con and insufficient computer memory, the algorithm’s execution is rendered unfeasible. To address this, repeated random sampling is adopted, whereby 50% of the samples are systematically selected for each experimental analysis.

Table 4. Basic information about the datasets.

SARM1 and SARM2 both have two variables that need to be given in advance: One is

δ

, and the other is

λ

.

λ

represents the rate of incomplete labels. For convenient calculation, 20% of labels were randomly incomplete (

λ = 20 %

). We need to observe how a change in

δ

affected the results. In order to study whether the attribute subsets can still maintain the classification accuracy of the original datasets, three classifiers, Bagged Trees (BT), Support Vector Machine (SVM), and K-nearest neighbors (KNN, with K = 5), are used for the accuracy analysis of the subset. The five-fold cross-validation method is applied for classification, and the average value is taken after the program ran 10 times. The relationship curves between the change in

δ

and the classification accuracy with three classifiers are shown in Figure 2. It can be seen that the curves of each subgraph are very oscillatory, which indicates that

δ

has a considerable impact on the subset and classification accuracy. Only the curves of PS and Aud are relatively smooth. According to Figure 2, we know how to choose

δ

to achieve the maximum accuracy for each dataset.

Figure 2. The correlation between

δ

and accuracy.

Next, we need to verify whether SARM1 and SARM2 are more effective than other similar algorithms. Five algorithms are selected from some references for comparison with SARM1 and SARM2. Given the absence of directly comparable studies on p-MSVDIS, semi-supervised attribute selection algorithms and their related counterparts are selected for comparative analysis. The compared algorithms fall into three categories: (1) FSRS [15] and FSDIS [40] algorithms designed for set-valued information systems; (2) SADA [41] and FSFS [20] attribute selection algorithms specifically developed for p-MSVDIS; (3) the Semi2MNR [42] algorithm, a semi-supervised feature selection method based on the minimum neighborhood redundancy and maximum neighborhood relevance criteria.

All comparative algorithms are reimplemented and experimentally validated. The number of selected attributes by each algorithm is summarized in Table 5. The numbers in bold are the optimal values. The average number of attributes calculated by FSRS is the least, while that of SARM2 is the most. However, the PW dataset in FSRS displays 30, which is the same as the number of attributes in the original dataset, indicating that the algorithm is invalid for the dataset. Similarly, the Spa and Con datasets of FSRS are “—”, which means that there are no results. Because the computer ran out of memory when calculating the datasets, it could not continue. The average number of attributes of SARM1 and SARM2 is 7.64 and 8.27, respectively. Although the values are not excellent, they are close to those of other algorithms, and the average value of SARM1 is close to the best seven in FSRS. Consequently, we conclude that the attribute subset effect of the two proposed algorithms is acceptable.

Table 5. Number of selected attributes.

Then, three classifiers are used for accuracy analysis based on the reduced subsets. Table 6, Table 7 and Table 8 present the comparative results of each algorithm under three different classifiers. Bold font indicates the optimal value. It can be seen from Table 6, Table 7 and Table 8 that SARM1 performed the best in a total of 18 out of these 33 experiments, while SARM2 performed the best 5 times. Notably, SARM1 outperformed all competitors, with sustained accuracy rates of 0.8518, 0.8015, and 0.8131 in the three classification frameworks.

Table 6. Assessing the accuracy of classification with BT.

Table 7. Assessing the accuracy of classification with SVM.

Table 8. Assessing the accuracy of classification with KNN.

The experiments indicate that SARM1 in a p-MSVDIS outperforms the other algorithms in terms of accuracy. Of course, it is not enough to analyze the accuracy only; the Receiver Operating Characteristic (ROC) curve and Area Under Curve (AUC) must also be analyzed [43]. The ROC curve unites specificity and sensitivity via a graphic method, accurately reflecting their relationship. It is a comprehensive representative of classification accuracy. The ROC curve is a graph with the false-positive rate (

F P R

) on the horizontal axis and the true-positive rate (

T P R

) on the vertical axis. As the ROC curve approaches the upper left corner, the model’s performance improves because it indicates that a higher true-positive rate can be achieved while maintaining a low false-positive rate. The AUC represents the area under the ROC curve, serving as a quantitative measure of classifier performance. A higher AUC indicates superior classification performance. The ROC curves are depicted, accompanied by the calculated AUC values, in Figure 3 and Figure 4 and Table 9 and Table 10.

Figure 3. ROC curve under classifier BT.

Figure 4. ROC curve under classifier SVM.

Table 9. AUC under classifier BT.

Table 10. AUC under classifier SVM.

In Figure 3 and Figure 4, the red and blue lines represent SARM1 and SARM2, respectively. It can be seen that the red and blue lines in most of the subgraphs are closer to the upper horizontal border, which indicates that the algorithm in this study is significantly better. In subplots ACA and MB of Figure 3, the red and blue lines are not more convex relative to the frame than other lines, which is consistent with the calculated classification accuracy. The four curves in subplot PS of Figure 3 are overlapped, which shows that the classification effect of the four algorithms is very good. In the subplot OLD of Figure 4, the black, green, and magenta curves are concave, which indicates that the attribute subsets obtained by these three algorithms are poor and generate bad classification accuracy. Table 9 and Table 10 record the areas enclosed by the curves and x-axis in Figure 3 and Figure 4. As evident from the results, SARM1 consistently achieves optimal average AUC scores of 0.8855 and 0.8213 across different experimental conditions. By comparing Table 6 and Table 7 and Table 9 and Table 10, we find that the classification accuracy ranking of each algorithm aligns closely with the corresponding AUC value. This confirms that SARM1 and SARM2 do not suffer from significantly reduced accuracy due to the uneven distribution of samples.

Subsequently, computational time is adopted as an additional metric to evaluate algorithmic efficiency. Since FSRS failed to complete validation on datasets PW, Spa, and Con, the remaining available datasets are ultimately utilized for comparative efficiency analysis. As presented in Table 11 (unit: seconds), SARM1 demonstrated optimal time efficiency with an average execution time of merely 2.7914 s. SARM2 ranked second at 3.8629 s, while FSFS required substantially longer computation times (79.1526 s). These results confirm that SARM1 and SARM2 realize the dual optimization of classification accuracy and computational efficiency.

Table 11. Comparison of computational speed (unit: seconds).

6.2. Statistical Analysis

Statistical analyses of the aforementioned results are described in this section. Further studies are required to determine whether SARM1 and SARM2 are more effective than other algorithms of this type. The Friedman test is applied to test the differences among the seven algorithms. The results of ranking the data in Table 6, Table 7 and Table 8 are shown in Table 12, Table 13 and Table 14. The data in Table 12, Table 13 and Table 14 were inputted into SPSS software for the Friedman test. Table 15 presents the calculation results. According to the data obtained by the three classifiers, the calculated p values are 0.0001, 0.00002, and 0.00006. When significance level is

α = 0.05

, then

α = 0.05 > 0.0001

,

α = 0.05 > 0.00002

, and

α = 0.05 > 0.00006

. This indicates that significant differences exist among the seven algorithms.

Table 12. The ranking of classification accuracies with BT.

Table 13. The ranking of classification accuracies with SVM.

Table 14. The ranking of classification accuracies with KNN.

Table 15. Friedman test of experimental results under three classifiers.

Given the significant differences among these seven algorithms, it is necessary to conduct further tests to determine which performed best. The Nemenyi test is used for the post hoc test, which is required to calculate the critical range

C D

of the difference between the average order values. The domain value

C D

is calculated as follows:

C D = q_{α} \sqrt{\frac{k (k + 1)}{6 N}}

. Let

α = 0.05

; then, from Tukey’s q table, we have

q_{α} = 2.949

. If seven algorithms and eleven datasets are known, then

k = 7, N = 11

, and

C D = 2.716

. The posterior results of these algorithms are plotted (see Figure 5). Subfigures (a), (b), and (c) in Figure 5 present the statistical analysis results based on the BT, SVM, and KNN classifiers, respectively. Figure 5 shows that the red and blue lines exhibit closer proximity to the y-axis compared to the other lines, indicating that the experimental results of SARM1 and SARM2 are better.

Figure 5. Nemenyi test of experimental results under three classifiers.

From Figure 5a, the following conclusions are obtained:

(a): The classification accuracy of SARM1 is significantly superior to SADA, Semi2MNR, FSRS, and FSDIS;
(b): In terms of statistics, there is no significant difference among SARM1, SARM2, and FSFS;
(c): There is no obvious difference among SARM2, FSFS, SADA, Semi2MNR, FSRS, and FSDIS.

From Figure 5b, we also obtain the following conclusions:

(a): The classification accuracy of SARM1 is better than that of FSDIS, SADA, and FSRS;
(b): SARM2 is significantly better than SADA and FSRS;
(c): In terms of statistics, there is no significant difference among SARM1, SARM2, Semi2MNR, FSFS, and FSDIS.

From Figure 5c, we also obtain the following conclusions:

(a): The classification accuracy of SARM1 is better than that of SADA, FSRS, Semi2MNR, and FSDIS;
(b): In terms of statistics, there is no significant difference among SARM1, SARM2, and FSFS;
(c): There is no significant difference among SARM2, FSFS, SADA, FSRS, Semi2MNR, and FSDIS.

Obviously, SARM1 realizes the best performance in the experiments under the three classifiers. Although the differences between SARM1, SARM2, Semi2MNR, and FSFS are statistically nonsignificant, SARM1 and SARM2 always rank first and second on average. In summary, we conclude that SARM1 and SARM2 are superior to other comparison algorithms.

6.3. Parameter Analysis

This section describes another parameter,

λ

, in SARM1 and SARM2.

λ

represents the rate of incomplete labels, and

λ = 20 %

in the above experiments. In many real datasets,

λ

cannot be fixed to a value, and sometimes, only a small portion of labels is lost. We need to continue conducting experiments to explore the impact of

λ

on the attribute subsets. Given the initial condition, let

δ = 0.6

in the algorithms; the reduced subsets are tested using the BT classifier. Let the incomplete rate

λ = 0.1, 0.2, \dots, 0.9

, and the step size is 0.1. After executing the algorithms repeatedly, the reduced subsets are found on each

λ

value. The BT classifier is used to evaluate the subsets. Figure 6 displays the results. The red line indicates SARM1, and the blue line indicates SARM2. It can be seen from Figure 6 that the curves in most subgraphs are relatively stable and remain within a certain range except where they fluctuate slightly in the subgraph (Spa) and (MB). This indicates that no matter what value

λ

takes, the attribute subsets generated by SARM1 and SARM2 have little influence on the classification accuracy. Consequently, we can conclude that SARM1 and SARM2 exhibit relatively stable characteristics and effectiveness in a p-MSVDIS.

Figure 6. The correlation between

λ

and accuracy with BT.

7. Conclusions

In this study, we presented semi-supervised attribute selection algorithms in a p-MSVDIS. First, a p-MSVDIS was divided into two multiset-valued decision information systems: l-MSVDIS and u-MSVDIS. Two uncertainty measurements on an attribute subset were then established using the concepts of indistinguishable relations, distinguishable relations, and dependence functions. According to the two proposed measurements, some theories and properties were proven, and two algorithms, namely SARM1 and SARM2, were provided for attribute selection. The incomplete label rate

λ = 20 %

so as to evenly divide the system into two parts: l-MSVDIS and u-MSVDIS. In the experiment, five other algorithms of the same type were selected for comparison with SARM1 and SARM2, and 11 datasets, which were selected from UCI, were applied to examine the algorithms’ validity.

This study systematically evaluated SARM1 and SARM2 across 11 diverse datasets. The experimental results demonstrated that both algorithms achieved statistically significant improvements in computational efficiency (exhibiting the shortest average runtime) while simultaneously maintaining superior feature selection quality, outperforming baseline methods in classification accuracy across BT, SVM, and KNN classifiers (p < 0.05). Notably, SARM1 and SARM2 exhibited robust performance stability, as evidenced by consistently high AUC values during cross-dataset validation. These findings collectively indicate that the proposed algorithms possess inherent advantages in feature discriminability and model generalizability. Crucially, their enhanced computational efficiency was attained without compromising selection quality, demonstrating remarkable robustness against both data distribution variations and classifier selection.

While SARM1 and SARM2 demonstrate notable advantages in computational efficiency and accuracy, three key limitations warrant discussion: (1) Memory consumption scales linearly with sample size, potentially causing overflow when processing ultra-large datasets. (2) The parameter

δ

lacks theoretical selection criteria, requiring exhaustive grid search that increases time complexity. (3) Performance degradation occurs with low-quality input data, necessitating additional preprocessing modules. These identified constraints establish clear directions for future optimization, particularly in developing memory-efficient data streaming and automated hyperparameter tuning frameworks.

The algorithms discussed herein could benefit substantially from parallelization, potentially improving computational efficiency and reducing memory usage. However, their implementation would require careful management of output handling in parallel environments. We identify this as an important research direction worthy of methodical exploration in future studies. In future work, we plan to integrate approximation algorithms with parallel computing techniques to develop feature selection algorithms that achieve higher computational speed and lower memory consumption.

Author Contributions

Methodology and writing—original draft, Y.H.; methodology, editing, and investigation, J.H.; experiment and programming, H.L.; editing and investigation, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Innovative Research on VR+Online Open Course–Research on the Model of Guangke Ideological and Political Course (2022ZXKC519), Doctoral Research of Guangdong University of Science and Technology (GKY-2024BSQDK-11), and Science Foundation in Guangdong University of Science and Technology (GKY-023KYZDK-1).

Informed Consent Statement

The data used or analyzed during the current study are available from the corresponding author after the paper is accepted for publication.

Data Availability Statement

The data presented in this study are openly available in Machine Learning Repository http://archive.ics.uci.edu/datasets, accessed on 20 January 2024.

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Dai, J.H.; Xu, Q.; Wang, W.T.; Tian, H.W. Conditional entropy for incomplete decision systems and its application in data mining. Int. J. Gen. Syst. 2012, 41, 713–728. [Google Scholar] [CrossRef]
Hempelmann, C.F.; Sakoglu, U.; Gurupur, V.P.; Jampana, S. An entropy-based evaluation method for knowledge bases of medical information systems. Expert Syst. Appl. 2016, 46, 262–273. [Google Scholar] [CrossRef]
Wang, P.; Zhang, P.F.; Li, Z.W. A three-way decision method based on Gaussian kernel in a hybrid information system with images: An application in medical diagnosis. Appl. Soft Comput. 2019, 77, 734–749. [Google Scholar] [CrossRef]
Navarrete, J.; Viejo, D.; Cazorla, M. Color smoothing for RGB-D data using entropy information. Appl. Soft Comput. 2016, 46, 361–380. [Google Scholar] [CrossRef]
Cament, L.A.; Castillo, L.E.; Perez, J.P.; Galdames, F.J.; Perez, C.A. Fusion of local normalization and Gabor entropy weighted features for face identification. Pattern Recognit. 2014, 47, 568–577. [Google Scholar] [CrossRef]
Swiniarski, R.W.; Skowron, A. Rough set methods in feature selection and recognition. Pattern Recognit. Lett. 2003, 24, 833–849. [Google Scholar] [CrossRef]
Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
Guan, Y.Y.; Wang, H.K. Set-valued information systems. Inf. Sci. 2006, 176, 2507–2525. [Google Scholar] [CrossRef]
Liang, J.Y.; Qian, Y.H. Information granules and entropy theory in information systems. Sci. China (Ser. F) 2008, 51, 1427–1444. [Google Scholar] [CrossRef]
Qian, Y.H.; Liang, J.Y.; Wu, W.Z.; Dang, C.Y. Information granularity in fuzzy binary GrC model. IEEE Trans. Fuzzy Syst. 2011, 19, 253–264. [Google Scholar] [CrossRef]
Dai, J.H.; Wang, W.T.; Xu, Q. An uncertainty measure for incomplete decision tables and its applications. IEEE Trans. Cybern. 2013, 43, 1277–1289. [Google Scholar] [CrossRef]
Zhang, P.; Li, T.; Wang, G.; Luo, C.; Chen, H. Multi-source information fusion based on rough set theory: A review. Inf. Fusion 2021, 68, 85–117. [Google Scholar] [CrossRef]
Yang, L.; Zhang, X.; Xu, W.; Sang, B. Multi-granulation rough sets and uncertainty measurement for multi-source fuzzy information system. Int. J. Fuzzy Syst. 2019, 21, 1919–1937. [Google Scholar] [CrossRef]
Hu, Q.H.; Yu, D.R.; Liu, J.; Wu, C. Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 2008, 178, 3577–3594. [Google Scholar] [CrossRef]
Singh, S.; Shreevastava, S.; Som, T.; Somani, G. A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems. Soft Comput. 2020, 24, 4675–4691. [Google Scholar] [CrossRef]
Wang, Y.B.; Chen, X.J.; Dong, K. Attribute reduction via local conditional entropy. Int. J. Mach. Learn. Cybern. 2019, 10, 3619–3634. [Google Scholar] [CrossRef]
Dai, J.H.; Hu, H.; Zheng, G.J.; Hu, Q.H.; Han, H.F.; Shi, H. Attribute reduction in interval-valued information systems based on information entropies. Front. Inf. Technol. Electron. Eng. 2016, 17, 919–928. [Google Scholar] [CrossRef]
Wang, C.Z.; Huang, Y.; Shao, M.W.; Hu, Q.H.; Chen, D.G. Feature selection based on neighborhood self-information. IEEE Trans. Cybern. 2020, 50, 4031–4042. [Google Scholar] [CrossRef]
Dai, J.H.; Liu, Q. Semi-supervised attribute reduction for interval data based on misclassification cost. Int. J. Mach. Learn. Cybern. 2022, 13, 1739–1750. [Google Scholar] [CrossRef]
Liu, K.Y.; Yang, X.B.; Yu, H.L.; Mi, J.S.; Wang, P.X. Rough set based semi-supervised feature selection via ensemble selector. Knowl.-Based Syst. 2019, 165, 282–296. [Google Scholar] [CrossRef]
Kim, K. An improved semi-supervised dimensionality reduction using feature weighting: Application to sentiment analysis. Expert Syst. Appl. 2018, 109, 49–65. [Google Scholar] [CrossRef]
Li, Z.; Tang, J. Semi-supervised local feature selection for data classification. Sci. China (Ser. F) 2021, 64, 192108. [Google Scholar] [CrossRef]
Zhang, W.; Miao, D.Q.; Gao, C.; Li, F. Semi-supervised attribute reduction based on rough-subspace ensemble learning. J. Chin. Comput. Syst. 2016, 37, 2727–2732. [Google Scholar]
Ma, M.H.; Deng, T.Q.; Wang, N.; Chen, Y.M. Semi-supervised rough fuzzy Laplacian Eigenmaps for dimensionality reduction. Int. J. Mach. Learn. Cybern. 2019, 10, 397–411. [Google Scholar] [CrossRef]
Han, Y.H.; Yang, Y.; Yan, Y.; Ma, Z.G.; Zhou, X.F. Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 252–264. [Google Scholar]
Moslemi, A.; Ahmadian, A. Subspace learning for feature selection via rank revealing QR factorization: Fast feature selection. Expert Syst. Appl. 2024, 256, 124919. [Google Scholar] [CrossRef]
Bohrer, J.D.S.; Dorn, M. Enhancing classification with hybrid feature selection: A multi-objective genetic algorithm for high-dimensional data. Expert Syst. Appl. 2024, 255, 124518. [Google Scholar] [CrossRef]
Sheikhpour, R.; Mohammadi, M.; Berahmand, K.; Movahed, F.S.; Khosravi, H. Robust semi-supervised multi-label feature selection based on shared subspace and manifold learning. Inf. Sci. 2025, 699, 121800. [Google Scholar] [CrossRef]
Sheikhpour, R.; Berahmand, K.; Mohammadi, M.; Khosravi, H. Sparse feature selection using hypergraph Laplacian-based semi-supervised discriminant analysis. Pattern Recognit. 2025, 157, 110882. [Google Scholar] [CrossRef]
Liu, K.Y.; Yang, X.B.; Ding, W.P.; Ju, H.R.; Li, T.R.; Wang, J.; Yin, T.Y. A survey on rough feature selection: Recent advances and challenges. IEEE/CAA J. Autom. Sin. 2025, 12, 125231. [Google Scholar]
Saberi-Movahed, F.; Berahman, K.; Sheikhpour, R.; Li, Y.F.; Pan, S. Nonnegative matrix factorization in dimensionality reduction: A survey. arXiv 2024, arXiv:2405.03615. [Google Scholar]
Yao, Y.Y. Information granulation and rough set approximation. Int. J. Intell. Syst. 2001, 16, 87–104. [Google Scholar] [CrossRef]
Miyamoto, S. Information clustering based on fuzzy multisets. Inf. Process. Manag. 2003, 39, 195–213. [Google Scholar] [CrossRef]
Zhao, X.R.; Hu, B.Q. Three-way decisions with decision-theoretic rough sets in multiset-valued information tables. Inf. Sci. 2020, 507, 684–699. [Google Scholar] [CrossRef]
Jena, S.P.; Ghosh, S.K.; Tripathy, B.K. On the theory of bags and lists. Inf. Sci. 2001, 132, 241–254. [Google Scholar] [CrossRef]
Nikulin, M.S. Hellinger distance, in Hazewinkel, Michiel, Encyclopedia of Mathematics; Springer Science: Berlin/Heidelberg, Germany, 2001; ISBN 978–1-55608-010-4. [Google Scholar]
Huang, D.; Lin, H.; Li, Z.W. Information structures in a multiset-valued information system with application to uncertainty measurement. J. Intell. Fuzzy Syst. 2022, 43, 7447–7469. [Google Scholar] [CrossRef]
Kryszkiewicz, M. Rules in incomplete information systems. Inf. Sci. 1999, 113, 271–292. [Google Scholar] [CrossRef]
UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/datasets (accessed on 20 January 2024).
Ahmed, W.; Sufyan, M.M.B.; Ahmad, T. Entropy based feature selection for fuzzy set-valued information systems. 3D Res. 2018, 9, 1–17. [Google Scholar] [CrossRef]
Zhong, W.C.; Chen, X.J.; Nie, F.P.; Huang, J.Z. Adaptive discriminant analysis for semi-supervised feature selection. Inf. Sci. 2021, 566, 178–194. [Google Scholar] [CrossRef]
Qian, D.; Liu, K.; Zhang, S.; Yang, X. Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy. Appl. Intell. 2024, 54, 7750–7764. [Google Scholar] [CrossRef]
Narkhede, S. Understanding auc-roc curve. Towards Data Sci. 2018, 26, 220–227. [Google Scholar]

Figure 1. Flow chart of research.

Figure 2. The correlation between

δ

and accuracy.

Figure 3. ROC curve under classifier BT.

Figure 4. ROC curve under classifier SVM.

Figure 5. Nemenyi test of experimental results under three classifiers.

Figure 6. The correlation between

λ

and accuracy with BT.

Table 1. An IDIS

(T, A, V, f, d)

.

Table 1. An IDIS

(T, A, V, f, d)

.

T	$a_{1}$	$a_{2}$	$a_{3}$	d
$t_{1}$	Sick	True	High	Flu
$t_{2}$	Sick	True	Low	Flu
$t_{3}$	Middle	∗	Normal	Flu
$t_{4}$	No	True	Normal	Flu
$t_{5}$	∗	True	Normal	Rhinitis
$t_{6}$	Middle	False	∗	Rhinitis
$t_{7}$	No	False	Low	Health
$t_{8}$	No	∗	∗	Health
$t_{9}$	∗	True	Low	Health

Table 2. An MSVDIS

(T, A, V, f, d)

.

Table 2. An MSVDIS

(T, A, V, f, d)

.

T	$a_{1}$	$a_{2}$	$a_{3}$	d
$t_{1}$	${1 / S i c k, 0 / M i d d l e, 0 / N o}$	${1 / T r u e, 0 / F a l s e}$	${1 / H i g h, 0 / N o r m a l, 0 / L o w}$	Flu
$t_{2}$	${1 / S i c k, 0 / M i d d l e, 0 / N o}$	${1 / T r u e, 0 / F a l s e}$	${0 / H i g h, 0 / N o r m a l, 1 / L o w}$	Flu
$t_{3}$	${0 / S i c k, 1 / M i d d l e, 0 / N o}$	${5 / T r u e, 2 / F a l s e}$	${0 / H i g h, 1 / N o r m a l, 0 / L o w}$	Flu
$t_{4}$	${0 / S i c k, 0 / M i d d l e, 1 / N o}$	${1 / T r u e, 0 / F a l s e}$	${0 / H i g h, 1 / N o r m a l, 0 / L o w}$	Flu
$t_{5}$	${2 / S i c k, 2 / M i d d l e, 3 / N o}$	${1 / T r u e, 0 / F a l s e}$	${0 / H i g h, 1 / N o r m a l, 0 / L o w}$	Rhinitis
$t_{6}$	${0 / S i c k, 1 / M i d d l e, 0 / N o}$	${0 / T r u e, 1 / F a l s e}$	${1 / H i g h, 3 / N o r m a l, 3 / L o w}$	Rhinitis
$t_{7}$	${0 / S i c k, 0 / M i d d l e, 1 / N o}$	${0 / T r u e, 1 / F a l s e}$	${0 / H i g h, 0 / N o r m a l, 1 / L o w}$	Flu
$t_{8}$	${0 / S i c k, 0 / M i d d l e, 1 / N o}$	${5 / T r u e, 2 / F a l s e}$	${1 / H i g h, 3 / N o r m a l, 3 / L o w}$	Health
$t_{9}$	${2 / S i c k, 2 / M i d d l e, 3 / N o}$	${1 / T r u e, 0 / F a l s e}$	${0 / H i g h, 0 / N o r m a l, 1 / L o w}$	Health

Table 3. A p-MSVDIS

(T, A, V, f, d)

.

Table 3. A p-MSVDIS

(T, A, V, f, d)

.

T	$a_{1}$	$a_{2}$	$a_{3}$	d
$t_{1}$	${1 / S i c k, 0 / M i d d l e, 0 / N}$	${1 / T r u e, 0 / F a l s e}$	${1 / H i g h, 0 / N o r m a l, 0 / L o w}$	Flu
$t_{2}$	${1 / S i c k, 0 / M i d d l e, 0 / N o}$	${1 / T r u e, 0 / F a l s e}$	${0 / H i g h, 0 / N o r m a l, 1 / L o w}$	Flu
$t_{3}$	${0 / S i c k, 1 / M i d d l e, 0 / N o}$	${5 / T r u e, 2 / F a l s e}$	${0 / H i g h, 1 / N o r m a l, 0 / L o w}$	⋄
$t_{4}$	${0 / S i c k, 0 / M i d d l e, 1 / N o}$	${1 / T r u e, 0 / F a l s e}$	${0 / H i g h, 1 / N o r m a l, 0 / L o w}$	Flu
$t_{5}$	${2 / S i c k, 2 / M i d d l e, 3 / N o}$	${1 / T r u e, 0 / F a l s e}$	${0 / H i g h, 1 / N o r m a l, 0 / L o w}$	Rhinitis
$t_{6}$	${0 / S i c k, 1 / M i d d l e, 0 / N o}$	${0 / T r u e, 1 / F a l s e}$	${1 / H i g h, 3 / N o r m a l, 3 / L o w}$	Rhinitis
$t_{7}$	${0 / S i c k, 0 / M i d d l e, 1 / N o}$	${0 / T r u e, 1 / F a l s e}$	${0 / H i g h, 0 / N o r m a l, 1 / L o w}$	Flu
$t_{8}$	${0 / S i c k, 0 / M i d d l e, 1 / N o}$	${5 / T r u e, 2 / F a l s e}$	${1 / H i g h, 3 / N o r m a l, 3 / L o w}$	Health
$t_{9}$	${2 / S i c k, 2 / M i d d l e, 3 / N o}$	${1 / T r u e, 0 / F a l s e}$	${0 / H i g h, 0 / N o r m a l, 1 / L o w}$	⋄

Table 4. Basic information about the datasets.

ID	Dataset	Logogram	Sample	Attribute	Class
1	Arrhythmia	Arr	452	279	16
2	Audit risk	Aud	776	26	2
3	Australian Credit Approval	ACA	690	14	2
4	Diabetic Retinopathy Debrecen	DRD	1151	19	2
5	Ozone-Level Detection	OLD	2534	73	2
6	Parkinson Speech	PS	1040	26	2
7	Phishing Websites	PW	2456	30	2
8	Image Segmentation	IS	2310	19	7
9	Spambase	Spa	4601	57	2
10	Molecular Biology	MB	3910	61	3
11	Connect-4	Con	67,557	42	3

Table 5. Number of selected attributes.

Dataset	Raw	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	279	3	3	5	5	5	4	2
Aud	26	1	5	6	5	6	4	4
ACA	14	3	6	6	5	4	3	5
DRD	19	3	6	6	4	8	4	3
OLD	73	4	5	6	6	9	5	5
PS	26	1	6	5	6	5	5	6
PW	30	30	20	18	17	10	17	17
IS	19	4	7	8	7	8	6	7
Spa	57	–	7	9	8	9	15	16
MB	61	14	10	9	9	11	10	16
Con	42	–	15	9	7	6	11	10
Average	58.73	7	7.73	7.91	7.18	7.36	7.64	8.27

Table 6. Assessing the accuracy of classification with BT.

Dataset	Raw Data	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	0.6925	0.4115	0.4270	0.5819	0.5509	0.5420	0.5819	0.5941
Aud	0.9936	0.9948	0.7513	0.9601	0.9803	0.9510	0.9987	0.9472
ACA	0.8014	0.6232	0.7014	0.8414	0.8406	0.6362	0.7188	0.7667
DRD	0.4961	0.5169	0.3970	0.5222	0.5291	0.5613	0.6655	0.6299
OLD	0.9428	0.9140	0.8327	0.9238	0.9219	0.9183	0.9388	0.9357
PS	1	0.9825	0.9888	0.6587	0.9908	0.9721	1	0.9998
PW	0.9507	0.9552	0.9572	0.8583	0.9002	0.9507	0.9617	0.9581
IS	0.9623	0.9316	0.5442	0.9355	0.9272	0.9368	0.9606	0.9195
Spa	0.9211	0	0.8439	0.8213	0.8205	0.9144	0.9344	0.8496
MB	0.8398	0.8210	0.7721	0.6793	0.9194	0.7705	0.8903	0.9075
Con	0.7105	0	0.6757	0.7033	0.6750	0.6917	0.7190	0.7163
Average	0.8464	0.6501	0.7174	0.7714	0.8233	0.8041	0.8518	0.8386

Table 7. Assessing the accuracy of classification with SVM.

Dataset	Raw Data	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	0.5420	0.4823	0.5398	0.5398	0.5225	0.5398	0.5730	0.5420
Aud	0.9704	0.6997	0.5979	0.7796	0.8453	0.8827	0.9088	0.8557
ACA	0.8565	0.6043	0.6725	0.8565	0.8536	0.6493	0.7478	0.7507
DRD	0.5256	0.5248	0.4387	0.4231	0.5265	0.5673	0.5795	0.5308
OLD	0.9365	0.9330	0.8465	0.8465	0.9248	0.9345	0.9369	0.9360
PS	0.9971	0.9250	0.9788	0.5827	0.9788	0.9865	0.9856	0.9962
PW	0.9491	0.9428	0.9483	0.9084	0.9292	0.9173	0.9540	0.9495
IS	0.9390	0.8043	0.4532	0.8394	0.8961	0.8887	0.9195	0.8491
Spa	0.9302	0	0.7440	0.7783	0.8572	0.8579	0.8970	0.8848
MB	0.5154	0.6890	0.7129	0.5188	0.7505	0.5730	0.6433	0.706
Con	0.6620	0	0.6633	0.6583	0.6597	0.6610	0.6710	0.7406
Average	0.8022	0.6005	0.6905	0.7029	0.7949	0.7689	0.8015	0.7947

Table 8. Assessing the accuracy of classification with KNN.

Dataset	Raw Data	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	0.5686	0.5066	0.4845	0.4115	0.5730	0.5465	0.5530	0.5486
Aud	0.9704	0.9175	0.8840	0.9639	0.9897	0.9046	0.9459	0.9240
ACA	0.8406	0.6072	0.6623	0.8377	0.8333	0.6101	0.7137	0.7246
DRD	0.5934	0.5222	0.5543	0.6299	0.6690	0.6142	0.6681	0.6299
OLD	0.9325	0.9333	0.9321	0.9325	0.9294	0.9309	0.9353	0.9369
PS	0.9635	0.9904	0.9894	0.5702	0.9942	0.9885	0.9923	0.9913
PW	0.9426	0.9426	0.9340	0.8905	0.9312	0.9283	0.9450	0.9393
IS	0.9299	0.8844	0.5039	0.8671	0.9056	0.9000	0.9429	0.9325
Spa	0.9078	0	0.7772	0.6846	0.8765	0.8450	0.8911	0.9031
MB	0.6524	0.6824	0.5194	0.5166	0.8025	0.5909	0.6909	0.6542
Con	0.6377	0	0.6563	0.6213	0.5543	0.5643	0.6657	0.6583
Average	0.8127	0.6351	0.7179	0.7205	0.8235	0.7658	0.8131	0.8039

Table 9. AUC under classifier BT.

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	0.6314	0.6413	0.5478	0.5628	0.6624	0.8256	0.7145
Aud	0.9543	0.9198	0.9958	0.9997	0.9809	0.9999	0.9652
ACA	0.6282	0.7769	0.9237	0.9089	0.7266	0.6923	0.7074
DRD	0.5588	0.6172	0.5435	0.5883	0.5350	0.7119	0.6802
OLD	0.5909	0.7037	0.6743	0.7257	0.7016	0.7414	0.7811
PS	1	1	0.5900	1	0.9479	1	1
PW	0	0.9913	0.9318	0.9875	0.9654	0.9921	0.9938
IS	0.9948	0.8044	0.9975	0.9991	0.9800	0.9999	0.9880
Spa	0	0.8683	0.8415	0.8056	0.8133	0.9247	0.8689
MB	0.9062	0.8888	0.6925	0.9821	0.7607	0.9058	0.8703
Con	0	0.7843	0.7589	0.6411	0.8266	0.9470	0.9485
Average	0.5695	0.8178	0.7725	0.8364	0.8091	0.8855	0.8653

Table 10. AUC under classifier SVM.

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	0.6192	0.5925	0.5370	0.5613	0.6160	0.7329	0.6666
Aud	0.9970	0.9231	0.9701	0.9684	0.9809	0.9596	0.9433
ACA	0.6282	0.7769	0.9237	0.9089	0.7141	0.6923	0.7074
DRD	0.5588	0.6172	0.5435	0.5883	0.6531	0.7119	0.6802
OLD	0.3871	0.4045	0.5438	0.3877	0.5860	0.6171	0.5355
PS	0.9365	0.9745	0.6195	0.9677	0.7426	0.9994	1
PW	0	0.9854	0.9578	0.9747	0.9400	0.9851	0.9852
IS	0.9319	0.7414	0.9987	0.9957	0.9651	0.9960	0.9284
Spa	0	0.8699	0.8505	0.9302	0.9176	0.9443	0.9340
MB	0.6404	0.8332	0.5854	0.8092	0.7607	0.6409	0.6947
Con	0	0.6556	0.5337	0.5573	0.5752	0.7551	0.6933
Average	0.5181	0.7613	0.7331	0.7863	0.7683	0.8213	0.7971

Table 11. Comparison of computational speed (unit: seconds).

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	44.2646	79.0389	9.5137	173.4523	2.0570	0.6715	1.2215
Aud	1.5412	2.2547	3.2974	39.0163	0.66725	0.2792	0.5720
ACA	0.9666	0.6114	2.3818	37.8134	0.32245	0.1488	0.1587
DRD	2.4309	4.1497	7.4338	26.0488	0.79814	0.3184	0.7071
OLD	132.5813	313.4274	50.0988	165.3208	8.3492	3.3065	3.5238
PS	3.2669	6.2606	5.7542	40.4705	6.3004	0.3036	1.0441
IS	42.6029	68.5449	27.2511	29.0793	3.9428	1.2997	4.0554
MB	301.1410	88.9425	73.0712	122.0196	11.9561	16.0032	19.6206
Average	66.0994	70.4038	22.3503	79.1526	4.2992	2.7914	3.8629

Table 12. The ranking of classification accuracies with BT.

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	7	6	2.5	4	5	2.5	1
Aud	2	7	4	3	5	1	6
ACA	7	5	1	2	6	4	3
DRD	6	7	5	4	3	1	2
OLD	6	7	3	4	5	1	2
PS	5	4	7	3	6	1	2
PW	4	3	7	6	5	1	2
IS	4	7	3	5	2	1	6
Spa	7	4	5	6	2	1	3
MB	4	5	7	1	6	3	2
Con	7	5	3	6	4	1	2

Table 13. The ranking of classification accuracies with SVM.

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	7	4	4	6	4	1	2
Aud	6	7	5	4	2	1	3
ACA	7	5	1	2	6	4	3
DRD	5	6	7	4	2	1	3
OLD	4	6.5	6.5	5	3	1	2
PS	6	4.5	7	4.5	2	3	1
PW	4	3	7	5	6	1	2
IS	6	7	5	2	3	1	4
Spa	7	6	5	4	3	1	2
MB	4	2	7	1	6	5	3
Con	7	3	6	5	4	2	1

Table 14. The ranking of classification accuracies with KNN.

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	5	6	7	1	4	2	3
Aud	5	7	2	1	6	3	4
ACA	7	5	1	2	6	4	3
DRD	7	6	3.5	1	5	2	3.5
OLD	3	5	4	7	6	2	1
PS	4	5	7	1	6	2	3
PW	2	4	7	5	6	1	3
IS	5	7	6	3	4	1	2
Spa	7	5	6	3	4	2	1
MB	3	6	7	1	5	2	4
Con	7	3	4	6	5	1	2

Table 15. Friedman test of experimental results under three classifiers.

Classifiers	Source	SS	df	MS	$χ^{2}$	p-Value
	Groups	126.32	6	21.05	27.11	0.0001
BT	Error	181.18	60	3.02
	Total	175	59
	Groups	145.23	6	24.2	31.43	0.00002
SVM	Error	159.77	60	2.66
	Total	305	76
	Groups	135.32	6	22.55	29.04	0.00006
KNN	Error	172.18	60	2.87
	Total	307.5	76

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Semi-Supervised Attribute Selection Algorithms for Partially Labeled Multiset-Valued Data

Abstract

1. Introduction

1.1. Research Background

1.2. Motivation and Contributions

1.3. Organization

2. Preliminaries

2.1. Multisets and Probability Distribution Sets

2.2. Multiset-Valued Decision Information Systems

3. A Partially Labeled Multiset-Valued Information System

3.1. The Definition of a p-MSVDIS

3.2. A Novel Distance Function in a p-MSVDIS

4. Importance in a p-MSVDIS

4.1. Type 1 Importance in a p-MSVDIS

4.2. Type 2 Importance in a p-MSVDIS

5. Semi-Supervised Attribute Selection in a p-MSVDIS

5.1. The Definition of Semi-Supervised Attribute Selection in a p-MSVDIS

5.2. Semi-Supervised Attribute Selection Algorithms in a p-MSVDIS

6. Experimental Analysis

6.1. Numerical Experiment

6.2. Statistical Analysis

6.3. Parameter Analysis

7. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	7	6	2.5	4	5	2.5	1
Aud	2	7	4	3	5	1	6
ACA	7	5	1	2	6	4	3
DRD	6	7	5	4	3	1	2
OLD	6	7	3	4	5	1	2
PS	5	4	7	3	6	1	2
PW	4	3	7	6	5	1	2
IS	4	7	3	5	2	1	6
Spa	7	4	5	6	2	1	3
MB	4	5	7	1	6	3	2
Con	7	5	3	6	4	1	2

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	7	4	4	6	4	1	2
Aud	6	7	5	4	2	1	3
ACA	7	5	1	2	6	4	3
DRD	5	6	7	4	2	1	3
OLD	4	6.5	6.5	5	3	1	2
PS	6	4.5	7	4.5	2	3	1
PW	4	3	7	5	6	1	2
IS	6	7	5	2	3	1	4
Spa	7	6	5	4	3	1	2
MB	4	2	7	1	6	5	3
Con	7	3	6	5	4	2	1

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	5	6	7	1	4	2	3
Aud	5	7	2	1	6	3	4
ACA	7	5	1	2	6	4	3
DRD	7	6	3.5	1	5	2	3.5
OLD	3	5	4	7	6	2	1
PS	4	5	7	1	6	2	3
PW	2	4	7	5	6	1	3
IS	5	7	6	3	4	1	2
Spa	7	5	6	3	4	2	1
MB	3	6	7	1	5	2	4
Con	7	3	4	6	5	1	2

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	7	6	2.5	4	5	2.5	1
Aud	2	7	4	3	5	1	6
ACA	7	5	1	2	6	4	3
DRD	6	7	5	4	3	1	2
OLD	6	7	3	4	5	1	2
PS	5	4	7	3	6	1	2
PW	4	3	7	6	5	1	2
IS	4	7	3	5	2	1	6
Spa	7	4	5	6	2	1	3
MB	4	5	7	1	6	3	2
Con	7	5	3	6	4	1	2

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	7	4	4	6	4	1	2
Aud	6	7	5	4	2	1	3
ACA	7	5	1	2	6	4	3
DRD	5	6	7	4	2	1	3
OLD	4	6.5	6.5	5	3	1	2
PS	6	4.5	7	4.5	2	3	1
PW	4	3	7	5	6	1	2
IS	6	7	5	2	3	1	4
Spa	7	6	5	4	3	1	2
MB	4	2	7	1	6	5	3
Con	7	3	6	5	4	2	1

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	5	6	7	1	4	2	3
Aud	5	7	2	1	6	3	4
ACA	7	5	1	2	6	4	3
DRD	7	6	3.5	1	5	2	3.5
OLD	3	5	4	7	6	2	1
PS	4	5	7	1	6	2	3
PW	2	4	7	5	6	1	3
IS	5	7	6	3	4	1	2
Spa	7	5	6	3	4	2	1
MB	3	6	7	1	5	2	4
Con	7	3	4	6	5	1	2

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	7	6	2.5	4	5	2.5	1
Aud	2	7	4	3	5	1	6
ACA	7	5	1	2	6	4	3
DRD	6	7	5	4	3	1	2
OLD	6	7	3	4	5	1	2
PS	5	4	7	3	6	1	2
PW	4	3	7	6	5	1	2
IS	4	7	3	5	2	1	6
Spa	7	4	5	6	2	1	3
MB	4	5	7	1	6	3	2
Con	7	5	3	6	4	1	2

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	7	4	4	6	4	1	2
Aud	6	7	5	4	2	1	3
ACA	7	5	1	2	6	4	3
DRD	5	6	7	4	2	1	3
OLD	4	6.5	6.5	5	3	1	2
PS	6	4.5	7	4.5	2	3	1
PW	4	3	7	5	6	1	2
IS	6	7	5	2	3	1	4
Spa	7	6	5	4	3	1	2
MB	4	2	7	1	6	5	3
Con	7	3	6	5	4	2	1

Dataset	FSRS	FSDIS	SADA	FSFS	Semi2MNR	SARM1	SARM2
Arr	5	6	7	1	4	2	3
Aud	5	7	2	1	6	3	4
ACA	7	5	1	2	6	4	3
DRD	7	6	3.5	1	5	2	3.5
OLD	3	5	4	7	6	2	1
PS	4	5	7	1	6	2	3
PW	2	4	7	5	6	1	3
IS	5	7	6	3	4	1	2
Spa	7	5	6	3	4	2	1
MB	3	6	7	1	5	2	4
Con	7	3	4	6	5	1	2