Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets

Zheng, Wenbin; Li, Jinjin; Liao, Shujiao; Lin, Yidong

doi:10.3390/sym14081652

Open AccessArticle

Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets

¹

School of Computer Science, Minnan Normal University, Zhangzhou 363000, China

²

Key Laboratory of Data Science and Intelligence Application, Fujian Province University, Zhangzhou 363000, China

³

School of Mathematics and Statistics, Minnan Normal University, Zhangzhou 363000, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2022, 14(8), 1652; https://doi.org/10.3390/sym14081652

Submission received: 8 July 2022 / Revised: 27 July 2022 / Accepted: 4 August 2022 / Published: 10 August 2022

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

The rough set model has two symmetry approximations called upper approximation and lower approximation, which correspond to a concept’s intension and extension, respectively. Multi-label learning enforces the rough set model, which wants to be applied considering the correlations among labels, while the target concept should not be limited to only one. This paper proposes a multi-target model considering label correlation (Neighborhood Multi-Target Rough Sets, NMTRS) and proposes an attribute reduction approach based on NMTRS. First, some definitions of NMTRS are introduced. Second, some properties of NMTRS are discussed. Third, some discussion about the attribute significance measure is given. Fourth, the attribute reduction approaches based on NMTRS are proposed. Finally, the efficiency and validity of the designed algorithms are verified by experiments. The experiments show that our algorithm shows considerable performance when compared to state-of-the-art approaches.

Keywords:

multi-label learning; attribute reduction; multi-target rough set; label correlation

1. Introduction

Since rough set theory was proposed by Pawlak [1] in 1982, it quickly became a hot topic in knowledge discovery and has been widely used in many applications such as classification [2,3,4,5,6], clustering [7,8,9,10,11], and attribute reduction [12]. It has two approximations corresponding to a target concept’s intension and extension, which shows symmetry. Various rough set models are based on different types of binary relations, such as multiple equivalence relations [13], general binary relations [14], and so on. Within these models, the neighborhood relation is outstanding for its ability to deal with both nominal and numerical attributes at the same time.

There are lots of works applying the neighborhood rough set model in various fields. Inbarani et al. [15] proposed a classification algorithm by using the neighborhood rough set model. For dynamic data mining, Zhang et al. [16] proposed a neighborhood rough set approach. Most relevant works applied the neighborhood rough set model to attribute reduction tasks.

Attribute reduction, or feature selection, is a traditional but essential machine learning task. Attribute reduction approaches try to select some features from the raw attribute set without harming the data’s information presentation ability. These approaches have made remarkable achievements in eliminating noise and promoting learning time efficiency. For attribute reduction tasks, there are different types of work. The first type is single-label attribute reduction. Hu et al. [17] proposed an approach for attribute reduction based on neighborhood rough sets. A quick attribute reduction algorithm was proposed by Yong et al. [18] based on the neighborhood rough set model. Additionally, there are parallel attribute reduction approaches [19], online streaming attribute reduction [20], and so on. These attribute reduction approaches were proposed based on the classic or extended neighborhood rough set model.

The second type of attribute reduction task is multi-label attribute reduction. These methods use strategies to handle the multiple labels in a multi-label learning paradigm. For example, Sun et al. [21] proposed a multi-label attribute reduction approach by transforming the multi-label learning problem into a single label one by ignoring the correlations within labels. Fisher R.A. et al. [22] regard the feature space and the label space as two different viewpoints of the data to improve the original dimension reduction method. By using the kernel matrix method, Wold H. et al. [23] proposed a similar method to [22]. By using the mapping dimensionality reduction and sub-control dimensionality reduction approaches at the same time, Zhang et al. [24] proposed a dimensionality reduction with a linear kernel matrix or non-linear kernel matrix. Based on PCA and genetic algorithms, Zhang et al. [25] proposed MLNB by using the Naive Bayes method to extract the features simultaneously. Liu et al. [26] proposed an attribute reduction method based on a neighborhood rough set model. Meanwhile, an online multi-label attribute reduction method was proposed by Liu et al. [27] using the neighborhood rough set model. The f-neighborhood rough set model was used to derive the feature selection method for multi-label learning [28].

The attribute reduction methods designed for single-label learning or multi-label learning based on neighborhood rough sets all use the classic neighborhood rough set model and its extended models. All of these neighborhood rough set models are designed for a classic information system [29,30,31,32,33,34,35], which means that the system only has one decision attribute. None of them are designed for a multi-decision information system that considers label correlation among labels simultaneously.

In this paper, we propose a neighborhood multi-target rough set model and then design an attribute reduction algorithm based on it. We design the model by defining a global correlated target set to be the target group of the rough set model. The coefficient of the global correlated target set controls the relevance of different target concepts. Then, we use a conservative strategy to combine the correlated target to define the rough set model. Using the rough set model which we proposed, the attribute significance measure can be then given. Furthermore, we can derive the corresponding attribute reduction algorithm.

The contributions of this paper are as follows:

A neighborhood rough set model considering the label correlation is proposed for multi-label learning.
The properties of the proposed models are investigated.
An algorithm for calculating the approximations in the proposed rough set model is designed.
Attribute significance measure is given based on the rough set model we proposed.
Experiments are conducted to validate the efficiency and effectiveness of the proposed algorithms.

The rest of this paper is organized as follows. Some basic concepts of NMTRS are introduced, and their properties are discussed in Section 2. In Section 3, the attribute significance measure is given, along with some discussions about it, and the corresponding attribute reduction algorithms are derived by the significance measure. All the algorithms are evaluated in Section 4. Finally, we conclude the whole paper in Section 5.

2. Neighborhood Multi-Target Rough Sets

In this section, some concepts associated with our proposed model are introduced and then the properties of the proposed model are discussed.

2.1. Definitions

In this subsection, the definitions of neighborhood multi-target rough sets are introduced.

Definition 1.

[1] Suppose

U

is a finite universe and

A = {a_{1}, a_{2}, \dots, a_{m}}

is an attribute set, then

(U, A)

is an information system.

Definition 2.

[35] (Set Correlation, SC) Suppose

U

is a finite universe and

(U, A)

is an information system,

X = \{X_{1}, X_{2}, \dots, X_{r}\}

is a finite target set which satisfies for all

X_{i} \in X, X_{i} \subseteq U

. The set correlation is defined by:

S_{R} (X_{i}, x_{j}) = | X_{i} \cap X_{j} | / | X_{i} |, i, j \leq r (R e l a t i v e C o r r e l a t i o n) o r

S_{R} (X_{i}, x_{j}) = | X_{i} \cap X_{j} | / | X_{i} \cup X_{j} |, i, j \leq r (A b s o l u t e C o r r e l a t i o n) .

Definition 3.

(Global Correlated Target Set, GCTS) Suppose

U

is a finite universe and

(U, A)

is an information system,

X = \{X_{1}, X_{2}, \dots, X_{r}\}

is a finite target set which satisfies for all

X_{i} \in X, X_{i} \subseteq U

. Then,

X

is a global correlated target set if and only if for all

X_{i}, X_{j} \in X

,

S_{R} (X_{i}, X_{j}) > α

or

S_{R} (X_{j}, X_{i}) > α

,

α \in (0, 1]

is the Correlation Control Parameter (CCP) among targets; it controls the relevance degree among targets in the target group.

Based on the definition of GCTS and neighborhood rough sets, we can define neighborhood multi-target rough sets accordingly.

Definition 4.

[34] (Neighborhood Class)Suppose

U

is a finite universe and

(U, A)

is an information system, where

A = A_{C} \cup A_{N}

,

A_{C}

is the symbol attribute set and

A_{N}

is the numerical attribute set. For all

x \in U, δ \geq 0

, the neighborhood class of

x

can be defined as

\begin{matrix} (1) n_{A_{C}} (x) = \{y \in U | \forall a \in A_{C}, y_{a} = x_{a}\}; \\ (2) n_{A_{N}} (x) = \{y \in U | d (x_{A_{N}}, y_{A_{N}}) \leq δ\}; \\ (3) n_{A} (x) = n_{A_{C} \cup A_{N}} (x) = \{y \in U | \forall a \in A_{C}, y_{a} = x_{a} \land d (x_{A_{N}}, y_{A_{N}}) \leq δ\} . \end{matrix}

where

y_{a}

denotes the attribute value of instance

y

on attribute

a

and where

y_{A_{N}}

denotes the attribute value of instance

y

on attribute set

A_{N}

.

Definition 5.

(Neighborhood Multi-Target Rough Sets, NMTRS) Suppose

U

is a finite universe and

(U, A)

is an information system,

X = \{X_{1}, X_{2}, \dots, X_{r}\}

is a finite target set which satisfies for all

X_{i} \in X, X_{i} \subseteq U

.

X

is a GCTS and the correlation coefficient is

α

.

Then, the lower approximation of NMTRS is defined as:

{\underline{R}}_{δ}^{α} (X) = \{x \in U | n_{A} (x) \subseteq X_{1} \land n_{A} (x) \subseteq X_{2} \land \dots \land n_{A} (x) \subseteq X_{r}\} .

The upper approximation of NMTRS is defined as:

{\bar{R}}_{δ}^{α} (X) = \{x \in U | n_{A} (x) \cap X_{1} \neq ϕ \lor n_{A} (x) \cap X_{2} \neq ϕ \lor \dots \lor n_{A} (x) \cap X_{r} \neq ϕ\} .

With the help of CCP, we can organize different targets together. The lower approximation is a conservative approximation of the target group and needs all the targets to meet the same condition. Relatively, the upper approximation obeys a liberal strategy and only needs one target of the target group to meet the condition.

Example 1.

A multi-label decision information system is asTable 1below: it has two labels which are assumed to be two different target concepts. We can easily obtain that when α = 0.4 the target group is a GCTS. For clarifying the definition of NMTRS, we have an example for NMTRS. Since

X_{1} = \{x_{2}, x_{3}, x_{5}\}, X_{2} = \{x_{1}, x_{2}, x_{5}\}, α = 0.4, δ = 0.5, S_{R} (X_{1}, X_{2}) = 2 / 3 > α = 0.4 .

From Table 1 we have

n_{A} (x_{1}) = \{x_{1}\}, n_{A} (x_{2}) = \{x_{2}, x_{5}\}, n_{A} (x_{3}) = \{x_{3}, x_{4}\}, n_{A} (x_{4}) = \{x_{3}, x_{4}\}, n_{A} (x_{5}) = \{x_{2}, x_{5}\} .

From Definition 5,

\begin{matrix} n_{A} (x_{1}) = \{x_{1}\} ⊄ X_{1} \land n_{A} (x_{1}) \subseteq X_{2}, n_{A} (x_{2}) \subseteq X_{1} \land n_{A} (x_{2}) \subseteq X_{2}, n_{A} (x_{3}) ⊄ X_{1} \land n_{A} (x_{3}) ⊄ X_{2}, \\ n_{A} (x_{4}) ⊄ X_{1} \land n_{A} (x_{4}) ⊄ X_{2}, n_{A} (x_{5}) \subseteq X_{1} \land n_{A} (x_{5}) \subseteq X_{2}, \\ ∴ {\underline{R}}_{δ}^{α} (X) = \{x_{2}, x_{5}\} . \end{matrix}

\begin{matrix} n_{A} (x_{1}) \cap X_{2} \neq ϕ, n_{A} (x_{2}) \cap X_{1} \neq ϕ, n_{A} (x_{3}) \cap X_{1} \neq ϕ, n_{A} (x_{4}) \cap X_{1} \neq ϕ, n_{A} (x_{5}) \cap X_{1} \neq ϕ, \\ ∴ {\bar{R}}_{δ}^{α} (X) = U . \end{matrix}

2.2. Properties

The properties of NMTRS are discussed in this subsection.

Proposition 1.

Suppose

U

is a finite universe and

(U, A)

is an information system,

X = \{X_{1}, X_{2}, \dots, X_{r}\}

is a finite target set which satisfies for all

X_{i} \in X, X_{i} \subseteq U

.

X

is a GCTS and the correlation coefficient is

α

. For the approximations of NMTRS:

\begin{array}{l} (1) f o r a l l X_{i} \in X, X_{i} = U i f a n d o n l y i f {\underline{R}}_{δ}^{α} (X) = U; \\ (2) e x i s t s X_{i} \in X, X_{i} = U i f a n d o n l y i f {\bar{R}}_{δ}^{α} (X) = U; \\ (3) e x i s t s X_{i} \in X, X_{i} = ϕ i f a n d o n l y i f {\underline{R}}_{δ}^{α} (X) = ϕ; \\ (4) f o r a l l X_{i} \in X, X_{i} = ϕ i f a n d o n l y i f {\bar{R}}_{δ}^{α} (X) = ϕ; \\ (5) f o r a l l X_{i} \in X, {\underline{R}}_{δ}^{α} (X) \subseteq X_{i}; \\ (6) f o r a l l X_{i} \in X, X_{i} \subseteq {\bar{R}}_{δ}^{α} (X) . \end{array}

Proof.

\begin{matrix} (1) f o r a l l X_{i} \in X, X_{i} = U i m p l i e s f o r a l l x \in U, f o r a l l i \in \{1, 2, \dots, r\}, n_{A} (x) \subseteq X_{i} \\ w h i c h i m p l i e s n_{A} (x) \subseteq X_{1} \land n_{A} (x) \subseteq X_{2} \land \dots \land n_{A} (x) \subseteq X_{r} s o {\underline{R}}_{δ}^{α} (X) = U; \\ (2) e x i s t s X_{i} \in X, X_{i} = U i m p l i e s f o r a l l x \in U, e x i s t s i \in \{1, 2, \dots, r\}, n_{A} (x) \cap X_{i} \neq ϕ \\ w h i c h i m p l i e s n_{A} (x) \cap X_{1} \neq ϕ \lor n_{A} (x) \cap X_{2} \neq ϕ \lor \dots \lor n_{A} (x) \cap X_{r} \neq ϕ \\ i f a n d o n l y i f x \in {\bar{R}}_{δ}^{α} (X), s o {\bar{R}}_{δ}^{α} (X) = U; \\ (3) e x i s t s X_{i} \in X, X_{i} = ϕ i m p l i e s f o r a l l x \in U, f o r a l l i \in \{1, 2, \dots, r\}, n_{A} (x) ⊄ X_{i} \\ w h i c h i m p l i e s x \notin {\underline{R}}_{δ}^{α} (X) s o {\underline{R}}_{δ}^{α} (X) = ϕ; \\ (4) f o r a l l X_{i} \in X, X_{i} = ϕ i m p l i e s f o r a l l x \in U, f o r a l l i \in \{1, 2, \dots, r\}, n_{A} (x) \cap X_{i} = ϕ \\ w h i c h i m p l i e s x \notin {\bar{R}}_{δ}^{α} (X) s o {\bar{R}}_{δ}^{α} (X) = ϕ; \\ (5) f o r a l l x \in {\underline{R}}_{δ}^{α} (X) i m p l i e s f o r a l l i \in \{1, 2, \dots, r\}, n_{A} (x) \subseteq X_{i} w h i c h i m p l i e s \\ n_{A} (x) \subseteq X_{1} \land n_{A} (x) \subseteq X_{2} \land \dots \land n_{A} (x) \subseteq X_{r} s o {\underline{R}}_{δ}^{α} (X) \subseteq X_{i}; \\ (6) f o r a l l x \in X_{i} i m p l i e s f o r a l l i \in \{1, 2, \dots, r\}, n_{A} (x) \cap X_{i} \neq ϕ w h i c h i m p l i e s \\ x \in {\bar{R}}_{δ}^{α} (X) s o X_{i} \subseteq {\bar{R}}_{δ}^{α} (X) . □ \end{matrix}

Proposition 2.

Suppose

U

is a finite universe and

(U, A)

is an information system,

Z = \{Z_{1}, Z_{2}, \dots, Z_{r}\}

is a finite target set which satisfies for all

Z_{i} \in Z, Z_{i} \subseteq U

.

Z

is a GCTS and the correlation coefficient is

α . X, Y \subseteq Z

. For the approximations of NMTRS:

\begin{matrix} (1) {\underline{R}}_{δ}^{α} (X \cap Y) \supseteq {\underline{R}}_{δ}^{α} (X) \cap {\underline{R}}_{δ}^{α} (Y); \\ (2) {\bar{R}}_{δ}^{α} (X \cap Y) \subseteq {\bar{R}}_{δ}^{α} (X) \cap {\bar{R}}_{δ}^{α} (Y); \\ (3) {\underline{R}}_{δ}^{α} (X \cup Y) \subseteq {\underline{R}}_{δ}^{α} (X) \cup {\underline{R}}_{δ}^{α} (Y); \\ (4) {\bar{R}}_{δ}^{α} (X \cup Y) = {\bar{R}}_{δ}^{α} (X) \cup {\bar{R}}_{δ}^{α} (Y) . \end{matrix}

Proof.

\begin{matrix} (1) f o r a l l x \in {\underline{R}}_{δ}^{α} (X) \cap {\underline{R}}_{δ}^{α} (Y) i m p l i e s f o r a l l X_{i} \in X (i \leq | X |), n_{A} (x) \subseteq X_{i} \\ a n d f o r a l l Y_{j} \in Y (j \leq | Y |), n_{A} (x) \subseteq Y_{j} w h i c h i m p l i e s f o r a l l Z_{k} \in X \cap Y \\ (k \in \{1, 2, \dots, r\}), n_{A} (x) \subseteq Z_{k} i f a n d o n l y i f x \in {\underline{R}}_{δ}^{α} (X \cap Y) \\ ∴ {\underline{R}}_{δ}^{α} (X \cap Y) \supseteq {\underline{R}}_{δ}^{α} (X) \cap {\underline{R}}_{δ}^{α} (Y); \\ (2) f o r a l l x \in {\bar{R}}_{δ}^{α} (X \cap Y), x \in {\bar{R}}_{δ}^{α} (X \cap Y) i f a n d o n l y i f e x i s t s Z_{k} \in X \cap Y \\ (k \in \{1, 2, \dots, r\}), n_{A} (x) \cap Z_{k} \neq ϕ, w h i c h i m p l i e s e x i s t s X_{i} \in X (i \leq | X |), n_{A} (x) \cap X_{i} \neq ϕ \\ a n d e x i s t s Y_{j} \in Y (j \leq | Y |), n_{A} (x) \cap Y_{j} \neq ϕ, w h i c h i m p l i e s x \in {\bar{R}}_{δ}^{α} (X) \cap {\bar{R}}_{δ}^{α} (Y) \\ ∴ {\bar{R}}_{δ}^{α} (X \cap Y) \subseteq {\bar{R}}_{δ}^{α} (X) \cap {\bar{R}}_{δ}^{α} (Y); \\ (3) f o r a l l x \in {\underline{R}}_{δ}^{α} (X \cup Y), x \in {\underline{R}}_{δ}^{α} (X \cup Y) i f a n d o n l y i f e x i s t s Z_{k} \in X \cup Y \\ (k \in \{1, 2, \dots, r\}), n_{A} (x) \subseteq Z_{k}, w h i c h i m p l i e s f o r a l l X_{i} \in X (i \leq | X |), n_{A} (x) \subseteq X_{i} \\ o r f o r a l l Y_{j} \in Y (j \leq | Y |), n_{A} (x) \subseteq Y_{j} w h i c h i m p l i e s x \in {\underline{R}}_{δ}^{α} (X) o r x \in {\underline{R}}_{δ}^{α} (Y) \\ w h i c h i m p l i e s x \in {\underline{R}}_{δ}^{α} (X) \cup {\underline{R}}_{δ}^{α} (Y) s o {\underline{R}}_{δ}^{α} (X \cup Y) \subseteq {\underline{R}}_{δ}^{α} (X) \cup {\underline{R}}_{δ}^{α} (Y); \\ (4) f o r a l l x \in {\bar{R}}_{δ}^{α} (X \cup Y), x \in {\bar{R}}_{δ}^{α} (X \cup Y) i f a n d o n l y i f e x i s t s Z_{k} \in X \cup Y \\ (k \in \{1, 2, \dots, r\}), n_{A} (x) \cap Z_{k} \neq ϕ, w h i c h i m p l i e s e x i s t s X_{i} \in X (i \leq | X |), n_{A} (x) \cap X_{i} \neq ϕ \\ o r e x i s t s Y_{j} \in Y (j \leq | Y |), n_{A} (x) \cap Y_{j} \neq ϕ, w h i c h i m p l i e s x \in {\bar{R}}_{δ}^{α} (X) \cup {\bar{R}}_{δ}^{α} (Y) \\ ∴ {\bar{R}}_{δ}^{α} (X \cup Y) = {\bar{R}}_{δ}^{α} (X) \cup {\bar{R}}_{δ}^{α} (Y) . □ \end{matrix}

Proposition 3.

Suppose

U

is a finite universe and

(U, A)

is an information system,

Z = \{Z_{1}, Z_{2}, \dots, Z_{r}\}

is a finite target set which satisfies for all

Z_{i} \in Z, Z_{i} \subseteq U

.

Z

is a GCTS and the correlation coefficient is

α

.

X \subseteq Y \subseteq Z

. For the approximations of NMTRS:

\begin{matrix} (1) {\underline{R}}_{δ}^{α} (X) \supseteq {\underline{R}}_{δ}^{α} (Y); \\ (2) {\bar{R}}_{δ}^{α} (X) \subseteq {\bar{R}}_{δ}^{α} (Y) . \end{matrix}

Proof.

\begin{matrix} (1) f o r a l l x \in {\underline{R}}_{δ}^{α} (Y) i f a n d o n l y i f f o r a l l Z_{k} \in Y, k \in \{1, 2, \dots, r\}, n_{A} (x) \subseteq Z_{k} \\ f o r a l l Z_{k} \in X, k \in \{1, 2, \dots, r\}, n_{A} (x) \subseteq Z_{k}, i m p l i e s x \in {\underline{R}}_{δ}^{α} (X), s o {\underline{R}}_{δ}^{α} (X) \supseteq {\underline{R}}_{δ}^{α} (Y); \\ (2) f o r a l l x \in {\bar{R}}_{δ}^{α} (X) i f a n d o n l y i f f o r a l l Z_{k} \in X, k \in \{1, 2, \dots, r\}, n_{A} (x) \cap Z_{k} \neq ϕ, \\ w h i c h i m p l i e s e x i s t s Z_{k} \in Y, k \in \{1, 2, \dots, r\}, n_{A} (x) \cap Z_{k} \neq ϕ, w h i c h i m p l i e s x \in {\bar{R}}_{δ}^{α} (Y), \\ ∴ {\bar{R}}_{δ}^{α} (X) \subseteq {\bar{R}}_{δ}^{α} (Y) . □ \end{matrix}

2.3. Approximation Computation of NMTRS

In this section, we propose an approach for computing the approximations of NMTRS. We induce some corresponding results for approximation computation, then design an algorithm for calculating the approximations of NMTRS.

Definition 6.

[36] Suppose

U

is a finite universe,

P \subseteq U,

the matrix representation of set

P

is defined as:

\vec{P} = {(p_{j})}_{n \times 1}, j \in \{1, 2, \dots, n\}, n = | U |, w h e r e p_{j} = \{\begin{matrix} 0 & x_{j} \in P \\ 1 & x_{j} \in P \end{matrix} .

Example 2.

Continues Example 1. By Definition 6 we have

\begin{matrix} \vec{n_{A} (x_{1})} = {(1, 0, 0, 0, 0)}^{'}, \vec{n_{A} (x_{2})} = {(0, 1, 0, 0, 1)}^{'}, \vec{n_{A} (x_{3})} = {(0, 0, 1, 1, 0)}^{'}, \\ \vec{n_{A} (x)} = {(0, 0, 1, 1, 0)}^{'}, \vec{n_{A} (x_{5})} = {(0, 1, 0, 0, 1)}^{'} . \end{matrix}

where

{\vec{M}}^{'}

means the transpose of matrix

\vec{M}

.

Lemma 1.

Suppose

X, Y \subseteq U

, we have

\begin{matrix} (1) X \cap Y \neq ϕ i f a n d o n l y i f {\vec{X}}^{'} \cdot \vec{Y} > 0; \\ (2) X \subseteq Y i f a n d o n l y i f {\vec{X}}^{'} \cdot \vec{~ Y} = 0 . \end{matrix}

where

~ Y

is a complement of

Y

, where

{\vec{X}}^{'}

means the transpose of a matrix

\vec{X}

, and ‘

\cdot

’ is the matrix quantity product.

There is a useful property gained by computing the approximations of NMTRS using Lemma 1.

Theorem 1.

Suppose

U

is a finite universe and

(U, A)

is an information system,

X

is a GCTS, and we have

\begin{matrix} (1) x \in {\underline{R}}_{δ}^{α} (X) i f a n d o n l y i f f o r a l l i \in \{1, 2, \dots, r\}, \vec{n_{A} {(x)}^{'}} \cdot \vec{~ X_{i}} = 0; \\ (2) x \in {\bar{R}}_{δ}^{α} (X) i f a n d o n l y i f e x i s t s i \in \{1, 2, \dots, r\}, {\vec{n_{A} (x)}}^{'} \cdot \vec{X_{i}} > 0 . \end{matrix}

Proof.

From Lemma 1,

\begin{matrix} (1) f o r a l l i \in \{1, 2, \dots, r\}, {\vec{n_{A} (x)}}^{'} \cdot \vec{~ X_{i}} = 0 w h i c h i m p l i e s f o r a l l i \in \{1, 2, \dots, r\}, \\ n_{A} (x) \subseteq X_{i}, i f a n d o n l y i f x \in {\underline{R}}_{δ}^{α} (X); \\ (2) e x i s t s i \in \{1, 2, \dots, r\}, {\vec{n_{A} (x)}}^{'} \cdot \vec{X_{i}} > 0, w h i c h i m p l i e s e x i t s i \in \{1, 2, \dots, r\}, \\ n_{A} (x) \cap X_{i} \neq ϕ, i f a n d o n l y i f x \in {\bar{R}}_{δ}^{α} (X) . □ \end{matrix}

The matrix representation of Theorem 1 is as the following definition.

Definition 7.

Suppose

U

is a finite universe and

(U, A)

is an information system,

X

is a GCTS, and the lower approximation matrix of NMTRS can be defined as:

\underline{H} (X) = {({\underline{h}}_{i j})}_{r \times n}, w h e r e {\underline{h}}_{i j} = \{\begin{matrix} 1 & {\vec{n_{A} (x_{j})}}^{'} \cdot \vec{~ X_{i}} = 0 \\ 0 & o t h e r w i s e \end{matrix}, i \in \{1, 2, \dots, r\}, j \in \{1, 2, \dots, n\} .

The upper approximation matrix of NMTRS can be defined as:

\bar{H} (X) = {({\bar{h}}_{i j})}_{r \times n}, w h e r e {\bar{h}}_{i j} = \{\begin{matrix} 1 & {\vec{n_{A} (x_{j})}}^{'} \cdot \vec{X_{i}} > 0 \\ 0 & o t h e r w i s e \end{matrix}, i \in \{1, 2, \dots, r\}, j \in \{1, 2, \dots, n\} .

Example 3.

Continuation of Example 2. By Definition 6 we have

\vec{X_{1}} = {(0, 1, 1, 0, 1)}^{'}, \vec{~ X_{1}} = {(1, 0, 0, 1, 0)}^{'}, \vec{X_{2}} = {(1, 1, 0, 0, 1)}^{'}, \vec{~ X_{2}} = {(0, 0, 1, 1, 0)}^{'} .

By Definition 7 we have

\begin{matrix} {\vec{n_{A} (x_{1})}}^{'} \cdot \vec{~ X_{1}} = (1, 0, 0, 0, 0) \cdot {(1, 0, 0, 1, 0)}^{'} \neq 0, \\ {\vec{n_{A} (x_{1})}}^{'} \cdot \vec{~ X_{2}} = (1, 0, 0, 0, 0) \cdot {(0, 0, 1, 1, 0)}^{'} = 0; \\ {\vec{n_{A} (x_{2})}}^{'} \cdot \vec{~ X_{1}} = 0, {\vec{n_{A} (x_{2})}}^{'} \cdot \vec{~ X_{2}} = 0; {\vec{n_{A} (x_{3})}}^{'} \cdot \vec{~ X_{1}} \neq 0, {\vec{n_{A} (x_{3})}}^{'} \cdot \vec{~ X_{2}} \neq 0; \\ {\vec{n_{A} (x_{4})}}^{'} \cdot \vec{~ X_{1}} \neq 0, {\vec{n_{A} (x_{4})}}^{'} \cdot \vec{~ X_{2}} \neq 0; {\vec{n_{A} (x_{5})}}^{'} \cdot \vec{~ X_{1}} = 0, {\vec{n_{A} (x_{5})}}^{'} \cdot \vec{~ X_{2}} = 0; \end{matrix}

then

\underline{H} (X) = (\begin{matrix} 0 & 1 & 0 & 0 & 1 \\ 1 & 1 & 0 & 0 & 1 \end{matrix}) .

Example 4.

Continuation of Example 3. By Definition 7 we have

\begin{matrix} {\vec{n_{A} (x_{1})}}^{'} \cdot \vec{X_{1}} = (1, 0, 0, 0, 0) \cdot {(0, 1, 1, 0, 1)}^{'} = 0, \\ {\vec{n_{A} (x_{1})}}^{'} \cdot \vec{X_{2}} = (1, 0, 0, 0, 0) \cdot {(1, 1, 0, 0, 1)}^{'} > 0; \\ {\vec{n_{A} (x_{2})}}^{'} \cdot \vec{X_{1}} > 0, {\vec{n_{A} (x_{2})}}^{'} \cdot \vec{X_{2}} > 0; {\vec{n_{A} (x_{3})}}^{'} \cdot \vec{X_{1}} > 0, {\vec{n_{A} (x_{3})}}^{'} \cdot \vec{X_{2}} = 0; \\ {\vec{n_{A} (x_{4})}}^{'} \cdot \vec{X_{1}} = 0, {\vec{n_{A} (x_{4})}}^{'} \cdot \vec{X_{2}} = 0; {\vec{n_{A} (x_{5})}}^{'} \cdot \vec{X_{1}} > 0, {\vec{n_{A} (x_{5})}}^{'} \cdot \vec{X_{2}} > 0; \end{matrix}

then

\bar{H} (X) = (\begin{matrix} 0 & 1 & 1 & 0 & 1 \\ 1 & 1 & 0 & 0 & 1 \end{matrix})

.

Then, we can easily obtain a theorem for computing approximations in NMTRS.

Theorem 2.

Suppose

U

is a finite universe and

(U, A)

is an information system,

X

is a GCTS,

i \in \{1, 2, \dots, r\}, j \in \{1, 2, \dots, n\}

, we have

\begin{matrix} (1) x_{j} \in {\underline{R}}_{δ}^{α} (X) i f a n d o n l y i f \land_{i = 1}^{r} {\underline{h}}_{i j} = 1; \\ (2) x_{j} \in {\bar{R}}_{δ}^{α} (X) i f a n d o n l y i f \lor_{i = 1}^{r} {\bar{h}}_{i j} = 1 . \end{matrix}

Proof.

From Lemma 1,

\begin{matrix} (1) x_{j} \in {\underline{R}}_{δ}^{α} (X) i f a n d o n l y i f f o r a l l i \in \{1, 2, \dots, r\}, {\vec{n_{A} (x_{j})}}^{'} \cdot \vec{~ X_{i}} = 0, \\ i f a n d o n l y i f f o r a l l i \in \{1, 2, \dots, r\}, {\underline{h}}_{i j} = 1, s o \land_{i = 1}^{r} {\underline{h}}_{i j} = 1; \\ (2) x_{j} \in {\bar{R}}_{δ}^{α} (X) i f a n d o n l y i f e x i s t s i \in \{1, 2, \dots, r\}, {\vec{n_{A} (x_{j})}}^{'} \cdot \vec{X_{i}} > 0, \\ i f a n d o n l y i f e x i s t s i \in \{1, 2, \dots, r\}, {\bar{h}}_{i j} = 1, s o \lor_{i = 1}^{r} {\bar{h}}_{i j} = 1 . □ \end{matrix}

Example 5.

Continuation of Example 4. By Theorem 2 we have

\begin{matrix} \vec{{\underline{R}}_{δ}^{α} (X)} = {(0, 1, 0, 0, 1)}^{'} \land {(1, 1, 0, 0, 1)}^{'} = {(0, 1, 0, 0, 1)}^{'}, \\ \vec{{\bar{R}}_{δ}^{α} (X)} = {(0, 1, 1, 0, 1)}^{'} \lor {(1, 1, 0, 0, 1)}^{'} = {(1, 1, 1, 0, 1)}^{'}, \\ ∴ {\underline{R}}_{δ}^{α} (X) = \{x_{2}, x_{5}\}, {\bar{R}}_{δ}^{α} (X) = \{x_{1}, x_{2}, x_{3}, x_{5}\} . \end{matrix}

Based on Theorem 2, we propose matrix-based Algorithm 1 for computing the approximations of a particular target concept group. The total time complexity of Algorithm 1 is

Θ (r n^{2})

. Steps 2–15 are to calculate

\bar{H}

and

\underline{H}

with time complexity

Θ (r n^{2})

.

Algorithm 1: Computing approximations of NMTRS
Input: $(U, A), X = \{X_{1}, X_{2}, \dots, X_{r}\}, \vec{n_{A} (x_{j})}, f o r a l l x \in U$
Output: ${\bar{R}}_{δ}^{α} (X), {\underline{R}}_{δ}^{α} (X)$
1:	$n \leftarrow \| U \|, r \leftarrow \| X \|$
2:	for $i = 1 \to r$
3:	for $j = 1 \to n$
4:	if ${\vec{n_{A} (x_{j})}}^{'} \cdot \vec{~ X_{i}} = 0$ then
5:	${\underline{h}}_{i j} = 1$
6:	else
7:	${\underline{h}}_{i j} = 0$
8:	end if
9:	if ${\vec{n_{A} (x_{j})}}^{'} \cdot \vec{X_{i}} > 0$ then
10:	${\bar{h}}_{i j} = 1$
11:	else
12:	${\bar{h}}_{i j} = 0$
13:	end if
14:	end for
15:	end for
16:	${\vec{{\bar{R}}_{δ}^{α} (X)}}^{'} = {(\land_{i = 1}^{r} {\underline{h}}_{i *})}^{'} = {(\land_{i = 1}^{r} {\underline{h}}_{i 1}, \land_{i = 1}^{r} {\underline{h}}_{i 2}, \dots, \land_{i = 1}^{r} {\underline{h}}_{i n})}^{'}$
17:	${\vec{{\underline{R}}_{δ}^{α} (X)}}^{'} = {(\lor_{i = 1}^{r} {\bar{h}}_{i *})}^{'} = {(\lor_{i = 1}^{r} {\bar{h}}_{i 1}, \lor_{i = 1}^{r} {\bar{h}}_{i 2}, \dots, \lor_{i = 1}^{r} {\bar{h}}_{i n})}^{'}$
18:	Return ${\bar{R}}_{δ}^{α} (X), {\underline{R}}_{δ}^{α} (X)$

3. Attribute Reduction Based on NMTRS

In this section, we propose an attribute reduction method based on NMTRS. We derive the attribute significance measure, then design an algorithm for calculating the reduction.

3.1. Attribute Significance Measure Based on NMTRS

Suppose

U = \{x_{1}, x_{2}, \dots, x_{n}\}

is the instance space and

L = \{l_{1}, l_{2}, \dots, l_{r}\}

is the label space.

T = \{(x_{i}, y_{i}) | i = 1, 2, \dots, n\}

means that label vector

y_{i}

is associated with instance

x_{i}

.

(U, A, L)

is the tuple denoting the multi-label information system.

Definition 8.

Suppose

U

is a finite universe and

(U, A, L)

is an information system,

X = \{X_{1}, X_{2}, \dots, X_{m}\}

is a GCTS and the correlation coefficient is

α

. If we regard

l_{i} (i = 1, 2, \dots, r)

as the matrix representation of a certain decision set, and

ϕ (L) = \{X^{1}, X^{2}, \dots\}

is the GCTS group on

L

, for any

B \subseteq A

then the positive region of NMTRS is defined as:

P o s_{B} (ϕ (L)) = \{\cup {\underline{R_{B}}}_{δ}^{α} (X) | X \in ϕ (L)\},

then, the dependence of NMTRS on attribute set

B

can be defined as:

γ_{B} (ϕ (L)) = \frac{| P o s (ϕ (L)) |}{| U |},

then, for all

c \in A - B

, the conditional attribute significance of

c

on attribute set B can be calculated as:

s i g_{γ} (c, B, L) = γ_{B \cup \{c\}} (ϕ (L)) - γ_{B} (ϕ (L)) .

It can be easily obtained that

s i g_{γ} (c, B, L) \geq 0

, and if

s i g_{γ} (c, B, L) = 0

, we can say that

c

is an unnecessary attribute.

3.2. Multi-Granulation Discrimination Analysis Based on NMTRS

Based on the NMTRS model, some discussions about multi-granulation discrimination are conducted in this section. Several propositions can be easily obtained.

Proposition 4.

Suppose

U

is a finite universe and

(U, A, L)

is an information system,

X = \{X_{1}, X_{2}, \dots, X_{m}\}

is a GCTS and the correlation coefficient is

α

.

d

is the Euclidean distance,

B_{1} \subseteq B_{2} \subseteq A

, and the following properties can be easily obtained:

\begin{matrix} (1) f o r a l l x \in U, n_{B_{1}} (x) \subseteq n_{B_{2}} (x); \\ (2) f o r a l l X \in ϕ (L), {\underline{R_{B_{1}}}}_{δ}^{α} (X) \subseteq {\underline{R_{B_{2}}}}_{δ}^{α} (X); \\ (3) P o s_{B_{1}} (ϕ (L)) \subseteq P o s_{B_{2}} (ϕ (L)); \\ (4) γ_{B_{1}} (ϕ (L)) \leq γ_{B_{2}} (ϕ (L)) . \end{matrix}

Proposition 5.

Suppose

U

is a finite universe and

(U, A, L)

is an information system,

X = \{X_{1}, X_{2}, \dots, X_{m}\}

is a GCTS and the correlation coefficient is

α

.

d

is the Euclidean distance, and for any

B_{1} \subseteq B_{2} \subseteq A

, if

x \in P o s_{B_{1}} (ϕ (L))

, then

x \in P o s_{B_{2}} (ϕ (L))

.

Proposition 6.

Suppose

U

is a finite universe and

(U, A, L)

is an information system,

X = \{X_{1}, X_{2}, \dots, X_{m}\}

is a GCTS and the correlation coefficient is

α

.

d

is the Euclidean distance, and for any

0 \leq δ_{2} \leq δ_{1}

,

B \subseteq A

, the following properties can be easily obtained:

\begin{matrix} (1) f o r a l l x \in U, n_{B}^{δ_{2}} (x) \subseteq n_{B}^{δ_{1}} (x); \\ (2) f o r a l l X \in ϕ (L), {\underline{R_{B}}}_{δ_{1}}^{α} (X) \subseteq {\underline{R_{B}}}_{δ_{2}}^{α} (X) a n d {\bar{R_{B}}}_{δ_{2}}^{α} (X) \subseteq {\bar{R_{B}}}_{δ_{1}}^{α} (X); \\ (3) P o s_{δ_{1}} (ϕ (L)) \subseteq P o s_{δ_{2}} (ϕ (L)); \\ (4) γ_{δ_{1}} (ϕ (L)) \leq γ_{δ_{2}} (ϕ (L)) . \end{matrix}

Proposition 7.

Suppose

U

is a finite universe and

(U, A, L)

is an information system,

L^{'} \subseteq L

.

d

is the Euclidean distance, and for any

0 \leq δ_{2} \leq δ_{1}

, the following properties can be easily obtained:

\begin{matrix} (1) P o s_{δ_{1}} (ϕ (L)) \subseteq P o s_{δ_{2}} (ϕ (L^{'})); \\ (2) γ_{δ_{1}} (ϕ (L)) \leq γ_{δ_{2}} (ϕ (L^{'})) . \end{matrix}

3.3. Attribute Reduction Algorithm Based on NMTRS

Based on the NMTRS model and its attribute significance measure, we derive the corresponding attribute reduction algorithm, which is named Algorithm 2.

Algorithm 2: Attribute reduction algorithm based on NMTRS
Input: $(U, A, L), α, δ, ϕ (L)$
Output:Reduction of the attribute set $A$
1:	$R e d u c t i o n \leftarrow \emptyset$
2:	while $A - R e d u c t i o n \neq \emptyset$
3:	for all $b \in A - R e d u c t i o n$
4:	Calculate $s i g_{γ} (b, R e d u c t i o n, L)$ with Algorithm 1
5:	end for
6:	if $\max_{b \in A - R e d u c t i o n} \{s i g_{γ} (b, R e d u c t i o n, L)\} \leq 0$ then
7:	break
8:	end if
9:	$R e d u c t i o n = R e d u c t i o n \cup \underset{b \in A - R e d u c t i o n}{argmax} \{s i g_{γ} (b, R e d u c t i o n, L)\}$
10:	Return $R e d u c t i o n$

The time complexity of Algorithm 2 is

Θ (r n^{2})

. The time complexity of calculating the neighborhood class of all instances is

Θ (n \times \log n)

, and the time complexity of calculating the attribute significance of attributes is

Θ (r n^{2})

.

4. Experimental Evaluations

In this section, several experiments are conducted to evaluate the effectiveness and the efficiency of the algorithms we proposed, namely Algorithm 2 (ours), Algorithm 3 (MLNB), Algorithm 4 (Laplacian Score) [37], Algorithm 5 (RelieF) [38], Algorithm 6 MDDM proj) and Algorithm 7 (MDDMspc) [24]. Since we can only determine the attribute order from MLNB and Laplacian Score, we make the number of the chosen attributes the same as our algorithm. Six datasets were chosen from public repositories. The details of these datasets are listed in Table 2. All the experiments were carried out on a personal computer with 64-bit Windows 10, Inter(R) Core (TM) i7 1065G7 CPU@3.50GHz and 16GB memory, developed by Wenbin Zheng, in Zhangzhou, China. The programming language was MATLAB r2020a.

4.1. Comparison of Performance Measures Using ML-kNN

In this subsection, we use ML-kNN as the learning method and select features using the four attribute reduction approaches, conducting ten-fold cross-validation on the temporary dataset ten times with the selected feature set.

4.1.1. Experimental Settings

For comparing the performance of our approach with the comparison algorithms, we apply all of the attribute reduction algorithms on the six datasets using their recommended parameter configurations in [39]. For our method, we set

α = 0.6

and

δ = 0.5

. The experimental results are shown in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8.

4.1.2. Discussions of the Experimental Results

From Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8, we can easily observe that our algorithm has considerable performance on each dataset, while none of the algorithms have obvious superiority on all datasets. Sometimes our algorithm can reach the best performance; it is in the middle class generally. The experiments show that our algorithm is valid so it does provide a new approach for multi-label feature selection.

For the dataset Bird Song, MDDM algorithms perform better than other algorithms; the performance of our algorithm is equal to MLNB, RelieF, and Laplacian Score.

For the dataset CAL-500, our algorithm performs equally to MDDMproj, better than all other algorithms with the performance measure Hamming loss.

For the dataset Emotions, our algorithm is the worst one with Hamming loss, ranking loss, and Average Precision, but performs better than RelieF, MDDMproj, and MDDMspec with the performance measure Coverage.

For the dataset FGNET, MDDM algorithms perform better, while our algorithm’s performance is equal to other algorithms.

For the dataset Water Quality Nom, our algorithm performs better than Laplacian Score and MDDMspc with the performance measure Hamming loss, performs better than Laplacian Score and MDDMspc with One-Error, and performs better than MDDMspc with Coverage.

4.2. Effectiveness of Parameters α and δ

In this subsection, we conduct several experiments to analyze the effectiveness of

α

and

δ

.

4.2.1. Experimental Settings

For analyzing the effectiveness of parameters, we conduct several experiments. We gradually increase the value of parameter

α

from 0 to 1 with a step of 0.05 and

δ

from 0 to

10^{20}

with a step of

10^{1}

. The experimental result is shown in Figure 1.

4.2.2. Discussions of the Experimental Results

In Figure 1 we can easily observe that with the increase in

α

, the performance of our proposed algorithm decreases when

α

is close to 1. Parameter

δ

does not have a significant effect on performance. It provides the recommended parameter setting for our experiments.

4.3. Effectiveness of Noise

In this subsection, we conduct several experiments to analyze the effectiveness of noise.

4.3.1. Experimental Settings

To analyze the effectiveness of noise, we conduct several experiments. We gradually increase the percentage of noise in the dataset Flags from 0 to 1 with a step of 0.05. The experimental result is shown in Figure 2.

4.3.2. Discussions of the Experimental Results

In Figure 1 we can easily observe that all the algorithms display similar performance with the increasing noise percentage. Compared with another algorithm, our proposed algorithm is not worse than them. In other words, for robustness, the performance of our algorithm is equivalent to comparison algorithms.

5. Conclusions

Attribute reduction in multi-label learning has been a hot topic in recent years. We propose a novel rough set model for multi-label learning and investigate the properties of the proposed model. With the proposed model, a novel feature selection method for multi-label learning is proposed, a novel attribute reduction approach is provided, and its validation is proved by experiments.

In the big data era, calculating the reduction of an attribute set presents great challenges for its high time complexity and dynamic attribute reduction. Algorithms can solve the problem by using previously calculated results. In future work, we will carry on promoting the proposed approach into a dynamic multi-label attribute reduction approach.

Author Contributions

Conceptualization, W.Z. and S.L. and Y.L.; methodology, W.Z. and J.L.; software, W.Z.; validation, W.Z., S.L. and Y.L.; formal analysis, W.Z. and J.L.; investigation, W.Z. and S.L.; writing—review and editing, W.Z.; supervision, J.L. and S.L.; project administration, W.Z. and S.L.; funding acquisition, S.L. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported in part by the National Natural Science Foundation of China under Grant Nos. 11871259, 61379021 and 12101289, the Natural Science Foundation of Fujian Province under Nos. 2022J01912 and 2022J01306, the Institute of Meteorological Big Data-Digital Fujian, Fujian Key Laboratory of Data Science and Statistics, and Fujian Key Laboratory of Granular Computing and Applications.

Data Availability Statement

The data that support the findings of this study are openly available in UC Irvine Machine Learning Repository at https://archive.ics.uci.edu/ml/index.php, http://mulan.sourceforge.net/datasets-mlc.html and https://www.uco.es/kdis/mllresources/, access on 16 April 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pawlak, Z. Rough set. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
Ziarko, W. Probabilistic approach to rough sets. Int. J. Approx. Reason. 2008, 49, 272–284. [Google Scholar] [CrossRef]
Bazan, J.G.; Nguyen, H.S.; Nguyen, S.H.; Synak, P.; Wróblewski, J. Rough set algorithms in classification problem. In Rough Set Methods and Applications Physical; Springer: Heidelberg, Germany, 2000. [Google Scholar]
Lingras, P. Unsupervised Rough Set Classification Using Gas. J. Intell. Inf. Syst. 2001, 16, 215–228. [Google Scholar] [CrossRef]
Miao, D.; Duan, Q.; Zhang, H.; Jiao, N. Rough set based hybrid algorithm for text classification. Expert Syst. Appl. 2009, 36, 9168–9174. [Google Scholar] [CrossRef]
Sharma, H.K.; Majumder, S.; Biswas, A.; Prentkovskis, O.; Kar, S.; Skačkauskas, P. A Study on Decision-Making of the Indian Railways Reservation System during COVID-19. J. Adv. Transp. 2022, 2022, 7685375. Available online: https://www.hindawi.com/journals/jat/2022/7685375/ (accessed on 7 July 2022). [CrossRef]
Lingras, P. Rough set clustering for web mining. In Proceedings of the 2002 IEEE World Congress on Computational Intelligence, 2002 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE’02. Proceedings (Cat. No. 02CH37291), Honolulu, HI, USA, 12–17 May 2002. [Google Scholar]
Lingras, P.; Peters, G. Applying rough set concepts to clustering. In Rough Sets: Selected Methods and Applications in Management and Engineering; Springer: London, UK, 2012; pp. 23–37. [Google Scholar]
Parmar, D.; Wu, T.; Blackhurst, J. MMR: An algorithm for clustering categorical data using Rough Set Theory. Data Knowl. Eng. 2007, 63, 879–893. [Google Scholar] [CrossRef]
Vidhya, K.A.; Geetha, T.V. Rough set theory for document clustering: A review. J. Intell. Fuzzy Syst. 2017, 32, 2165–2185. [Google Scholar] [CrossRef]
Hedar, A.R.; Ibrahim, A.M.M.; Abdel-Hakim, A.E.; Sewisy, A.A. Modulated clustering using integrated rough sets and scatter search attribute reduction. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Kyoto, Japan, 15–19 July 2018; pp. 1394–1401. [Google Scholar]
Xia, S.; Zhang, H.; Li, W.; Wang, G.; Giem, E.; Chen, Z. GBNRS: A Novel Rough Set Algorithm for Fast Adaptive Attribute Reduction in Classification. IEEE Trans. Knowl. Data Eng. 2022, 34, 1231–1242. [Google Scholar] [CrossRef]
Qian, Y.; Liang, J.; Yao, Y.; Dang, C. MGRS: A multi-granulation rough set. Inf. Sci. 2010, 180, 949–970. [Google Scholar] [CrossRef]
Yao, Y.; Yao, B. Covering based rough set approximations. Inf. Sci. 2012, 200, 91–107. [Google Scholar] [CrossRef]
Kumar, S.U.; Inbarani, H.H. A Novel Neighborhood Rough Set Based Classification Approach for Medical Diagnosis. Procedia Comput. Sci. 2015, 47, 351–359. [Google Scholar] [CrossRef]
Zhang, J.; Li, T.; Ruan, D.; Liu, D. Neighborhood rough sets for dynamic data mining. Int. J. Intell. Syst. 2012, 27, 317–342. [Google Scholar] [CrossRef]
Hu, Q.; Yu, D.; Liu, J.; Wu, C. Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 2008, 178, 3577–3594. [Google Scholar] [CrossRef]
Yong, L.; Wenliang, H.; Yunliang, J.; Zeng, Z. Quick attribute reduce algorithm for neighborhood rough set model. Inf. Sci. 2014, 271, 65–81. [Google Scholar] [CrossRef]
Chen, H.; Li, T.; Cai, Y.; Luo, C.; Fujita, H. Parallel attribute reduction in dominance-based neighborhood rough set. Inf. Sci. 2016, 373, 351–368. [Google Scholar] [CrossRef]
Zhou, P.; Hu, X.; Li, P.; Wu, X. Online streaming feature selection using adapted Neighborhood Rough Set. Inf. Sci. 2018, 481, 258–279. [Google Scholar] [CrossRef]
Sun, L.; Ji, S.; Ye, J. Multi-Label Dimensionality Reduction; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar] [CrossRef]
Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Wold, H. Estimation of principal components and related models by iterative least squares. In Multivariate Analysis; Academic Press: Cambridge, MA, USA, 1966; pp. 391–420. [Google Scholar]
Zhang, Y.; Zhou, Z.H. Multilabel dimensionality reduction via dependence maximization. ACM Trans. Knowl. Discov. Data 2010, 4, 1–21. [Google Scholar] [CrossRef]
Zhang, M.-L.; Peña, J.M.; Robles, V. Feature selection for multi-label naïve Bayes classification. Inf. Sci. 2009, 179, 3218–3229. [Google Scholar] [CrossRef]
Lin, Y.; Hu, Q.; Liu, J.; Chen, J.; Duan, J. Multi-label feature selection based on neighborhood mutual information. Appl. Soft Comput. 2016, 38, 244–256. [Google Scholar] [CrossRef]
Liu, J.; Lin, Y.; Li, Y.; Weng, W.; Wu, S. Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recognit. 2018, 84, 273–287. [Google Scholar] [CrossRef]
Deng, Z.; Zheng, Z.; Deng, D.; Wang, T.; He, Y.; Zhang, D. Feature Selection for Multi-Label Learning Based on F-Neighborhood Rough Sets. IEEE Access 2020, 8, 39678–39688. [Google Scholar] [CrossRef]
Al-Shami, T.M.; Ciucci, D. Subset neighborhood rough sets. Knowl. Based Syst. 2021, 237, 107868. [Google Scholar] [CrossRef]
Chen, Y.; Xue, Y.; Ma, Y.; Xu, F. Measures of uncertainty for neighborhood rough sets. Knowl. Based Syst. 2017, 120, 226–235. [Google Scholar] [CrossRef]
Wang, C.; Shi, Y.; Fan, X.; Shao, M. Attribute reduction based on k-nearest neighborhood rough sets. Int. J. Approx. Reason. 2018, 106, 18–31. [Google Scholar] [CrossRef]
Li, Y.; Lin, Y.; Liu, J.; Weng, W.; Shi, Z.; Wu, S. Feature selection for multi-label learning based on kernelized fuzzy rough sets. Neurocomputing 2018, 318, 271–286. [Google Scholar] [CrossRef]
Xu, J.; Shen, K.; Sun, L. Multi-label feature selection based on fuzzy neighborhood rough sets. Complex Intell. Syst. 2022, 8, 2105–2129. [Google Scholar] [CrossRef]
Wang, Q.; Qian, Y.; Liang, X.; Guo, Q.; Liang, J. Local neighborhood rough set. Knowl. Based Syst. 2018, 153, 53–64. [Google Scholar] [CrossRef]
Lin, G.; Qian, Y.; Li, J. NMGRS: Neighborhood-based multi-granulation rough sets. Int. J. Approx. Reason. 2012, 53, 1080–1093. [Google Scholar] [CrossRef]
Ziarko, W. Variable precision rough set model. J. Comput. Syst. Sci. 1993, 46, 39–59. [Google Scholar] [CrossRef]
He, X.; Cai, D.; Niyogi, P. Laplacian score for feature selection. Adv. Neural Inf. Process. Syst. 2005, 18, 1–8. Available online: https://proceedings.neurips.cc/paper/2005/file/b5b03f06271f8917685d14cea7c6c50a-Paper.pdf (accessed on 7 July 2022).
Spolaôr, N.; Cherman, E.A.; Monard, M.C.; Lee, H.D. Relief for multi-label feature selection. In Proceedings of the 2013 Brazilian Conference on Intelligent Systems, Fortaleza, Brazil, 20–24 October 2013; IEEE: Manhattan, NY, USA, 2013; pp. 6–11. [Google Scholar]
Liu, G.L. Axiomatic systems for rough sets and fuzzy rough sets. Int. J. Approx. Reason. 2008, 48, 857–867. [Google Scholar] [CrossRef]

Figure 1. Effectiveness of parameter α and δ on dataset Emotions.

Figure 2. Robustness while adding noise into the dataset Flags.

Table 1. An information system.

$x$	$n_{1}$	$a_{1}$	$a_{2}$	$X_{1}$	$X_{2}$
$x_{1}$	1.5	M	1	0	1
$x_{2}$	2	F	2	1	1
$x_{3}$	2	M	2	1	0
$x_{4}$	1.5	M	2	0	0
$x_{5}$	2	F	2	1	1

Table 2. Details of datasets.

No.	Datasets	Samples	Attributes	Labels
1	Bird Song	4998	38	13
2	CAL-500	502	68	174
3	Emotions	593	72	6
4	Flags	194	14	12
5	FGNET	1002	262	78
6	Water Quality Nom	1060	16	14

Table 3. Experimental results on Bird Song.

Algorithms	Hamming Loss (↓)	Ranking Loss (↓)	One-Error (↓)	Coverage (↓)	Average Precision (↑)
Ours	1	$-$	0.60024	1.3229	0.61599
MLNB	$-$	$-$	$-$	$-$	$-$
Laplacian Score	1	$-$	0.60024	1.3229	0.61599
ReliefF	1	$-$	0.60024	1.3229	0.61599
MDDMproj	0.94641	$-$	0.22601	0.41394	0.86440
MDDMspc	0.94641	$-$	0.22601	0.41394	0.86440

Table 4. Experimental results on CAL-500.

Algorithms	Hamming Loss (↓)	Ranking Loss (↓)	One-Error (↓)	Coverage (↓)	Average Precision (↑)
Ours	0.96755	0.18771	0.13001	132.03	0.48148
MLNB	0.96806	0.18763	0.12386	131.26	0.48137
Laplacian Score	0.96880	0.18701	0.11717	131.22	0.48484
RelieF	0.96870	0.18593	0.1208	131	0.48684
MDDMproj	0.96755	0.18771	0.13001	132.03	0.48148
MDDMspc	0.96806	0.18763	0.12386	131.26	0.48137

Table 5. Experimental results on Emotions.

Algorithms	Hamming Loss (↓)	Ranking Loss (↓)	One-Error (↓)	Coverage (↓)	Average Precision (↑)
Ours	0.95468	0.51255	2.7671	0.62310	0.95468
MLNB	$-$	$-$	$-$	$-$	$-$
Laplacian Score	0.95033	0.56044	2.9738	0.59137	0.95033
RelieF	0.89657	0.40958	2.4069	0.69350	0.89657
MDDMproj	0.89657	0.40958	2.4069	0.69350	0.89657
MDDMspc	0.88908	0.42924	2.3618	0.68983	0.88908

Table 6. Experimental results on Flags.

Algorithms	Hamming Loss (↓)	Ranking Loss (↓)	One-Error (↓)	Coverage (↓)	Average Precision (↑)
Ours	0.81977	0.24048	6.1901	0.75653	0.81977
MLNB	$-$	$-$	$-$	$-$	$-$
Laplacian Score	0.80916	0.23298	5.8892	0.76737	0.80916
RelieF	0.81433	0.23269	6.2364	0.75832	0.81433
MDDMproj	0.81416	0.21904	6.1963	0.76229	0.81416
MDDMspc	0.81416	0.21904	6.1963	0.76229	0.81416

Table 7. Experimental results on FGNET.

Algorithms	Hamming Loss (↓)	Ranking Loss (↓)	One-Error (↓)	Coverage (↓)	Average Precision (↑)
Ours	1	$-$	0.93427	21.477	0.15097
MLNB	$-$	$-$	$-$	$-$	$-$
Laplacian Score	1	$-$	0.93427	21.477	0.15097
RelieF	1	$-$	0.93427	21.477	0.15097
MDDMproj	1	$-$	0.95291	21.475	0.13413
MDDMspc	1	$-$	0.95291	21.475	0.13413

Table 8. Experimental results on Water Quality Nom.

Algorithms	Hamming Loss (↓)	Ranking Loss (↓)	One-Error (↓)	Coverage (↓)	Average Precision (↑)
Ours	0.88209	$-$	0.33826	9.3556	$-$
MLNB	$-$	$-$	$-$	$-$	$-$
Laplacian Score	0.8825	$-$	0.35662	9.2995	$-$
RelieF	0.85437	$-$	0.32326	9.2675	$-$
MDDMproj	0.85437	$-$	0.32326	9.2675	$-$
MDDMspc	0.88296	$-$	0.36373	9.5964	$-$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, W.; Li, J.; Liao, S.; Lin, Y. Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets. Symmetry 2022, 14, 1652. https://doi.org/10.3390/sym14081652

AMA Style

Zheng W, Li J, Liao S, Lin Y. Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets. Symmetry. 2022; 14(8):1652. https://doi.org/10.3390/sym14081652

Chicago/Turabian Style

Zheng, Wenbin, Jinjin Li, Shujiao Liao, and Yidong Lin. 2022. "Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets" Symmetry 14, no. 8: 1652. https://doi.org/10.3390/sym14081652

APA Style

Zheng, W., Li, J., Liao, S., & Lin, Y. (2022). Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets. Symmetry, 14(8), 1652. https://doi.org/10.3390/sym14081652

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets

Abstract

1. Introduction

2. Neighborhood Multi-Target Rough Sets

2.1. Definitions

2.2. Properties

2.3. Approximation Computation of NMTRS

3. Attribute Reduction Based on NMTRS

3.1. Attribute Significance Measure Based on NMTRS

3.2. Multi-Granulation Discrimination Analysis Based on NMTRS

3.3. Attribute Reduction Algorithm Based on NMTRS

4. Experimental Evaluations

4.1. Comparison of Performance Measures Using ML-kNN

4.1.1. Experimental Settings

4.1.2. Discussions of the Experimental Results

4.2. Effectiveness of Parameters α and δ

4.2.1. Experimental Settings

4.2.2. Discussions of the Experimental Results

4.3. Effectiveness of Noise

4.3.1. Experimental Settings

4.3.2. Discussions of the Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI