Next Article in Journal
New Efficient Computations with Symmetrical and Dynamic Analysis for Solving Higher-Order Fractional Partial Differential Equations
Previous Article in Journal
Symmetry in Chaotic Systems and Circuits II
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets

1
School of Computer Science, Minnan Normal University, Zhangzhou 363000, China
2
Key Laboratory of Data Science and Intelligence Application, Fujian Province University, Zhangzhou 363000, China
3
School of Mathematics and Statistics, Minnan Normal University, Zhangzhou 363000, China
*
Authors to whom correspondence should be addressed.
Symmetry 2022, 14(8), 1652; https://doi.org/10.3390/sym14081652
Submission received: 8 July 2022 / Revised: 27 July 2022 / Accepted: 4 August 2022 / Published: 10 August 2022
(This article belongs to the Section Computer)

Abstract

:
The rough set model has two symmetry approximations called upper approximation and lower approximation, which correspond to a concept’s intension and extension, respectively. Multi-label learning enforces the rough set model, which wants to be applied considering the correlations among labels, while the target concept should not be limited to only one. This paper proposes a multi-target model considering label correlation (Neighborhood Multi-Target Rough Sets, NMTRS) and proposes an attribute reduction approach based on NMTRS. First, some definitions of NMTRS are introduced. Second, some properties of NMTRS are discussed. Third, some discussion about the attribute significance measure is given. Fourth, the attribute reduction approaches based on NMTRS are proposed. Finally, the efficiency and validity of the designed algorithms are verified by experiments. The experiments show that our algorithm shows considerable performance when compared to state-of-the-art approaches.

1. Introduction

Since rough set theory was proposed by Pawlak [1] in 1982, it quickly became a hot topic in knowledge discovery and has been widely used in many applications such as classification [2,3,4,5,6], clustering [7,8,9,10,11], and attribute reduction [12]. It has two approximations corresponding to a target concept’s intension and extension, which shows symmetry. Various rough set models are based on different types of binary relations, such as multiple equivalence relations [13], general binary relations [14], and so on. Within these models, the neighborhood relation is outstanding for its ability to deal with both nominal and numerical attributes at the same time.
There are lots of works applying the neighborhood rough set model in various fields. Inbarani et al. [15] proposed a classification algorithm by using the neighborhood rough set model. For dynamic data mining, Zhang et al. [16] proposed a neighborhood rough set approach. Most relevant works applied the neighborhood rough set model to attribute reduction tasks.
Attribute reduction, or feature selection, is a traditional but essential machine learning task. Attribute reduction approaches try to select some features from the raw attribute set without harming the data’s information presentation ability. These approaches have made remarkable achievements in eliminating noise and promoting learning time efficiency. For attribute reduction tasks, there are different types of work. The first type is single-label attribute reduction. Hu et al. [17] proposed an approach for attribute reduction based on neighborhood rough sets. A quick attribute reduction algorithm was proposed by Yong et al. [18] based on the neighborhood rough set model. Additionally, there are parallel attribute reduction approaches [19], online streaming attribute reduction [20], and so on. These attribute reduction approaches were proposed based on the classic or extended neighborhood rough set model.
The second type of attribute reduction task is multi-label attribute reduction. These methods use strategies to handle the multiple labels in a multi-label learning paradigm. For example, Sun et al. [21] proposed a multi-label attribute reduction approach by transforming the multi-label learning problem into a single label one by ignoring the correlations within labels. Fisher R.A. et al. [22] regard the feature space and the label space as two different viewpoints of the data to improve the original dimension reduction method. By using the kernel matrix method, Wold H. et al. [23] proposed a similar method to [22]. By using the mapping dimensionality reduction and sub-control dimensionality reduction approaches at the same time, Zhang et al. [24] proposed a dimensionality reduction with a linear kernel matrix or non-linear kernel matrix. Based on PCA and genetic algorithms, Zhang et al. [25] proposed MLNB by using the Naive Bayes method to extract the features simultaneously. Liu et al. [26] proposed an attribute reduction method based on a neighborhood rough set model. Meanwhile, an online multi-label attribute reduction method was proposed by Liu et al. [27] using the neighborhood rough set model. The f-neighborhood rough set model was used to derive the feature selection method for multi-label learning [28].
The attribute reduction methods designed for single-label learning or multi-label learning based on neighborhood rough sets all use the classic neighborhood rough set model and its extended models. All of these neighborhood rough set models are designed for a classic information system [29,30,31,32,33,34,35], which means that the system only has one decision attribute. None of them are designed for a multi-decision information system that considers label correlation among labels simultaneously.
In this paper, we propose a neighborhood multi-target rough set model and then design an attribute reduction algorithm based on it. We design the model by defining a global correlated target set to be the target group of the rough set model. The coefficient of the global correlated target set controls the relevance of different target concepts. Then, we use a conservative strategy to combine the correlated target to define the rough set model. Using the rough set model which we proposed, the attribute significance measure can be then given. Furthermore, we can derive the corresponding attribute reduction algorithm.
The contributions of this paper are as follows:
  • A neighborhood rough set model considering the label correlation is proposed for multi-label learning.
  • The properties of the proposed models are investigated.
  • An algorithm for calculating the approximations in the proposed rough set model is designed.
  • Attribute significance measure is given based on the rough set model we proposed.
  • Experiments are conducted to validate the efficiency and effectiveness of the proposed algorithms.
The rest of this paper is organized as follows. Some basic concepts of NMTRS are introduced, and their properties are discussed in Section 2. In Section 3, the attribute significance measure is given, along with some discussions about it, and the corresponding attribute reduction algorithms are derived by the significance measure. All the algorithms are evaluated in Section 4. Finally, we conclude the whole paper in Section 5.

2. Neighborhood Multi-Target Rough Sets

In this section, some concepts associated with our proposed model are introduced and then the properties of the proposed model are discussed.

2.1. Definitions

In this subsection, the definitions of neighborhood multi-target rough sets are introduced.
Definition 1.
[1] Suppose U is a finite universe and A = { a 1 , a 2 , , a m } is an attribute set, then U , A is an information system.
Definition 2.
[35] (Set Correlation, SC) Suppose U is a finite universe and U , A is an information system, X = X 1 , X 2 , , X r is a finite target set which satisfies for all X i X , X i U . The set correlation is defined by:
S R X i , x j = | X i X j | / | X i | , i , j r   ( R e l a t i v e   C o r r e l a t i o n )   o r
S R X i , x j = | X i X j | / | X i X j | , i , j r   ( A b s o l u t e   C o r r e l a t i o n ) .
Definition 3.
(Global Correlated Target Set, GCTS) Suppose U is a finite universe and U , A is an information system, X = X 1 , X 2 , , X r is a finite target set which satisfies for all X i X , X i U . Then, X is a global correlated target set if and only if for all X i , X j X , S R ( X i , X j ) > α or S R ( X j , X i ) > α , α ( 0 , 1 ] is the Correlation Control Parameter (CCP) among targets; it controls the relevance degree among targets in the target group.
Based on the definition of GCTS and neighborhood rough sets, we can define neighborhood multi-target rough sets accordingly.
Definition 4.
[34] (Neighborhood Class)Suppose U is a finite universe and ( U , A ) is an information system, where A = A C A N , A C is the symbol attribute set and A N is the numerical attribute set. For all x U , δ 0 , the neighborhood class of x can be defined as
( 1 )   n A C ( x ) = y U | a A C , y a = x a ; ( 2 )   n A N ( x ) = y U | d ( x A N , y A N ) δ ; ( 3 )   n A ( x ) = n A C A N ( x ) = y U | a A C , y a = x a d ( x A N , y A N ) δ .
where y a denotes the attribute value of instance y on attribute a and where y A N denotes the attribute value of instance y on attribute set A N .
Definition 5.
(Neighborhood Multi-Target Rough Sets, NMTRS) Suppose U is a finite universe and U , A is an information system, X = X 1 , X 2 , , X r is a finite target set which satisfies for all X i X , X i U . X is a GCTS and the correlation coefficient is α .
Then, the lower approximation of NMTRS is defined as:
R _ δ α X = x U | n A x X 1 n A x X 2 n A x X r .
The upper approximation of NMTRS is defined as:
R ¯ δ α X = x U | n A x X 1 ϕ n A x X 2 ϕ n A x X r ϕ .
With the help of CCP, we can organize different targets together. The lower approximation is a conservative approximation of the target group and needs all the targets to meet the same condition. Relatively, the upper approximation obeys a liberal strategy and only needs one target of the target group to meet the condition.
Example 1.
A multi-label decision information system is asTable 1below: it has two labels which are assumed to be two different target concepts. We can easily obtain that when α = 0.4 the target group is a GCTS. For clarifying the definition of NMTRS, we have an example for NMTRS. Since
X 1 = x 2 , x 3 , x 5 , X 2 = x 1 , x 2 , x 5 , α = 0.4 , δ = 0.5 , S R X 1 , X 2 = 2 / 3 > α = 0.4 .
From Table 1 we have
n A x 1 = x 1 , n A x 2 = x 2 , x 5 , n A x 3 = x 3 , x 4 , n A x 4 = x 3 , x 4 , n A x 5 = x 2 , x 5 .
From Definition 5,
n A x 1 = x 1   X 1 n A x 1 X 2 , n A x 2 X 1 n A x 2 X 2 , n A x 3   X 1 n A x 3   X 2 , n A x 4   X 1 n A x 4   X 2 , n A x 5 X 1 n A x 5 X 2 , R _ δ α X = x 2 , x 5 .
n A x 1 X 2 ϕ , n A x 2 X 1 ϕ , n A x 3 X 1 ϕ , n A x 4 X 1 ϕ , n A x 5 X 1 ϕ , R ¯ δ α X = U .  

2.2. Properties

The properties of NMTRS are discussed in this subsection.
Proposition 1.
Suppose U is a finite universe and U , A is an information system, X = X 1 , X 2 , , X r is a finite target set which satisfies for all X i X , X i U . X is a GCTS and the correlation coefficient is α . For the approximations of NMTRS:
1   f o r   a l l   X i X , X i = U   i f   a n d   o n l y   i f   R _ δ α X = U ; 2   e x i s t s   X i X , X i = U   i f   a n d   o n l y   i f   R ¯ δ α X = U ; 3   e x i s t s   X i X , X i = ϕ   i f   a n d   o n l y   i f   R _ δ α X = ϕ ; 4   f o r   a l l   X i X , X i = ϕ   i f   a n d   o n l y   i f   R ¯ δ α X = ϕ ; 5   f o r   a l l   X i X , R _ δ α X X i ; 6   f o r   a l l   X i X , X i R ¯ δ α X .
Proof.
1   f o r   a l l   X i X , X i = U   i m p l i e s   f o r   a l l   x U ,   f o r   a l l   i 1 , 2 , , r , n A x X i   w h i c h   i m p l i e s   n A x X 1 n A x X 2 n A x X r   s o   R _ δ α X = U ; 2   e x i s t s   X i X , X i = U   i m p l i e s   f o r   a l l   x U ,   e x i s t s   i 1 , 2 , , r , n A x X i ϕ w h i c h   i m p l i e s   n A x X 1 ϕ n A x X 2 ϕ n A x X r ϕ i f   a n d   o n l y   i f   x R ¯ δ α X ,   s o   R ¯ δ α X = U ; 3   e x i s t s   X i X , X i = ϕ   i m p l i e s   f o r   a l l   x U ,   f o r   a l l   i 1 , 2 , , r , n A x X i w h i c h   i m p l i e s   x R _ δ α X   s o   R _ δ α X = ϕ ; 4   f o r   a l l   X i X , X i = ϕ   i m p l i e s   f o r   a l l   x U ,   f o r   a l l   i 1 , 2 , , r , n A x X i = ϕ w h i c h   i m p l i e s   x R ¯ δ α X   s o   R ¯ δ α X = ϕ ; 5   f o r   a l l   x R _ δ α X   i m p l i e s   f o r   a l l   i 1 , 2 , , r , n A x X i   w h i c h   i m p l i e s n A x X 1 n A x X 2 n A x X r   s o   R _ δ α X X i ; 6   f o r   a l l   x X i   i m p l i e s   f o r   a l l   i 1 , 2 , , r , n A x X i ϕ   w h i c h   i m p l i e s x R ¯ δ α X   s o   X i R ¯ δ α X .
Proposition 2.
Suppose U is a finite universe and U , A is an information system, Z = Z 1 , Z 2 , , Z r is a finite target set which satisfies for all Z i Z , Z i U . Z is a GCTS and the correlation coefficient is α . X , Y Z . For the approximations of NMTRS:
1   R _ δ α X Y R _ δ α X R _ δ α Y ; 2   R ¯ δ α X Y R ¯ δ α X R ¯ δ α Y ; 3   R _ δ α X Y R _ δ α X R _ δ α Y ; 4   R ¯ δ α X Y = R ¯ δ α X R ¯ δ α Y .
Proof. 
( 1 )   f o r   a l l   x R _ δ α X R _ δ α Y   i m p l i e s   f o r   a l l   X i X i | X | , n A x X i a n d   f o r   a l l   Y j Y j | Y | , n A x Y j   w h i c h   i m p l i e s   f o r   a l l   Z k X Y k 1 , 2 , , r , n A x Z k   i f   a n d   o n l y   i f   x R _ δ α X Y R _ δ α X Y R _ δ α X R _ δ α Y ; 2   f o r   a l l   x R ¯ δ α X Y , x R ¯ δ α X Y   i f   a n d   o n l y   i f   e x i s t s   Z k X Y k 1 , 2 , , r , n A x Z k ϕ ,   w h i c h   i m p l i e s   e x i s t s   X i X i | X | , n A x X i ϕ a n d   e x i s t s   Y j Y j | Y | , n A x Y j ϕ ,   w h i c h   i m p l i e s   x R ¯ δ α X R ¯ δ α Y R ¯ δ α X Y R ¯ δ α X R ¯ δ α Y ; 3   f o r   a l l   x R _ δ α X Y , x R _ δ α X Y   i f   a n d   o n l y   i f   e x i s t s   Z k X Y k 1 , 2 , , r , n A x Z k ,   w h i c h   i m p l i e s   f o r   a l l   X i X i | X | , n A x X i o r   f o r   a l l   Y j Y j | Y | , n A x Y j   w h i c h   i m p l i e s   x R _ δ α X   o r   x R _ δ α Y w h i c h   i m p l i e s   x R _ δ α X R _ δ α Y   s o   R _ δ α X Y R _ δ α X R _ δ α Y ; 4   f o r   a l l   x R ¯ δ α X Y , x R ¯ δ α X Y   i f   a n d   o n l y   i f   e x i s t s   Z k X Y k 1 , 2 , , r , n A x Z k ϕ ,   w h i c h   i m p l i e s   e x i s t s   X i X i | X | , n A x X i ϕ o r   e x i s t s   Y j Y j | Y | , n A x Y j ϕ ,   w h i c h   i m p l i e s   x R ¯ δ α X R ¯ δ α Y R ¯ δ α X Y = R ¯ δ α X R ¯ δ α Y .
Proposition 3.
Suppose U is a finite universe and U , A is an information system, Z = Z 1 , Z 2 , , Z r is a finite target set which satisfies for all Z i Z , Z i U . Z is a GCTS and the correlation coefficient is α . X Y Z . For the approximations of NMTRS:
1   R _ δ α X R _ δ α Y ; 2   R ¯ δ α X R ¯ δ α Y .
Proof. 
( 1 )   f o r   a l l   x R _ δ α Y   i f   a n d   o n l y   i f   f o r   a l l   Z k Y , k 1 , 2 , , r , n A x Z k f o r   a l l   Z k X , k 1 , 2 , , r , n A x Z k ,   i m p l i e s   x R _ δ α X ,   s o   R _ δ α X R _ δ α Y ; 2   f o r   a l l   x R ¯ δ α X   i f   a n d   o n l y   i f   f o r   a l l   Z k X , k 1 , 2 , , r , n A x Z k ϕ , w h i c h   i m p l i e s   e x i s t s   Z k Y , k 1 , 2 , , r , n A x Z k ϕ ,   w h i c h   i m p l i e s   x R ¯ δ α Y , R ¯ δ α X R ¯ δ α Y .

2.3. Approximation Computation of NMTRS

In this section, we propose an approach for computing the approximations of NMTRS. We induce some corresponding results for approximation computation, then design an algorithm for calculating the approximations of NMTRS.
Definition 6.
[36] Suppose U is a finite universe, P U , the matrix representation of set P is defined as:
P = p j n × 1 , j 1 , 2 , , n , n = | U | ,   w h e r e   p j = 0 x j P 1 x j P .
Example 2.
Continues Example 1. By Definition 6 we have
n A x 1 = 1 , 0 , 0 , 0 , 0 , n A x 2 = 0 , 1 , 0 , 0 , 1 , n A x 3 = 0 , 0 , 1 , 1 , 0 , n A x = 0 , 0 , 1 , 1 , 0 , n A x 5 = 0 , 1 , 0 , 0 , 1 .
where M means the transpose of matrix M .
Lemma 1.
Suppose X , Y U , we have
1   X Y ϕ   i f   a n d   o n l y   i f   X Y > 0 ; 2   X Y   i f   a n d   o n l y   i f   X ~ Y = 0 .
where ~ Y is a complement of Y , where X means the transpose of a matrix X , and is the matrix quantity product.
There is a useful property gained by computing the approximations of NMTRS using Lemma 1.
Theorem 1.
Suppose U is a finite universe and U , A is an information system, X is a GCTS, and we have
1   x R _ δ α X   i f   a n d   o n l y   i f   f o r   a l l   i 1 , 2 , , r , n A x ~ X i = 0 ; 2   x R ¯ δ α X   i f   a n d   o n l y   i f   e x i s t s   i 1 , 2 , , r , n A x X i > 0 .
Proof. 
From Lemma 1,
1   f o r   a l l   i 1 , 2 , , r , n A x ~ X i = 0   w h i c h   i m p l i e s   f o r   a l l   i 1 , 2 , , r , n A x X i ,   i f   a n d   o n l y   i f   x R _ δ α X ; 2   e x i s t s   i 1 , 2 , , r , n A x X i > 0 ,   w h i c h   i m p l i e s   e x i t s   i 1 , 2 , , r , n A x X i ϕ ,   i f   a n d   o n l y   i f   x R ¯ δ α X .
The matrix representation of Theorem 1 is as the following definition.
Definition 7.
Suppose U is a finite universe and U , A is an information system, X is a GCTS, and the lower approximation matrix of NMTRS can be defined as:
H _ X = h _ i j r × n ,   w h e r e   h _ i j = 1 n A x j ~ X i = 0 0 o t h e r w i s e ,   i 1 , 2 , , r ,   j 1 , 2 , , n .
The upper approximation matrix of NMTRS can be defined as:
H ¯ X = h ¯ i j r × n ,   w h e r e   h ¯ i j = 1 n A x j X i > 0 0 o t h e r w i s e ,   i 1 , 2 , , r ,   j 1 , 2 , , n .
Example 3.
Continuation of Example 2. By Definition 6 we have
X 1 = 0 , 1 , 1 , 0 , 1 , ~ X 1 = 1 , 0 , 0 , 1 , 0 , X 2 = 1 , 1 , 0 , 0 , 1 , ~ X 2 = 0 , 0 , 1 , 1 , 0 .
By Definition 7 we have
n A x 1 ~ X 1 = 1 , 0 , 0 , 0 , 0 1 , 0 , 0 , 1 , 0 0 , n A x 1 ~ X 2 = 1 , 0 , 0 , 0 , 0 0 , 0 , 1 , 1 , 0 = 0 ; n A x 2 ~ X 1 = 0 , n A x 2 ~ X 2 = 0 ; n A x 3 ~ X 1 0 , n A x 3 ~ X 2 0 ; n A x 4 ~ X 1 0 , n A x 4 ~ X 2 0 ; n A x 5 ~ X 1 = 0 , n A x 5 ~ X 2 = 0 ;
then H _ X = 0 1 0 0 1 1 1 0 0 1 .
Example 4.
Continuation of Example 3. By Definition 7 we have
n A x 1 X 1 = 1 , 0 , 0 , 0 , 0 0 , 1 , 1 , 0 , 1 = 0 , n A x 1 X 2 = 1 , 0 , 0 , 0 , 0 1 , 1 , 0 , 0 , 1 > 0 ; n A x 2 X 1 > 0 , n A x 2 X 2 > 0 ; n A x 3 X 1 > 0 , n A x 3 X 2 = 0 ; n A x 4 X 1 = 0 , n A x 4 X 2 = 0 ; n A x 5 X 1 > 0 , n A x 5 X 2 > 0 ;
then H ¯ X = 0 1 1 0 1 1 1 0 0 1 .
Then, we can easily obtain a theorem for computing approximations in NMTRS.
Theorem 2.
Suppose U is a finite universe and U , A is an information system, X is a GCTS, i 1 , 2 , , r , j 1 , 2 , , n , we have
1   x j R _ δ α X   i f   a n d   o n l y   i f   i = 1 r h _ i j = 1 ; 2   x j R ¯ δ α X   i f   a n d   o n l y   i f   i = 1 r h ¯ i j = 1 .
Proof. 
From Lemma 1,
1   x j R _ δ α X   i f   a n d   o n l y   i f   f o r   a l l   i 1 , 2 , , r , n A x j ~ X i = 0 , i f   a n d   o n l y   i f   f o r   a l l   i 1 , 2 , , r , h _ i j = 1 , s o i = 1 r h _ i j = 1 ; 2   x j R ¯ δ α X   i f   a n d   o n l y   i f   e x i s t s   i 1 , 2 , , r , n A x j X i > 0 , i f   a n d   o n l y   i f   e x i s t s   i 1 , 2 , , r , h ¯ i j = 1 , s o i = 1 r h ¯ i j = 1 .
Example 5.
Continuation of Example 4. By Theorem 2 we have
R _ δ α X = 0 , 1 , 0 , 0 , 1 1 , 1 , 0 , 0 , 1 = 0 , 1 , 0 , 0 , 1 , R ¯ δ α X = 0 , 1 , 1 , 0 , 1 1 , 1 , 0 , 0 , 1 = 1 , 1 , 1 , 0 , 1 , R _ δ α X = x 2 , x 5 , R ¯ δ α X = x 1 , x 2 , x 3 , x 5 .
Based on Theorem 2, we propose matrix-based Algorithm 1 for computing the approximations of a particular target concept group. The total time complexity of Algorithm 1 is Θ r n 2 . Steps 2–15 are to calculate H ¯ and H _ with time complexity Θ r n 2 .
Algorithm 1: Computing approximations of NMTRS
Input: U , A , X = X 1 , X 2 , , X r , n A x j ,   f o r   a l l   x U
Output: R ¯ δ α X , R _ δ α X
1: n | U | , r | X |
2:for i = 1 r
3:for j = 1 n
4:  if n A x j ~ X i = 0 then
5:    h _ i j = 1
6:  else
7:    h _ i j = 0
8:  end if
9:  if n A x j X i > 0 then
10:    h ¯ i j = 1
11:  else
12:    h ¯ i j = 0
13:  end if
14:end for
15:end for
16: R ¯ δ α X = i = 1 r h _ i = i = 1 r h _ i 1 , i = 1 r h _ i 2 , , i = 1 r h _ i n
17: R _ δ α X = i = 1 r h ¯ i = i = 1 r h ¯ i 1 , i = 1 r h ¯ i 2 , , i = 1 r h ¯ i n
18:Return R ¯ δ α X , R _ δ α X

3. Attribute Reduction Based on NMTRS

In this section, we propose an attribute reduction method based on NMTRS. We derive the attribute significance measure, then design an algorithm for calculating the reduction.

3.1. Attribute Significance Measure Based on NMTRS

Suppose U = x 1 , x 2 , , x n is the instance space and L = l 1 , l 2 , , l r is the label space. T = x i , y i | i = 1 , 2 , , n means that label vector y i is associated with instance x i . ( U , A , L ) is the tuple denoting the multi-label information system.
Definition 8.
Suppose U is a finite universe and ( U , A , L ) is an information system, X = X 1 , X 2 , , X m is a GCTS and the correlation coefficient is α . If we regard l i ( i = 1 , 2 , , r ) as the matrix representation of a certain decision set, and ϕ ( L ) = X 1 , X 2 , is the GCTS group on L , for any B A then the positive region of NMTRS is defined as:
P o s B ( ϕ ( L ) ) = R B ¯ δ α ( X ) | X ϕ ( L ) ,
then, the dependence of NMTRS on attribute set B can be defined as:
γ B ϕ L = | P o s ϕ L | | U | ,
then, for all c A B , the conditional attribute significance of c on attribute set B can be calculated as:
s i g γ c , B , L = γ B c ϕ L γ B ϕ L .
It can be easily obtained that s i g γ c , B , L 0 , and if s i g γ c , B , L = 0 , we can say that c is an unnecessary attribute.

3.2. Multi-Granulation Discrimination Analysis Based on NMTRS

Based on the NMTRS model, some discussions about multi-granulation discrimination are conducted in this section. Several propositions can be easily obtained.
Proposition 4.
Suppose U is a finite universe and ( U , A , L ) is an information system, X = X 1 , X 2 , , X m is a GCTS and the correlation coefficient is α . d is the Euclidean distance, B 1 B 2 A , and the following properties can be easily obtained:
1   f o r   a l l   x U , n B 1 x n B 2 x ; 2   f o r   a l l   X ϕ L , R B 1 ¯ δ α X R B 2 ¯ δ α X ; 3   P o s B 1 ϕ L P o s B 2 ϕ L ; 4   γ B 1 ϕ L γ B 2 ϕ L .
Proposition 5.
Suppose U is a finite universe and ( U , A , L ) is an information system, X = X 1 , X 2 , , X m is a GCTS and the correlation coefficient is α . d is the Euclidean distance, and for any B 1 B 2 A , if x P o s B 1 ϕ L , then x P o s B 2 ϕ L .
Proposition 6.
Suppose U is a finite universe and ( U , A , L ) is an information system, X = X 1 , X 2 , , X m is a GCTS and the correlation coefficient is α . d is the Euclidean distance, and for any 0 δ 2 δ 1 , B A , the following properties can be easily obtained:
1   f o r   a l l   x U , n B δ 2 x n B δ 1 x ; 2   f o r   a l l   X ϕ L , R B ¯ δ 1 α X R B ¯ δ 2 α X   a n d   R B ¯ δ 2 α X R B ¯ δ 1 α X ; 3   P o s δ 1 ϕ L P o s δ 2 ϕ L ; 4   γ δ 1 ϕ L γ δ 2 ϕ L .
Proposition 7.
Suppose U is a finite universe and ( U , A , L ) is an information system, L L . d is the Euclidean distance, and for any 0 δ 2 δ 1 , the following properties can be easily obtained:
1   P o s δ 1 ϕ L P o s δ 2 ϕ L ; 2   γ δ 1 ϕ L γ δ 2 ϕ L .

3.3. Attribute Reduction Algorithm Based on NMTRS

Based on the NMTRS model and its attribute significance measure, we derive the corresponding attribute reduction algorithm, which is named Algorithm 2.
Algorithm 2: Attribute reduction algorithm based on NMTRS
Input: U , A , L , α , δ , ϕ L
Output:Reduction of the attribute set A
1: R e d u c t i o n
2:while A R e d u c t i o n
3:  for all   b A R e d u c t i o n
4:    Calculate s i g γ b , R e d u c t i o n , L with Algorithm 1
5:  end for
6:if max b A R e d u c t i o n s i g γ b , R e d u c t i o n , L 0 then
7:  break
8:end if
9:   R e d u c t i o n = R e d u c t i o n argmax b A R e d u c t i o n s i g γ b , R e d u c t i o n , L
10:Return R e d u c t i o n
The time complexity of Algorithm 2 is Θ r n 2 . The time complexity of calculating the neighborhood class of all instances is Θ n × log n , and the time complexity of calculating the attribute significance of attributes is Θ r n 2 .

4. Experimental Evaluations

In this section, several experiments are conducted to evaluate the effectiveness and the efficiency of the algorithms we proposed, namely Algorithm 2 (ours), Algorithm 3 (MLNB), Algorithm 4 (Laplacian Score) [37], Algorithm 5 (RelieF) [38], Algorithm 6 MDDM proj) and Algorithm 7 (MDDMspc) [24]. Since we can only determine the attribute order from MLNB and Laplacian Score, we make the number of the chosen attributes the same as our algorithm. Six datasets were chosen from public repositories. The details of these datasets are listed in Table 2. All the experiments were carried out on a personal computer with 64-bit Windows 10, Inter(R) Core (TM) i7 1065G7 [email protected] and 16GB memory, developed by Wenbin Zheng, in Zhangzhou, China. The programming language was MATLAB r2020a.

4.1. Comparison of Performance Measures Using ML-kNN

In this subsection, we use ML-kNN as the learning method and select features using the four attribute reduction approaches, conducting ten-fold cross-validation on the temporary dataset ten times with the selected feature set.

4.1.1. Experimental Settings

For comparing the performance of our approach with the comparison algorithms, we apply all of the attribute reduction algorithms on the six datasets using their recommended parameter configurations in [39]. For our method, we set α = 0.6 and δ = 0.5 . The experimental results are shown in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8.

4.1.2. Discussions of the Experimental Results

From Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8, we can easily observe that our algorithm has considerable performance on each dataset, while none of the algorithms have obvious superiority on all datasets. Sometimes our algorithm can reach the best performance; it is in the middle class generally. The experiments show that our algorithm is valid so it does provide a new approach for multi-label feature selection.
For the dataset Bird Song, MDDM algorithms perform better than other algorithms; the performance of our algorithm is equal to MLNB, RelieF, and Laplacian Score.
For the dataset CAL-500, our algorithm performs equally to MDDMproj, better than all other algorithms with the performance measure Hamming loss.
For the dataset Emotions, our algorithm is the worst one with Hamming loss, ranking loss, and Average Precision, but performs better than RelieF, MDDMproj, and MDDMspec with the performance measure Coverage.
For the dataset FGNET, MDDM algorithms perform better, while our algorithm’s performance is equal to other algorithms.
For the dataset Water Quality Nom, our algorithm performs better than Laplacian Score and MDDMspc with the performance measure Hamming loss, performs better than Laplacian Score and MDDMspc with One-Error, and performs better than MDDMspc with Coverage.

4.2. Effectiveness of Parameters α and δ

In this subsection, we conduct several experiments to analyze the effectiveness of α and δ .

4.2.1. Experimental Settings

For analyzing the effectiveness of parameters, we conduct several experiments. We gradually increase the value of parameter α from 0 to 1 with a step of 0.05 and δ from 0 to 10 20 with a step of 10 1 . The experimental result is shown in Figure 1.

4.2.2. Discussions of the Experimental Results

In Figure 1 we can easily observe that with the increase in α , the performance of our proposed algorithm decreases when α is close to 1. Parameter δ does not have a significant effect on performance. It provides the recommended parameter setting for our experiments.

4.3. Effectiveness of Noise

In this subsection, we conduct several experiments to analyze the effectiveness of noise.

4.3.1. Experimental Settings

To analyze the effectiveness of noise, we conduct several experiments. We gradually increase the percentage of noise in the dataset Flags from 0 to 1 with a step of 0.05. The experimental result is shown in Figure 2.

4.3.2. Discussions of the Experimental Results

In Figure 1 we can easily observe that all the algorithms display similar performance with the increasing noise percentage. Compared with another algorithm, our proposed algorithm is not worse than them. In other words, for robustness, the performance of our algorithm is equivalent to comparison algorithms.

5. Conclusions

Attribute reduction in multi-label learning has been a hot topic in recent years. We propose a novel rough set model for multi-label learning and investigate the properties of the proposed model. With the proposed model, a novel feature selection method for multi-label learning is proposed, a novel attribute reduction approach is provided, and its validation is proved by experiments.
In the big data era, calculating the reduction of an attribute set presents great challenges for its high time complexity and dynamic attribute reduction. Algorithms can solve the problem by using previously calculated results. In future work, we will carry on promoting the proposed approach into a dynamic multi-label attribute reduction approach.

Author Contributions

Conceptualization, W.Z. and S.L. and Y.L.; methodology, W.Z. and J.L.; software, W.Z.; validation, W.Z., S.L. and Y.L.; formal analysis, W.Z. and J.L.; investigation, W.Z. and S.L.; writing—review and editing, W.Z.; supervision, J.L. and S.L.; project administration, W.Z. and S.L.; funding acquisition, S.L. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported in part by the National Natural Science Foundation of China under Grant Nos. 11871259, 61379021 and 12101289, the Natural Science Foundation of Fujian Province under Nos. 2022J01912 and 2022J01306, the Institute of Meteorological Big Data-Digital Fujian, Fujian Key Laboratory of Data Science and Statistics, and Fujian Key Laboratory of Granular Computing and Applications.

Data Availability Statement

The data that support the findings of this study are openly available in UC Irvine Machine Learning Repository at https://archive.ics.uci.edu/ml/index.php, http://mulan.sourceforge.net/datasets-mlc.html and https://www.uco.es/kdis/mllresources/, access on 16 April 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pawlak, Z. Rough set. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
  2. Ziarko, W. Probabilistic approach to rough sets. Int. J. Approx. Reason. 2008, 49, 272–284. [Google Scholar] [CrossRef]
  3. Bazan, J.G.; Nguyen, H.S.; Nguyen, S.H.; Synak, P.; Wróblewski, J. Rough set algorithms in classification problem. In Rough Set Methods and Applications Physical; Springer: Heidelberg, Germany, 2000. [Google Scholar]
  4. Lingras, P. Unsupervised Rough Set Classification Using Gas. J. Intell. Inf. Syst. 2001, 16, 215–228. [Google Scholar] [CrossRef]
  5. Miao, D.; Duan, Q.; Zhang, H.; Jiao, N. Rough set based hybrid algorithm for text classification. Expert Syst. Appl. 2009, 36, 9168–9174. [Google Scholar] [CrossRef]
  6. Sharma, H.K.; Majumder, S.; Biswas, A.; Prentkovskis, O.; Kar, S.; Skačkauskas, P. A Study on Decision-Making of the Indian Railways Reservation System during COVID-19. J. Adv. Transp. 2022, 2022, 7685375. Available online: https://www.hindawi.com/journals/jat/2022/7685375/ (accessed on 7 July 2022). [CrossRef]
  7. Lingras, P. Rough set clustering for web mining. In Proceedings of the 2002 IEEE World Congress on Computational Intelligence, 2002 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE’02. Proceedings (Cat. No. 02CH37291), Honolulu, HI, USA, 12–17 May 2002. [Google Scholar]
  8. Lingras, P.; Peters, G. Applying rough set concepts to clustering. In Rough Sets: Selected Methods and Applications in Management and Engineering; Springer: London, UK, 2012; pp. 23–37. [Google Scholar]
  9. Parmar, D.; Wu, T.; Blackhurst, J. MMR: An algorithm for clustering categorical data using Rough Set Theory. Data Knowl. Eng. 2007, 63, 879–893. [Google Scholar] [CrossRef]
  10. Vidhya, K.A.; Geetha, T.V. Rough set theory for document clustering: A review. J. Intell. Fuzzy Syst. 2017, 32, 2165–2185. [Google Scholar] [CrossRef]
  11. Hedar, A.R.; Ibrahim, A.M.M.; Abdel-Hakim, A.E.; Sewisy, A.A. Modulated clustering using integrated rough sets and scatter search attribute reduction. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Kyoto, Japan, 15–19 July 2018; pp. 1394–1401. [Google Scholar]
  12. Xia, S.; Zhang, H.; Li, W.; Wang, G.; Giem, E.; Chen, Z. GBNRS: A Novel Rough Set Algorithm for Fast Adaptive Attribute Reduction in Classification. IEEE Trans. Knowl. Data Eng. 2022, 34, 1231–1242. [Google Scholar] [CrossRef]
  13. Qian, Y.; Liang, J.; Yao, Y.; Dang, C. MGRS: A multi-granulation rough set. Inf. Sci. 2010, 180, 949–970. [Google Scholar] [CrossRef]
  14. Yao, Y.; Yao, B. Covering based rough set approximations. Inf. Sci. 2012, 200, 91–107. [Google Scholar] [CrossRef]
  15. Kumar, S.U.; Inbarani, H.H. A Novel Neighborhood Rough Set Based Classification Approach for Medical Diagnosis. Procedia Comput. Sci. 2015, 47, 351–359. [Google Scholar] [CrossRef]
  16. Zhang, J.; Li, T.; Ruan, D.; Liu, D. Neighborhood rough sets for dynamic data mining. Int. J. Intell. Syst. 2012, 27, 317–342. [Google Scholar] [CrossRef]
  17. Hu, Q.; Yu, D.; Liu, J.; Wu, C. Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 2008, 178, 3577–3594. [Google Scholar] [CrossRef]
  18. Yong, L.; Wenliang, H.; Yunliang, J.; Zeng, Z. Quick attribute reduce algorithm for neighborhood rough set model. Inf. Sci. 2014, 271, 65–81. [Google Scholar] [CrossRef]
  19. Chen, H.; Li, T.; Cai, Y.; Luo, C.; Fujita, H. Parallel attribute reduction in dominance-based neighborhood rough set. Inf. Sci. 2016, 373, 351–368. [Google Scholar] [CrossRef]
  20. Zhou, P.; Hu, X.; Li, P.; Wu, X. Online streaming feature selection using adapted Neighborhood Rough Set. Inf. Sci. 2018, 481, 258–279. [Google Scholar] [CrossRef]
  21. Sun, L.; Ji, S.; Ye, J. Multi-Label Dimensionality Reduction; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar] [CrossRef]
  22. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
  23. Wold, H. Estimation of principal components and related models by iterative least squares. In Multivariate Analysis; Academic Press: Cambridge, MA, USA, 1966; pp. 391–420. [Google Scholar]
  24. Zhang, Y.; Zhou, Z.H. Multilabel dimensionality reduction via dependence maximization. ACM Trans. Knowl. Discov. Data 2010, 4, 1–21. [Google Scholar] [CrossRef]
  25. Zhang, M.-L.; Peña, J.M.; Robles, V. Feature selection for multi-label naïve Bayes classification. Inf. Sci. 2009, 179, 3218–3229. [Google Scholar] [CrossRef]
  26. Lin, Y.; Hu, Q.; Liu, J.; Chen, J.; Duan, J. Multi-label feature selection based on neighborhood mutual information. Appl. Soft Comput. 2016, 38, 244–256. [Google Scholar] [CrossRef]
  27. Liu, J.; Lin, Y.; Li, Y.; Weng, W.; Wu, S. Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recognit. 2018, 84, 273–287. [Google Scholar] [CrossRef]
  28. Deng, Z.; Zheng, Z.; Deng, D.; Wang, T.; He, Y.; Zhang, D. Feature Selection for Multi-Label Learning Based on F-Neighborhood Rough Sets. IEEE Access 2020, 8, 39678–39688. [Google Scholar] [CrossRef]
  29. Al-Shami, T.M.; Ciucci, D. Subset neighborhood rough sets. Knowl. Based Syst. 2021, 237, 107868. [Google Scholar] [CrossRef]
  30. Chen, Y.; Xue, Y.; Ma, Y.; Xu, F. Measures of uncertainty for neighborhood rough sets. Knowl. Based Syst. 2017, 120, 226–235. [Google Scholar] [CrossRef]
  31. Wang, C.; Shi, Y.; Fan, X.; Shao, M. Attribute reduction based on k-nearest neighborhood rough sets. Int. J. Approx. Reason. 2018, 106, 18–31. [Google Scholar] [CrossRef]
  32. Li, Y.; Lin, Y.; Liu, J.; Weng, W.; Shi, Z.; Wu, S. Feature selection for multi-label learning based on kernelized fuzzy rough sets. Neurocomputing 2018, 318, 271–286. [Google Scholar] [CrossRef]
  33. Xu, J.; Shen, K.; Sun, L. Multi-label feature selection based on fuzzy neighborhood rough sets. Complex Intell. Syst. 2022, 8, 2105–2129. [Google Scholar] [CrossRef]
  34. Wang, Q.; Qian, Y.; Liang, X.; Guo, Q.; Liang, J. Local neighborhood rough set. Knowl. Based Syst. 2018, 153, 53–64. [Google Scholar] [CrossRef]
  35. Lin, G.; Qian, Y.; Li, J. NMGRS: Neighborhood-based multi-granulation rough sets. Int. J. Approx. Reason. 2012, 53, 1080–1093. [Google Scholar] [CrossRef]
  36. Ziarko, W. Variable precision rough set model. J. Comput. Syst. Sci. 1993, 46, 39–59. [Google Scholar] [CrossRef]
  37. He, X.; Cai, D.; Niyogi, P. Laplacian score for feature selection. Adv. Neural Inf. Process. Syst. 2005, 18, 1–8. Available online: https://proceedings.neurips.cc/paper/2005/file/b5b03f06271f8917685d14cea7c6c50a-Paper.pdf (accessed on 7 July 2022).
  38. Spolaôr, N.; Cherman, E.A.; Monard, M.C.; Lee, H.D. Relief for multi-label feature selection. In Proceedings of the 2013 Brazilian Conference on Intelligent Systems, Fortaleza, Brazil, 20–24 October 2013; IEEE: Manhattan, NY, USA, 2013; pp. 6–11. [Google Scholar]
  39. Liu, G.L. Axiomatic systems for rough sets and fuzzy rough sets. Int. J. Approx. Reason. 2008, 48, 857–867. [Google Scholar] [CrossRef]
Figure 1. Effectiveness of parameter α and δ on dataset Emotions.
Figure 1. Effectiveness of parameter α and δ on dataset Emotions.
Symmetry 14 01652 g001
Figure 2. Robustness while adding noise into the dataset Flags.
Figure 2. Robustness while adding noise into the dataset Flags.
Symmetry 14 01652 g002
Table 1. An information system.
Table 1. An information system.
x n 1 a 1 a 2 X 1 X 2
x 1 1.5M101
x 2 2F211
x 3 2M210
x 4 1.5M200
x 5 2F211
Table 2. Details of datasets.
Table 2. Details of datasets.
No.DatasetsSamplesAttributesLabels
1Bird Song49983813
2CAL-50050268174
3Emotions593726
4Flags1941412
5FGNET100226278
6Water Quality Nom10601614
Table 3. Experimental results on Bird Song.
Table 3. Experimental results on Bird Song.
AlgorithmsHamming Loss (↓)Ranking Loss
(↓)
One-Error (↓)Coverage (↓)Average
Precision (↑)
Ours1 0.600241.32290.61599
MLNB
Laplacian Score1 0.600241.32290.61599
ReliefF1 0.600241.32290.61599
MDDMproj0.94641 0.226010.413940.86440
MDDMspc0.94641 0.226010.413940.86440
Table 4. Experimental results on CAL-500.
Table 4. Experimental results on CAL-500.
AlgorithmsHamming Loss (↓)Ranking Loss (↓)One-Error (↓)Coverage (↓)Average
Precision (↑)
Ours0.967550.187710.13001132.030.48148
MLNB0.968060.187630.12386131.260.48137
Laplacian Score0.968800.187010.11717131.220.48484
RelieF0.968700.185930.12081310.48684
MDDMproj0.967550.187710.13001132.030.48148
MDDMspc0.968060.187630.12386131.260.48137
Table 5. Experimental results on Emotions.
Table 5. Experimental results on Emotions.
AlgorithmsHamming Loss (↓)Ranking Loss (↓)One-Error (↓)Coverage (↓)Average
Precision (↑)
Ours0.954680.512552.76710.623100.95468
MLNB
Laplacian Score0.950330.560442.97380.591370.95033
RelieF0.896570.409582.40690.693500.89657
MDDMproj0.896570.409582.40690.693500.89657
MDDMspc0.889080.429242.36180.689830.88908
Table 6. Experimental results on Flags.
Table 6. Experimental results on Flags.
AlgorithmsHamming Loss (↓)Ranking Loss (↓)One-Error (↓)Coverage (↓)Average
Precision (↑)
Ours0.819770.240486.19010.756530.81977
MLNB
Laplacian Score0.809160.232985.88920.767370.80916
RelieF0.814330.232696.23640.758320.81433
MDDMproj0.814160.219046.19630.762290.81416
MDDMspc0.814160.219046.19630.762290.81416
Table 7. Experimental results on FGNET.
Table 7. Experimental results on FGNET.
AlgorithmsHamming Loss (↓)Ranking Loss (↓)One-Error (↓)Coverage (↓)Average
Precision (↑)
Ours1 0.9342721.4770.15097
MLNB
Laplacian Score1 0.9342721.4770.15097
RelieF1 0.9342721.4770.15097
MDDMproj1 0.9529121.4750.13413
MDDMspc1 0.9529121.4750.13413
Table 8. Experimental results on Water Quality Nom.
Table 8. Experimental results on Water Quality Nom.
AlgorithmsHamming Loss (↓)Ranking Loss (↓)One-Error (↓)Coverage (↓)Average
Precision (↑)
Ours0.88209 0.338269.3556
MLNB
Laplacian Score0.8825 0.356629.2995
RelieF0.85437 0.323269.2675
MDDMproj0.85437 0.323269.2675
MDDMspc0.88296 0.363739.5964
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zheng, W.; Li, J.; Liao, S.; Lin, Y. Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets. Symmetry 2022, 14, 1652. https://doi.org/10.3390/sym14081652

AMA Style

Zheng W, Li J, Liao S, Lin Y. Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets. Symmetry. 2022; 14(8):1652. https://doi.org/10.3390/sym14081652

Chicago/Turabian Style

Zheng, Wenbin, Jinjin Li, Shujiao Liao, and Yidong Lin. 2022. "Multi-Label Attribute Reduction Based on Neighborhood Multi-Target Rough Sets" Symmetry 14, no. 8: 1652. https://doi.org/10.3390/sym14081652

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop