Next Article in Journal
Mathematical Modeling and Methodology for Assessing the Pace of Socio-Economic Development of the Russian Federation
Next Article in Special Issue
On the Suitability of Bagging-Based Ensembles with Borderline Label Noise
Previous Article in Journal
A Novel Spatio-Temporal Fully Meshless Method for Parabolic PDEs
Previous Article in Special Issue
An Improved Three-Way Clustering Based on Ensemble Strategy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Multi-Label Classification with Enhanced Global and Local Label Correlation

1
Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
2
China UnionPay Co., Ltd, Shanghai 201201, China
3
Postdoctoral Research Station of Computer Science and Technology, Fudan University, Shanghai 200433, China
4
Department of Electrical & Computer Engineering, Alberta University, Edmonton, AB T6R 2V4, Canada
5
System Research Institute, Polish Academy of Sciences, PL-01447 Warsaw, Poland
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(11), 1871; https://doi.org/10.3390/math10111871
Submission received: 8 May 2022 / Revised: 22 May 2022 / Accepted: 25 May 2022 / Published: 30 May 2022
(This article belongs to the Special Issue Data Mining: Analysis and Applications)

Abstract

:
Data representation is of significant importance in minimizing multi-label ambiguity. While most researchers intensively investigate label correlation, the research on enhancing model robustness is preliminary. Low-quality data is one of the main reasons that model robustness degrades. Aiming at the cases with noisy features and missing labels, we develop a novel method called robust global and local label correlation (RGLC). In this model, subspace learning reconstructs intrinsic latent features immune from feature noise. The manifold learning ensures that outputs obtained by matrix factorization are similar in the low-rank latent label if the latent features are similar. We examine the co-occurrence of global and local label correlation with the constructed latent features and the latent labels. Extensive experiments demonstrate that the classification performance with integrated information is statistically superior over a collection of state-of-the-art approaches across numerous domains. Additionally, the proposed model shows promising performance on multi-label when noisy features and missing labels occur, demonstrating the robustness of multi-label classification.

1. Introduction

Multi-label learning [1,2,3] aims at learning a mapping from features to labels and determines a group of associated labels for unseen instances. The traditional “is-a” relation between instances and labels has thus been upgraded with the “has-a” relation. With the advancement of automation and networking, the expenditure continuously collecting data decreases significantly, resulting in a collection of advanced algorithms to cope with concept cognition [4,5,6,7,8,9,10]. However, it is a non-trivial task to collect massive high-quality data, as unexpected device failure or data falsification happens from time to time. Therefore, how to effectively cope with flawed multi-label data is frequently discussed in [11,12,13,14].
Label correlations are frequently exploited as auxiliary information for classifier construction as they approximately recover the joint probability density among labels. The appearance possibility of the label “bee” will be higher if the labels “flower” and “butterfly” are present in a picture. The global label correlation means the label relation holds universally, and the local label correlation otherwise. The label correlation of “flower, butterfly, and bee” is global if the picture is in the natural scene case. It degenerates as local if the picture describes the flowers in the house without butterflies and bees. As a result, the premise of global label correlation and local label correlation concurring is more reasonable. However, both label correlations are vulnerable to the feature noise and missing label, as they are inherent in the observed data itself. Recent papers focus on label correlation. Guo et al. [15] utilized Laplacian manifold regularization to recover the missing labels. Sun et al. [16] imposed sparse constraint and linear self-recovery to deal with the weakly supervised labels and noisy features. Zhang et al. [17] addressed the noise in latent features and latent labels via bi-sparsity regularization. Lou et al. [18] introduced a fuzzy weighting strategy on both observed features and instances to resist the impact of noisy features. Fan et al. [19] adopted manifold learning to exploit both global and local label correlations with selected features. Nevertheless, most of them examine the model robustness explicitly with either feature space or label space. The classification performance on the concurrence of both noisy features and missing labels will be more reliable if we strengthen the robustness of both global and local label correlations in latent data representation.
For multi-label classifiers, the evaluation metrics emphasize the accuracy and ranking of relevant labels [20]. Either from an instance perspective or a label perspective, the minority distributions of related labels amplify sensitivity to model parameter selections. In other words, inaccurate information, including feature noise [21,22] and label missing [23,24,25], may skew the estimation of model parameters. Subspace learning is an effective method to uncover the underlying structural representations. The label correlation estimation in subspace gains better credibility due to the removal of distractive information. Chen et al. [26] learned a shared latent projection via linear low-dimensional transformation, which is solved by a generalized eigenvalue problem. Chen et al. [27] adopted a neural factorization machine on both the feature side and label side to exploit feature and label correlations. Wang et al. [28] restrained the subspace dimension by relaxing the performance difference to a pre-trained model and preserving the largest parameters in the corresponding round. Huang et al. [29] sought a latent label space by minimizing the distance difference of pairwise label correlation constraint on explicit and implicit labels. However, these approaches neither examine the potentially flawed data nor take full advantage of the concurrence of global and local label correlation.
Based on subspace learning and manifold learning, we present a novel method called Robust Global and Local label Correlation (RGLC) to leverage both global and local label correlation extracted from the latent representation, as illustrated in Figure 1. Inspired by [30,31], subspace learning is employed to reconstruct a clean feature representation. As the evaluation metrics work in the observed label matrix, we generate latent label representation via matrix factorization and train a model from noise-free feature representation to latent label representation. During this training, we coupled with global and local label learning. Compared with existing multi-label algorithms, the contributions of RGLC are enumerated as follows:
  • For the first time, we exploit both global label correlations and local label correlations to regularize the learning from latent features to latent labels. The integrated intrinsic label correlation in RGLC robustly handles multi-label classification problems in different fields.
  • The subspace learning and matrix decomposition are jointly incorporated to deduce latent features and latent labels from flawed multi-label data. The two modules strengthen the robustness of RGLC when completing multi-label classification tasks with noisy features and missing labels.
  • We intensively examine RGLC from different aspects of modularity, complexity, convergence, and sensitivity. The satisfactory performance demonstrates the superiority of enhanced global and local label correlations.
We structure the remaining into five sections. We review related works of robust subspace learning, manifold regularization and global and local label correlation in Section 2. Section 3 details the components of the presented multi-label model (RGLC). Section 3.2 explains the optimization solving of RGLC. Extensive comparisons are completed and analyzed in Section 4. We discuss the characteristics of RGLC in Section 5. Finally, we summarize the findings and values of the proposed algorithm and suggest future research directions in Section 6.

2. Related Work

2.1. Robust Subspace Learning

Subspace learning is an effective strategy to strengthen model robustness. As far as the goal of subspace learning, there are roughly three solutions in single-view based multi-label learning [32]: (i) only reduce the feature matrix X to X 0 ; (ii) only reduce the label matrix Y to Y 0 ; and (iii) reduce the label matrix Y and feature matrix X to X 0 and Y 0 , respectively. The commonness of these methods is that the generated subspace is compact with the low-rank property. Recently, many approaches prefer the embedded strategy, which means subspace learning and classifier construction are optimized alternatively. For the first category, wrapping multi-label classification with label-specific features generation model (WRAP) [33] learnt the linear model from X 0 to Y and the embedded label-specific features simultaneously. For the second category, latent relative label importances for the multi-label learning model (RLIM) [34] learns the relative label importance while training the classifier from X to Y 0 . For the third category, the independent feature and label components model (IFLC) [35] learns the classifier from the latent feature subspace (i.e., X 0 ) to the latent label subspace (i.e., Y 0 ) by maximizing the independence individually and maximizing correlations jointly. In this paper, we employ different strategies for latent representation learning on the feature and label sides. Concretely, the latent feature subspace learning takes an embedded strategy by constructing a linear regression from X 0 to Y without the consideration of label correlation. The latent label subspace learning takes an embedded method by constructing a linear regression from X 0 to Y 0 .

2.2. Manifold Regularization

Manifold regularization [36] constrains the distribution of output similarity based on instance similarity. Cai proposed feature manifold learning and sparse regularization for the multi-label feature selection model (MSSL) [37], and it preserved feature geometric structure by feature manifold learning. Other scholars proposed a manifold regularized discriminative feature selection (MDFS) model [38]. It designed a feature embedding method that retains local label correlation via manifold regularization. There is also a Bayesian model with label manifold regularization and label confidence constraint (BM-LRC) [39]. It applied label manifold regularization on topic and document levels to estimate label similarity for multi-label text classification. Feng considered a regularized matrix factorization for the multi-label learning (RMFL) model [40]. It preserved topological information in local features through label manifold regularization. In this paper, we employ manifold regularization on latent label subspace learning to optimize the coefficients in the linear regression model.

2.3. Global and Local Label Correlation

Effective utilization of global and local label correlation is beneficial to improving multi-label classification if label information is assumed to be plausible. We explore accurate label correlations from the label subspace constrained by manifold regularization. Recent works [41,42,43] focus on local label correlations. Multi-label learning with global and local label correlation (GLOCAL) [31] is proposed for the first time, which attempts to deal with multi-label missing labels by exploiting both global and local label correlation in latent label subspace. The global and local label manifold regularizers use the pairwise label correlation. Global and local multi-view multi-label learning (GLMVML) [44] extended GLOCAL to a multi-view situation. Global–local label correlation (GLC) [45] solved the partial multi-label learning problem by simultaneously taking label coefficient matrix and consistency among the coefficient matrix as global and local label correlation regularizers, respectively. In this paper, we learn both global label correlations and local label correlations in latent label subspace, which constitutes a component of objective functions for the multi-label classifier.

3. Materials and Methods

We recall some essential notations here. Given multi-label instance X , Y , where X = x 1 , x 2 , , x n R d × n is the original feature matrix, d is the feature count, and n is the instance count. Y = y 1 , y 2 , , y n { 1 , 1 } l × n constitutes the ground-truth label matrix, where y i = y i 1 , , y i j , , y i l denotes the label association information of x i across label space, with y i j = 1 if x i is associated with the j-th label, and y i j = 1 otherwise. l is the label count.
Let Q R d × m and U R l × k denote two transformation matrices imposed on X R d × n and Y R l × n , respectively. We explain the model components by learning a latent feature matrix C R m × n ( m d ) first (Section 3.1.1) and then learning a latent label matrix V R k × n ( k l ) (Section 3.1.2), which is regularized by global and local manifold (Section 3.1.3) and learning latent label correlations Z (Section 3.1.4). The model output for unseen instances is presented finally (Section 3.1.5). The pipeline also applies to the case with noisy features and missing labels. We explain the problem solving in Section 3.2.

3.1. Proposed Model

3.1.1. Learning Latent Features

Inspired by [30], we seek a latent feature matrix C which is the low-rank representation of X with both informative and discriminative information by imposing reconstruction constraints. To obtain informativeness through imposing the reconstruction constraint:
X = PQ X + E
where Q R d × m transforms the original feature matrix X to the latent feature matrix C and P R d × m reconstructs the original feature matrix X from latent feature matrix C . To ensure a non-trivial solution, we add orthogonal constraint on P , denoted as P P = I . E R d × n separates the noise from feature matrix X . The desired latent feature matrix should minimize the classification error, and we assume that the classifier from the latent feature matrix C to label matrix Y is a linear regression model. The smaller the regression loss is, the better the reconstructed feature becomes. We require the simplicity of R and E , as they do not participate in the procedure of latent label learning. We use indicator matrix S and J to simulate the feature ( X ^ ) and missing label, respectively. Hence, we have the objective function as:
min R , Q , E , P 1 2 J Y RQ X ^ F 2 + 1 2 λ 1 Q F 2 + λ 2 E 1 + 1 2 R F 2 s . t . X ^ = PQ X ^ + E , P P = I , X ^ = S X
where · F 2 denotes the Frobenius norm, R is the weight of the classifier and λ 1 , λ 2 0 are all tradeoff parameters balancing the complexity between Q and R , E . The elements S i j = 1 and J i j = 1 in S and J hold if they are not noisy features and missing labels, S i j = 0 and J i j = 0 otherwise. We do not consider label correlation here, as the original label matrix may contain the missing labels. We learn the latent features ( Q ( S X ) ) in a label-by-label fashion. For simplicity, we denote:
C Q S X

3.1.2. Learning Latent Labels

The ground-truth label matrix Y R l × n is transformed to the latent label matrix V R k × n by the transformation matrix U R l × k satisfying Y UV , where k < l . The classifier
g ( x ) = W C
is learnt from the latent feature matrix C to the latent label matrix V . We assume the mapping is also linear. Therefore, we obtain the latent label matrix V by minimizing the reconstruction error of latent labels and square loss of the classifier prediction performance described as:
min U , V , W J ( Y UV ) F 2 + λ 1 V W C F 2 + λ 3 U F 2 + V F 2 + W F 2
where J i j = 1 if y i j is observable, and J i j = 0 otherwise. W R m × k is a linear transformation from C to V , and U R l × k is a transformation matrix from V to Y . U F 2 + V F 2 + W F 2 is the regularization item.

3.1.3. Global and Local Manifold Regularizer

Applying global and local manifold regularizers on label representation is conducive to distinguishing the similarity of different instances of latent features in both the global and local sense. The two regularization terms can guide the model to exploit global and local label correlation. We compute global and local manifold regularizers in a pairwise way. This idea is inspired by [31]. G = { G 0 , G 1 , , G i , , G j , , G h } is the output set of the classifier based on different groups of instances and the classifier is trained from the latent feature C to the latent label Y . C = { C 0 , C 1 , , C i , , C j , C h } denotes a group of instance sets with latent features, where C i R m × n i and C j R m × n j are the i th and j th latent features containing n i and n j instances. Y i and Y j are corresponding latent label representations of C i and C j , respectively. We definite the global manifold regularizer as:
t r ( G 0 L 0 G 0 )
which should be minimized. G 0 = UW C 0 is the label outputs on all instances, where C 0 = C . L 0 = d i a g ( S 0 1 ) S 0 is the Laplacian matrix of S 0 R l × l , where d i a g ( · ) denotes the diagonal of matrix and 1 denotes a vector of ones. S 0 is a global label correlation matrix measured by the cosine similarity among all latent labels (denoted as Y 0 ), where Y 0 = Y . t r ( · ) is the matrix trace. Similarly, for the ith local manifold regularizer, we have:
t r ( G i L i G i )
which should also be minimized. G i = UW C i 1 i , j h is the label output of ith groups of instances. For simplicity, we assume that each local similarity corresponds to only one set of instances, which means the corresponding intersection of instances between two arbitrary C i and C j is empty. L i = d i a g ( S i 1 ) S i is the Laplacian matrix of S i R l × l , where d i a g ( · ) denotes the diagonal of the matrix and 1 denotes a vector of ones. S i is a local label correlation matrix measured by the cosine similarity of ith local label subset (denoted as Y i ), where Y i Y . t r ( · ) is the matrix trace. Thus, (5) can be rewritten as:
min U , V , W J ( Y UV ) F 2 + λ 1 V W C F 2 + λ 3 U F 2 + V F 2 + W F 2 + λ 4 t r ( G 0 LG 0 ) + i = 1 h t r ( G i L i G i )
where J is an indicator matrix with J i j = 1 if y i j is observable and J i j = 0 otherwise.

3.1.4. Learning Label Correlations

The cosine similarity measured on the original label matrix Y may be dubious when a missing label occurs. One alternative solution is to learn the Laplacian matrix on Y , which stores the fluctuations on pairwise label correlations when an arbitrary label l i changes. Considering the symmetric positive definite property of the Laplacian matrix, we decompose L i as Z i Z i , with Z i R l × k . To avoid a trivial solution, we add the diagonal constraint d i a g ( Z i Z i ) = 1 for all Z i , i = 1 , , h . To unify the optimization processing on label correlation, we assume the global label correlation matrix is a linear combination of local label correlation matrices. Thus, the global label Laplacian matrix is the linear combination of the local label Laplacian matrix. Then, (8) is rewritten as:
min U , V , W , Z J Y UV F 2 + λ 1 V W C F 2 + λ 3 U F 2 + V F 2 + W F 2 + i = 1 h λ 4 n i n t r G 0 Z i Z i G 0 + i = 1 h λ 4 t r G i Z i Z i G i s . t . d i a g ( Z i Z i ) = 1 , i = 1 , 2 , , h
where symbol ∘ denotes Hadamard (element-wise) product. λ 1 , λ 3 and λ 4 0 are tradeoff parameters.

3.1.5. Predicting Labels for Unseen Instances

With the latent feature matrix C , the latent label matrix V and the classifier g ( x i ) obtained from Section 3.1.1, Section 3.1.2, Section 3.1.3 and Section 3.1.4, we present the model output here. For an unseen instance x i , the label prediction in the original label space is denoted as:
f ( x i ) = U g ( x i )
By substituting (3) and (4) into (10), the final prediction of an unseen instance is as follows:
f ( x i ) = UW Q S i x i

3.2. Optimization

3.2.1. Solving (2)

We use the Alternating Direction Method of Multipliers (ADMM) to solve the objective function in (2). By introducing constraint A = RQ , the objective function is renewed as:
min Q , R , E , A , P 1 2 J Y A X ^ F 2 + 1 2 λ 1 Q F 2 + λ 2 E 1 + 1 2 R F 2 s . t . X ^ = PQ X ^ + E , P P = I , A = RQ
The augmented Lagrangian function J is:
J = arg min A , R , Q , E , P , Y 1 , Y 2 1 2 J Y A X ^ F 2 + 1 2 λ 1 Q F 2 + λ 2 E 1 + 1 2 λ 2 R F 2 + Y 1 , X ^ PQ X ^ E + Y 2 , A RQ + μ 2 X ^ PQ X ^ E F 2 + A RQ F 2 s . t . P P = I
where Y 1 , Y 2 are Lagrange multipliers, · , · denotes the Frobenius inner product and μ > 0 is a penalty parameter.
Update A (Line 3 in Algorithm 1):
J = arg min A 1 2 J Y A X ^ F 2 + μ 2 A RQ + Y 2 μ F 2
Let J A = 0 , we have:
A = J Y X ^ + μ RQ + Y 2 μ X ^ X ^ + μ I 1
Update P (Line 4 in Algorithm 1):
J = arg min P X ^ PQ X ^ E + Y 1 μ F 2 s . t . P P = I
Problem (16) is a classic orthogonal procrustes problem [46], given M = X ^ E + Y 1 μ .
Update Q (Line 5 in Algorithm 1):
J = arg min Q 1 2 λ 1 Q F 2 + μ 2 X ^ PQ X ^ E + Y 1 μ F 2 + μ 2 A RQ + Y 2 μ F 2
Let J Q = 0 , we have:
Q λ 1 I + μ R R + μ X ^ X ^ Q = μ X ^ M P + μ B R
where B = A Y 2 μ . Q is renewed by a Sylvester equation [47].
Update R (Line 6 in Algorithm 1):
J = arg min R 1 2 λ 2 R F 2 + μ 2 A RQ + Y 2 μ F 2
Let J R = 0 , we have:
R = μ HQ λ 3 I + μ Q Q 1
where H = A Y 2 μ .
Update E (Line 7 in Algorithm 1):
J = arg min E λ 2 E 1 + μ 2 X ^ PQ X ^ E + Y 1 μ F 2
which has the closed-form solution [48] shown below:
E = S λ 2 μ X ^ PQ X ^ + Y 1 μ
where S λ 2 μ t = s g n ( t ) max t λ 2 μ , 0 , s g n ( · ) is a sign function with 1 if the input is greater than 0, 0 if the input is 0 and −1 otherwise.
Update Y 1 , Y 2 , and μ (Line 8 in Algorithm 1):
Y 1 = Y 1 + μ ( X ^ PQ X ^ E ) ; Y 2 = Y 2 + μ ( A RQ ) ; μ = min ( μ max , ρ μ ) .
where ρ 1 + .

3.2.2. Solving (9)

We employ the alternating minimization to solve the objective function in (9). We implement the solving for U, V, and W via manopt toolbox [49].
Update Z i (Line 13 in Algorithm 1):
min Z i λ 4 n i n t r G 0 Z i Z i G 0 + λ 4 t r G i Z i Z i G i s . t . d i a g ( Z i Z i ) = 1
for each i { 1 , , h } . We solve it with projected gradient descent. Z i is calculated by solving the gradient of Z i :
Z i = λ 4 n i n UW CC WU Z i + λ 4 UW C i C i WU Z i
To satisfy the constraint d i a g ( Z i Z i ) = 1 , we update the jth row of Z i (i.e., z i , j , : ) by z i , j , : / z i , j , : .
Update V (Line 15 in Algorithm 1):
min V J Y UV F 2 + λ 1 V W C F 2 + λ 3 V F 2
V is calculated by solving the gradient of V :
V = U J ( UV Y ) + λ 1 ( V W C ) + λ 3 V
Update U (Line 16 in Algorithm 1):
min U J Y UV F 2 + λ 3 U F 2 + i = 1 h λ 4 n i n t r G 0 Z i Z i G 0 + i = 1 h λ 4 t r G i Z i Z i G i
U is calculated by solving the gradient of U :
U = J ( UV Y ) V + λ 3 U + i = 1 h λ 4 n i n Z i Z i U W C C W + i = 1 h λ 4 Z i Z i U W C i C i W
Update W (Line 17 in Algorithm 1):
min W λ 1 V W C F 2 + λ 3 W F 2 + i = 1 h λ 4 n i n t r G 0 Z i Z i G 0 + i = 1 h λ 4 t r G i Z i Z i G i
W is calculated by solving the gradient of W :
W = λ 1 C ( C W V ) + λ 3 W + i = 1 h λ 4 n i n CC WU Z i Z i U + i = 1 h λ 4 C i C i WU Z i Z i U

3.3. Computational Complexity Analysis

This section analyzes the computational complexity of proposed RGLC (see Algorithm 1).
Algorithm 1: RGLC
Mathematics 10 01871 i001

3.3.1. Complexity Analysis of Solving (2)

The most time-consuming steps are the calculation of matrix A and matrix Q. Firstly, the complexity of calculating A (step 3) is O ( l n d + l d m + d 2 l + d 3 ) . Secondly, the complexity of calculating P (step 4) is O ( max ( d 2 n , d 2 m ) ) . Thirdly, the complexity of calculating Q (step 5) is O ( d 3 + max ( d 2 n , d 2 m ) + d 2 n + m 2 l ) . Finally, the complexity of computing R (step 6) is O ( m 3 + m 2 d + max ( d l m , m 2 l ) ) . Since the dimensionality of d is usually larger than instance count n, and the instance count n is much larger than l, we have the complexity of solving (2) as O τ 1 ( d 2 l + 2 d 3 + 3 d 2 n ) , where τ 1 is the iteration count.

3.3.2. Complexity Analysis of Solving (9)

The most time-consuming steps are the calculation of matrix Z. Firstly, the complexity of calculating Z (step 13) is O ( h l 2 k 2 m 2 n + τ 2 h m n ) , where O ( τ 2 h m n ) denotes the complexity for the generation of local instance groups with K-means algorithms, and τ 2 is the iteration counts. Secondly, the complexity of calculating V (step 15) is O ( max ( k 2 l n , k m n ) ) . Thirdly, the complexity of calculating U (step 16) is O ( h l 2 k 2 m 2 n 2 ) . Finally, the complexity of computing W (step 17) is O ( h m 2 n k 3 l 2 ) . As we adopt gradient descent for Z, V, U, and W, the gradient decreases to zero at the rate of 1 τ 3 . Therefore, the complexity of solving (9) is O ( τ 3 h l 2 k 2 m 2 n ) , where τ 3 is the iteration count.
In a nutshell, the complexity is max ( O τ 1 ( d 2 l + 2 d 3 + 3 d 2 n ) , O ( τ 3 h l 2 k 2 m 2 n ) ) .

4. Results

4.1. Experimental Settings

We conduct extensive experiments to examine the robustness of RGLC on twelve benchmarks, where bibtex, corel5k, enron, genbase, mediamill, medical, scene are from Mulan http://mulan.sourceforge.net/datasets.html (accessed on 21 January 2022) [50], languagelog and slashdot are from Meka http://waikato.github.io/meka/datasets/ (accessed on 21 January 2022) [51] and art, business, and health are from MDDM www.lamda.nju.edu.cn/code_MDDM.ashx (accessed on 21 January 2022) [52]. Table 1 summarizes the data characteristics from domains including images, text, video, and biology. For each dataset, we introduce the example count (#n), the feature dimensionality (#f), the label count (#q), the average number of associated labels per instance (#c) and the domain.
We adopt six evaluation metrics (Hamming Loss, Ranking Loss, One Error, Coverage, Average Precision, and Micro F1) [53] to evaluate the classification performance. The last two achieve better performance if the values are large, and the remaining metrics obtain better performance if the values are small. Let Y i and Y i ¯ denote the relevant and irrelevant label set in ground truth and n be the unseen instances count; then, the formulas of metrics are enumerated as:
(1)
Hamming Loss (abbreviated as Hl) evaluates the average difference between predictions and ground truth (see Formula (32)). The smaller the value of Hamming loss is, the better the performance of an algorithm becomes.
H l = 1 n i = 1 n 1 l f x i Δ Y i
where Δ is the set symmetric difference and | · | is the set cardinality.
(2)
Ranking Loss (abbreviated as Rkl) evaluates the fraction that an irrelevant label ranks before the relevant label in label predictions (see Formula (33)). The smaller the value of Ranking Loss is, the better the performance of an algorithm becomes.
R k l = 1 n i = 1 n 1 Y i Y i ¯ × l a , l b r i l a > r i l b l a , l b Y i × Y i ¯
where r i l j denotes the ranking position in ascending order for the j-th label on the i-th instance. | · | is the set cardinality.
(3)
One Error (abbreviated as Oe) evaluates whether the average fraction that the label ranks first in prediction is the irrelevant label (see Formula (34)). The smaller the value of One Error is, the better the performance of an algorithm becomes.
O e = 1 n i = 1 n arg min r i l j l j Y i
where · is equal to 1 if the condition holds, and it equals 0 otherwise. The operator r i l j denotes the ranking position in ascending order for the j-th label on the i-th instance.
(4)
Coverage (abbreviated as Cvg) evaluates the average fraction for inclusion of all ground-truth labels in the ranking of label predictions (see Formula 35). The smaller the value of Coverage is, the better the performance of an algorithm becomes.
C v g = 1 n i = 1 n max l j Y i r i l j 1
where r i l j denotes the ranking position in ascending order for the j-th label on the i-th instance.
(5)
Average Precision (abbreviated as Ap) evaluates the average precision of actually relevant labels ranking before a relevant label examined by label predictions (see Formula 36). The larger the value of Average Precision is, the better the performance of an algorithm becomes.
A p = 1 n i = 1 n 1 Y i l j Y i l s Y i r i l s r i l j r i l j
where | · | is the set cardinality.
The configurations used for comparing algorithms are enumerated as follows:
  • Multi-label learning with label-specific features (LIFT) http://palm.seu.edu.cn/zhangml/files/LIFT.rar (accessed on 21 January 2022) [54]: Generating cluster-based label-specific features for multi-label.
  • Learning label-specific features (LLSF) https://jiunhwang.github.io/ (accessed on 21 January 2022) [55]: Learning label-specific features to promote multi-label classification performance.
  • Multi-label twin support vector machine (MLTSVM) http://www.optimal-group.org/Resource/MLTSVM.html (accessed on 21 January 2022) [56]: Providing multi-label learning algorithm with twin support vector machine.
  • Global and local label correlations (Glocal) http://www.lamda.nju.edu.cn/code_Glocal.ashx (accessed on 21 January 2022) [31]: Learning global and local correlation for multi-label.
  • Hybrid noise-oriented multilabel learning (HNOML) [17]: A feature and label noise-resistance multi-label model.
  • Manifold regularized discriminative feature selection for multi-label learning (MDFS) https://github.com/JiaZhang19/MDFS (accessed on 21 January 2022) [38]: Learning discriminative features via manifold regularization.
  • Multilabel classification with group-based mapping (MCGM) https://github.com/JianghongMA/MC-GM (accessed on 21 January 2022) [43]: Group-based local correlation with local feature selection.
  • Fast random k labelsets (fRAkEL) http://github.com/KKimura360/fast_RAkEL_matlab (accessed on 21 January 2022) [57]: A fast version of Random k-labelsets.
  • RGLC: Proposed model. λ 1 , λ 2 , λ 3 , and λ 4 are searched in { 10 5 , 10 4 , , 1 } . Cluster number h = 5 and ρ = 1.01 , as they have limited impacts on considered metrics.
Firstly, we explore the robustness of the baseline performance compared with the eight state-of-the-art multi-label classification algorithms across different domains. Secondly, we examine the contribution of included components by an ablation study. For the first two experiments, both S and J are all ones for baseline comparisons. Thirdly, we examine the robustness of RGLC with noisy features and missing labels. Lastly, we analyze convergence and sensitivity with noisy features and missing labels. For fairness, we present the results from the average of six times five-fold cross-validation. We conduct the entire experiments on a computer with an Intel(R) Core i7-10700 2.90GHz CPU and 32GB RAM running on a Windows 10 operating system.
To simulate noisy features and missing labels, we randomly replace 1 with 0 in each column of the indicator matrix S and J using the strategy in article [17]. The replacements are increased from 0% to 80% with 20% being the step size. The experimental comparison is on the datasets of business and health.

4.2. Learning with Benchmarks

Table 2 elaborates the experimental results of selected algorithms across all considered datasets. We first calculate the nine-algorithm performance rank on the same dataset and then calculate the average of one-algorithm performance rank among the twelve datasets. We obtain the value of the “Avg rank” of one algorithm and then rank it to get the value in ().
We employ the Friedman test [58] to quantify the differences between all considered algorithms. We introduce four symbols to detail the processing of Friedman statistics F F (see Formula (37)), including comparing algorithms number (k), datasets number (N), the performance rank of the j-th algorithm on the i-th dataset ( r i j ) and the average rank of the j-th algorithm over all the datasets ( R j = 1 N i = 1 N r i j ). With the null hypothesis (i.e., H 0 ) of all algorithms obtaining identical performance, we have the F-distribution with k 1 degrees of freedom as the numerator and ( k 1 ) ( N 1 ) degrees of freedom as the denominator:
F F = N 1 χ F 2 N k 1 χ F 2
where
χ F 2 = 12 N k k + 1 j R j 2 k k + 1 2 4
Table 3 presents the value of Friedman statistics F F for all considered metrics and the referenced critical difference (CD). As shown in Table 3, at the significance level α = 0.05 , we reject the null hypothesis (i.e., H 0 ) of statistically indistinguishable performance among the considered algorithms for all considered evaluation metrics. The result implies that it is feasible to examine whether RGLC gains statistical superiority over other algorithms by conducting the post hoc test such as the Holm test [58].
Table 4 presents the statistical result of RGLC compared with remaining algorithms by the Holm test (see Formula (39)) at significance level ( α = 0.05 ). We use RGLC as the control algorithm and nominate it as A 1 . We stipulate the remaining comparing algorithms as A j in the order of average rank across all datasets in each evaluation metric, where 2 j k .
z j = R 1 R j R 1 R j k k + 1 6 N k k + 1 6 N 2 j k
The p-value of z j (denoted as p j ) is calculated by standard normal distribution. For significance level α = 0.05 , we examine whether p j is smaller than α / k j + 1 for j = 2 , 3 , , 9 . Specially, the Holm test continues until there exists a j * th step, where j * denotes the first j with p j α k j + 1 . j * is configured as k + 1 if p j < α k j + 1 hold for all j.
Accordingly, we have the following findings based on reported experimental comparisons:
  • From Table 2, we observe that for all six evaluation metrics, RGLC achieves the best performance or second-best performance in 66.67% (4/6) and 33.33% (2/6) cases according to the rank of “Avg rank” in terms of Hamming Loss, Ranking Loss, One Error, Coverage, Average Precision, and Micro F1. It is only inferior to LIFT and fRAkEL according to the rank of the value of “Avg rank” in Table 2 based on the evaluation metric Ranking Loss and One Error, respectively. Specifically, RGLC achieves 24 (33.33%) best (the number of values indicated in bold) and 18 (25%) second-best performances (the number of values indicated in underlined) on 72 observations (12 datasets × 6 metrics). In contrast, the second-best method is LIFT, which achieves 12 (16.67%) best and 18 (25%) second-best performances on 72 observations (12 datasets × 6 metrics).
  • The Holm test in Table 4 shows that RGLC is statistically superior to other algorithms to varying degrees in all metrics, with the best performance in Average Precision (better than seven algorithms) and the worst performance in Ranking Loss (better than three algorithms). Concretely, RGLC significantly outperforms MCGM in terms of all metrics except Micro F1, significantly outperforms MLTSVM in terms of Hamming Loss, Ranking Loss, Coverage, and Average Precision, significantly outperforms Glocal in terms of Hamming Loss, Coverage, Average Precision, and Micro F1, significantly outperforms HNOML in terms of Hamming Loss, One Error, Average Precision, and Micro F1, significantly outperforms MDFS in terms of One Error, Coverage, Average Precision, and Micro F1, significantly outperforms fRAkEL in terms of Hamming Loss, Ranking Loss, Average Precision, and Micro F1, significantly outperforms LLSF in terms of Hamming Loss, One Error, and Average Precision and significantly outperforms LIFT in terms of Coverage.

4.3. Ablation Study

Table 5 shows the functionality of latent features, latent labels, global label correlation, and the local label correlation.
We devise four degenerate versions of RGLC: (i) RGLC-LF, which learns both global and local label correlation without latent features; (ii) RGLC-LL, which learns both global and local label correlation without latent labels; (iii) RGLC-GC, which learns local label correlation only on both latent features and latent labels; and (iv) RGLC-LC, which learns global label correlation only on both latent features and latent labels. The lower the rank is, the better the performance becomes. For the degenerate versions, the larger difference in the ranking of the algorithms to RGLC is, the more significant the component in boosting classification performance and vice versa.
As shown in Table 5, RGLC achieves the best performance across all metrics on average. Latent feature learning contributes most among all components, and latent label learning contributes almost the same as local label correlation. The global label correlation has the least contribution to boosting performance.

4.4. Learning with Noisy Features and Missing Labels

In this section, we evaluate the fluctuations of RGLC with the concurrence of noisy features and missing labels.
Figure 2 illustrates the classification performance on dataset art.
As can be observed, the performance of all metrics degenerates as the ratio of noisy features and missing labels increases. For most cases, RGLC fluctuates least as the noisy features and missing labels increase.
Figure 3 illustrates the classification performance of the dataset business.
As can be observed, the performance of all metrics degenerates as the ratio of noisy features and missing labels increases. For most cases, RGLC fluctuates the least as the noisy features and missing labels increase.
Figure 4 illustrates the classification performance of dataset health.
As can be observed, the performance of all metrics degenerates as the ratio of noisy features and missing labels increases. For most cases, RGLC fluctuates least as the noisy features and missing labels increase.

4.5. Convergence

Furthermore, we study the convergence of RGLC. Without losing generality, we report the variations of two objective functions on datasets bibtex and corel5k in Figure 5, given 60% noisy features and missing labels presence.
As can be observed, values of both objective functions converge quickly in a few iterations. Similar phenomena also apply to other datasets. The runtime comparisons of all algorithms in Table 6 show that the computational complexity of RGLC is acceptable on large-scale datasets.

4.6. Sensitivity

We also study the sensitivity of involved parameters ( λ 1 , λ 2 , λ 3 , and λ 4 ) and report the fluctuations in the health dataset in Figure 6.
The larger λ 1 is, the higher the significance of potential representation is, and the smaller λ 2 and λ 3 are, the lower the complexity of the model on the feature side and label side. The larger λ 1 is, the smaller λ 2 and λ 3 are, and the more satisfactory the Average Precision, Hamming Loss, Micro F1, and Ranking Loss are. However, the results can deteriorate if λ 3 is too small, which means the latent labels are less important than latent features. A larger λ 4 means higher importance of the label manifold. The contribution of the label manifold is smaller than the latent feature and latent label, yet it is larger than the model parameters. Similar phenomena also apply to other datasets. Consequently, we recommend the settings of parameters λ 1 , λ 2 , λ 3 , and λ 4 in the order of λ 1 λ 4 λ 3 λ 2 .

5. Discussion

This section discusses the robustness of RGLC. The experimental results on benchmarks indicate that RGLC gains the best performance on average across different domains. This means RGLC is competent for multi-label classification. The ablation study examines the functionality of included modules (i.e., latent features, latent labels, global label correlations and local label correlation). It reveals that all modules contribute to the performance improvement, and local label correlation is more conducive than global label correlation. Such a conclusion coincides with perception, as global label correlation is more challenging than local label correlation when data are automatically collected. In real applications, the data collection is inevitably affected by noise and missing. The degenerations of RGLC on three large-scale datasets are negligible for metrics Hamming Loss, Ranking Loss, Coverage, and Average Precision. It implies the availability for uncontrolled cases demonstrating model robustness. The convergence analysis implies that the objective values of RGLC converge in limited rounds. Nevertheless, the computational complexity is not very satisfying, which means the computation in each iteration is considerable. Partial reasons stem from the optimization on a large number of intermediate matrix (A, P, Q, R, E, W, U, V, Z). Future work should focus on the acceleration of RGLC.

6. Conclusions

This paper formulated a robust multi-label classification method RGLC by exploiting enhanced global and local label correlations. Unlike the existing multi-label approaches, which realize robustness by either exploiting label correlations or reconstructing data representation, we tackle it simultaneously by learning global label correlations and local label correlations from the low-rank latent space. We strengthen the reliability of global label correlations and local label correlations by combining subspace learning and manifold regularization. Extensive studies on multiple domains demonstrate the robust performance of RGLC over a collection of state-of-the-art multi-label algorithms. We further demonstrate the robustness of RGLC in the presence of noisy features and missing labels, which is more realistic in multi-label data collection. In the future, we intend to validate whether RGLC shows robustness for more categories of noisy features and missing labels.

Author Contributions

Conceptualization, formal analysis, writing—original draft preparation, T.Z.; methodology, software, validation, writing—review and editing, Y.Z.; writing—review and editing, W.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China grant number 61976158, 61976160, 62076182, 62163016, 62006172, 61906137, and it is also partially supported by the Jiangxi “Double Thousand Plan” with grant number 20212ACB202001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank the constructive comments from D. Q. Miao. Meanwhile, we would like to thank serveral people for their contributions in datasets. Their names are in alphabetical order: M. R. Boutell, S. Diplaris, P. Duygulu, I. Katakis, J. P. Pestian, J. Read, C. G. M. Snoek and Y. Zhang.

Conflicts of Interest

Y.J. Zhang is a post-doctoral at China UnionPay Co., Ltd. The data and computing resources of this paper have no commercial relationship with the company.

References

  1. Zhang, M.L.; Zhou, Z.H. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 2014, 26, 1819–1837. [Google Scholar] [CrossRef]
  2. Gibaja, E.; Ventura, S. A tutorial on multilabel learning. ACM Comput. Surv. 2015, 47, 1–38. [Google Scholar] [CrossRef]
  3. Liu, W.W.; Shen, X.B.; Wang, H.B.; Tsang, I.W. The emerging trends of multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, in press. [Google Scholar] [CrossRef]
  4. Xu, W.H.; Li, W.T. Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Trans. Cybern. 2016, 46, 366–379. [Google Scholar] [CrossRef]
  5. Xu, W.H.; Yu, J.H. A novel approach to information fusion in multi-source datasets: A granular computing viewpoint. Inf. Sci. 2017, 378, 410–423. [Google Scholar] [CrossRef]
  6. Zhang, Y.J.; Miao, D.Q.; Zhang, Z.F.; Xu, J.F.; Luo, S. A three-way selective ensemble model for multi-label classification. Int. J. Approx. Reason. 2018, 103, 394–413. [Google Scholar] [CrossRef]
  7. Zhang, Y.J.; Miao, D.Q.; Pedrycz, W.; Zhao, T.N.; Xu, J.F.; Yu, Y. Granular structure-based incremental updating for multi-label classification. Knowl. Based Syst. 2020, 189, 105066:1–105066:15. [Google Scholar] [CrossRef]
  8. Zhang, Y.J.; Zhao, T.N.; Miao, D.Q.; Pedrycz, W. Granular multilabel batch active learning with pairwise label correlation. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 3079–3091. [Google Scholar] [CrossRef]
  9. Yuan, K.H.; Xu, W.H.; Li, W.T.; Ding, W.P. An incremental learning mechanism for object classification based on progressive fuzzy three-way concept. Inf. Sci. 2022, 584, 127–147. [Google Scholar] [CrossRef]
  10. Xu, Y.H.; Yuan, K.H.; Li, W.T. Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl. Intell. 2022, in press. [Google Scholar] [CrossRef]
  11. Guo, Y.M.; Chung, F.L.; Li, G.Z.; Wang, J.C.; Gee, J.C. Leveraging label-specific discriminant mapping features for multi-label learning. ACM Trans. Knowl. Discov. Data 2019, 13, 24:1–24:23. [Google Scholar] [CrossRef]
  12. Huang, D.; Cabral, R.; Torre, F.D.L. Robust regression. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 363–375. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Ma, J.H.; Zhang, H.J.; Chow, T.W.S. Multilabel classification with label-specific features and classifiers: A coarse- and fine-tuned framework. IEEE Trans. Cybern. 2021, 51, 1028–1042. [Google Scholar] [CrossRef] [PubMed]
  14. Zhang, J.; Lin, Y.D.; Jiang, M.; Li, S.Z.; Tang, Y.; Tan, K.C. Multi-label feature selection via global relevance and redundancy optimization. In Proceedings of the International Conference on Artificial Intelligence, Yokohama, Japan, 7–15 January 2020; pp. 2512–2518. [Google Scholar]
  15. Guo, B.L.; Hou, C.P.; Shan, J.C.; Yi, D.Y. Low rank multi-label classification with missing labels. In Proceedings of the International Conference on Pattern Recognition, Beijing, China, 21–24 August 2018; pp. 417–422. [Google Scholar]
  16. Sun, L.J.; Ye, P.; Lyu, G.Y.; Feng, S.H.; Dai, G.J.; Zhang, H. Weakly-supervised multi-label learning with noisy features and incomplete labels. Neurocomput. 2020, 413, 61–71. [Google Scholar] [CrossRef]
  17. Zhang, C.Q.; Yu, Z.W.; Fu, H.Z.; Zhu, P.F.; Chen, L.; Hu, Q.H. Hybrid noise-oriented multilabel learning. IEEE Trans. Cybern. 2020, 50, 2837–2850. [Google Scholar] [CrossRef] [PubMed]
  18. Lou, Q.D.; Deng, Z.H.; Choi, K.S.; Shen, H.B.; Wang, J.; Wang, S.T. Robust multi-label relief feature selection based on fuzzy margin co-optimization. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 387–398. [Google Scholar] [CrossRef]
  19. Fan, Y.L.; Liu, J.H.; Liu, P.Z.; Du, Y.Z.; Lan, W.Y.; Wu, S.X. Manifold learning with structured subspace for multi-label feature selection. Pattern Recogn. 2021, 120, 108169:1–108169:16. [Google Scholar] [CrossRef]
  20. Xu, M.; Li, Y.F.; Zhou, Z.H. Robust multi-label learning with pro loss. IEEE Trans. Knowl. Data Eng. 2020, 32, 1610–1624. [Google Scholar] [CrossRef] [Green Version]
  21. Braytee, A.; Liu, W.; Anaissi, A.; Kennedy, P.J. Correlated multi-label classification with incomplete label space and class imbalance. ACM Trans. Intell. Syst. Technol. 2019, 10, 56:1–56:26. [Google Scholar] [CrossRef] [Green Version]
  22. Dong, H.B.; Sun, J.; Sun, X.H. A multi-objective multi-label feature selection algorithm based on shapley value. Entropy 2021, 23, 1094. [Google Scholar] [CrossRef]
  23. Jain, H.; Prabhu, Y.; Varma, M. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 935–944. [Google Scholar]
  24. Qarraei, M.; Schultheis, E.; Gupta, P.; Babbar, R. Convex surrogates for unbiased loss functions in extrem classification with missing labels. In Proceedings of the World Wide Web Conference, Ljubljana, Slovenia, 19–23 April 2021; pp. 3711–3720. [Google Scholar]
  25. Wydmuch, M.; Jasinska-Kobus, K.; Babbar, R.; Dembczynski, K. Propensity-scored probabilistic label trees. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 11–15 July 2021; pp. 2252–2256. [Google Scholar]
  26. Chen, Z.J.; Hao, Z.F. A unified multi-label classification framework with supervised low-dimensional embedding. Neurocomputing 2016, 171, 1563–1575. [Google Scholar] [CrossRef]
  27. Chen, C.; Wang, H.B.; Liu, W.W.; Zhao, X.Y.; Hu, T.L.; Chen, G. Two-stage label embedding via neural factorization machine for multi-label classification. In Proceedings of the Association for the Advance in Artificial Intelligence, Hawaii, HI, USA, 27 December–1 January 2019; pp. 3304–3311. [Google Scholar]
  28. Wei, T.; Li, Y.F. Learning compact model for large-scale multi-label data. In Proceedings of the Association for the Advance in Artificial Intelligence, Hawaii, HI, USA, 27 December–1 January 2019; pp. 5385–5392. [Google Scholar]
  29. Huang, J.; Xu, L.C.; Wang, J.; Feng, L.; Yamanishi, K. Discovering latent class labels for multi-label learning. In Proceedings of the International Conference on Artificial Intelligence, Yokohama, Japan, 7–15 January 2020; pp. 3058–3064. [Google Scholar]
  30. Fang, X.Z.; Teng, S.H.; Lai, Z.H.; He, Z.S.; Xie, S.L.; Wong, W.K. Robust latent subspace learning for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2502–2515. [Google Scholar] [CrossRef] [PubMed]
  31. Zhu, Y.; Kwok, J.T.; Zhou, Z.H. Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 2018, 30, 1081–1094. [Google Scholar] [CrossRef] [Green Version]
  32. Siblini, W.; Kuntz, P.; Meyer, F. A review on dimensionality reduction for multi-label classification. IEEE Trans. Knowl. Data Eng. 2021, 33, 839–857. [Google Scholar] [CrossRef] [Green Version]
  33. Yu, Z.B.; Zhang, M.L. Multi-label classification with label-specific feature generation: A wrapped approach. IEEE Trans. Pattern Anal. Mach. Intell. 2021, in press. [Google Scholar] [CrossRef]
  34. He, S.; Feng, L.; Li, L. Estimating latent relative labeling importances for multi-label learning. In Proceedings of the International Conference on Data Mining, Singapore, 17–20 November 2018; pp. 1013–1018. [Google Scholar]
  35. Zhong, Y.J.; Xu, C.; Du, B.; Zhang, L.F. Independent feature and label components for multi-label classification. In Proceedings of the International Conference on Data Mining, Singapore, 17–20 November 2018; pp. 827–836. [Google Scholar]
  36. Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 2006, 7, 2399–2434. [Google Scholar]
  37. Cai, Z.L.; Zhu, W. Multi-label feature selection via feature manifold learning and sparsity regularization. Int. J. Mach. Learn. Cybern. 2018, 9, 1321–1334. [Google Scholar] [CrossRef]
  38. Zhang, J.; Luo, Z.M.; Li, C.D.; Zhou, C.E.; Li, S.Z. Manifold regularized discriminative feature selection for multi-label learning. Pattern Recogn. 2019, 95, 136–150. [Google Scholar] [CrossRef]
  39. Guan, Y.Y.; Li, X.M. Multilabel text classification with incomplete labels: A save generative model with label manifold regularization and confidence constraint. IEEE Multimedia 2020, 27, 38–47. [Google Scholar] [CrossRef]
  40. Feng, L.; Huang, J.; Shu, S.L.; An, B. Regularized matrix factorization for multilabel learning with missing labels. IEEE Trans. Cybern. 2020, in press. [Google Scholar] [CrossRef]
  41. Huang, S.J.; Zhou, Z.H. Multi-label learning by exploiting label correlation locally. In Proceedings of the Association for Advanced Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; pp. 945–955. [Google Scholar]
  42. Jia, X.Y.; Zhu, S.S.; Li, W.W. Joint label-specific features and correlation information for multi-label learning. J. Comput. Sci. Technol. 2020, 35, 247–258. [Google Scholar] [CrossRef]
  43. Ma, J.H.; Chiu, B.C.Y.; Chow, T.W.S. Multilabel classification with group-based mapping: A framework with local feature selection and local label correlation. IEEE Trans. Cybern. 2020, in press. [Google Scholar] [CrossRef] [PubMed]
  44. Zhu, C.M.; Miao, D.Q.; Wang, Z.; Zhou, R.G.; Wei, L.; Zhang, X.F. Global and local multi-view multi-label learning. Neurocomputing 2020, 371, 67–77. [Google Scholar] [CrossRef]
  45. Sun, L.J.; Feng, S.H.; Liu, J.; Lyu, G.Y.; Lang, C.Y. Global-local label correlation for partial multi-label learning. IEEE Trans. Multimed. 2022, 24, 581–593. [Google Scholar] [CrossRef]
  46. Cai, X.; Ding, C.; Nie, F.; Huang, H. On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 1124–1132. [Google Scholar]
  47. Sylvester, J. Sur l’equation en matrices px=xq. C. R. Acad. Sci. Paris 1884, 99, 67–71. [Google Scholar]
  48. Liu, G.C.; Lin, Z.C.; Yan, S.C.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [Green Version]
  49. Boumal, N.; Mishra, B.; Absil, P.A.; Sepulchre, R. Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 2014, 15, 1455–1459. [Google Scholar]
  50. Tsoumakas, G.; Katakis, I.; Vlahavas, I. Mining Multi-label Data. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2009; pp. 667–685. [Google Scholar]
  51. Read, J.; Reutemann, P.; Pfahringer, B.; Holmes, G. Meka: A multi-label/multi-target extension to weka. J. Mach. Learn. Res. 2016, 17, 21:1–21:5. [Google Scholar]
  52. Zhang, Y.; Zhou, Z.H. Multilabel dimensionality reduction via dependence maximization. ACM Trans. Knowl. Disocv. Data 2010, 4, 14:1–14:21. [Google Scholar] [CrossRef]
  53. Schapire, R.E.; Singer, Y. BoosTexter: A boosting-based system for text categorization. Mach. Learn. 2000, 39, 135–168. [Google Scholar] [CrossRef] [Green Version]
  54. Zhang, M.L.; Wu, L. LIFT: Multi-label learning with label-specific features. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 107–120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Huang, J.; Li, G.R.; Huang, Q.M.; Wu, X.D. Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans. Knowl. Data Eng. 2016, 28, 3309–3323. [Google Scholar] [CrossRef]
  56. Chen, W.J.; Shao, Y.H.; Li, C.N.; Deng, N.Y. MLTSVM: A novel twin support vector machine to multi-label learning. Pattern Recogn. 2016, 52, 61–74. [Google Scholar] [CrossRef]
  57. Kimura, K.; Kudo, M.; Sun, L.; Koujaku, S. Fast random k-labelsets for large-scale multi-label classification. In Proceedings of the International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 438–443. [Google Scholar]
  58. Demsar, J. Statistical comparisons of classifier over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Figure 1. Framework of RGLC. Instead of directly learning global and local label correlations Z from the observed feature matrix X to observed label matrix Y, we attempt to train the RGLC (i.e., f X ) model from latent feature matrix (i.e., C Q S X ) to latent label matrix (i.e., V ). It exploits global and local label correlations Z on latent features and labels with linear mapping W .
Figure 1. Framework of RGLC. Instead of directly learning global and local label correlations Z from the observed feature matrix X to observed label matrix Y, we attempt to train the RGLC (i.e., f X ) model from latent feature matrix (i.e., C Q S X ) to latent label matrix (i.e., V ). It exploits global and local label correlations Z on latent features and labels with linear mapping W .
Mathematics 10 01871 g001
Figure 2. Robustness experiments with noisy features and missing labels on dataset art on metric (a) Hamming Loss, (b) Ranking Loss, (c) Average Precision, and (d) Micro F1.
Figure 2. Robustness experiments with noisy features and missing labels on dataset art on metric (a) Hamming Loss, (b) Ranking Loss, (c) Average Precision, and (d) Micro F1.
Mathematics 10 01871 g002
Figure 3. Robustness experiments with noisy features and missing labels on dataset business on metric (a) Hamming Loss, (b) Ranking Loss, (c) Average Precision, and (d) Micro F1.
Figure 3. Robustness experiments with noisy features and missing labels on dataset business on metric (a) Hamming Loss, (b) Ranking Loss, (c) Average Precision, and (d) Micro F1.
Mathematics 10 01871 g003
Figure 4. Robustness experiments with noisy features and missing labels on dataset health on metric (a) Hamming Loss, (b) Ranking Loss, (c) Average Precision, and (d) Micro F1.
Figure 4. Robustness experiments with noisy features and missing labels on dataset health on metric (a) Hamming Loss, (b) Ranking Loss, (c) Average Precision, and (d) Micro F1.
Mathematics 10 01871 g004
Figure 5. Convergence of RGLC on the bibtex and corel5k datasets with 60% noisy features and missing labels. (a,c) are the convergence of (2), whereas (b,d) are the convergence of (9).
Figure 5. Convergence of RGLC on the bibtex and corel5k datasets with 60% noisy features and missing labels. (a,c) are the convergence of (2), whereas (b,d) are the convergence of (9).
Mathematics 10 01871 g005
Figure 6. Performance fluctuations of λ 1 , λ 2 , λ 3 , and λ 4 on the health dataset with 20% noisy features and missing labels on metric (a) Hamming Loss, (b) Ranking Loss, (c) Average Precision, and (d) Micro F1.
Figure 6. Performance fluctuations of λ 1 , λ 2 , λ 3 , and λ 4 on the health dataset with 20% noisy features and missing labels on metric (a) Hamming Loss, (b) Ranking Loss, (c) Average Precision, and (d) Micro F1.
Mathematics 10 01871 g006
Table 1. Characteristics of datasets.
Table 1. Characteristics of datasets.
Datasets#n#f#q#cDomain
art5000462261.64text
bibtex739518361592.402text
business5000438301.59text
corel5k50004993743.522image
enron17021001533.378text
genbase6621185271.252biology
health5000612321.66text
languagelog14601004751.18text
mediamill439071201014.376video
medical9781449451.245text
scene240729461.074image
slashdot37821079221.18text
Table 2. Comparison of each algorithm (mean ± std). Best results and second best are in bold and underlined, respectively. (↓ the smaller the better, ↑ the larger the better).
Table 2. Comparison of each algorithm (mean ± std). Best results and second best are in bold and underlined, respectively. (↓ the smaller the better, ↑ the larger the better).
DatasetHamming Loss (↓)
LIFTLLSFMLTSVMGlocalHNOMLMDFSMCGMfRAkELRGLC
art0.053 ± 0.0010.057 ± 0.0020.066 ± 0.0010.062 ± 0.0040.062 ± 0.0010.061 ± 0.0010.086 ± 0.0060.136 ± 0.0040.060 ± 0.002
bibtex0.013 ± 0.0010.015 ± 0.0010.017 ± 0.0010.014 ± 0.0010.015 ± 0.0010.013 ± 0.0010.047 ± 0.0060.046 ± 0.0020.012 ± 0.001
business0.028 ± 0.0010.034 ± 0.0020.025 ± 0.0010.029 ± 0.0010.030 ± 0.0010.027 ± 0.0020.033 ± 0.0030.046 ± 0.0020.025 ± 0.002
corel5k0.010 ± 0.0010.024 ± 0.0020.015 ± 0.0010.010 ± 0.0010.009 ± 0.0010.010 ± 0.0010.201 ± 0.0170.026 ± 0.0010.009 ± 0.001
enron0.109 ± 0.0010.067 ± 0.0010.062 ± 0.0020.076 ± 0.0060.056 ± 0.0020.051 ± 0.0020.062 ± 0.0040.204 ± 0.0080.057 ± 0.004
genbase0.003 ± 0.0010.001 ± 0.0010.002 ± 0.0010.003 ± 0.0030.003 ± 0.0030.005 ± 0.0010.003 ± 0.0010.002 ± 0.0010.002 ± 0.004
health0.035 ± 0.0020.037 ± 0.0010.037 ± 0.0010.039 ± 0.0020.047 ± 0.0010.044 ± 0.0010.048 ± 0.0030.074 ± 0.0040.033 ± 0.002
languagelog0.016 ± 0.0010.025 ± 0.0010.029 ± 0.0010.018 ± 0.0010.016 ± 0.0010.016 ± 0.0010.028 ± 0.0010.184 ± 0.0070.015 ± 0.001
mediamill0.030 ± 0.0010.034 ± 0.0010.033 ± 0.0010.033 ± 0.0010.033 ± 0.0010.032 ± 0.0010.035 ± 0.0010.051 ± 0.0050.032 ± 0.001
medical0.013 ± 0.0010.014 ± 0.0010.014 ± 0.0020.015 ± 0.0010.014 ± 0.0420.016 ± 0.0160.016 ± 0.0010.051 ± 0.0050.011 ± 0.001
scene0.094 ± 0.0050.114 ± 0.0020.143 ± 0.0030.111 ± 0.0060.147 ± 0.0050.085 ± 0.0080.119 ± 0.0050.291 ± 0.0200.089 ± 0.012
slashdot0.017 ± 0.0010.026 ± 0.0010.212 ± 0.0080.020 ± 0.0010.020 ± 0.0010.019 ± 0.0010.029 ± 0.0040.044 ± 0.0020.016 ± 0.002
Avg rank3.2500(2)5.0833(5)5.4583(7)5.1250(6)4.8750(4)3.8750(3)7.2917(8)8.2500(9)1.7917(1)
DatasetRanking Loss (↓)
LIFTLLSFMLTSVMGlocalHNOMLMDFSMCGMfRAkELRGLC
art0.113 ± 0.0050.171 ± 0.0060.625 ± 0.0060.154 ± 0.0160.122 ± 0.0030.146 ± 0.0020.345 ± 0.0170.454 ± 0.0150.143 ± 0.005
bibtex0.078 ± 0.0040.090 ± 0.0020.659 ± 0.0080.159 ± 0.0050.136 ± 0.0050.170 ± 0.0060.252 ± 0.0120.495 ± 0.0050.074 ± 0.003
business0.031 ± 0.0020.055 ± 0.0010.250 ± 0.0040.049 ± 0.0070.046 ± 0.0040.038 ± 0.0040.353 ± 0.0210.185 ± 0.0080.045 ± 0.003
corel5k0.128 ± 0.0030.190 ± 0.0070.891 ± 0.0020.148 ± 0.0040.137 ± 0.0020.132 ± 0.0020.224 ± 0.0120.812 ± 0.0070.172 ± 0.003
enron0.329 ± 0.0060.189 ± 0.0070.499 ± 0.0120.132 ± 0.0080.143 ± 0.0070.088 ± 0.0030.404 ± 0.0140.424 ± 0.0190.089 ± 0.011
genbase0.003 ± 0.0010.002 ± 0.0020.006 ± 0.0080.002 ± 0.0010.002 ± 0.0020.006 ± 0.0020.031 ± 0.0370.006 ± 0.0060.003 ± 0.003
health0.040 ± 0.0030.094 ± 0.0060.383 ± 0.0120.081 ± 0.0060.055 ± 0.0050.058 ± 0.0010.285 ± 0.0120.259 ± 0.0090.061 ± 0.002
languagelog0.163 ± 0.0070.232 ± 0.0100.731 ± 0.0140.193 ± 0.0060.287 ± 0.0160.159 ± 0.0070.311 ± 0.0210.546 ± 0.0200.131 ± 0.010
mediamill0.050 ± 0.0010.047 ± 0.0010.602 ± 0.0050.056 ± 0.0030.047 ± 0.0010.060 ± 0.0010.451 ± 0.0080.132 ± 0.0230.054 ± 0.001
medical0.029 ± 0.0060.035 ± 0.0040.168 ± 0.0180.201 ± 0.0330.021 ± 0.0590.043 ± 0.0040.227 ± 0.0230.131 ± 0.0230.023 ± 0.005
scene0.080 ± 0.0070.113 ± 0.0080.278 ± 0.0090.096 ± 0.0050.110 ± 0.0100.076 ± 0.0050.095 ± 0.0100.243 ± 0.0220.093 ± 0.008
slashdot0.042 ± 0.0030.035 ± 0.0050.484 ± 0.0220.069 ± 0.0070.061 ± 0.0080.050 ± 0.0050.416 ± 0.0350.144 ± 0.0120.057 ± 0.004
Avg rank2.4583(1)4.3750(5)8.5833(9)4.7500(6)3.3750(3)3.5000(4)7.5000(8)7.4167(7)3.0417(2)
DatasetOne Error (↓)
LIFTLLSFMLTSVMGlocalHNOMLMDFSMCGMfRAkELRGLC
art0.458 ± 0.0260.471 ± 0.0150.523 ± 0.0070.468 ± 0.0200.475 ± 0.0110.610 ± 0.0110.569 ± 0.0690.303 ± 0.0170.448 ± 0.006
bibtex0.386 ± 0.0090.358 ± 0.0220.422 ± 0.0100.528 ± 0.1370.617 ± 0.0070.532 ± 0.0110.752 ± 0.0250.194 ± 0.0070.374 ± 0.011
business0.106 ± 0.0070.133 ± 0.0080.092 ± 0.0060.119 ± 0.0090.126 ± 0.0050.117 ± 0.0100.660 ± 0.0470.068 ± 0.0070.116 ± 0.012
corel5k0.690 ± 0.0120.658 ± 0.0100.745 ± 0.0070.675 ± 0.0210.727 ± 0.0140.727 ± 0.0170.722 ± 0.0290.393 ± 0.0120.644 ± 0.021
enron0.637 ± 0.0210.381 ± 0.0270.139 ± 0.0070.293 ± 0.0370.300 ± 0.0170.264 ± 0.0100.760 ± 0.0330.089 ± 0.0180.230 ± 0.026
genbase0.003 ± 0.0000.003 ± 0.0040.001 ± 0.0010.002 ± 0.0030.006 ± 0.0030.017 ± 0.0060.144 ± 0.0600.002 ± 0.0030.001 ± 0.003
health0.238 ± 0.0320.274 ± 0.0100.236 ± 0.0140.272 ± 0.0150.293 ± 0.0150.374 ± 0.0170.476 ± 0.0290.124 ± 0.0060.254 ± 0.013
languagelog0.690 ± 0.0150.803 ± 0.0250.679 ± 0.0130.707 ± 0.0260.773 ± 0.0110.783 ± 0.0250.852 ± 0.0360.594 ± 0.0330.676 ± 0.018
mediamill0.137 ± 0.0030.159 ± 0.0040.134 ± 0.0040.154 ± 0.0030.191 ± 0.0070.161 ± 0.0030.960 ± 0.0140.071 ± 0.0120.151 ± 0.004
medical0.163 ± 0.0140.189 ± 0.0170.113 ± 0.0200.143 ± 0.0320.154 ± 0.1770.237 ± 0.0250.510 ± 0.0590.071 ± 0.0120.142 ± 0.024
scene0.240 ± 0.0210.286 ± 0.0150.177 ± 0.0130.264 ± 0.0110.292 ± 0.0140.224 ± 0.0190.208 ± 0.1600.153 ± 0.0240.258 ± 0.018
slashdot0.091 ± 0.0210.337 ± 0.0230.086 ± 0.0080.118 ± 0.0130.332 ± 0.0210.093 ± 0.0190.790 ± 0.0490.295 ± 0.0100.092 ± 0.014
Avg rank4.2083(4)5.9583(6)3.7917(3)4.9583(5)6.9583(8)6.4583(7)8.0000(9)1.6250(1)3.0417(2)
DatasetCoverage (↓)
LIFTLLSFMLTSVMGlocalHNOMLMDFSMCGMfRAkELRGLC
art4.478 ± 0.1810.245 ± 0.00911.68 ± 0.1295.903 ± 0.4040.180 ± 0.0055.299 ± 0.0901131 ± 45.460.329 ± 0.0130.211 ± 0.009
bibtex22.10 ± 1.1760.163 ± 0.0050.501 ± 0.00437.54 ± 1.1140.214 ± 0.0070.281 ± 0.0121459 ± 32.580.416 ± 0.0080.149 ± 0.004
business1.949 ± 0.1220.098 ± 0.0039.307 ± 0.2962.808 ± 0.2890.087 ± 0.0092.201 ± 0.199951.9 ± 32.760.112 ± 0.0050.086 ± 0.004
corel5k113.6 ± 2.8240.397 ± 0.014340.2 ± 1.197112.7 ± 1.2690.320 ± 0.009214.7 ± 8.13673.21 ± 5.3910.439 ± 0.0070.384 ± 0.005
enron33.91 ± 0.7210.433 ± 0.02730.89 ± 0.77418.04 ± 0.8720.356 ± 0.01912.86 ± 0.408328.3 ± 9.4650.423 ± 0.0300.240 ± 0.026
genbase0.403 ± 0.0540.010 ± 0.0010.292 ± 0.1670.352 ± 0.0810.012 ± 0.0030.565 ± 0.11911.37 ± 5.4640.016 ± 0.0050.014 ± 0.004
health2.549 ± 0.0990.160 ± 0.00611.91 ± 0.3464.595 ± 0.3500.106 ± 0.0083.186 ± 0.076829.8 ± 28.330.162 ± 0.0080.124 ± 0.003
languagelog13.14 ± 1.5570.275 ± 0.00130.79 ± 1.40317.86 ± 0.3080.293 ± 0.01612.63 ± 0.770163.4 ± 4.2650.309 ± 0.0140.143 ± 0.006
mediamill17.63 ± 0.1210.165 ± 0.00157.99 ± 0.50619.36 ± 0.2290.163 ± 0.00319.69 ± 0.3140.861 ± 0.0090.089 ± 0.0190.151 ± 0.004
medical1.811 ± 0.3420.049 ± 0.0044.648 ± 0.7005.737 ± 0.8280.033 ± 0.0652.783 ± 0.34778.17 ± 10.640.089 ± 0.0190.037 ± 0.008
scene0.482 ± 0.0410.105 ± 0.0070.887 ± 0.0520.568 ± 0.0330.108 ± 0.0090.465 ± 0.025444.7 ± 34.900.149 ± 0.0140.089 ± 0.041
slashdot0.857 ± 0.1160.044 ± 0.0059.644 ± 0.3232.288 ± 0.2080.056 ± 0.0060.046 ± 0.004742.9 ± 47.830.089 ± 0.0090.055 ± 0.004
Avg rank6.0833(6)2.5833(3)7.5833(8)6.9167(7)2.1667(2)5.7500(5)8.3333(9)3.8333(4)1.7500(1)
DatasetAverage Precision (↑)
LIFTLLSFMLTSVMGlocalHNOMLMDFSMCGMfRAkELRGLC
art0.622 ± 0.0160.600 ± 0.0090.461 ± 0.0070.606 ± 0.0210.619 ± 0.0080.526 ± 0.0060.220 ± 0.0140.538 ± 0.0150.626 ± 0.003
bibtex0.563 ± 0.0090.587 ± 0.0140.324 ± 0.0090.357 ± 0.0100.360 ± 0.0060.409 ± 0.0070.164 ± 0.0090.418 ± 0.0080.589 ± 0.006
business0.894 ± 0.0040.859 ± 0.0050.756 ± 0.0030.875 ± 0.0110.880 ± 0.0050.882 ± 0.0090.176 ± 0.0220.708 ± 0.0240.882 ± 0.007
corel5k0.286 ± 0.0070.251 ± 0.0070.089 ± 0.0030.273 ± 0.0070.254 ± 0.0040.254 ± 0.0040.302 ± 0.0430.160 ± 0.0050.298 ± 0.008
enron0.346 ± 0.0140.556 ± 0.0120.450 ± 0.0110.642 ± 0.0130.617 ± 0.0080.654 ± 0.0120.175 ± 0.0040.427 ± 0.0150.695 ± 0.018
genbase0.995 ± 0.0020.994 ± 0.0040.989 ± 0.0060.995 ± 0.0030.993 ± 0.0030.985 ± 0.0050.892 ± 0.0450.992 ± 0.0070.996 ± 0.005
health0.802 ± 0.0190.755 ± 0.0070.631 ± 0.0080.762 ± 0.0100.769 ± 0.0110.703 ± 0.0090.312 ± 0.0190.651 ± 0.0060.787 ± 0.005
languagelog0.397 ± 0.0270.266 ± 0.0190.267 ± 0.0110.387 ± 0.0190.296 ± 0.0330.319 ± 0.0200.104 ± 0.0060.183 ± 0.0170.408 ± 0.013
mediamill0.719 ± 0.0020.695 ± 0.0030.465 ± 0.0040.692 ± 0.002NA0.686 ± 0.0030.066 ± 0.0010.764 ± 0.0050.698 ± 0.001
medical0.875 ± 0.0120.864 ± 0.0160.799 ± 0.0240.766 ± 0.0280.887 ± 0.1510.805 ± 0.0160.411 ± 0.0480.764 ± 0.0050.893 ± 0.015
scene0.857 ± 0.0130.821 ± 0.0090.756 ± 0.0050.840 ± 0.0060.820 ± 0.0100.867 ± 0.0110.715 ± 0.0270.797 ± 0.0160.845 ± 0.011
slashdot0.893 ± 0.0250.664 ± 0.0060.658 ± 0.0140.875 ± 0.0100.887 ± 0.0100.882 ± 0.0130.100 ± 0.0140.592 ± 0.0050.885 ± 0.012
Avg rank2.5417(2)4.9583(6)7.2500(8)4.4583(3)4.5417(5)4.6667(4)8.2500(9)6.5000(7)1.8333(1)
DatasetMicro F1(↑)
LIFTLLSFMLTSVMGlocalHNOMLMDFSMCGMfRAkELRGLC
art0.359 ± 0.0100.416 ± 0.0150.373 ± 0.0040.346 ± 0.0210.225 ± 0.0070.152 ± 0.0020.420 ± 0.0140.246 ± 0.0070.394 ± 0.015
bibtex0.370 ± 0.0130.487 ± 0.0060.356 ± 0.0030.239 ± 0.0100.328 ± 0.0020.273 ± 0.0070.224 ± 0.0210.181 ± 0.0070.293 ± 0.006
business0.731 ± 0.0080.687 ± 0.0130.733 ± 0.0020.698 ± 0.0050.675 ± 0.0070.703 ± 0.0140.702 ± 0.0180.615 ± 0.0140.710 ± 0.018
corel5k0.073 ± 0.0020.205 ± 0.0410.119 ± 0.0030.037 ± 0.0090.006 ± 0.0020.024 ± 0.0010.194 ± 0.0170.061 ± 0.0040.123 ± 0.008
enron0.241 ± 0.0100.479 ± 0.0160.501 ± 0.0100.445 ± 0.0210.483 ± 0.0110.504 ± 0.0260.552 ± 0.0220.200 ± 0.0120.521 ± 0.015
genbase0.970 ± 0.0140.991 ± 0.0050.987 ± 0.0070.980 ± 0.0180.956 ± 0.0360.938 ± 0.0120.972 ± 0.0090.975 ± 0.0130.992 ± 0.031
health0.643 ± 0.0180.627 ± 0.0040.609 ± 0.0090.582 ± 0.0150.201 ± 0.0040.381 ± 0.0230.509 ± 0.1640.446 ± 0.0180.594 ± 0.014
languagelog0.172 ± 0.0390.205 ± 0.0230.221 ± 0.0090.058 ± 0.0100.074 ± 0.0190.070 ± 0.0230.263 ± 0.0080.028 ± 0.0050.133 ± 0.014
mediamill0.556 ± 0.0030.505 ± 0.0030.474 ± 0.0040.508 ± 0.0030.465 ± 0.0040.520 ± 0.0040.514 ± 0.0030.458 ± 0.0340.533 ± 0.001
medical0.753 ± 0.0300.755 ± 0.0290.759 ± 0.0260.714 ± 0.0240.708 ± 0.0770.668 ± 0.0190.698 ± 0.0130.458 ± 0.0340.798 ± 0.016
scene0.760 ± 0.0080.591 ± 0.0260.674 ± 0.0080.588 ± 0.0090.602 ± 0.0150.746 ± 0.0240.617 ± 0.0190.427 ± 0.0370.653 ± 0.015
slashdot0.772 ± 0.0130.737 ± 0.0120.643 ± 0.0110.774 ± 0.0170.745 ± 0.0050.745 ± 0.0200.682 ± 0.0290.581 ± 0.0120.778 ± 0.015
Avg rank3.5833(2)3.8333(4)3.7500(3)5.8333(7)6.5417(8)6.0417(6)4.5000(5)8.1667(9)2.7500(1)
Table 3. Summary of the Friedman statistics F F ( k = 9 , N = 12 ) and the critical value at significance level α = 0.05 in terms of each evaluation measure (k:# comparing algorithms; N:# datasets).
Table 3. Summary of the Friedman statistics F F ( k = 9 , N = 12 ) and the critical value at significance level α = 0.05 in terms of each evaluation measure (k:# comparing algorithms; N:# datasets).
Metric F F Critical Value
Hamming Loss49.09442.0454
Ranking Loss64.9111
One Error53.1111
Coverage78.3778
Average Precision55.3000
Micro F139.0833
Table 4. Comparison of RGLC (control algorithm) against the remaining approaches. The test statistics z i and p-value are determined by the Holm test at significance level α = 0.05. Algorithms that are statistically inferior than RGLC are in bold size.
Table 4. Comparison of RGLC (control algorithm) against the remaining approaches. The test statistics z i and p-value are determined by the Holm test at significance level α = 0.05. Algorithms that are statistically inferior than RGLC are in bold size.
Hamming LossRanking Loss
jalgorithm z j p α / ( k j + 1 ) jalgorithm z j p α / ( k j + 1 )
2fRA k EL−5.77650.0000000.0062MLTSVM−4.95660.0000000.006
3MCGM−4.91940.0000010.0073MCGM−3.98760.0000670.007
4MLTSVM−3.27960.0010400.0084fRA k EL−3.91310.0000910.008
5Glocal−2.98140.0028690.0105Glocal−1.52790.1265370.010
6LLSF−2.94420.0032380.0136LLSF−1.19250.2330650.013
7HNOML−2.75780.0058190.0177MDFS−0.40990.6818790.017
8MDFS−1.86340.0624070.0258HNOML−0.29810.7656270.025
9LIFT−1.30440.1921060.0509LIFT0.52181.0000000.050
One ErrorCoverage
jalgorithm z j p α / ( k j + 1 ) jalgorithm z j p α / ( k j + 1 )
2MCGM−4.43480.0000090.0062MCGM−5.88830.0000000.006
3HNOML−3.50310.0004600.0073MLTSVM−5.21750.0000000.007
4MDFS−3.05590.0022440.0084Glocal−4.62120.0000040.018
5LLSF−2.60870.0090890.0105LIFT−3.87590.0001060.010
6Glocal−1.71430.0864740.0136MDFS−3.57780.0003470.013
7LIFT−1.04340.2967630.0177fRAkEL−1.86340.0624070.017
8MLTSVM−0.67080.5023480.0258LLSF−0.74540.4560570.025
9fRAkEL1.26711.0000000.0509HNOML−0.37270.7093880.050
Average PrecisionMicro F1
jalgorithm z j p α / ( k j + 1 ) jalgorithm z j p α / ( k j + 1 )
2MCGM−5.73920.0000000.0062fRA k EL−4.84480.0000010.006
3MLTSVM−4.84480.0000010.0073HNOML−3.39140.0006950.007
4fRA k EL−4.17400.0000300.0084MDFS−2.94420.0032380.008
5LLSF−2.79510.0051890.0105Glocal−2.75780.0058190.010
6MDFS−2.53420.0112700.0136MCGM−1.56520.1175250.013
7HNOML−2.42240.0154180.0177LIFT−0.96900.3325640.017
8Glocal−2.34790.0188810.0258MLTSVM−0.89440.3710930.025
9LIFT−0.63360.5263730.0509LLSF−0.74540.4560570.050
Table 5. Comparison of each component in RGLC. Best results are in bold size. ↓: the smaller the better; ↑: the larger the better.
Table 5. Comparison of each component in RGLC. Best results are in bold size. ↓: the smaller the better; ↑: the larger the better.
DatasetHamming Loss (↓)DatasetRanking Loss (↓)
RGLCRGLC-LFRGLC-LLRGLC-GCRGLC-LCRGLCRGLC-LFRGLC-LLRGLC-GCRGLC-LC
art0.060(2)0.062(5)0.061(4)0.060(2)0.060(2)art0.143(1)0.156(5)0.153(4)0.146(3)0.144(2)
bibtex0.012(1)0.014(5)0.013(3)0.013(3)0.013(3)bibtex0.074(1)0.131(5)0.085(2.5)0.085(2.5)0.089(4)
business0.025(1)0.028(4)0.028(4)0.027(2)0.028(4)business0.045(1)0.051(5)0.050(4)0.046(2)0.047(3)
corel5k0.009(2)0.009(2)0.010(4.5)0.009(2)0.010(4.5)corel5k0.172(2)0.161(1)0.199(5)0.177(3)0.195(4)
enron0.057(1)0.073(5)0.072(4)0.071(3)0.068(2)enron0.089(1)0.128(5)0.125(3.5)0.122(2)0.125(3.5)
genbase0.002(3)0.001(1)0.003(5)0.002(3)0.002(3)genbase0.003(4)0.002(1.5)0.003(4)0.002(1.5)0.003(4)
health0.033(1)0.039(5)0.037(3.5)0.036(2)0.037(3.5)health0.061(1)0.081(5)0.075(4)0.069(2)0.070(3)
languagelog0.015(1)0.018(5)0.017(4)0.016(2.5)0.016(2.5)languagelog0.131(1)0.226(5)0.178(3)0.187(4)0.179(2)
mediamill0.032(1)0.033(3.5)0.033(3.5)0.033(3.5)0.033(3.5)mediamill0.054(1)0.056(2.5)0.056(2.5)0.057(4)0.058(5)
medical0.011(2)0.018(5)0.011(2)0.011(2)0.012(4)medical0.023(3.5)0.039(5)0.019(1)0.022(2)0.023(3.5)
scene0.089(1)0.121(5)0.116(4)0.110(2.5)0.110(2.5)scene0.093(1)0.102(5)0.095(3.5)0.094(2)0.095(3.5)
slashdot0.016(1)0.022(5)0.017(3)0.017(3)0.017(3)slashdot0.057(1)0.070(5)0.069(4)0.068(3)0.067(2)
Avg. Rank1.417(1)4.208(5)3.708(4)2.542(2)3.125(3)Avg Rank1.542(1)4.167(5)3.417(4)2.583(2)3.292(3)
DatasetOne Error (↓)DatasetCoverage (↓)
RGLCRGLC-LFRGLC-LLRGLC-GCRGLC-LCRGLCRGLC-LFRGLC-LLRGLC-GCRGLC-LC
art0.448(1)0.468(5)0.458(2)0.460(3)0.459(4)art0.211(1)5.96(5)0.225(4)0.215(3)0.214(2)
bibtex0.374(2)0.491(5)0.364(1)0.380(3)0.383(4)bibtex0.149(1)33.15(5)0.163(3)0.161(2)0.169(4)
business0.116(1)0.122(4.5)0.122(4.5)0.118(2.5)0.118(2.5)business0.086(1)2.864(5)0.094(4)0.087(2)0.088(3)
corel5k0.644(1)0.647(2)0.652(5)0.650(3)0.651(4)corel5k0.384(1)129.5(5)0.445(4)0.405(2)0.439(3)
enron0.230(1)0.283(5)0.264(2)0.279(4)0.266(3)enron0.240(1)17.44(5)0.314(3)0.310(2)0.319(4)
genbase0.001(1)0.006(3)0.003(3.5)0.002(2)0.003(3.5)genbase0.015(4)0.348(5)0.014(2.5)0.012(1)0.014(2.5)
health0.254(1)0.272(5)0.262(4)0.255(2)0.256(3)health0.143(4)4.595(5)0.135(3)0.127(1)0.129(2)
languagelog0.676(2)0.743(5)0.665(1)0.689(4)0.678(3)languagelog0.143(1)20.58(5)0.188(2)0.198(4)0.193(3)
mediamill0.151(1)0.153(2)0.169(5)0.167(3.5)0.167(3.5)mediamill0.151(1)19.28(5)0.180(2)0.185(3)0.187(4)
medical0.142(1)0.162(5)0.155(4)0.147(2)0.149(3)medical0.037(3)2.496(5)0.031(1)0.038(4)0.036(2)
scene0.258(1)0.281(5)0.269(3)0.276(4)0.268(2)scene0.089(1)0.599(5)0.091(2)0.092(3)0.093(4)
slashdot0.092(1)0.116(5)0.108(4)0.104(2)0.105(3)slashdot0.055(1)2.317(5)0.065(4)0.064(3)0.062(2)
Avg. Rank1.167(1)4.292(5)3.250(4)2.917(2)3.208(3)Avg Rank1.667(1)5.000(5)2.875(3)2.500(2)2.958(4)
DatasetAverage Precision (↑)DatasetMicro F1 (↑)
RGLCRGLC-LFRGLC-LLRGLC-GCRGLC-LCRGLCRGLC-LFRGLC-LLRGLC-GCRGLC-LC
art0.626(1)0.606(5)0.614(3.5)0.614(3.5)0.618(2)art0.394(1)0.349(5)0.353(2)0.338(3.5)0.338(3.5)
bibtex0.589(1)0.433(5)0.580(2)0.573(3)0.570(4)bibtex0.293(4)0.254(5)0.348(1)0.305(3)0.334(2)
business0.882(1)0.873(4.5)0.873(4.5)0.878(2)0.876(3)business0.710(2)0.707(4.5)0.708(3)0.714(1)0.707(4.5)
corel5k0.298(1)0.283(5)0.289(4)0.292(2)0.291(3)corel5k0.123(2)0.056(4)0.188(1)0.037(5)0.090(3)
enron0.695(1)0.648(5)0.662(2)0.658(4)0.661(3)enron0.521(1)0.457(5)0.464(4)0.467(3)0.480(2)
genbase0.996(1.5)0.994(4)0.995(3)0.996(1.5)0.993(5)genbase0.991(1)0.985(2)0.972(5)0.976(3.5)0.976(3.5)
health0.787(1)0.762(5)0.772(4)0.779(2)0.778(3)health0.594(2)0.582(5)0.596(1)0.593(3)0.590(4)
languagelog0.408(2)0.346(5)0.416(1)0.401(4)0.405(3)languagelog0.133(2)0.134(1)0.127(3)0.089(4)0.071(5)
mediamill0.698(1)0.693(2)0.680(5)0.683(3)0.681(4)mediamill0.533(1)0.509(2)0.490(5)0.493(3)0.492(4)
medical0.893(1)0.862(5)0.886(4)0.887(2.5)0.887(2.5)medical0.798(1)0.703(5)0.794(2.5)0.794(2.5)0.779(4)
scence0.845(1)0.829(5)0.839(2.5)0.836(4)0.839(2.5)scene0.653(1)0.543(5)0.581(4)0.628(2)0.622(3)
slashdot0.885(1)0.867(5)0.875(2.5)0.875(2.5)0.878(4)slashdot0.778(1)0.773(2)0.767(5)0.769(4)0.772(3)
Avg. Rank1.208(1)4.625(5)3.167(3)2.833(2)3.250(4)Avg Rank1.583(1)3.792(5)3.042(2)3.125(3)3.458(4)
Table 6. Runtime (in seconds) for learning with 60% noisy features and missing labels. Best results are in bold size.
Table 6. Runtime (in seconds) for learning with 60% noisy features and missing labels. Best results are in bold size.
DatasetAlgorithms
LIFTLLSFMLTSVMGlocalHNOMLMDFSMCGMfRAkELRGLC
art162.4(9)0.337(2)87.41(8)26.34(3)50.16(5)56.06(6)65.80(7)0.317(1)27.97(4)
bibtex1033(9)5.432(2)630.5(8)349.3(5)370.6(6)253.8(4)246.4(3)1.658(1)482.9(7)
business201.1(9)0.420(2)99.48(8)24.24(3)45.95(5)46.66(6)77.64(7)0.168(1)30.04(4)
corel5k466.2(9)5.469(2)425.9(8)106.3(5)83.31(4)133.7(6)160.7(7)4.298(1)72.57(3)
enron22.02(7)0.559(1)5.107(4)35.04(8)8.611(5)5.007(3)10.77(6)0.608(2)81.94(9)
genbase2.463(6)0.546(2)1.089(4)15.73(8)2.783(7)0.620(3)2.007(5)0.293(1)46.28(9)
health538.0(9)0.647(2)94.44(8)21.69(3)64.63(6)50.77(4)72.75(7)0.284(1)51.47(5)
mediamill1555(9)9.209(1)454.1(8)154.9(3)438.9(7)435.5(6)224.3(5)12.60(2)198.8(4)
medical8.858(7)1.313(2)2.958(4)30.12(9)5.576(5)1.552(3)7.183(6)0.347(1)29.87(8)
scene11.94(9)0.143(1)2.131(3)4.158(5)7.412(8)5.917(7)5.451(6)0.906(2)4.143(4)
slashdot36.72(7)0.459(1)15.55(3)15.10(2)40.08(8)21.69(5)17.29(4)24.45(6)59.77(9)
Total&Rank7.50(9)1.50(1)5.50(7)4.50(4)5.50(7)4.42(3)5.25(5)1.58(2)5.50(7)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhao, T.; Zhang, Y.; Pedrycz, W. Robust Multi-Label Classification with Enhanced Global and Local Label Correlation. Mathematics 2022, 10, 1871. https://doi.org/10.3390/math10111871

AMA Style

Zhao T, Zhang Y, Pedrycz W. Robust Multi-Label Classification with Enhanced Global and Local Label Correlation. Mathematics. 2022; 10(11):1871. https://doi.org/10.3390/math10111871

Chicago/Turabian Style

Zhao, Tianna, Yuanjian Zhang, and Witold Pedrycz. 2022. "Robust Multi-Label Classification with Enhanced Global and Local Label Correlation" Mathematics 10, no. 11: 1871. https://doi.org/10.3390/math10111871

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop