You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

3 February 2023

Semi-Supervised Multi-Label Dimensionality Reduction Learning by Instance and Label Correlations

,
,
,
,
and
1
Yunnan Key Lab of Computer Technology Applications, Kunming University of Science and Technology, Kunming 650500, China
2
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
3
School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85287, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Mathematical Optimization in Pattern Recognition, Machine Learning and Data Mining

Abstract

The label learning mechanism is challenging to integrate into the training model of the multi-label feature space dimensionality reduction problem, making the current multi-label dimensionality reduction methods primarily supervision modes. Many methods only focus attention on label correlations and ignore the instance interrelations between the original feature space and low dimensional space. Additionally, very few techniques consider how to constrain the projection matrix to identify specific and common features in the feature space. In this paper, we propose a new approach of semi-supervised multi-label dimensionality reduction learning by instance and label correlations (SMDR-IC, in short). Firstly, we reformulate MDDM which incorporates label correlations as a least-squares problem so that the label propagation mechanism can be effectively embedded into the model. Secondly, we investigate instance correlations using the k-nearest neighbor technique, and then present the l 1 -norm and l 2 , 1 -norm regularization terms to identify the specific and common features of the feature space. Experiments on the massive public multi-label data sets show that SMDR-IC has better performance than other related multi-label dimensionality reduction methods.

1. Introduction

Multi-label learning tasks are frequently accompanied by high-dimensional feature space, as are other machine learning paradigms. When learning directly from large-scale, high-dimensional data, algorithms usually fail to perform in classification [1]. First, the high-dimensional space’s overabundance of redundant and irrelevant information makes it more difficult for the model to identify the interclass structure of the data; second, the multicollinearity between the feature attributes of high-dimensional data results in the model’s poor generalization ability; and third, in high-dimensional feature spaces, traditional distances (such as the Euclidean distance) do not have the ability to measure the manifold structure between samples. In reality, Euclidean distance plays a significant role in the majority of classifiers. The term “dimensional curse” [2] is used to describe these problems. Dimensionality reduction becomes an important preprocessing step for multi-label learning on high-dimensional data as a result.
Contrary to the traditional single-label learning tasks, which presuppose that the labels of instances are mutually exclusive, the labels of multi-label instances are inter-correlated. In the multi-class classification task of photos, for instance, “desert” and “camel” coexist frequently, although “sea” does not frequently coexist with “desert”. People are motivated by this fact to learn about or label samples with unknown labels using known labels by relying on the correlations between labels. Unfortunately, it is challenging to incorporate such a label-learning mechanism into the model to estimate the labels of unlabeled instances due to the particularity of feature dimensionality reduction. Because of this, the majority of multi-label dimensionality reduction techniques that have been developed recently are supervised settings that require enough labeled instances [3,4,5,6]. These supervised approaches, sadly, fail to take into account the fact that, in many real-world applications, it is difficult to annotate enough unlabeled instances because of the high labeling costs. As a consequence, labeling a sizable training dataset is typically impractical in authentic situations [7]. Unlabeled instances, on the contrary, are typically available and plentiful. Then, certain semi-supervised multi-label dimensionality reduction techniques have been developed (see [8,9,10]), which can enhance learning performance by efficiently combining a large number of unlabeled instances with the scarce number of labeled instances.
Most of these semi-supervised dimensionality reduction techniques start by developing various label propagation methods based on label correlations, then applying them to the k-nearest neighbor (kNN) graph to generate soft labels for instances that are not labeled. When expanding labeled training samples, the projection matrix from high-dimensional space to low-dimensional space is trained on the assumption that the sample features’ between-class distance achieves the maximum and their within-class distance reaches the minimum. This approach, often known as MLDA (multi-label linear discriminant analysis) [6], or its promotion variant, is the core of these concepts. The instance correlations between the original feature space and low dimensional space are ignored by the MLDA framework, despite the fact that it can integrate label correlations.
Given that the multi-label data with high-dimensional features contains a significant amount of redundant and irrelevant information, it is necessary to selectively extract specific and common features of the feature space for concentration, while reducing dimensions, and to remove the negative effects of unimportant features. Feature extraction and feature selection are techniques that are comparable to this. The l 2 , 1 -norm is frequently employed in multi-label feature selection because it may choose distinguishing features for all instances with joint sparsity (each feature has a lower score for all instances or a higher score for all instances) [11,12]. The l 2 , 1 -norm’s drawback is equally clear though—it does not take into account the distinctive features or the redundant correlation of features [13]. To compensate for the shortcomings of the  l 2 , 1 -norm and improve the projection matrix’s ability to identify within-class features of samples, we use the  l 1 -norm as the model regularization term to learn the high sparsity specific features of the low-dimensional feature space of samples.
In this paper, we propose a novel method, namely semi-supervised multi-label dimensionality reduction learning by instance and label correlations (SMDR-IC) based on dependence maximization. This method effectively utilizes the information from both labeled and unlabeled instances, simultaneously considers the label and instance correlations, and also concurrently incorporates specific and common features of feature space. Our major contributions are summarized as follows.
The Hilbert–Schmidt Independence Criterion (HSIC) [14] has been mathematically shown to maximize the dependence between the original feature description and the associated class label. Motivated by this, we use the matrix factorization technique to reconstruct the HSIC empirical estimator in MDDM [4] into a least squares problem, enabling the label propagation mechanism to be seamlessly incorporated into the dimensionality reduction learning model.
Consideration is given to the instance correlations. In order to use instance correlations in dimensionality reduction, we introduce a new assumption, which states that if two instances have a high degree of correlation in the original feature space, they should also have a high degree of correlation in the low-dimensional feature space. The instance correlations are assessed using the k-nearest neighbor approach.
Through the use of the  l 1 -norm and l 2 , 1 -norm regularization terms to select the appropriate features, the specific features and the common features of the feature space are simultaneously investigated in our method, which helps to enhance the performance of dimensionality reduction.
The rest of the paper is organized as follows. The related work is briefly reviewed in Section 2. Section 3 introduces the details of our proposed SMDR-IC method. Experimental results are analyzed in Section 4. Finally, Section 5 concludes this paper.

3. Materials and Methods

In this section, we first go over some important notation and symbols, and thereafter elaborate on our proposed SMDR-IC method.

3.1. Preliminaries

Let X = [ x 1 T ; ; x l T ; x l + 1 T ; ; x n T ] R n × d be a data set of n d-dimensional instances, where the first l of the instances are labeled and the remaining u are unlabeled, l + u = n . L = { 1 , 2 , , c } denotes the label set, and c means that each instance includes c labels. Drawing on [9], the additional ( c + 1 ) -th label is appended into the label set in order to detect the outliers. Define the initial label matrix Y = [ y 1 T ; ; y l T ; y l + 1 T ; ; y n T ] = [ Y l ; Y u ] R n × ( c + 1 ) , where Y l denotes the initial labels of labeled instances and Y u denotes the initial labels of unlabeled instances. For the labeled instances, Y i j = 1 if the i-th instance is labeled as j, and Y i j = 0 otherwise. For the unlabeled instances, Y i j = 1 if  j = c + 1 , and Y i j = 0 otherwise. We suppose the predicted label matrix as F = [ F 1 T ; ; F l T ; F l + 1 T ; ; F n T ] = [ F l ; F u ] R n × ( c + 1 ) , where F i R c + 1 ( 1 i n ) are column vectors and 0 F i j 1 . F l denotes the predicted labels of labeled instances and F u denotes the predicted labels of unlabeled instances.
The objective is to learn a projection matrix P R d × t that projects an instance x from original feature space R d to a lower dimensional space representation z R t , and
z = x T P ,
where t d .

3.2. Obtaining Soft Label by Label Propagation

3.2.1. Neighborhood Graph Construction

To accomplish label propagation, a graph construction consisting of labeled and unlabeled instances is built to evaluate the similarities among neighboring instances. The weighted adjacency matrix W is defined specifically by using a  k NN graph over n instances, as shown below:
W i j = 1 , if x i k NN ( x j ) or x j k NN ( x i ) , 0 , otherwise ,
where k NN ( x i ) contains the k-nearest neighbors of  x i computed by the Euclidean distance. Because of its simplicity and wide applicability, the weight in Equation (6) is simply set to  0 1 weight. This adjacency matrix W can also be obtained by other weight (i.e., Gaussian heat kernel) and distance settings.
We normalize W as a stochastic matrix W ˜ to ensure that the sum of the transitional probabilities from i to the other nodes of the graph equals 1, as follows:
W ˜ = D 1 W ,
where D = diag ( d 11 , , d n n ) and d i i = j = 1 n W i j . W ˜ i j can be considered as the probability of a transition from node i to node j along the edge between them.

3.2.2. Label Propagation

Multi-label learning differs from single-label. The former assumes that instances’ labels are inter-correlated, while the latter assumes that instance labels are independent of one another. To convert the single-label propagation version to the multi-label propagation version, we first normalize the initial label matrix:
Y ˜ i j = 1 Y i , if Y i j = 1 , 0 , otherwise ,
where Y i denotes the number of labels that the instance x i belong to.
The following equation updates the probability that instance x i has the j-th class label:
F i j ( t + 1 ) = λ i k = 1 n W ˜ i k F k j ( t ) + ( 1 λ i ) Y ˜ i j .
Obviously, in each iteration, the probability of each instance partially propagates from their neighbors and partially from their own labels. According to the labeled and unlabeled instances, we divide the matrix W ˜ , Y ˜ , F into the following forms based on the labeled and unlabeled instances:
W ˜ = W ˜ l l W ˜ l u W ˜ u l W ˜ u u , Y ˜ = Y ˜ l Y ˜ u , F = F l F u .
For the labeled instances, we fix the labels as F l = Y ˜ l and λ l = 0 . For the unlabeled instances, the iteration can be written as follows:
F u ( t + 1 ) = I λ u W ˜ u l F l ( t ) + I λ u W ˜ u u F u ( t ) + ( I I λ u ) Y ˜ u ,
where I R u × u is an identify matrix and I λ u R u × u is a diagonal matrix with the diagonal elements λ u . Due to  F l ( t ) = Y ˜ l , F u ( 0 ) = Y ˜ u , we have:
F u ( t + 1 ) = i = 0 t ( I λ u W ˜ u u ) i I λ u W ˜ u l Y ˜ l + ( I λ u W ˜ u u ) t + 1 Y ˜ u + i = 0 t ( I λ u W ˜ u u ) i ( I I λ u ) Y ˜ u ,
{ F u ( t ) } is a convergent sequence, the convergence analysis has been presented in [9]. After a finite number of iterations, F u will converge to
F u = ( I I λ u W ˜ u u ) 1 ( I λ u W ˜ u l Y l ˜ + ( I I λ u ) Y ˜ u ) .
It is easily found that the sum of each row of  F u is equal to 1. This means that the elements in  F are the probability values and F i j can be viewed as the posterior probability of the instance x i belonging to the j-th class. In particular, F i , c + 1 represents the probability of the instance x i belonging to the outliers. After obtaining the predicted label F i j for each instance x i , we define these labels F i j ( 1 j c ) as soft labels.

3.3. Dimensionality Reduction

In this subsection, we utilize soft labels to learn the projection matrix P . Firstly, we go into detail on how to incorporate the label propagation mechanism and label correlations into MDDM to construct least squares, and then we integrate instance correlations as well as specific and common features of feature space into our approach separately. Finally, we discuss how our proposed dimensionality reduction strategy was optimized.

3.3.1. Design of the Semi-Supervised Mode

It is evident from label propagation Equations (6) and (9) that the label propagation measures the soft labels of unlabeled instances by the similarity of instance features, whereas label correlations are almost never employed. Therefore, to improve the performance of the propagation mechanism, we take the label correlations into consideration. According to the previous work [6], the correlation, namely cosine similarity, between two distinct label classes is expressed as follows:
C k l = cos ( Y ( k ) , Y ( l ) ) = Y ( k ) , Y ( l ) Y ( k ) Y ( l ) .
Then, we have F ˜ u = F u C and F ˜ = [ F l ; F ˜ u ] . Next, the resulting soft label matrix is then used to compute the label kernel matrix L in Section 2.2, effectively transforming the MDDM into a semi-supervised technique. L can be rewritten below:
L = F ˜ F ˜ T .

3.3.2. Reformulate MDDM to Least Square

To reformulate MDDM to a least squares problem, we first define two necessary matrix S 1 , S 2 as follows:
S 1 = μ X T X + ( 1 μ ) I , S 2 = X T HLHX .
Then, the generalized eigenvalue problem in Equation (4) can be written as S 2 p = λ S 1 p . If S 1 is inverse (when μ 1 , S 1 must be inverse), the optimal P is given by the eigenvectors of  S 1 1 S 2 corresponding to the d eigenvalues. Now, we apply the matrix decomposition technique to decompose S 1 1 S 2 . The singular value decomposition of  X is defined as:
X = U diag ( Σ t , 0 ) V T ,
where U R n × n and V R d × d are orthogonal matrices and t = rank ( X ) ; Σ t R t × t is an orthogonal matrix with the diagonal elements being singular values of  X , and diag ( Σ t , 0 ) R n × d is a matrix in which the first r diagonal elements are singular values of  X and 0 otherwise.
Let U = [ U 1 , U 2 ] and V = [ V 1 , V 2 ] , where U 1 R n × t , U 2 R n × ( n t ) , V 1 R d × t , and V 2 R d × ( d t ) . Then S 1 , S 2 can be rewritten as:
S 1 = V 1 [ μ Σ t 2 + ( 1 μ ) I ] V 1 T , S 2 = V 1 Σ t U 1 T H F ˜ F ˜ T H U 1 Σ t V 1 T .
According to Equation (18), we can calculate S 1 1 S 2 as:
S 1 1 S 2 = V 1 [ μ Σ t 2 + ( 1 μ ) I ] 1 Σ t U 1 T H F ˜ F ˜ T H U 1 Σ t V 1 T .
We define a diagonal matrix B as follows:
B = [ μ Σ t 2 + ( 1 μ ) I ] 1 2 .
Notice that B is an inverse matrix and B = B T . According to Equations (19) and (20), we rewrite S 1 1 S 2 as:
S 1 1 S 2 = V 1 B B T Σ t U 1 T H F ˜ F ˜ T H U 1 Σ t B B 1 V 1 T .
Denote T = F ˜ T HU 1 Σ t B R c × t , and let T = P 1 Λ P 2 T be the singular value decomposition of  T , where P 1 R c × c , P 2 R t × t , and Λ R c × t is a diagonal matrix. Then we have:
S 1 1 S 2 = V 1 B P 2 Λ P 1 T P 1 Λ P 2 T B 1 V 1 T = V 1 B P 2 Λ ˜ P 2 T B 1 V 1 T ,
where Λ ˜ = Λ 2 . Thus, the solution for problem Equation (4), which consists of the eigenvectors corresponding to the eigenvalues of  S 1 1 S 2 , is provided by the following equation:
P = V 1 B P 2 .
Consider the following least squares problem:
min P XP Z F 2 ,
where we assume that both the observation matrix X and the target matrix Z are centered. The optimal solution of Equation (24) is given by [39]:
P = ( X T X ) X T Z ,
where ( · ) is the pesudo-inverse of a matrix. If we set Z = U 1 Σ t BP 2 , then we have:
P = V 1 Σ t 2 V 1 T V 1 Σ t U 1 T U 1 Σ t BP 2 = V 1 B P 2 ,
which is exactly the same formula as in Equation (23). It implies that the MDDM formulation in Equation (4) is equivalent to the least-squares formulation in Equation (24). On the basis of this connection of equivalence, we can attach some constraint conditions—such as instance correlations and sparse constraints—as regularization items.

3.3.3. Incorporating Instance Correlations

In our dimensionality reduction approach, we not only consider the label correlations as shown in Equations (15) and (18) but also incorporate the instance correlations. By making the assumption that two instances, x i and x j , may be related in the label space if they are correlated in the feature space, the previous classification algorithms [40,41] incorporate instance correlations. In fact, instances in multi-label dimensionality reduction problems should also maintain the interrelation of their features and labels even before and after dimensionality reduction. As a result, if two instances x i and x j are correlated in the original feature space, we assume they will also be related in the low-dimensionality feature space.
Instead of evaluating the instance correlations using the cosine similarity, we adopt the k-nearest neighbor (kNN) mechanism to reduce the effects of noisy and redundant features. Thus, the weighted adjacency matrix W in Equation (6) can be exploited to define the following regularization term for assessing the interrelation of feature space:
min P i , j n W i j x i T P x j T P = tr ( ( XP ) T L ( XP ) ) = tr ( O T LO ) ,
where O = XP is the output low-dimensionality matrix and L = W * W indicates the  n × n Laplacian matrix of  W . W * is a diagonal matrix and W i i * = j = 1 n W i j . After incorporating this regularization, we can rewrite our objective function in Equation (24) as follows:
min P 1 2 XP Z F 2 + α 2 tr ( O T LO ) .

3.3.4. Incorporating Specific and Common Features

Furthermore, two norm regularization terms that constrain the sparsity of matrix P are also incorporated in our technique to enhance the performance of the dimensionality reduction. One is the  l 1 -norm, which can enforce sparsity among all elements in  P and shrink some parameters to zero, allowing specific features in the original feature space to be selected. The l 2 , 1 -norm, which is the other norm, can guarantee the sparsity of  P in rows, which is advantageous for choosing common features in the original feature space. For instance, in Figure 1, when the sparse projection matrix P with five rows and three columns is used to reduce the dimensions of instances x 1 and x 2 , the first feature f 1 of the projection instances z 1 and z 2 is dominated of the original features f 1 and f 4 , the second feature f 2 is made up of the original features f 1 , f 2 , and f 3 , and the third feature f 3 is determined by  f 2 and f 5 , respectively. These features can be thought of as the specific features of each feature in the low dimensional feature space. To give P this performance, we employ the  l 1 -norm. While features f 1 and f 2 contribute to the first, second, and third features of  z 1 and z 2 , they can be regarded as common features in the original feature space, and we leverage the  l 2 , 1 -norm to capture these common features.
Figure 1. An explanation of specific and common features of feature space.
The final objective function of our proposed approach can be rewritten as Equation (29) after incorporating these two regularization terms.
min P 1 2 XP Z F 2 + α 2 tr ( O T LO ) + β P 1 + γ P 2 , 1 ,
where α , β , and γ are constant coefficients.

3.3.5. Optimization

Despite knowing that Equation (29) is a convex optimization problem, the objective function is not smooth because of the non-smoothness of the  l 1 -norm and l 2 , 1 -norm regularization terms. To address this non-smooth optimization problem, we first release P 2 , 1 by  tr ( P T AP ) in the following [11], where A indicates a  d × d diagonal matrix, the i-th diagonal value in  A is denoted as A i i = 1 2 P i 2 . Then, to solve the  l 1 -norm regularization term, we employ the accelerated proximal gradient method (APG).
In the general accelerated proximal gradient method, a convex optimization problem can be defined as:
min P H F ( P ) = f ( P ) + g ( P ) ,
where H indicates a real Hilbert space. f ( P ) and g ( P ) are both convex, but they are smooth and non-smooth, respectively. The gradient function f is also Lipschitz continuous, i.e., f ( P 1 ) f ( P 2 ) F 2 L f Δ P , where Δ P = P 1 P 2 and L f is the Lipschitz constant. Proximal gradient algorithms, rather than directly minimizing F ( P ) , minimize a sequence of separable quadratic approximations to  F ( P ) , denoted as:
Q L f ( P , P ( m ) ) = f ( P ( m ) ) + f ( P ( m ) ) , P P ( m ) + L f 2 P P ( m ) F 2 + g ( P ) .
Let G ( m ) = P ( m ) 1 L f f ( P ( m ) ) , then
P * = arg min P Q L f ( P , P ( m ) ) = arg min P g ( P ) + L f 2 P G ( m ) F 2 .
According to Equations (29) and (30), f ( P ) and g ( P ) are defined as follows:
f ( P ) = 1 2 XP Z F 2 + α 2 tr ( O T LO ) + γ P 2 , 1 ,
g ( P ) = β P 1 .
According to Equation (33), we can calculate f ( P ) as:
f ( P ) = X T XP X T Z + α X T LXP + γ AP .
According to Equations (32)–(34), the projection matrix P can be optimized by
P * = arg min P Q L f ( P , P ( m ) ) = arg min P L f 2 P G ( m ) F 2 + g ( P ) = arg min P 1 2 P G ( m ) F 2 + β L f P 1 .
Lin et al. [42] proposed that instead, setting P ( m ) = P m + b m 1 1 b m ( P m P m 1 ) for a sequence b m by satisfying b m + 1 2 b m b m 2 can improve the convergence rate to  O ( m 2 ) , where P m is the result of  P at the m-th iteration and P ( m ) is the intermediate variable at the m-th iteration. Additionally, for the  l 1 -norm regularization term g ( P ) , if  H is a normed space endowed with the Frobenius norm · F and g ( · ) is l 1 -norm, then P m + 1 is generated by soft-thresholding the entries of  G ( m ) as:
P m + 1 = S ε [ G ( m ) ] = arg min P 1 2 P G ( m ) F 2 + ε P 1 ,
where S ε [ ω ] is the soft-thresholding operation, ω R and ε > 0 , defined as:
S ε [ ω ] = ω ε , if ω > ε , ω + ε , if ω < ε , 0 , otherwise .
Then, in each iteration, P m + 1 can be obtained by the following soft-thresholding operation:
P m + 1 = S β L f [ G ( m ) ] .
Here, we calculate the Lipschitz constant L f . Given P 1 and P 2 , according to Equation (35), we have:
f ( P 1 ) f ( P 2 ) F 2 = X T X Δ P + α X T LX Δ P + γ A Δ P F 2 3 ( X T X Δ P F 2 + α X T LX Δ P F 2 + γ A Δ P F 2 ) 3 ( X T X 2 2 + α X T LX 2 2 + γ A 2 2 ) Δ P F 2 ,
where Δ P = P 1 P 2 . Obviously, L f is formed by
L f = 3 ( X T X 2 2 + α X T LX 2 2 + γ A 2 2 ) .
We fix the value of  A , which is determined by the initial value of  P , to ensure that L f will always be a constant value.
Algorithm 1 summarizes the pseudo-code of the SMDR-IC method. This algorithm produces the projection matrix P , which can map the features of instances from d-dimensional to t-dimensional space, here t = rank ( X ) and t d . If the dimensionality is to be reduced to  r ( r < t ) , the first r columns of the projection matrix P can be chosen.
Algorithm 1 SMDR-IC: Semi-supervised Multi-label Dimensionality Reduction Learning by Instance and Label Correlations
Require: 
Feature matrix X R n × d , label matrix Y R n × c , and parameters k, μ , α , β , γ , σ
Ensure: 
Projection matrix P R d × t , t = rank ( X )
1:
F l Y , F u 0 ;
2:
Calculate the neighborhood graph construction matrix W for label propagation and instance correlations by using k-nearest neighbors;
3:
Obtain the soft label matrix F ;
4:
Calculate the label correlations matrix C and obtain the label matrix F ˜ ;
5:
Calculate the target matrix Z ;
6:
b 0 , b 1 1 , m 1 ;
7:
P 0 , P 1 ( X T X + σ I ) 1 X T Z ;
8:
Calculate the diagonal matrix A ;
9:
Calculate Lipschitz constant L f ;
10:
repeat:
P ( m ) P m + b m 1 1 b m ( P m P m 1 ) ;
G ( m ) P ( m ) 1 L f f ( P ( m ) ) ;
P m + 1 S β L f ( G ( m ) ) ;
b m + 1 1 + 4 b m 2 + 1 2 ;
     Calculate the diagonal matrix A m + 1 ;
untill: stop criterion is reached;
11:
P P m ;
An alternative optimization to the accelerated proximal gradient method is the alternating direction method of multipliers (ADMM), which has the advantage that f ( P ) and g ( P ) are completely independent, so they can both be non-smooth.

4. Results

4.1. Benchmark Data Sets

To verify the efficiency of our proposed method, we conduct experiments on fifteen publicly available real-world data sets from Mulan (http://mulan.sourceforge.net/datasets-mlc.html, accessed on 5 March 2022), the statistics of which are summarized in Table 1. These data sets are divided into four categories: music, image, biology, and text. Emotions is a multi-label music data set, Scene and Corel5k are multi-label image data sets, Yeast is a multi-label biology data set, and the remaining are subsets of Yahoo, which is a multi-label text (web) data set. All data sets are standardized to have a mean of zero and a variance of one.
Table 1. Data statistics (“#instances” represents the total amount of instances, “#Attributes” indicates the number of features, including both the numeric and nominal features, “#Labels” is the number of class labels, and “Cardinality” means the average number of labels per instance).

4.2. Evaluation Metrics

The performances of different dimensionality reduction methods are evaluated by employing seven widely used evaluation metrics: Hamming Loss, Ranking Loss, Average Precision (AvgPrec), OneError, Macro-F1 (MacroF1), Micro-F1 (MicroF1), and Coverage. These evaluation criteria are given in [4,43] along with detailed descriptions.
For convenience, we modify Hamming Loss, Ranking Loss, and OneError to 1-Hamming, 1-Ranking, and 1-OneError, respectively, so that higher values always mean better for the first six evaluation metrics, and lower values indicate better performance for Coverage.

4.3. Comparison Methods and Parameters Settings

The previously mentioned multi-label dimensionality reduction methods, CCA [5], MDDM [4], and six MLDA variants—wMLDAb [25], wMLDAe [26], wMLDAc [6], wMLDAf [27], wMLDAd [28], and MLDA-LC [29]—are all taken into account to compare the performance of SMDR-IC. These supervised methods can only be trained on the labeled portion of the training instances since they require labeled instances. MLDA-LC, specifically, adopts an instance correlation measure similar to our technique to constrain the consistency between the class-wise within-class distances of instances in the low-dimensional feature space and the original feature space, allowing the LDA framework to capture local structures. Moreover, three semi-supervised approaches, SSMLDR [9], SMDRdm [10], and NMLSDR [38], are compared to SMDR-IC. NMLSDR is a dimensionality reduction method for noisy multi-label data that was recently proposed. To compare and demonstrate fairness, we replace NMLSDR’s noise-coping label propagation strategy with our approach’s label propagation mechanism.
We begin by projecting the high-dimensional instances into a low-dimensional space using these dimensionality reduction methods. The widely used ML-kNN [43] classifier is then applied to classify the instances in the low-dimensional space.
In the experiments, we set k = 10 to construct the kNN neighborhood graph construction. To be more convincing, the label propagation parameters are set to the same as those in SSMLDR, i.e., λ l = 0 for labeled instances and λ u = 0.99 for unlabeled instances. μ in Equation (16) is set to 0.5 , as in MDDM [4], to constrain the orthogonality of the objective matrix. The regularization parameters α , β , and γ are tuned by a grid search strategy from { 10 i | i = 5 , 4 , , 1 } and the best results are reported. All comparison method parameters were chosen based on their publications’ recommendations.
The experiment was performed using Matlab R2020a on a desktop PC with an Intel(R) Core(TM) i7-7700k 4.20GHz CPU, 16GB RAM, and NVIDIA GeForce GTX 1080 8G GPU, with a 64-bit Windows 10 operating system.

4.4. Experimental Results

In our experiments, we randomly partition 70% of instances as the training set and 30% of instances as the test set. In the training set, we randomly select 20% of instances as the labeled set and the remaining instances as the unlabeled set to learn a projection matrix P , which is used to project the d-dimensionality training and test sets to an r-dimensionality representation. To avoid the random effect, the random partitioning and selection are repeated 10 times on each dataset for each comparing method, with the averaged results reported.
Due to the restrictions of the MLDA framework, the five methods wMLDAb, wMLDAe, wMLDAc, wMLDAf, and wMLDAd have a target dimension of at most c 1 . The rank of the initial feature matrix is maximum and the feature dimension of the data is reduced by MLDA-LC. Therefore, we set the dimensions of projected features to t = c 1 for all approaches.
Using the training set after dimensionality reduction, we train an ML-kNN classifier and validate its performance on the low-dimensionality test set. As shown in Table 2, SMDR-IC demonstrated excellent effectiveness by obtaining optimal results on all data sets for seven measures. Indeed, SMIC has over five evaluation criteria on 11 data sets that produce the highest results, while other outcomes are marginally inferior.
Table 2. Experimental results of 20% labeled instances on fifteen data sets in terms of seven evaluation metrics.
In the meantime, we randomly chose more labeled instances to evaluate the effectiveness of SMDR-IC. As shown in Table 3 and Table 4, SMDR-IC performs better when 50% of the training set’s instances are labeled than when 20% are. However, SMDR-IC with 70% labeled instances of the training set shows higher performance than it does with 20% labeled instances and has somewhat worse performance than it does with 50% labeled instances, but it still ranked first on average. This outcome is primarily due to the fact that the merits of the supervised approaches start to emerge as the fraction of labeled instances rises enough.
Table 3. Experimental results of 50% labeled instances on fifteen data sets in terms of seven evaluation metrics.
Table 4. Experimental results of 70% labeled instances on fifteen data sets in terms of seven evaluation metrics.
Regarding the experimental results reported in Table 2, Table 3 and Table 4, the semi-supervised methods outperform the supervised methods because these supervised methods obtain the geometric structure of instances by using labeled instances, whereas the semi-supervised method leverages labeled and unlabeled instances to obtain a more comprehensive geometric structure of instances. Furthermore, because the label propagation mechanism of the NMSLDR method is replaced by that of the SMDR-IC method, the comparison results demonstrate that learning of instance similarity, common features, and specific features all play significant roles in capturing the geometric structure of instances.
To investigate further the performance differences between the compared methods and the proposed approach SMDR-IC, the necessary analyses were carried out using the Friedman test [44]. Table 5 lists the Friedman statistics F F for each evaluation metric and critical levels. The null hypothesis that all comparison algorithms perform equally well is rejected for each evaluation metric at significance level α = 0.05 . The proposed approach, SMDR-IC, is viewed as the control method, and the Bonferroni–Dunn test is utilized as the post-hoc test [44]. The critical distance (CD), defined as CD = q a K ( K + 1 ) 6 N , is used to assess the average ranking differences between any two algorithms. Here, q a = 3.2680 , ( K = 12 , N = 15 ) , and CD = 4.3025 . Figure 2 depicts the CD diagrams for each evaluation metric. The red line in each subfigure shows that the distances between SMDR-IC and certain comparison methods are less than CD, indicating statistical similarity. We observe that SMDR-IC and MDDM have significant statistical similarities for Ranking Loss, Average Precision, OneError, and Coverage, which is consistent with MDDM where the baseline method of SMDR-IC and SMDR-IC ranks second under MacroF1 and first under the other six evaluation metrics.
Table 5. Friedman statistics F F and the critical value of each evaluation metric.
Figure 2. Bonferroni–Dunn test for SMDR-IC and other compared techniques. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) MacroF1. (f) MicroF1. (g) Coverage.
We also selected four domain data sets (Emotions, Scene, Yeast, and Arts) and five comparing methods, including supervised and semi-supervised to further study the performance of SMDR-IC under different target dimensionalities. Figure 3, Figure 4, Figure 5 and Figure 6 show the average results concerning 1-Hamming, 1-Ranking, AvgPrec, and Coverage, respectively. It can also be seen that SMDR-IC almost always outperforms other comparing methods on these evaluation metrics. These results further demonstrate the effectiveness of SMDR-IC for multi-label dimensionality reduction.
Figure 3. Results of 1-Hamming on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Figure 4. Results of 1-Ranking on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Figure 5. Results of AvgPrec on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Figure 6. Results of Coverage on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.

4.5. Sensitivity Analysis of Parameters

Three crucial parameters— α , β , and γ —are present in our dimensionality reduction model. α controls the contributions of embedding instance correlations, while β and γ regulate the sparsity of the projection matrix. It is necessary for us to conduct the sensitivity analysis of parameters for SMDR-IC. All values of parameters are specified as [ 10 7 , 10 6 , , 10 2 , 10 3 ] based on four domain data sets, including Emotions, Scene, Yeast, and Arts. We tune one parameter by a step 10 1 , while fixing other parameters at their best settings.
Figure 7, Figure 8 and Figure 9 show the influence of three parameters on four data sets. Parameter α controls the instance correlations, and the larger value of it means the higher importance of instance correlations. We can clearly see that when the α increases from 10 7 to 10 0 , the performance tends to increase slowly and remain stable, but when the α exceeds 1, performance begins to decline slowly on Yeast and Arts and rapidly on Emotions and Scene. This is explained by the fact that the influence of instance correlations becomes more significant as the α value increases, potentially limiting the influence of other factors and resulting in poor performance. Parameters β and γ control the sparsity of specific features and common features, respectively. The trends are similar to the parameter α , with the increasing values of β and γ , the performance tends to increase and maintain stability. Nevertheless, with parameters β or γ > 10 1 , the performance beings to decline, slowly on Yeast and Arts, and rapidly on Emotions and Scene.
Figure 7. Sensitivity analysis of parameter α on four data sets for five evaluation metrics. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) Coverage.
Figure 8. Sensitivity analysis of parameter β on four data sets for five evaluation metrics. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) Coverage.
Figure 9. Sensitivity analysis of parameter γ on four data sets for five evaluation metrics. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) Coverage.

4.6. Convergence and Time Complexity Analysis

The accelerated proximal gradient method (APG), a special version of the gradient descent method, is used to solve optimization problems with non-differentiable objective functions. The SMDR-IC optimization model Equation (29) can be transformed into the standard model solved by APG. For its convergence theory, see [42].
As the number of iterations is increased, Figure 10 illustrates the changes in the objective function values for the SMDR-IC dimensionality reduction model on the four data sets. It is clear that the value of the objective function rapidly declines as iteration times increase. More specifically, on the Emotions dataset after around 50 iterations, and on the Scene, Yeast, and Arts data sets after about 150, 40, and 2500 iterations, respectively, the objective function value tends to a fixed value. In conclusion, the APG method steadily converges when solving the SMDR-IC dimensionality reduction model.
Figure 10. Convergence analysis of proposed method on four data sets. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
The following is the conclusion of the complexity analysis of SMDR-IC methods. The time complexity of constructing a kNN graph matrix W and computing the label correlations matrix C is denoted as O ( n 2 d ) and O ( l c 2 ) , respectively, where n is the number of labeled and unlabeled instances, d is the dataset dimensionality, l is the number of labeled instances, and c is the number of labels. The time complexity of obtaining the soft labels is approximately O ( n 2 c ) . The time cost of calculating the target matrix Z is dominated by SVD on X , which can be denoted as O ( n 2 d + n d 2 ) . O ( n d 2 + d 3 ) is the time complexity of initializing P . The time complexity of calculating the Lipschitz constant L f is denoted by the symbol O ( d 3 ) . The time cost of iteration steps is dominated by calculating the gradient of f ( P ) , which can be denoted as O ( n d 2 + n 2 d ) . As a consequence, the total time complexity of SMDR-IC is denoted as O ( m ( n 2 d + n d 2 ) + d 3 ) , where m is the number of iterations.
Table 6 lists the calculation time of each methods on four data sets. According to Table 6, we see the following. Firstly, the semi-supervised dimensionality methods take longer than the supervised ones. This is because semi-supervised dimensionality methods use more instances than supervised ones in the learning process. Specifically, semi-supervised dimensionality reduction methods use both labeled and unlabeled instances to learn, whereas supervised methods use only labeled instances. Semi-supervised dimensionality methods also require obtaining soft labels for unlabeled instances. Secondly, when comparing other semi-supervised dimensionality reduction methods, SMDR-IC performs with similar efficiency on data sets with medium data volume, and slower efficiency on data sets with small and large data volume, while most of them take a longer time. This is because the final result of SMDR-IC is obtained by the gradient optimization method, and the number of iterations for the optimization method to converge varies on different data sets, making it difficult to find a universally applicable setting.
Table 6. Calculation time (s) of different methods on four data sets.

5. Conclusions

In this paper, we introduced a novel method, namely semi-supervised multi-label dimensionality reduction learning by instances and label correlations (SMDR-IC), which effectively utilizes the information from both labeled instances and unlabeled instances by label propagation. SMDR-IC exploits the instance correlations by assuming that if two instances are correlated in the original feature space, they will also be related in the low-dimensionality feature space. The label correlations are also taken into account by reformulating the least squares method. Furthermore, l 1 -norm and l 2 , 1 -norm regularization terms are respectively utilized to select specific features and common features in feature space. Finally, extensive experiments on fifteen data sets prove that our method can outperform other well-established multi-label dimensionality reduction methods.
The main shortcoming of the SMDR-IC technique is that the projection mapping ϕ considered is a linear operator, and the linear operator may not be capable of learning the manifold structure of high-dimensional space completely. Then, in future research, we will concentrate on developing more effective nonlinear projection operators and researching the classification method coupled with dimensionality reduction techniques. In addition, SMDR-IC is slightly disadvantaged in terms of efficiency, mainly due to the difficulty in finding universally applicable convergence determination conditions for iterative optimization for different data sets. Therefore, iterative convergence determination methods or more efficient solution methods applicable to different data sets are to be further studied in future work.

Author Contributions

Conceptualization, formal analysis, writing—original draft preparation, R.L. and J.D. (Jiaxing Du); methodology, software, validation, writing—review and editing, R.L., J.D. (Jiaman Ding) and L.J.; resources, supervision, funding acquisition, Y.C. and Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the open fund of the Yunnan Key Laboratory of Computer Technology Applications, and in part by the National Natural Science Foundation of China (No. 12063002 and 62262035).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sun, L.; Ji, S.; Ye, J. Multi-Label Dimensionality Reduction; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
  2. Bellman, R. Dynamic programming and Lagrange multipliers. Proc. Natl. Acad. Sci. USA 1956, 42, 767. [Google Scholar] [CrossRef] [PubMed]
  3. Siblini, W.; Kuntz, P.; Meyer, F. A review on dimensionality reduction for multi-label classification. IEEE Trans. Knowl. Data Eng. 2021, 33, 839–857. [Google Scholar] [CrossRef]
  4. Zhang, Y.; Zhou, Z.H. Multilabel dimensionality reduction via dependence maximization. ACM Trans. Knowl. Discov. Data (TKDD) 2010, 4, 1–21. [Google Scholar] [CrossRef]
  5. Hardoon, D.R.; Szedmak, S.; Shawe-Taylor, J. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 2004, 16, 2639–2664. [Google Scholar] [CrossRef] [PubMed]
  6. Wang, H.; Ding, C.; Huang, H. Multi-label linear discriminant analysis. In Proceedings of the Computer Vision—ECCV, Heraklion, Greece, 5–11 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 126–139. [Google Scholar]
  7. Kong, X.; Ng, M.K.; Zhou, Z.H. Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 2011, 25, 704–719. [Google Scholar] [CrossRef]
  8. Qian, B.; Davidson, I. Semi-supervised dimension reduction for multi-label classification. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, GA, USA, 11–15 July 2010; pp. 569–574. [Google Scholar]
  9. Guo, B.; Hou, C.; Nie, F.; Yi, D. Semi-supervised multi-label dimensionality reduction. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 919–924. [Google Scholar]
  10. Yu, Y.; Wang, J.; Tan, Q.; Jia, L.; Yu, G. Semi-supervised multi-label dimensionality reduction based on dependence maximization. IEEE Access 2017, 5, 21927–21940. [Google Scholar] [CrossRef]
  11. Nie, F.; Huang, H.; Cai, X.; Ding, C. Efficient and robust feature selection via joint l2,1-norms minimization. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; Volume 23, pp. 1813–1821. [Google Scholar]
  12. Hu, L.; Li, Y.; Gao, W.; Zhang, P.; Hu, J. Multi-label feature selection with shared common mode. Pattern Recognit. 2020, 104, 107344. [Google Scholar] [CrossRef]
  13. Li, J.; Li, P.; Hu, X.; Yu, K. Learning common and label-specific features for multi-Label classification with correlation information. Pattern Recognit. 2022, 121, 108259. [Google Scholar] [CrossRef]
  14. Gretton, A.; Bousquet, O.; Smola, A.; Schölkopf, B. Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings of the International Conference on Algorithmic Learning Theory, Singapore, 8–11 October 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 63–77. [Google Scholar]
  15. Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
  16. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef]
  17. Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inf. Process. Syst. 2001, 14, 585–591. [Google Scholar]
  18. Nie, F.; Xu, D.; Tsang, I.W.H.; Zhang, C. Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction. IEEE Trans. Image Process. 2010, 19, 1921–1932. [Google Scholar] [PubMed]
  19. Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 1990, 41, 391–407. [Google Scholar] [CrossRef]
  20. Yu, K.; Yu, S.; Tresp, V. Multi-label informed latent semantic indexing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 15–19 August 2005; pp. 258–265. [Google Scholar]
  21. Hotelling, H. Relations between two sets of variates. Biometrika 1936, 28, 321–377. [Google Scholar] [CrossRef]
  22. Sun, L.; Ji, S.; Ye, J. Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 194–200. [Google Scholar]
  23. Pacharawongsakda, E.; Theeramunkong, T. A two-stage dual space reduction framework for multi-labe classification. In Proceedings of the Trends and Applications in Knowledge Discovery and Data Mining, Delhi, India, 11 May 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 330–341. [Google Scholar]
  24. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
  25. Park, C.H.; Lee, M. On applying linear discriminant analysis for multi-labeled problems. Pattern Recognit. Lett. 2008, 29, 878–887. [Google Scholar] [CrossRef]
  26. Chen, W.; Yan, J.; Zhang, B.; Chen, Z.; Yang, Q. Document transformation for multi-label feature selection in text categorization. In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA, 28–31 October 2007; pp. 451–456. [Google Scholar]
  27. Lin, X.; Chen, X.W. KNN: Soft relevance for multi-label classification. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010; pp. 349–358. [Google Scholar]
  28. Xu, J. A weighted linear discriminant analysis framework for multi-label feature extraction. Neurocomputing 2018, 275, 107–120. [Google Scholar] [CrossRef]
  29. Yuan, Y.; Zhao, K.; Lu, H. Multi-label linear Ddiscriminant analysis with locality consistency. In Proceedings of the Neural Information Processing, Montreal, QC, Canada, 8–13 December 2014; Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 386–394. [Google Scholar]
  30. Shu, X.; Lai, D.; Xu, H.; Tao, L. Learning shared subspace for multi-label dimensionality reduction via dependence maximization. Neurocomputing 2015, 168, 356–364. [Google Scholar] [CrossRef]
  31. Gönen, M. Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning. Pattern Recognit. Lett. 2014, 38, 132–141. [Google Scholar] [CrossRef]
  32. Yu, T.; Zhang, W. Semisupervised multilabel learning with joint dimensionality reduction. IEEE Signal Process. Lett. 2016, 23, 795–799. [Google Scholar] [CrossRef]
  33. Blaschko, M.B.; Shelton, J.A.; Bartels, A.; Lampert, C.H.; Gretton, A. Semi-supervised kernel canonical correlation analysis with application to human fMRI. Pattern Recognit. Lett. 2011, 32, 1572–1583. [Google Scholar] [CrossRef]
  34. Li, H.; Li, P.; Guo, Y.J.; Wu, M. Multi-label dimensionality reduction based on semi-supervised discriminant analysis. J. Cent. South Univ. Technol. 2010, 17, 1310–1319. [Google Scholar] [CrossRef]
  35. Hubert, M.; Van Driessen, K. Fast and robust discriminant analysis. Comput. Stat. Data Anal. 2004, 45, 301–320. [Google Scholar] [CrossRef]
  36. Croux, C.; Dehon, C. Robust linear discriminant analysis using S-estimators. Can. J. Stat. 2001, 29, 473–493. [Google Scholar] [CrossRef]
  37. Hubert, M.; Rousseeuw, P.J.; Van Aelst, S. High-breakdown robust multivariate methods. Stat. Sci. 2008, 23, 92–119. [Google Scholar] [CrossRef]
  38. Mikalsen, K.Ø.; Soguero-Ruiz, C.; Bianchi, F.M.; Jenssen, R. Noisy multi-label semi-supervised dimensionality reduction. Pattern Recognit. 2019, 90, 257–270. [Google Scholar] [CrossRef]
  39. Golub, G.H.; Van Loan, C.F. Matrix Computations; JHU Press: Baltimore, MD, USA, 2013. [Google Scholar]
  40. Han, H.; Huang, M.; Zhang, Y.; Yang, X.; Feng, W. Multi-label learning with label specific features using correlation information. IEEE Access 2019, 7, 11474–11484. [Google Scholar] [CrossRef]
  41. Huang, S.J.; Zhou, Z.H. Multi-Label Learning by Exploiting Label Correlations Locally; AAAI Press: Palo Alto, CA, USA, 2012. [Google Scholar]
  42. Lin, Z.; Ganesh, A.; Wright, J.; Wu, L.; Chen, M.; Ma, Y. Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix; Report no. UILU-ENG-09-2214, DC-246; Coordinated Science Laboratory: Urbana, IL, USA, 2009; Available online: https://hdl.handle.net/2142/74352 (accessed on 20 December 2022).
  43. Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 2007, 40, 2038–2048. [Google Scholar] [CrossRef]
  44. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.