Multi-Label Classification Based on Low Rank Representation for Image Annotation

Annotating remote sensing images is a challenging task for its labor demanding annotation process and requirement of expert knowledge, especially when images can be annotated with multiple semantic concepts (or labels). To automatically annotate these multi-label images, we introduce an approach called Multi-Label Classification based on Low Rank Representation (MLC-LRR). MLC-LRR firstly utilizes low rank representation in the feature space of images to compute the low rank constrained coefficient matrix, then it adapts the coefficient matrix to define a feature-based graph and to capture the global relationships between images. Next, it utilizes low rank representation in the label space of labeled images to construct a semantic graph. Finally, these two graphs are exploited to train a graph-based multi-label classifier. To validate the performance of MLC-LRR against other related graph-based multi-label methods in annotating images, we conduct experiments on a public available multi-label remote sensing images (Land Cover). We perform additional experiments on five real-world multi-label image datasets to further investigate the performance of MLC-LRR. Empirical study demonstrates that MLC-LRR achieves better performance on annotating images than these comparing methods across various evaluation criteria; it also can effectively exploit global structure and label correlations of multi-label images.


Introduction
Remote sensing image annotation [1] aims to assign one or several predefined labels (semantic concepts) to a remote sensing image. It is the basis of remote sensing image indexing for organizing and locating images of interest from a large database. With the development of satellite sensor technology, remote sensing images can be easily accumulated. On the other hand, manually annotating so many images is time consuming and expensive, even infeasible. Therefore, it is urgent to develop techniques that can effectively and efficiently annotate remote sensing images. Most existing remote sensing image classification methods [2][3][4][5] assume that each image is annotated with only one label within a number of candidate labels and target at annotating only one label to an image. However, in real-world applications, a remote sensing image can and should be annotated with multiple labels due to the mixture of multiple signals, the interactions of photons with matter, the atmosphere and phenomenon attributed to the physical properties of light [6]. For example, in Figure 1, the left image is annotated with two labels: "Giza Pyramids" and "Egypt", and the right image is tagged with labels "Train Station", "Zurich" and "Switzerland". Given that, it is necessary to take into account the multi labels' characteristics of remote sensing images and annotate a set of relevant labels to these images, instead of a single label. Recently, multi-label classification [7,8] that studies the problem where an instance annotated with a set of labels has been applied to annotate multi-label remote sensing images [1,[9][10][11] and has shown appealing performance in remote sensing applications. The multi-label classification framework has been applied to numerous real-world applications [12]. One key challenging problem [7,8] in multi-label classification is how to make use of label correlations. As shown in Figure 1, since the left image is annotated with "Giza Pyramids", this image has a higher probability to be annotated with the label "Egypt" than "Switzerland". Existing strategies on exploiting label correlation could be roughly divided into three categories based on the order of correlations [8]. First-order correlation [13] assumes labels are independent and ignores intertwined effects of other labels. Second-order correlation [14] captures pairwise relationships between labels. However, there are certain real-world applications that label correlations go beyond the second-order assumption. High-order correlation [15] imposes all other labels' influences on each label and has stronger correlation modeling capabilities than first-order and second-order strategies. However, it is computationally more demanding and less scalable than them. Various methods have been proposed to exploit label correlations for multi-label classification, and corroborated label correlations can improve the accuracy of multi-label classification. A comprehensive coverage of them is beyond the scope of this paper; readers can refer to [7,8] and the references therein.
Recently, graph-based classification methods have been introduced to remote sensing image classification for simplicity and effectiveness [3,16,17]. A common basic assumption for these methods is the good structure of the constructed graph. However, it is difficult to construct a graph that can faithfully describe the relationship between instances, especially in image annotation, since each image can be represented by various types of features (i.e., color histograms and color layout) from different aspects. Moreover, all of these methods assume that each image is annotated with only one label. In this paper, a novel graph-based multi-label classification approach called Multi-Label Classification based on Low Rank Representation for image annotation (MLC-LRR) has been proposed. Unlike traditional graph construction techniques, we take advantage of LRR for graph construction in both the feature space and label space of images. Specifically, MLC-LRR first computes the coefficient matrix of images (including both labeled and unlabeled images) in the feature space by LRR and then makes use of the matrix to define a feature-based graph. In addition to that, MLC-LRR constructs another semantic graph by using LRR again in the label space of labeled images to explore the global relationships among labels. Next, MLC-LRR fuses the feature-based graph and semantic graph into a graph-based multi-label classifier to annotate unlabeled images. The main contributions of this paper are listed as follows: 1. We apply graph-based multi-label classification to annotate remote sensing images associated with multiple concepts (labels). 2. We exploit LRR for graph construction in the feature space and label space of images, respectively. 3. The semantic graph constructed in the label space can effectively capture global label correlation and improve the accuracy of image annotation. 4. The proposed MLC-LRR can take advantage of limited labeled images and abundant unlabeled images and shows improved performance compared to other related methods on annotating images.
The reminder of this paper is organized as follows. Section 2 briefly reviews related work on graph-based multi-label classification. Section 3 describes the proposed approach, including: (i) feature-based graph construction via LRR in the image feature space; (ii) semantic graph construction using LRR in the image label space; (iii) graph-based multi-label classification. In Section 4, we present the used image datasets, experimental results and discussions. Conclusions and future work are provided in Section 5.

Related Work
Graph-based multi-label classification [18,19] models all instances (or images) in a graph G = (V, E , W), where V is a vertex set, and each vertex represents an image; E is the set of edges; and W is a nonnegative adjacency matrix storing the weight of edges (or similarity) between images. Wang et al. [20] proposed an l 1 graph-based sparse coding method for image annotation called MSC. MSC constructs a label-space graph based on the overlap of label vectors. Particularly, if two images are annotated with the same labels, then the similarity between them is set to one, otherwise zero. After that, MSC takes available labels of images as features and constructs an l 1 graph based on sparse representation coefficients regularized by an l 1 norm. Next, MSC incorporates the label-space graph and l 1 graph into a linear embedding framework to project these images into low dimensional space. For a query image, MSC firstly projects the image into a low dimensional space and then computes the corresponding sparse representation coefficients with respect to labeled images. In the end, MSC predicts the labels of the query image via linear combination of label vectors of labeled images weighted by the coefficients. However, MSC, like the solutions in [1,[9][10][11], is a supervised approach that asks for sufficient labeled images for sparse representation and dimensionality reduction. Nevertheless, it is very time consuming and expensive to collect sufficient labeled images. We may often have scarce labeled images and a large amount of unlabeled images. Therefore, many researchers have applied semi-supervised learning techniques [21][22][23] to leverage limited labeled images and abundant unlabeled images for annotating large-scale unlabeled remote sensing images, but these techniques still assume that each remote sensing image is annotated with a single label.
Recently, graph-based semi-supervised multi-label classification, an important branch of semi-supervised learning, has been widely applied for annotating images [18,24,25], for its flexibility and ease of application. Chen et al. [18] proposed an l 2 graph-based semi-supervised classifier for multi-label classification; the l 2 graph is constructed by utilizing neighborhood instances of an instance and neighborhood instances of its reciprocal neighbors. Specifically, this method first constructs an instance level graph in the feature space and constructs another category level graph by using cosine similarity in the label space. Then, it combines these two graphs into a regularization framework and makes a prediction for unlabeled instances by solving a Sylvester equation [26]. Yu et al. [27] proposed a Transductive Multi-label Classifier (TMC) on a directed bi-relational graph that takes both instances and labels as nodes. This graph contains three kinds of edges: between instances, between labels and between instances and labels. TMC predicts the labels of unlabeled instances based on Random Walk with Restarts (RWR) [28] on the bi-relational graph. Wang et al. [29] proposed a graph-based multi-label classification method (FCML)for multi-functional protein function prediction. FCML uses the Green function [30] to incorporate the function-function correlations based on the theory of Reproducing Kernel Hilbert Space (RKHS) to infer protein functions. Kong et al. [19] proposed a kNN graph-based transductive multi-label classifier called Tram. Tram introduces the label composition concept and assumes similar instances should have a similar label composition. It formulates the transductive multi-label classification as an optimization problem of estimating label composition and provides a closed-form solution. Wang et al. [31] proposed a Dynamic Label Propagation (DLP) to infer labels of unlabeled images by using an l 2 -graph, which is constructed by exploiting neighborhood images of an image and neighborhood images of its reciprocal neighbors. However, all of these aforementioned methods for graph construction do not adequately exploit the global structure of data. Low Rank Representation (LRR) was recently introduced for graph construction in remote sensing image classification [32,33]. It is recognized that LRR is an appropriate approach to explore the global structure and global mixture of the subspace structure of images [34]. Jing et al. [25] proposed a graph-based low-rank mapping method for multi-label classification. This method constructs a kNN graph on both labeled and unlabeled instances and defines a mapping matrix as a linear transformation from feature space to label space. Then, this method incorporates the mapping matrix and kNN graph into a multi-label classification framework based on manifold regularization [35], but it does not explicitly utilize label correlations.
In this paper, we introduce a Multi-Label Classifier based on Low Rank Representation (MLC-LRR) for automatically annotating images. MLC-LRR takes advantage of low rank representation for graph construction and constructs a feature-based graph and a semantic graph. The feature-based graph is constructed in the feature space of labeled and unlabeled images, while the semantic graph is defined in the label space of labeled images. Next, MLC-LRR makes use of both the feature-based graph and the semantic graph to predict the labels of unlabeled images.

Methodology
In this section, we will briefly introduce LRR, the construction of the feature-based graph and the semantic graph and transductive multi-label classification based on the feature-based graph and the semantic graph derived from labeled images. Before that, we give some notations that will be used throughout the paper. Let X = [x 1 , x 2 , ..., x N ] ∈ R d×N be a set of images; x i ∈ R d is the feature vector for the i-th image. Y = [y 1 , y 2 , · · · , y N ] ∈ R N×C is the label indicative matrix for these N images, where C is the number of distinct labels of these images. y i ∈ R C is the label vector of the i-th image, if image i is annotated with label c, y ic = 1; otherwise, y ic = 0. Without loss of generality, suppose among N images in X, the first l images are labeled, and the left u images are unlabeled, N = l + u. Our goal is to use all of the images in X to train a graph-based multi-label classifier and to predict the labels of these unlabeled images.

Low Rank Representation for Feature-Based Graph Construction
Graph-based semi-supervised classification depends on a well-structured graph [36]. However, in real applications, it is difficult to construct a graph that effectively and correctly captures the relationship between images. In this section, we take advantage of LRR to construct a feature-based graph and to capture the global relationships among images. Given that X ∈ R d×N represents the feature space of both labeled and unlabeled images, where each column corresponds to an image, each image can be viewed as a linear combination of bases from a dictionary A = [a 1 , a 2 , a 3 , ..., a M ] ∈ R d×M . Similar to the work in [37], we set A = X in this paper. LRR encodes each image by a linear combination of the bases in A as follow: where Z 1 ∈ R N×N is the coefficient matrix with each Z 1 (·, i) ∈ R N being the representation coefficient vector for image x i with respect to N images. Entry Z 1 (j, i) can be viewed as the contribution of x j to the reconstruction of x i with A as the dictionary. Different from sparse representation [38] that may not capture the global relationship of images in X, LRR enforces Z 1 to be low rank and solves the following optimization problem: Here, X is reconstructed by the low rank constrained matrix Z 1 . Equation (2) is called low rank representation [34]. Since non-negativity can guarantee the physical meanings of graph weights and often results in better performance for graph construction [39], similar to Zhuang et al. [40], Z 1 ≥ 0 is added. In the iterative solution process of nonnegative LRR, negative values in Z 1 are substituted with zeros in each iteration.
Due to the discrete nature of the rank function, Equation (2) in general is NP-hard [41]. One popular approach is to replace the rank function by the trace norm (or nuclear norm) [42], and Equation (2) can be relaxed to Equation (3) as follows: where Z 1 * denotes the nuclear norm of Z 1 , and it is the sum of singular values of Z 1 [43]. Recently, Zhang et al. [44] proved that Equations (2) and (3) have a closed-form solution, and Equation (3) can be further relaxed to take into account noisy features as follows: is called the l 2,1 -norm and the parameter λ is used to trade off the effect of low-rank part and the noise tolerance part. To solve Equation (4), Liu et al. [34] exploited the well-known alternating direction method [45]. However, this method suffers from O(n 3 ) computation complexity due to the matrix multiplication and matrix inversion. Moreover, the alternating direction method introduces auxiliary variables and constraints and, thus, downgrades the convergence rate. Such a heavy computational load of the alternating direction method prevents LRR for large-scale datasets. Fortunately, Lin et al. [46] introduced a Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) to accelerate the solution of LRR. LADMAP belongs to the alternating direction method of multipliers family [45]; it solves LRR by linearizing the quadratic penalty term and adding a proximal term when solving the sub-problems. In addition, LADMAP represents Z 1 as its skinny single-value decomposition and utilizes an advanced functionality of the PROPACK [47] package to accelerate the solving process. The time complexity of LADMAP for LRR is O(rn 2 ); r is the rank of the optimal Z 1 , since there is no full rank matrix multiplication. Next, each column of Z 1 is normalized via Z 1 (·, i) = Z 1 (·, i)/ Z 1 (·, i) 2 , and then, each negative entry of Z 1 is set to zero. Since LRR jointly finds the low-ranked coefficient matrix Z 1 for all images in X, here, similar to [48], we adapt Z 1 to define a feature-based graph whose weighted adjacent matrix is W, where

Low Rank Representation for Semantic Graph Construction
Graph construction is important for graph-based semi-supervised learning approaches, and many graph construction methods have been proposed [16,17,38]. All of these methods mainly construct a graph in the feature space and focus on single-label classification, where each image is restricted to be annotated with only one label. However, remote sensing images can often be associated with multiple labels. For example, one can think of two images with the same blue background; the first one is annotated with "blue color" and "ship", while the latter is tagged with "blue color" and "airplane".
It is easy to find that these two images are similar to each other in the feature space, since they have the same background, which may account for the majority feature similarity. However, these two images are different from each other in the semantic perspective; the first image describes a ship on the sea, while the second image describes flying in the sky.
Therefore, directly calculating the similarity between two images based on their feature vectors could not distinguish polysemous labels with different concepts in different images. To improve the performance of graph-based multi-label classification, it is necessary to develop techniques that can additionally utilize semantic information for graph construction. Most multi-label learning methods [7,8] additionally make use of pairwise (or high-order) label correlations to improve the performance; for example, pairwise label correlation computed by cosine similarity [29] and empirical conditional probability [14], high-order label correlation by classifier chain [49] and a random label set [50]. Different from these popular techniques, some multi-label classifiers take each label as a node and construct a hybrid graph with label nodes and images to represent the intra-relationship between images, between labels and inter-relationships between images [27]. The semantic information of an image can be additionally encoded by the labels annotated to that image. These label correlation-based methods do not take into account all of the labels annotated to an image, and thus, they do not make proper use of the semantic relationship between images. The semantic similarity between two multi-label images can be derived from the labels associated with these two images. Wang et al. [20] proposed a reconstruction-based semantic graph by utilizing sparse representation in the label space of labeled images. Nevertheless, the graph is constructed by separately treating each labeled image with respect to other labeled images; thus, it may be ineffective in capturing the global semantic relationship between images. Here, we construct a semantic graph to capture the global semantic similarity between images by reusing LRR on labeled images in Y. Label vectors of labeled images are reconstructed with respect to Y by LRR (similar to Equation (4)) as a whole, and then, the low rank coefficients are used to define a semantic graph S ∈ R N×N , where S = (Z 2 + Z T 2 )/2 and Z 2 is the low rank representation coefficient matrix with respect to labeled images in Y. In practice, we just utilize labeled images to construct the semantic graph between l labeled images. To be consistent with the feature-based graph W, we extend S to be an N × N matrix, with the upper left sub-matrix with size l × l initialized by low rank coefficients in Z 2 , and the other entries of S are set to zeros.

Graph-Based Multi-Label Classification
To this end, we introduce a graph-based multi-label classifier to use the feature-based graph and the semantic graph for multi-label classification. At first, we define a matrix is the predicted likelihood of the i-th image with respect to C labels. Here, we consider a graph-based multi-label classifier as follows: where the first term is the empirical loss function measuring the approximation error between the annotated multi-label images and the predicted likelihoods, the second term is the regularization on the global structure of labeled and unlabeled images and the last term is to take advantage of the semantic relationship between labeled images. α and β are two parameters to balance these three terms. The first term is defined as follows: where tr() is the matrix trace operator and H ∈ R N×N is a diagonal matrix with H ii = 1 if x i is labeled, H ii = 0, otherwise. The second term of Equation (5) can be computed as: where W ij is the weight of the edge between images [51]. Similar to the assumption in [52], in this paper, we assume that the label of an image can be reconstructed by other related images, while the reconstructed coefficients are derived from low rank representation in the label space. Based on this assumption, the last term of Equation (5) is defined as: where S ∈ R N×N is the adjacent matrix of the semantic-based graph defined in Section 3.2. S ij represents the similarity of label vector y i and y j . I ∈ R N×N is an identity matrix, and M L = (I − S)(I − S) T . The motivation to use the last term is to replenish possible missing labels of labeled images by taking advantage of the semantic graph. If a labeled image has some missing labels and its semantic neighbors are annotated with these labels, this term can replenish missing labels of that image to some extent. In addition, by joint work with the second term on the right of Equation (5), the available labels and replenished labels of labeled images can further propagate to other unlabeled images, and thus, MLC-LRR can more completely predict the labels of unlabeled images. Based on Equations (6)-(8), we can rewrite Equation (5) as: The reason to minimize the first term in Equation (9) is to force the predicted outputs f i to be similar to the original labels. The motivation to minimize the second term is to ensure images with similar feature vectors having similar outputs. The last term is motivated by the label reconstruction assumption [52], where the labels of an image can be reconstructed by those of other images. In other words, if the label vectors of two images y i and y j are similar to each other, then S ij has a large value. In this way, the third term can capture and employ semantic information between images.
Equation (9) can be solved by taking the partial derivative of Ψ(F) with respect to F as follows: Let Ψ(∂F) ∂F = 0; we can obtain the analytic solution of F as: Given X and Y are already known, S = D − W, and M L = (I − S)(I − S) T , F is mainly determined by W and S. Here, W and S are the weighted adjacent matrices of graphs constructed by LRR in the feature space and label space of images, respectively. Figure 2 briefly lists the procedure of the MLC-LRR algorithm.

Results and Discussion
We conduct experiments on Land Cover and five other publicly-available multi-label image datasets, Flags [53], Scene [13], Corel5k [54], MIRFlickr [55] and ESPGame [56], to validate the performance of our proposed MLC-LRR and compare it with five representative and related graph-based multi-label classifiers: MSC [20], TMC [27], FCML [29], Tram [19] and DLP [31]. MSC is a supervised multi-label classifier, and the other four are semi-supervised multi-label classifiers. These classifiers have been introduced in Section 2.
Land Cover is a multi-label remote sensing image dataset collected by Karalas et al. [11]. This dataset combines real satellite data from the Moderate Resolution Imaging Spectroradiometer instrument and high spatial resolution ground data from the CORINE Land Cover (CLC) project supported by European Environment Agency. We use the same features and labels as suggested in [11] for image annotation; each image is represented with 57 features with respect to 20 distinct labels, as depicted in Table 1.
Flags is a multi-label toy image dataset with 194 images in seven object classes. Each image is represented by a vector with 19 features. Scene includes 2407 images in six object classes; each image has 294 features. Corel5k, MIR FLICKRand ESPGame are three representative and popular image datasets. Similar to MLR-GL [57], we remove the images that are annotated with fewer than three labels from Corel5k and images that are tagged with fewer than five labels from MIR Flickr and ESPGame. Each image in the last three datasets is represented by dense SIFT descriptors [58]. The statistical information of these datasets used for experiments is revealed in Table 2.
In the experiments, Unless otherwise specified, we set the parameters of the comparing methods according to what the author supposed in the original papers or codes. As for MLC-LRR, we perform a cross-validation on the Flags and Scene datasets in the preliminary experiments by varying α and β from 0.01 to one with a stepsize of 0.01. Results show that MLC-LRR yields stable performance around

Evaluation Metrics
Performance evaluation for multi-label image annotation is somewhat complicated, as each image is annotated with a set of labels, instead of a single one. Various metrics have been used to evaluate the performance of multi-label classification. Given C different labels, all of these comparing methods result in a predicted likelihood vector with respect to C labels. In this paper, we use five widely-used evaluation metrics (RankLoss, AvgPrec, Coverage, MicroAvgand AUC) [7,8] to quantitatively compare the performance of these multi-label classification methods in annotating images.
RankLoss evaluates the average fraction of misordered label pairs, i.e., an irrelevant label of an image is ranked ahead of a relevant label. The smaller the value of RankLoss, the better the performance. Its formal definition is: i ≤ k, j ≤ C,Ȳ i is the complementary set of Y i and Y i is the known label set of the i-th image.
AvgPrec evaluates the average fraction of relevant labels ranked ahead of a particular label Y ik ∈ Y i . The larger the value of AvgPrec, the better the performance. Its formal definition is: rank(x i , Y ik ) returns the rank (from largest to lowest) of Y ik in f (x i ).
MicroAvg evaluates both the micro average of precision and the micro average of recall with equal importance. The bigger the value, the better the performance. Its formal definition is: MicroAvg requires the vector to be a binary indicator vector. Here, we consider the labels corresponding to the r largest entries of f (x i ) as the predicted labels of image x i , where r is determined as the average number of labels (rounded to the next integer) of annotated images. From Table 1, r for Land Cover, Flags, Scene, Core15k, MIR Flickr and ESPGame is 3, 4, 3, 4, 8 and 7, respectively.
The adapted Area Under the Curve (AUC) was proposed and utilized in [57]. AUC firstly sorts the predicted likelihood scores vector for each image in descending order; it then varies the number of predicted labels from one to C and computes the receiver operating characteristic curve by calculating the true positive rate and the false positive rate for each number of predicted labels. It finally computes the area under the curve to evaluate the performance of multi-label classification.
Coverage evaluates how many steps are needed, on average, to move down the ranked label list of an image to cover all of its relevant labels. Its formal definition is: Obviously, different from the above four metrics, Coverage can be greater than one, and the smaller the value of Coverage, the better the performance is.
To maintain consistency with AUC, MicroAvg and AvgPrec, we use 1 − RankLoss instead of RankLoss. Thus, the larger the value of these metrics (except Coverage), the better the performance is. We want to remark that these metrics evaluate the performance of multi-label image annotation from different aspects; it is difficult for an approach to consistently perform better than other methods across all of these metrics.

Experimental Results on Annotating Remote Sensing Images
In this section, we conduct experiments on the Land Cover dataset to investigate the performance of MLC-LRR on annotating remote sensing images by using both labeled and unlabeled images and compare the performance of MLC-LRR with other comparing methods. We randomly partition the images of the Land Cover dataset into two sets; one is used as labeled images, and the other is used as unlabeled images for validation. Here, we consider two different label ratios: 10% and 15%. 10% (or 15%) means we randomly select 10% (or 15%) of images in a dataset as the labeled training set and take the remaining images as unlabeled images to be annotated by these comparing methods. Table  3 reports the results of these comparing methods on the Land Cover dataset. To reduce any random effect, for each fixed label ratio, we repeat independent experiments for each method 10 times and report the average results. In the table, •/• indicates whether MLC-LRR is statistically (pairwise t-test at 95% significance level) superior/inferior to the comparing methods under a particular evaluation metric. ↓ along with Coverage means the lower the value, the better the performance. We additionally investigate the performance of MLC-LRR with a normalized graph Laplacian and introduce MLC-LRR L , which is adopted from MLC-LRR by replacing L in Equation (7) with a normalized graph Laplacian matrix L = D −1/2 WD −1/2 . Similar to previous experimental protocols, we randomly select some images of the Land Cover dataset as labeled images and take the remaining images as unlabeled images for annotation. Figure 3 reports the results of MLC-LRR and MLC-LRR L with respect to MicroAvg, 1-RankLoss, AvgPrec and Coverage, respectively. These reported results are the average of ten independent experiments for each fixed label ratio (from 10% to 35% with the stepsize as 5%).

The Benefit of the Semantic Graph
In this subsection, to further study the effectiveness of MLC-LRR in employing semantic graph S, we conduct another set of experiments on a variant of MLC-LRR: MLC-LRR F is adopted from MLC-LRR by only using feature-based graph W. Similar to previous experimental protocols, we randomly draw 10% to 35% of the images of the Land Cover dataset as labeled images and use the remaining images as unlabeled ones for annotation. Ten independent experiments are conducted under each label ratio (10% to 35% with a stepsize of 5%), and meanwhile, the mean value of each evaluation metric is recorded under each label ratio. Figure 4 shows the performance of MLC-LRR and MLC-LRR F with respect to MicroAvg, 1-RankLoss, AUC, AvgPrec and Coverage, respectively.

Experimental Results on Other Multi-Label Image Datasets
In this section, we conduct experiments on five public image datasets (Flags, Scene, Core15k, MIR Flickr and ESPGame) to validate the performance of MLC-LRR in annotating multi-label images and compare it with other related methods. Here, we randomly partition 30% images of each dataset as the labeled training set and take the remaining images as the unlabeled images. Table 4 reports the results of these comparing methods across various evaluation metrics and: MicroAve, 1-RankLoss, AvgPrec, AUC and Coverage. In the table, •/• indicates whether MLC-LRR is statistically (pairwise t-test at 95% significance level) superior/inferior to the comparing methods under a particular evaluation metric. ↓ along with Coverage means the lower the value, the better the performance.  As we can see from Table 3, the performance of all methods increases with the the increase of the number of labeled images. By comparing with the five related methods, MLC-LRR performs better than or comparable to them in most cases. In summary, out of 10 configurations (1 dataset × 5 evaluation metrics × 2 settings of label ratios), MLC-LRR always outperforms MSC, DLP, TMC and FCML, outperforms TRAM in nine configurations and ties with it only in one configuration. These results demonstrate the advantage of MLC-LRR in annotating multi-label remote sensing images. MSC is an inductive multi-label classifier. Both MSC and MLC-LRR explicitly utilize label correlation and semantic information for graph construction, where MSC constructs an l 1 graph in the label space and MLC-LRR constructs a semantic graph by low rank representation in the label space. However, MLC-LRR always outperforms MSC. This is principally because MSC assumes a large amount of labeled images is available, and it only exploits labeled images, discarding abundant unlabeled images in the training process. Moreover, constructing a graph by low rank representation can better capture the global relationship among images than sparse representation. These results hint that unlabeled images can be used to boost the performance of multi-label classification. DLP is a dynamic label propagation method. In fact, MLC-LRR also makes a prediction of unlabeled images by label propagation. However, DLP is outperformed by MLC-LRR in many cases. One possible reason is that DLP just makes use of the relationship between images in feature space without using label correlations. In contrast, MLC-LRR additionally utilizes semantic information among images by low rank representation. Both Tram and MLC-LRR employ unlabeled images, but Tram is outperformed by MLC-LRR. There are two passable reasons: (i) Tram does not explicitly utilize label correlation; (ii) Tram constructs a kNN graph in the feature space to capture the relationship between images while MLC-LRR constructs a feature-based graph by low-rank representation. It may not be reliable to construct a predefined kNN graph in the feature space for automatically annotating remote sensing images. This observations suggest that the feature-based graph adopted by MLC-LRR can effectively capture the global relationship of remote sensing images.
TMC constructs a directed bi-relational graph to capture three types of relationships: relationship between labels, relationship between images and association between images and labels. Both TMC and MLC-LRR utilize unlabeled images and explicitly employ label correlation, but TMC is outperformed by MLC-LRR in many cases. The passable reason is that TMC just considers pairwise label correlation, whereas MLC-LRR exploits LRR to explore and employ high order label correlation. FCML is another transductive multi-label classifier that takes advantage of pairwise label correlation. Both FCML and MLC-LRR explicitly employ label correlations; however, MLC-LRR significantly outperforms FCML. The cause is that FCML utilizes cosine similarity to capture the relationship between different labels; MLC-LRR constructs a semantic graph to capture the global relationship among labels by LRR, which can effectively capture the global semantic relationship among images. These results not only corroborate the effectiveness of LRR in capturing the global structure of remote sensing images, but also indicate the significance of the semantic graph in capturing the correlation among labels and the semantic relationship between images.
Experimental results in Figure 3 also support the effectiveness of the semantic graph in utilizing the semantic relationship between images. From this figure, we can see that the performance of MLC-LRR and MLC-LRR F increases with the increase of label ratio (from 10% to 35%). Both MLC-LRR and MLC-LRR F utilize unlabeled remote sensing images for training; however, MLC-LRR outperforms MLC-LRR F in all evaluation metrics due to additionally employing the semantic graph. In other words, the semantic graph can further improve the performance of MLC-LRR F . From these results, we can observe that MLC-LRR performs significantly better than MLC-LRR F in all evaluation metrics. Taking 1-RankLoss in Figure 3b for example, with labeled images increasing from 10% to 35%, MLC-LRR F increases by 1.00%, while MLC-LRR increases by 5.19%. The additional improvement of MLC-LRR with respect to MLC-LRR F is attributed to the semantic graph. These results suggest that the semantic graph derived from labels of annotated remote sensing images can be used to boost the performance of image annotation. In addition, these results also justify our motivation to combine both the feature-based graph and the semantic graph for image annotation.
As shown in Figure 4, we can observe that the performance of MLC-LRR and MLC-LRR L increases with the increase of label ratio. However, MLC-LRR always performs better than MLC-LRR L across five evaluation metrics. This is possible because that un-normalized graph Laplacian is more suitable for the Land Cover dataset. Another interesting observation is that MLC-LRR L generally has smaller variance than MLC-LRR. This observation shows that the normalized graph Laplacian provides additional stability.

Results Analysis on Other Multi-Label Images
From Table 4, we can also observe that MLC-LRR performs better (or is comparable to) than the five other related graph-based multi-label classifiers in annotating multi-label images in most cases. The overall observation is similar as that in Table 3. Particularly, out of 25 configurations (5 datasets × 5 evaluation metrics). MLC-LRR outperforms MSC, DLP, Tram and TMC in 88.00%, 72.00%, 44.00% and 72.00% of cases, ties with them in 12.00%, 24.00%, 20.00% and 24.00% of cases and loses to DLP, Tram and TMC in 4.00%, 40.00% and 4.00% of cases, respectively. MLC-LRR outperforms FCML in all configurations as on the Land Cover dataset. We also observe that MLC-LRR performs significantly better than MSC, DLP, Tram and TMC in 100.00%, 73.33%, 60.00% and 73.33% of cases on three high-dimensional image datasets: Core15k, MIR Flickr and ESPGame. MLC-LRR on average improves them by 32.00%, 16.00%, 28.00% and 16.00% on the last three high-dimensional image datasets, respectively. Taking experimental results with respect to MIR Flickr in Table 4 for example, we can observe that MLC-LRR achieves the best (or comparable) performance among all comparing methods across various evaluation metrics. These results demonstrate the advantage of MLC-LRR in annotating high-dimensional multi-label images.
We also observe that the performance of MLC-LRR increases as the average number of labels per image (Avg) increases, compared with the five other related graph-based methods. This is principally because label correlation is more prominent with the increase of the average number of labels of an image; thus, the semantic graph of MLC-LRR can more effectively capture the semantic relationship between images. This fact implies that the semantic graph is useful for capturing label correlation and for image annotation, which coincides with the results on Land Cover images revealed in Figure 3.
Another interesting observation is that MLC-LRR performs similarly to Tram on the Flags and Scene datasets, but it outperforms Tram on the Core15k, MIR Flickr and ESPGame datasets. In practice, both MLC-LRR and Tram take advantage of the label composition concept for graph-based multi-label classification. Tram constructs a kNN graph by the Euclidean distance between images in the feature space, while MLC-LRR constructs a feature-based graph by low-rank representation. These contrast results can be attributed to the dimensionality of images in the first two datasets being much lower than that of the last three datasets; there may be no significant difference between the kNN graph and the feature-based graph in these two datasets. This fact indicates that MLC-LRR is more suitable for annotating high-dimensional multi-label images.

Toy Examples
In this section, we conduct another experiment to visually compare MLC-LRR and the five other related methods in annotating images in the IAPRTC-12 [59] image dataset. The number of distinct labels is 291 in IAPRTC-12, and each image is described by dense SIFT features and is represented by a 1000-dimensional vector. We filter rare labels by keeping the top 30% frequent labels and remove the images that are assigned to fewer than six labels. In the end, we get 3207 images with 88 labels, then we partition these images into two parts. The first part accounts for 30% images used as labeled instances; the second part accounts for 70% images used as unlabeled ones, whose labels are annotated by these comparing methods. In order to transform the predicted likelihood vector into a binary indicator vector, here, we consider the labels corresponding to the r largest entries of f (x) as the predicted labels of image x, where r is determined as the maximum number of labels of these annotated images. Here, r is fixed as 10.
From Table 5, we can observe that MLC-LRR has better (or comparable) performance than other comparing methods in annotating images. In summary, out of 23 (9 + 7 + 7) labels in these three images, MLC-LRR, MSC, Tram, TMC, FCML and DLP correctly predict 21, 15, 14, 16, 2 and 19 of them, respectively. Table 5. Predicted labels for exemplar images in IAPRTC-12 [59]. The ground-truth labels are highlighted in bold font, and the potentially correct labels are in italic font. Another interesting observation is that these comparing methods also annotate additionally correct labels to these images. For example, "lamp", "building", "man" and "sweater" are not in the ground-truth label set of the first image; "lamp", "man" and "hair" are not in the ground-truth label set of the second image; "building" is not in the ground-truth label set of the last image; but we can see that "lamp", "building", "man" and "sweater" are annotated to the first image, "lamp", "man" and "hair" to the second image, and "building" to the last image by these comparing methods. By taking into account these additional labels, MLC-LRR also produces more correct labels than these comparing methods.
These examples show that the proposed method has the potential to provide more complete annotations of images than other comparing methods. These toy examples also indicate the potential application of MLC-LRR in collaborative tagging systems (also known as folksonomy) [60]; for example, folksonomy-based personalized searches [61], folksonomy-based recommender systems [62] and folksonomy-based social media services [63,64].

Conclusions
In this paper, we take advantage of low-rank representation for graph construction and introduce a graph-based Multi-Label Classifier based on Low Rank Representation (MLC-LRR) for annotating remote sensing images associated with multiple labels. Unlike existing methods that focus on the single-label problem and just construct graphs in the feature space, we construct a semantic graph based on LRR in the label space. Our empirical study with five related methods on the Land Cover remote sensing images shows that MLC-LRR performs significantly better than these methods. In addition, we find that the synergy of the two graphs constructed by LRR in the feature and in the label space achieves better performance than using the graph in the feature space alone. Extra experimental results on five multi-label image datasets and a case study again show that MLC-LRR performs better than these comparing methods in annotating multi-label images. In the future, we want to investigate the effectiveness of LRR with other graph-based multi-label classifiers and design more effective algorithms for remote sensing image annotation.