High Resolution Satellite Image Classification Using MultiTask Joint Sparse and Low-Rank Representation

Scene classification plays an important role in the intelligent processing of HighResolution Satellite (HRS) remotely sensed image. In HRS image classification, multiple features, e.g. shape, color, and texture features, are employed to represent scenes from different perspectives. Accordingly, effective integration of multiple features always results in better performance compared to methods based on a single feature in the interpretation of HRS image. In this paper, we introduce a multi-task joint sparse and low-rank representation model to combine the strength of multiple features for HRS image interpretation. Specifically, a multi-task learning formulation is applied to simultaneously consider sparse and low -rank structure across multiple tasks. The proposed model is optimized as a non-smooth convex optimization problem using an accelerated proximal gradient method. Experiments on two public scene classification datasets demonstrate that the proposed method achieves remarkable performance and improves upon the state-of-art methods in respective applications.


Introduction
With the rapid development of remote sensing techniques over recent years, High-Resolution Satellite (HRS) images are becoming increasingly available thus enabling us to study earth observations in greater detail.However, despite enhanced resolution, these details often suffer from the spectral uncertainty problem stemming from an increase of the intra-class variance and decrease of the inter-class variance [1], and the curse of dimensionality problem resulting from the small ratio between the number of training samples and features [ 2].Taking into account these characteristics, HRS image classification methods have evolved from pixel-oriented methods to object-oriented methods and achieved precise object recognition [3][4][5].Object-oriented feature extraction methods cluster homogeneous pixels and take advantage of both local an d global properties [6].These successful development of feature extraction technologies for HRS satellite images has increased the usefulness of remote sensing applications in environmental and land resource management, and security and defense issues, urban planning, etc.
Scene representation and recognition of HRS satellite images is a challenging task given the ambiguity and variability of scenes, and has attracted much attention in recent years [7][8][9][10].Scene classification is aimed at automatically labeling an image from a set of semantic categories [11][12][13].In this paper, the term "scenes" refers to separated sub-blocks split from a large satellite image.Scenes often contain multiple land-cover objects having a specific semantic meaning, such as an agricultural area, residential area, mobile home park, and golf course in a satellite image.These high-level latent semantic concepts make it difficult to recognize HRS satellite scenes.As a consequence, the main problem in the HRS satellite scene interpretation is bridging semantic gaps [14].Semantic-based scene classification has been widely applied in HRS image scene interpretation [15,16].It is usually difficult to understand and recognize scene categories because of the high complexity of spatial and structural patterns in the massive HRS satellite images [17].Therefore, feature representation in each scene is a key step and highly demanded for accurate scene classification.
To obtain the meaningful features for scene classification, many descriptors have been developed in recent years.Features such as color distributions describing the reflective spectral information [18,19], textures reflecting a specific, spatially r epetitive pattern of surfaces [20,21], structures containing macroscopic relationships between objects [ 22,23] have been widely used in HRS satellite image classification; but none of the feature descriptors have the same discriminating power for all classes of scene.For example, features based on color information might perform well when classifying forest and desert, while a classifier for residential areas should be invariant to the actual color of the scenes.Therefore, instead of using a single modality of feature for all classes, adaptively fusing a set of diverse and complementary feature modalities might more accurately and precisely discriminate a class from all others.
There are two general fusion strategies within the machine learning trend to semanti c scene analysis, namely: early fusion and late fusion.The former combines cues prior to feature extraction [24,25], and the latter first separately extracts features and then combines them at the classifier stage [26,27].Both early and later fusion methods can be used to classify an HRS image for satellite scene classes have multiple feature dependency and independency simultaneously [ 6,28].Because different features may have different scales, hard comb inations such as concatenation may cause redundancy and degenerate efficiency and performance.Recent studies on Multiple Kernel Learning (MKL) [29] that fuse different features through multiple similarity function combinations can effectively improve the classification performance [30,31].Several combination methods inspired by MKL have been proposed varying from linear to nonlinear, and from the same type of kernel to different types of kernels [32,33].
In contrast to this family of work, Yuan et al. [34] proposed a Multi-Task Joint Sparse Representation and Classification (MTJSRC) framework for visual recognition in a regularized Multi-Task Learning (MTL) framework.The idea behind MTL is basically that, when the tasks to be learned are similar or related in some sense, it may be advantageous to take into account these cross-task relations in the model.Experimental results have demonstrated the effectiveness of such a framework [35,36].The MTJSRC framework was motivated by the success of multi-task joint sparse linear regression and the Sparse Representation Classification (SRC) [37] approaches, which has been applied in HRS satellite image classification and achieve excellent performances [ 38,39].Based on the knowledge transferring mechanism in MTL [40] and the collaborative representation mechanism in SRC [41], MTJSRC can deal with the "lack of samples" problem for high -dimensional signal recognition [38].The MTJSRC method can learn a common subset of features for all tasks through joint sparsity regularization [42] by penalizing the sum of  2 norms of the blocks of coefficients associated with each covariate group across different classification problems.From the perspective of linear regression, MTJSRC was inspired by Multi-Task Joint Covariate Selection (MTJCS) which can be regarded as a combination model of group Least Absolute Shrinkage and Selection Operator (LASSO) [43] and multi-task LASSO [44].Li et al. [38] introduced the MTJSRC paradigm for hyperspectral image classification and achieved competitive performance.However, the multiple learning tasks in MTJSRC can be coupled using a set of shared factors possessing low-rank structure [45].For example, satellite scene images with different labels may share similar background under a low-rank structure.Chen et al. [46] demonstrated the effectiveness of the MTL formulation considering the sparse and low -rank patterns from multiple related tasks.
Inspired by the existing works in this fields, we present a Multi-Task Joint Sparse and Low-rank Representation and Classification (MTJSLRC) for HRS images.In this paper, the term "multi-task" means that several linear representation models are simultaneously estimated through regularization on parameters across all the models.For example, when classifying scenes, we obtain K different linear representation models from K different visual features (e.g.texture, shape, and color).The joint sparsity and low -rank are enforced by imposing the  1,2 -norm penalty as proposed by [40,42] and trace norm penalty as previously developed approaches in [47,48].The objective in MTJSLRC is to determine a squared reconstruction error term and two convex but non-smooth ( 1,2 -norm and trace norm) regularization terms.We deform the model and the use the Accelerated Proximal Gradient (APG) method [49] to solve this non-smooth convex optimization problem.Similar to MTJSRC, classification is ruled in favor of the class that has lowest total reconstruction error accumulated form all the tasks [34].Extensive experiments show that our method takes advantage of multiple features and thus overcomes the over-fitting problem produced by the hyper -dimensional stacked feature space and "lack of samples".In our framework, a low-rank constraint is applied to reduce redundancy and correlation in highly correlated tasks for HRS satellite image classification.
The contribution of this study lies in the combination of multiple feature based on MTL, SRC, and low-rank representation.We found that the multi-task joint sparse and low -rank representation is a simple yet effective way to combine multiple complementary features to improve the HRS image classification accuracy.We overcome the problem of incoherent sparse and low -rank patterns by considering multiple related features, and decomposing model parameters as a joint sparsityinducing component and a low -rank component.Specifically, we employ a  1,2 -norm regularization term to enforce group sparsity in the model parameter, and identify the essential discriminative feature for effective HRS image classification; meanwhile, we use a t race-norm constraint to encourage the low -rank structure, capturing the underlying relationship among the tasks for improved generalization performance.We employ the APG method to solve this as a non-smooth convex optimization problem.
The remainder of this paper is organized as follows: Section 2 introduces the proposed MTJSLRC framework for HRS image classification.The experimental results and analysis are presented in Section 3. In Section 4, some concluding remarks and prospects for future work close the paper.
Notations For any matrix X ∈  × , let   be the entry in the -th row and -th column of X; ‖  ‖ 0 denotes the  0 -norm which counts the number of non-zero entries in X; let ‖  ‖ 1 denote the  1 -norm and ; let ‖  ‖ * denote the nuclear norm which is the sum of absolute value of all the singular values.

The Proposed Method
In this section, we describe the MTJSLRC framework for representation and classification of HRS image scene with multiple feature representation.We also present the optimization method resorting APG algorithm for this framework in detail.

The MTJSLRC Framework
The working mechanism of the proposed method is depicted in Figure 1.In the preprocessing stage, multiple feature modalities for all the training images from each of classes are extracted.Given a test image, all features that are exactly same as training images are abstracted.Each feature is represented as a linear combination of the corresponding training features in a joint sparse and lowrank way.In this paper, we focus on the usage of  1,2 -norm penalty and  1 -norm of the low -rank constraint to enforce joint sparsity and low -rank structure across representation tasks.Thus, the objective function consists of a squared reconstruction error term, a non-smooth  1,2 -norm regularization term, and a non-smooth  1 -norm across low -rank regularization term.In order to use the APG method [34,49] for optimization, we transform the model into a combination of a smooth convex term and a non-smooth term.Once the representation coefficients are estimated, the category can be decided according to the overall reconstruction error of the individual class. .Each sub-dictionary   can model a convex set for a specific class, and the collaborative dictionary X ∈  × , made up of all the subdictionary   , maps each feature vectors into a new dimensional space corresponding to the dictionary.Given a testing image feature Y ∈   , the optimization problem of the sparse linear representation model is described as follows: where  denotes the noise level parameter.It is known that problem (1) is NP-hard.Previous research results [50] show that under mild assumptions, this problem can be relaxed as the follow ing objective function: This optimization problem is convex and the optimal solution  ̂ can be efficiently solved.Then, for classification, the class of the image feature  can be determined by minimizing the reconstruction error   (error between Y and the linearly reconstruct result from the training images in the m-th class) as follows: where  ̂ denote the components of  ̂ corresponding to class .In the study of face recognition, the SRC is expressed as the model ( 2) and the decision rule (3).Peer-reviewed version available at Remote Sens. 2017, 9, 10; doi:10.3390/rs9010010

Multi-Task Joint Sparse Representation Classification
The SRC model was originally developed for single feature, and MTJSRC model extends it to multiple features and multiple instances based visual recognition.Suppose K modalities of features for all the training samples with M classes, and the   ∈    × is the training feature matrix for each modality index K = 1, ⋯ , K .Then, we denote    ∈    ×  as the   columns of   associated with the -th class.For a testing image, let Y = {  ∈    ,  = 1, ⋯ , ,  = 1, ⋯ , } be the ensemble of L different instances (e.g.multiple transformation of a HRS scene) with same K modalities of features as training images.For each testing image feature   , we suppose the representation vector as   = [ (  1  )  , ⋯ , (   )  ]  , which    ∈    restricts on class .Let us define the coefficients associated with class  as   = [  11 , ⋯ ,    ] ∈    × .Thus, the multi-task joint covariate selection model in sparse learning [42] seeks to solve the following optimization problem: where the expressions of  (  ) and ( ) are defined respectively as This optimized problem can be solved by the APG method [49].Given the optimal coefficient matrix  ̂, we can approximately recover each testing feature   as    ̂ .The class can be decided with the lowest reconstruction error accumulated over all the K × L tasks: The model ( 4) together with decision rule ( 7) is known as MTJSRC in the study of visual classification [34].

Multi-Task Joint Sparse and Low-rank Representation Classification
The MTJSRC model described in the previous section considered the sparse patterns from multiple related tasks (multiple features and instances based visual recognition).However, in the HRS image classification, the underlying predictive classifiers lie in a hypothesis space of some lowrank structure for the redundancy and correlation in highly correlated tasks.In this paper, we consider both the sparse and low -rank patterns for multiple features and instances-based HRS image classification to improve performance.

Class-Level Joint Sparse and Low -rank Regularization
This formulation of problem (4) improves the independent learning model ( 2) to a joint learning model by imposing a class-level sparsity-inducing term.It can be useful to represent a testing image by a few training samples under the common class for the multi-class classification.To encourage the low-rank structure in the model coefficient, we impose a class-level rank-constraint term to capture the underlying relationship among the tasks for improved generalization performance.Therefore, the representation of multiple features and instances may share certain class -level sparse and low -rank patterns.
To consider the low -rank structure within class m, we apply rank constraint over   .We employ  1 -norm across the rank constraint of   to reduce the redundancy in highly correlated tasks for HRS image classification.We denote the class-level rank constraint term as follows: We propose to solve the following multi-task joint sparse and low -rank representation model: where the expressions of  (  ) , () and Γ() are given in ( 5), ( 6) and ( 8), respectively, and  and  are the regularization coefficients to balance the strength of the general loss component and regularization terms.The problem (9), however, is non-convex and the solution may not be unique due to the rank-constraint in Γ(), which can be regarded as  0 -norm of its singular value matrix.
To make the problem tractable, we relax the rank operator with nuclear norm, and rewritten the model as follow: where () is the following  1 -norm across the nuclear norm: The classification rule of our model, therefore, is identical with MTJSRC.We call the model ( 10) together with the decision rule (7) as MTJSLRC, namely multi-task joint sparse representation and classification.

Optimization Algorithm
The problem (10) is intractable for the two non-smooth convex regularization terms P (  ) and Q (  ) .Considering a general minimization problem of that the objective composes a smooth convex term and a non-smooth convex term, Nesterov et al. [49] proposed the APG method that achieving O(1/ 2 ) rate of convergence.Chen et al. [51] applied a nearly unified treatment using existing APG methods to group/multi-task joint sparse learning.Similar to [51], Yuan et al. implemented an APG optimization procedure for MTJSRC [34].In this paper, we solve the problem (10) by transforming it to a combination of a smooth convex term and a non-smooth term.Then, we apply the APG algorithm to optimize it.We adopt the Moreau Proximal Smoothing [52] on the nuclear norm regularization term in Q (  ) .More formally, the nuclear norm  ‖   ‖ * is approximated by Moreau approximation where  is the smoothing parameter.The Φ  (   ) is convex and smooth with respect to   , and the gradient can be computed as where G * (   ) = arg min  ( ‖   −  ‖  2 +  ‖  ‖ * ) .The closed-form expression of G * (   ) can be determined using the soft-threshold operation on the singular values of   [48], and the gradient can be denoted as where   = ΣV T is the singular value decomposition of   , Σ  is diagonal with (Σ  )  = max (0, Σ  − ), and  = /.Therefore, we apply the following smoothing function to the class-level rank constraint term Q (  ) , and the approximation is: The Ω (  ) is convex and smooth due to Φ  (   ) is convex and smooth, and the gradient is: We replace the nuclear norm with its Moreau approximation in model ( 12) and obtain the approximated objective with only one non-smooth term.
̂= arg min   (  ) + () + Ω(), We define the smooth component in (17) as Η (  ) =  (  ) + Ω().The objective function can be seen as the summation of a smooth term Η (  ) and a non-smooth Then, we can use the APG optimization algorithm to solve problem (18).Algorithm 1 summarizes the details of optimization and classification.Each iteration consists of the generalized gradient mapping step and the aggregation forward step.In the generalized gradient mapping step, we update the  (+) using current matrix  (+1) as follows: where  is the step-size parameter.The solution of problem shown in [53] is: Then, as shown in [53], we apply the aggregation forward step to update  () as follows:

Time complexity analysis
Due to the iterative characteristic of MTJSLRC, the computational complexity depends on two factors, the number of iterations before convergence and the time consumed at each iteration.As MTJSRC, the objective of our proposed model is to minimize reconstr uction error of a testing image； therefore, it is not necessary to execute the algorithm until convergent for the best recognition performance.Therefore, we consider the dominant computational cost at each iterate of Algorithm 1, which comes from the calculation of gradient ( 25) and ( 26) in step 2. Assume T be the average number of iterations for the running of Algorithm 1, the total Floating-point operations (Flops) for gradient estimation of (25) in step 2 is  (    + 2  ) as estimated in [34].The timeconsuming part of ( 26) are SVD of matrix   (  ) and the Σ  V T in ∇Φ  (  (  ) ).The costs of the two terms are typically O ( Ms ) and O ( 2 (  ) 2   ) flops, respectively, where  is the computation time for the SVD of   (  ) .The total flops consumed by gradient estimation in ( 24) is typically O (  + 2 (  ) 2   ) .The time consumed in the other steps is negligible in comparison to that of gradient estimation in step 2.

Experiments and Analysis
In this section, we provide the experimental setups, and discuss the results on two public datasets.We conducted several groups of experiments to evaluate the feature combination capability and effectiveness of MTJSLRC for HRS image classification.

Experimental Setup
We evaluated our proposed MTJSLRC method on two public land-use scene datasets, which were:


UC Merced Land Use Dataset.The UC Merced dataset (UCM) [10] is one of the first ground truth datasets derived from publicly available high resolution overhead image; manually extracted from aerial orthoimagery downloaded from the United States Geological Survey (USGS) National Map.This dataset contains 21 typical land-use scene categories, each of which consists of 100 images measuring 256 × 256 pixels with a pixel resolution of 30 cm in the red-green-blue color space.Figure 2 shows two examples of ground truth images from each class in this dataset.The classification of UCM dataset is challenging because of the high inter-class similarity among categories such as medium residential and dense residential areas. WHU-RS Dataset.The WHU-RS dataset [54] is a new publicly available dataset that all the images are collected from Google Earth (Google Inc.).This dataset consists of 950 images with a size of 600 × 600 pixels distributed among 19 scene classes.Examples of ground truth images are shown in Figure 3.It can be seen that, as compared to the UCM dataset, the scene categories in the WHU-RS dataset are more complicated due to the variation in scale, resolution, and viewpoint-dependent appearance.For the test image, we utilize four types of transform to obtain multiple instance as follows: zoom in it 1.2, flip it left to right, and rotate it five degrees clockwise and counterclockwise.Therefore we utilize L = 4 instances for each test image in the MTJSRC and MTJSLRC models.We give an overview of the features used in our experiments, and refer to the correspondin g publications for more details:  Bag of Visual Words (BoVW).We extracted Scale-Invariant Feature Transform (SIFT) descriptors [18] using a dense regular grid on the image with image patches at a 16 × 16 pixel size over a grid with spacing of eight pixels [22].The visual vocabulary containing 600 entries was formed by k-means clustering of a random subset of patches from the training set. Multi-Segmentation-based correlaton (MS-based correlaton) [8].SIFT descriptors are extracted on a regular grid with a spacing of eight pixels and at 16 × 16 pixels grid size.The segmentation size is set at six and the number of segments were {2 2 , 2 3 , 2 4 , 2 5 , 2 6 , 2 7 }.The MSbased correlograms were quantized in 300 MS-based correlatons using k-means. Dense words (including PhowGray, PhowColor) [11].The PhowGray was modeled using rotationally invariant SIFT descriptors computed on a regular grid with step of five pixel at four multiple scales (5, 7, 9, 12 pixel radii), zeroing the low contrast pixels.Then the descriptors were subsequently quantized into a vocabulary of 600 visual words that generated by k-means clustering.The PhowColor is the color version of PhowGray that stacks SIFT descriptors for each HSV color channel. Self-SIMilarity features (SSIM).SSIM descriptors [12] were extracted on a regular grid at steps of five pixels.We acquired each descriptor by computing the correlation map of a 5 × 5 pixels patch in a window of radius 40 pixels, quantizing it in 3 radial bins and 10 angular bins.This way, we obtained a pack of 30 dimensional descriptor vectors.These descriptors were then quantized into 600 visual words.We computed all but the MS-based correlaton features in a spatial pyramid as proposed in [22].A pyramid representation consists of several levels obtained by partitioning the image into increasingly fine non-overlapping sub-regions and computing histograms of features found inside each sub-region.The features of each level were concatenated to build the final descriptor.We computed a three-level pyramid of spatial histograms for each feature channel.In the experiment, we divided the 10 times to obtain reliable results, and all the final results, as well as the classification accuracy rate for categories were recorded as the mean and standard deviation of these 10 runs.
The features were computed using open source code [55].All experiments in this work are implemented var Matlab 8.0/Windows 10, and run on a workstation equipped with 4 Intel quadcore 3.3 GHz CPU with 16GB memory.

Explanation of Feature Combination
We applied the UCM dataset to demonstrate the feature combination capability of MTJSLRC.For each image, we set K = 2 for feature combination test, including the SSIM and BoVW features.These two features are complementary in terms of co-occurrence of local patches and appearance.We used L = 4 instances for each test image by transformation, and obtained K × L = 2 × 4 representation tasks.The number of training images was varied using   = {10,20,30,40,50,60,70,80,90} per category for training and the remaining images for testing.
Figure 4 shows the classification accuracy results of individual features by SRC and their combination by MTJSRC and MTJSLRC.The MTL-based models including the MTJSRC and MTJSLRC models improved the performance by well feature combination.We can see that the performance improved as the training ratio increased since more data is available for model training.Moreover, the average accuracy approached 80% while the number of training images per category was 20.This indicates that the SRC and MTL can deal with t he "lack of samples" problem in HRS image recognition.Compared with the MTJSRC model, our MTJSLRC method classification accuracy slightly for low number of tasks.The low -rank structure had no significant effect on the MTJSRC while the class-level coefficient  (  ) was less than or equal to the number of tasks.In Figure 5, several example classifications are provided while the number of images per category was   = 40.In the figure 5(a), we can see that the cases of failure using BoVW feature due to the relatively ordinary appearance, resulting in misclassification into similar categories.In contrast, the SSIM feature uses the geometric shape information in these six samples, and thus classifies them correctly.However, the SSIM feature failed in some of other samples shown in Figure 5(b) that the BoVW feature recognizes correctly, when samples lacked of obvious spatial arrangements and their background were clutter.We fused the benefits of these two complementary features by combining them to improve the classification performance.Peer-reviewed version available at Remote Sens. 2017, 9, 10; doi:10.3390/rs9010010

Parameter Effect
We investigated the effect of iteration on classification performance (Figure 6).Although our optimal algorithm has been proved to be convergent to global minimum with the optimal rate O(1/ 2 ) in [34], it does not guarantee a monotonous decrease in objective value.We therefore, have to run our optimal algorithm until convergent within several hundred times of iteration.However, in our experiments, we found that convergence is not necessary for the good classification performance.The results seen in Figure 6 show that performance increases first and then gradually drops with the increase of iterations.The best performance on the two datasets consistently occurs at about 10 time of iteration.This evaluation demonstrates that the objective of our proposed method is to address minimal reconstruction error on a testing image, while those classifier training based methods, such as SVM, directly optimize the classification error on training data.Thus, instead of running Algorithm 1 until convergent, we can achieve a sufficient classification performance within a few times of iteration.

Figure 6. Classification pe rformance of MTJSLRC against the time s of ite rations on the UCM and WHU-RS datasets
There are two other parameters that affect the classification performance, including the regularization coefficients for class-level sparsity and low -rank constraint.We analyze the effects of the parameters on the classification accuracy to choose the optimal parameters.These regularization coefficients determines the strength of the loss and regularization terms.Intuitively, there is actually a trade-off between the sparse structure and low -rank structure.Let us consider several special cases of our formulation: when α = 0, the problem degenerates to a model with only a low-rank structure that learns a small number of shared features among tasks; when β = 0, the problem degenerates into model with only sparse structure among tasks.To take advantages of both properties, we adjust α and β to balance the sparse and low -rank structures.
We tested a series of α and β on the UCM and WHU-RS datasets, and the classification results are shown in Figure 7.The sparse regularization parameter was selected from the range α ∈ {0, 0.1, 0.2, … , 1}, and the low -rank regularization parameter β ∈ {0, 1, 2, … ,30} was selected for these two datasets.From Figure 7, we can observe that MTJSLRC achieves best results at most of settings for these two datasets.This verifies the ability and benefit of MTJSLRC when simultaneously learning low-rank and sparse structures from multiple tasks.For the low -rank regularization coefficient, although the classification accuracy on the UCM and WHU-RS datasets fluctuates, it takes on an overall trend that first improves, then comes to its maximum, and begins to gradually decrease.The optimal low -rank regularization coefficient was around 25 to the UCM dataset and 20 to the WHU-RS dataset for most of sparse regularization parameter.This demonstrates the significance of the lowrank structure for this multiple feature combination tasks based on MTL and SRC.Peer-reviewed version available at Remote Sens. 2017, 9, 10; doi:10.3390/rs9010010performance to the sparse regularization parameter α was relatively smooth relative to the low-rank regularization coefficient.The overall optimal α was both around 0.1 for these two datasets.To better visualize this phenomena, we selected α = 0.1 to distinguish effects of the low-rank regularization parameter β on these two datasets.As shown in Figure 8, the trend in the classification accuracy is not easy to see.This is probably because the convergence of our objective function to minimizer is no guaranteed, and the objective value does not monotonously decrease.On the whole however, the performance first improves and then gradually drops with the increase of β, and the best performance occurs at β = 24 for the UCM dataset and β = 18 for the WHU-RS dataset.The results show clearly that the multiple tasks in MTJSLRC share one low -dimension feature space assumed as low -rank structure in this paper.The low-rank regularization parameter β indeed had a substantial impact on final performance, and overlooking the low-rank structure for these two datasets would have negative compromised the results.

Classification Results
We applied the MTJSLRC to HRS image classification on the UCM and WHU-RS datasets.In addition, to further illustrate the effect of our method, we compared our MTJSLRC method with the following methods:


Feature combination based on independent SRC.This method can be seen as a simplification of MTJSLRC method without the joint sparsity and low -rank structure across tasks.So the coefficients  ̂ are independently learned by SRC.


Feature combination based on MTJSRC.This method enforces the joint sparsity across tasks but ignore the low -rank structure in the multiple feature space.


The representative multiple kernel learning method.The kernel matrices are computed as exp (− 2 (x,  ′ )/μ), where μ is set to be the mean value of the pairwise  2 distance on the training set.
The classification accuracy of our MTJSLRC along with baselines and results from several stateof-the-art methods on the UCM dataset are shown in Tab le 1.The results on single feature are listed in Table 1(a).We can observe that SRC based methods yield comparable accuracies to SVM on single features.The results by feature combination methods are tabulated in Table 1(b).It can be seen that all feature combination methods dramatically improve classification performance, but our MTJSLRC based algorithm are slightly better than SRC based combination method, the MTJSRC based method, and the MKL method.The independent SRC combination, a simplification of MTJSRC or MTJSLRC based method, is competitive to the MKL.By considering the joint sparsity across different tasks, the MTJSRC based algorithm is superior to the independent SRC combination methods, even better than the MKL, but slightly inferior to our MTJSLRC method that takes into account the low -rank structure from multiple tasks.As the SRC based combination and MTJSRC ba sed methods, our MTJSLRC method does not require any classifier training procedures.Thus it is flexible in practice, and novel reference samples can be introduced without additional efforts to update the classifier.The classification performances of individual classes on the UCM and WHU-RS datasets using our proposed MTJSLRC method with the optimal parameters as previously described are shown in the confusion matrices shown in Figure 9.As observed, there is some confusion between certain scenes in the UCM dataset.The identified positive samples for the storage tanks display the greatest confusion because their color information, spatial information, and texture information are likely to be confused with those of baseball diamond, buildings, intersection s, forests, golf courses, airplane fields, and mobile home parks.The most confusing pairs were median residential and dense residential with the misclassification rate reaching 12% because of the strong similarity of these scenes.Therefore, the features used in our research were not sufficient for separating these scenes, and additional features must be included in our future work.The classification results on the WHU-RS dataset is illustrated in Figure 10.Based on the fusion of the visual effect, deserts, football fields, parks, ponds, mountains, and viaducts achieve the best results over 97%; residential areas are mixed with commercial, and industrial areas are mixed with residential.This may result from the strong similarity of these scenes and intuitively , give rise to weak performance.

Running Time
In this experiment, we analyzed the running times for different models on the UCM and WHU-RS datasets.As shown in Table 3, per query times of our method were 0.37s for the UCM dataset and 0.38s for the WHU-RS dataset, while per query times were 0.09s and 0.12s for the SRC combination method, and 0.1s and 0.12s for MTJSRC method.The running time of the MKL method was much longer than the others on account of the required training phase.

Discussion
HRS image classification plays an important role in understanding remotely sensed image.In our work, we built a multi-task joint sparse and low -rank representation for HRS image classification.Our objective is to improve the classification accuracy by fusing multiple features and instances.The experimental results on the UCM and WHU-RS datasets indicate that the proposed MTJSLRC model is competitive with the state-of-the-art methods.
From the experiments on feature combination illustrated in Figure 4, we observe that the multitask joint sparse representations is a simple yet effective way to fuse multiple complementary visual features and instances to improve the classification accuracy.Considering the low -rank structure, our MTJSLRC model can achieve slightly more accurate results than the MTJSRC model for multiple tasks.The performance is competitive even when the number of samples for learning is small.This benefits from MTL as it transfer knowledge from one task to another.Peer-reviewed version available at Remote Sens. 2017, 9, 10; doi:10.3390/rs9010010 We tested three important parameters of the MTJSLRC method in experiments.As shown in Figure 6, we found that the convergence is not necessary and the algorithm can achieve good classification performance with several iterations.This means that our proposed method requires less time overall and hence is very competitive.We see from Figure 7 and Figure 8 that the two regularization parameters for the sparse structure and low -rank structure have impact on the final performance.It shows improvement at first and then a gradually dropping performance trend with an increasing low -rank regularization parameter.The variation of performance along with joint sparse regularization parameter is relatively stable for two dataset s as discussed in this paper.Our experiments show that the low -rank regularization parameter ranging from 20-25 is suitable for the best accuracy.The joint sparse regularization parameter as 0.1 is sufficient to result in good performance.Table 1 and Table 2 show that our method can fuse multiple complementary visual features and instances to improve classification accuracy.The proposed MTJSLRC method achieves better classification results than the MTJSRC method, which ignores the low-rank structure across tasks, and is slightly superior to MKL.
The proposed MTJSLRC method performs quite competitively to several representative stateof-the-art approaches by fusing multiple complementary features and instances, thus considering the sparse and low -rank structure across tasks.However, our MTJSLRC method is inferior in terms of computational speed when compared to the state-of-the-art methods since the SVD algorithm is used in the optimal solution.Considering the computational complexity, we use 4 instances for each test image by transformation.In future work, we plan to improve MTJSLRC by elaborating more efficient optimal schemes with increased instances to add more robustness and cope with variations in scales, translation and rotation.

Conclusions
This paper presents the MTJSLRC algorithm for HRS image scenes classification.In MTL framework, both low -rank structure and sparse structure are important but are quite different in nature.We note that the multi-task joint sparse and low -rank representation is a simple yet effective way to fuse multiple complementary features and instances.Compared to the MTJSRC method that only considers the sparse structure, our proposed method can improve classification performance by learning low -rank and sparse structures simultaneously.Experiments on the UCM and WHU-RS datasets indicate that our method performs quite competitively to several representative state-of-theart approaches.Similar to the SRC and MTJSRC methods, our proposed method is free of classifier training, which makes it convenient when introducing novel reference samples and classifier update s.On the whole, multi-task joint sparse and low-rank representation is a promising method for scene classification with multiple features and/or instances in terms of accuracy and computational cost.In the future work, we will consider more texture, shape, or structural features that are more appropriate for HRS image scene classification.Accelerating the speed of the algorithm is another research direction given its practical significance.

Figure 1 .
Figure 1.Flowchart of the propose d approach for HRS sce ne classification

PreprintsFigure 3 .
Figure 3. Example ground truth image s of e ach sce ne cate gory in WHU-RS datase t

Figure 4 .
Figure 4. Classification re sults on the UCM datase t.The MTL base d mode ls, MTJSRC and MTJSLRC mode ls, outpe rform e ach single task SRC mode l.The MTJSLRC mode l is slightly be tte r than MTJSRC due to low numbe r of tasks.

Figure 5 .
Figure 5. Example classification on fe ature combination (with   = 40).(a) The SSIM fe ature classifie s corre ctly but the BoVW fe a ture make s an incorre ct de cision; (b) The SSIM feature classifies incorre ctly but the BoVW fe ature make s an right de cision.For the image s in (a) and (b), our MTJSLRC mode l make s the right de cision.

Figure 7 .
Figure 7. Classification pe rformance of MTJSLRC against re gularization parame te rs α and β.The xaxis (le ft) re pre se nts α, the y-axis (right) re pre se nts β, and the z-axis (ve rtical) is average classification accuracy.(a) Effe ct on the UCM dataset; (b) Effe ct on the WHU-RS dataset.

Figure 8 .
Figure 8. Classification pe rformance of MTJSLRC against low-rank re gularization parameter β while sparse re gularization parame te r α = 0.01.The x-axis re pre se nts low-rank re gularization coefficient β, and the y-axis is ave rage classification accuracy.

Figure 9 .
Figure 9. Confusion matrix for the MTJSLRC me thod on the UCM datase t.

Figure 10 .
Figure 10.Confusion matrix for the MTJSLRC me thod on the WHU-RS datase t.

Table 2 .
Table 2(a) lists the results on a single feature, which indicate that SRC methods are competitive to SVM for single features on this dataset.Table 2(b) shows the results from feature combination methods.We can see that our algorithm performs comparably to the MKL method, and superior to the independent SRC combination and MTJSRC methods.

Table 3 .
Running time comparison (in se conds)