Joint Sparse and Low-Rank Multitask Learning with Laplacian-Like Regularization for Hyperspectral Classiﬁcation

: Multitask learning (MTL) has recently provided signiﬁcant performance improvements in supervised classiﬁcation of hyperspectral images (HSIs) by incorporating shared information across multiple tasks. However, the original MTL cannot effectively exploit both local and global structures of the HSI and the class label information is not fully used. Moreover, although the mathematical morphology (MM) has attracted considerable interest in feature extraction of HSI, it remains a challenging issue to sufﬁciently utilize multiple morphological proﬁles obtained by various structuring elements (SEs). In this paper, we propose a joint sparse and low-rank MTL method with Laplacian-like regularization (termed as sllMTL) for hyperspectral classiﬁcation by utilizing the three-dimensional morphological proﬁles (3D-MPs) features. The main steps of the proposed method are twofold. First, the 3D-MPs are extracted by the 3D-opening and 3D-closing operators. Different SEs are adopted to result in multiple 3D-MPs. Second, sllMTL is proposed for hyperspectral classiﬁcation by taking the 3D-MPs as features of different tasks. In the sllMTL, joint sparse and low-rank structures are exploited to capture the task speciﬁcity and relatedness, respectively. Laplacian-like regularization is also added to make full use of the label information of training samples. Experiments on three datasets demonstrate the OA of the proposed method is at least about 2% higher than other state-of-the-art methods with very limited training samples


Introduction
By simultaneously acquiring hundreds of continuous narrow spectral wavelengths for each image pixel, hyperspectral remote sensors can generate three-dimensional (3D) hyperspectral cubes containing rich spectral and spatial information.Thanks to the high spectral and spatial resolution, hyperspectral images (HSIs) have opened up new avenues for the remote sensing applications, including earth observation [1], environmental science [2], agriculture [3], mineral identification [4], etc. Supervised classification [5][6][7][8], which aims at assigning each pixel in the HSI into an accurate class label with representative training samples, plays an important role in those applications.However, it still faces some challenges in the classification task.For example, a large number of labeled samples are crucial to producing good classification results due to the Hughes phenomenon, but in reality, it is extremely difficult even impossible to identify the labels of samples.Other difficulties, such as the spectral signature from identical material may vary while different materials may have similar signatures, will also give a negative effect on the classification performance.
Much work has been carried out in the literature to address the above-mentioned dilemmas.The first category is to design reliable classifiers, among which support vector machine (SVM) [9,10] is one of the state-of-the-art methods.The success of SVM stems from the fact that it is robust to the high spectral dimension by using kernel tricks.Variations of the SVM-based methods are proposed to improve the classification performance, such as incorporating spatial information by composite kernels (SVMCK) [11] or multiple kernels [12], combining the kernel matrices of a family of SVM classifiers by ensemble learning [13], and the semi-supervised learning extension [14,15] by exploiting both labeled and unlabeled samples, etc.However, the kernel tricks in the SVM require proper tuning of parameters.Sparse representation-based classification (SRC), which is pioneered by Wright et al. [16] for face recognition, has become another widely used hyperspectral classifier [17][18][19][20][21][22][23][24][25].The SRC is suitable for classifying the HSIs because it relies on the assumption that the spectral signatures of hyperspectral pixels belonging to the same class approximately lie in the same low-dimensional subspace and the unknown test sample can be sparsely represented by a linear combination of a few atoms from the whole training dictionary.Haq et al. [26] propose a homotopy-based SRC to alleviate the problem introduced by the high-dimensional data with too few labeled samples.The contextual information of neighboring pixels can also be incorporated to solve the problem of lack of sufficient training samples [18,19,22,23].Multitask learning (MTL) [27][28][29][30][31], which can learn shared information across multiple tasks, has also demonstrated to be very powerful.In [28], several complementary features (i.e., spectral, gradient, Gabor texture and shape) are combined in a MTL fusion to enhance the discrimination of hyperspectral pixels, while kernel joint collaborative representation with adaptive weighted MTL is proposed in [29] to obtain desirable classification results.A SVM-based MTL method is proposed in [31] to effectively utilize the rich information included in the extracted two-dimensional (2D) Gabor features.Although many contributions have been carried out to improve the classification performance of MTL, till now, existing MTL methods rarely exploit both the task specificity and relatedness, and the label information of the training samples is also not fully utilized.
The second category is to increase the class separability by feature extraction/selection. Apart from spectral profiles, the spatial context can be adopted to improve the classification results.Many spatial filters have been proposed over the past few years, such as the edge-preserving filtering [5,32] and the mathematical morphology (MM) processing [33,34], where the spatial-domain features are extracted by edge-preserving filter or opening/closing operators.In [32], the edge-preserving filtering is conducted on the probability maps (i.e., probability presentation of the classification maps) and the label of a test sample is determined by maximum probability.In [5], a principal component analysis (PCA)-based edge-preserving method is proposed to highlight the separability of pixels.This simple yet powerful filtering approach has announced impressive results for hyperspectral classification.The MM-based method proposed in [33,34] is termed as extended morphological profile (EMP), where the opening and closing operators with different sizes of structuring elements (SEs) are performed on the first several principal components.It is worth stressing that a HSI data is naturally formed as a 3D cube consisting of a spectral dimension and two spatial dimensions.However, the above-mentioned methods ignore the 3D nature of the HSI.
In this paper, we propose a joint sparse and low-rank MTL method with Laplacian-like regularization (abbreviated as sllMTL) for classification of HSI by utilizing the 3D morphological profiles (3D-MPs).The visual illustration of the proposed method is shown in Figure 1, which is consisted of two major steps.In the feature extraction step, the 3D-MPs [35] of the HSI are obtained by the 3D-opening and 3D-closing operators with various SEs.Each SE can produce two corresponding 3D-MPs (one is obtained by the 3D-opening and another is generated by the 3D-closing).The significant difference between the existing 3D-MPs-based method [35] and our proposed method is that: other than simply stacking multiple 3D-MPs together, we make full use of the 3D-MPs cubes simultaneously in a MTL manner.In the classification step, the sllMTL is proposed to exploit the joint sparse and low-rank structures by treating each 3D-MP as the feature of a specific task.Both sparse and low-rank constraints are added to the representation coefficients so as to capture the task specificity and relatedness.
A Laplacian-like regularization is also adopted to take advantage of the class label information.Finally, the class label of the test sample can be determined by the minimal total residual.
The advantages of the proposed method lie in the following two aspects: • 3D-MPs are adopted for feature extraction of the HSI cube.Compared to the vector/image-based methods, we respect the 3D nature of the HSI and take the HSI cube as a whole entity to simultaneously extract the spectral-spatial features.Different from the existing 3D-MPs-based method, we do not simply stack multiple 3D-MPs but fully exploit the spectral-spatial feature of each 3DMP in a novel MTL framework.The remainder of this paper is organized as follows.Section 2 introduces two types of 3D mathematical morphology methods (i.e., 3D-opening and 3D-closing operators) to obtain multiple 3D-MPs.Section 3 describes the proposed sllMTL for hyperspectral classification in detail.Section 4 shows the experimental results, and discussions and conclusions are drawn in Sections 5 and 6.

Multiple Morphological Profiles Extracted by 3D Mathematical Morphology
Although the EMP has been successfully employed in feature extraction of the HSI, it ignores the 3D nature of the hyperspectral cube and thus cannot fully investigate the 3D spectral-spatial dependence among pixels.In this section, 3D mathematical operators (i.e., 3D-opening and 3D-closing) are applied for multiple morphological profiles extraction of HSI.
MM is initially proposed as a rigorous theoretic framework for analyzing the spatial relationship of different pixels by using SEs with given shapes and sizes as input.The relationship and structural character of the image can be captured by keeping the SEs sliding over the image.In analogy to the 2D MM operators, the 3D extension ones learn the structural characters through the moving of 3D-SEs in the image.The major advantage of the 3D-SEs lies in that they can effectively explore the spectral-spatial structure of the 3D images (e.g., HSI).It is noteworthy that erosion and dilation are two basic morphological operators.Suppose the original HSI cube is represented as I ∈ R m×n×b , where m, n and b refer to the number of rows, columns and spectral bands, respectively, the value in the cube I for a given coordinate (x 0 , y 0 , z 0 ) is I(x 0 , y 0 , z 0 ) with I(x 0 , y 0 , z 0 ) ∈ R, B is a 3D-SE and B(x 0 , y 0 , z 0 ) indicates a set of values centered at the coordinate (x 0 , y 0 , z 0 ) of I, the 3D-erosion 3D B (I(x 0 , y 0 , z 0 )) of the cube I at the coordinate (x 0 , y 0 , z 0 ) with the 3D-SE B is determined by the minimum value of pixels inside B(x 0 , y 0 , z 0 ) 3D B (I(x 0 , y 0 , z 0 )) = I B = min (x,y,z) (I(x, y, z) ∈ B(x 0 , y 0 , z 0 )) where " " indicates the 3D-erosion operator.The 3D-dilation δ 3D B (I(x 0 , y 0 , z 0 )) can be regarded as the dual operator, which switches the minimum operator to the maximum one where "⊕" indicates the 3D-dilation operator.The 3D-erosion can expand objects of the HSI cube that are darker than their neighborhoods, while the 3D-dilation shrinks them, and vice versa for the objects that are brighter than their neighborhoods.The 3D-opening and 3D-closing operators can be derived from the combination of 3D-erosion and 3D-dilation.In greater detail, the 3D-opening γ 3D B (I) is calculated by the 3D-erosion of the HSI cube I with B followed by the 3D-dilation with B where "•" indicates the 3D-opening operator.
Similarly, the 3D-closing φ 3D B (I) can be obtained by the 3D-dilation of the HSI cube I with B followed by the 3D-erosion with B where "•" indicates the 3D-closing operator.The multiple 3D-MPs of I in our proposed method are generated by performing a series of 3D-opening and 3D-closing operators with different 3D-SEs.As an example, Figure 2 depicts a set of 3D-SEs with various shapes (e.g., cube, sphere) and sizes (e.g., side length (L) is set to 3, 5 and 7).Each 3D-SE can produce two corresponding 3D-MPs in merit of the 3D-opening and 3D-closing, respectively.

Joint Sparse and Low-Rank Multitask Learning with Laplacian-Like Regularization
In this section, we propose the sllMTL method to exploit both the local and global structures of the HSI by taking the 3D-MPs obtained from the 3D-opening and 3D-closing morphological operators as features of multiple tasks.Both sparse and low-rank constraints are considered in the sllMTL to capture the task specificity and relatedness.Moreover, Laplacian-like regularization is also added to improve the classification performance by taking full advantage of the label information of training samples.Suppose K 3D-MPs are generated by the 3D morphological operators, there are K tasks in the sllMTL.For each task index k = 1, 2, . . ., K, represent as the training feature matrix, in which X k j ∈ R m k ×p j , j = 1, 2, . . ., J is related to the jth class, J indicates the total number of classes contained in the HSI, m k refers to the dimensionality of the kth task (note that no feature reduction is performed, m k equals to the number of spectral bands b), p j denotes the number of training samples belong to class j, and ∑ J j=1 p j = P is the total number of training samples.Given the unlabeled test sample Y = [Y 1 , Y 2 , . . ., Y K ], the sllMTL method is formulated as the solution to the following problem min where Z denotes the coefficient matrix to be calculated, joint sparse and low-rank constraints are added to Z, Z * refers to the trace norm (also known as the nuclear norm, i.e., the sum of singular values of a matrix), which can be utilized to identify the shared characters among multiple tasks, Z 1,2 indicates the row-sparse constraint to identify the task specificities, L is the Laplacian-like matrix, tr(•) is the trace of a matrix, i.e., the sum of the elements on the main diagonal.The matrix L is generated by the natural assumption that the coefficients Z p and Z p, p, p = 1, 2, . . ., P from the same class are more closer or similar than those belong to different classes.This assumption can be realized in terms of similarity.In mathematics, a reasonable way is to minimize the following function where D is a diagonal matrix in which D pp = ∑ P p=1 W p p, L is the Laplacian-like matrix obtained by L = D − W, and W p p = 1 if the pth and pth training samples belong to the same class 0 otherwise .To give an intuitive explanation, Figure 3 visually shows the W used in the three HSI datasets.It should be mentioned that the label information of the training samples are adopted to obtain the matrix W, which can then be utilized to generate L (L = D − W) in Equation (5).Therefore, the training samples are utilized more fully in the proposed method.Moreover, although we add a Laplacian-like regularization in the joint sparse and low-rank multitask learning method, it cannot state that the Laplacian-like regularization is more effective than the sparse and low-rank regularizations.Actually, the Laplacian-like regularization is complementary to the sparse and low-rank regularizations.Note that the usage of the low-rank and sparsity priors have achieved great success in HSI classification, we also use the joint sparse and low-rank regularizations (i.e., Z * and Z 1,2 ) in the proposed method (see Equation ( 5)).To solve Equation ( 5), we first introduce two auxiliary matrices P and Q to make the optimization problem in Equation ( 5) separable and simplify the solution procedure, in this regard, the problem in Equation ( 5) can be rewritten as min Linearized alternating direction method with Adaptive Penalty (LADMAP) [36] can be adopted to solve the above-mentioned problem, whose augmented Lagrangian function yields where µ > 0 is a penalty parameter, the kth column of Z, P, Q, respectively, and • F is the Frobenius norm obtained by the square root of the sum of the absolute squares of matrix elements, e.g., The variables Z, P and Q can be updated alternately by minimizing the augmented Lagrangian function (see Equation ( 8)) with other variables fixed.In case the augmented Lagrangian function is difficult to minimize with a specific variable, we will linearize the corresponding smooth component.Specifically, the updating process is as follows.
Fix other variables, and the variable P in the (t + 1)th iteration can be updated by minimizing L(Z (t) , P, Q (t) ) where P = P T .Note that Equation ( 9) does not have a closed-form solution, we express the smooth component of L(Z (t) , P, Q (t) ) as Inspired by the LADMAP method, minimizing L(Z (t) , P, Q (t) ) can be approximately substituted by solving the following problem P(t+1) ≈ arg min η < ∇ P q( P(t) ), P − P(t) > + P − P(t) 2 F = arg min where q( P) is approximated by its linearization < ∇ P q( P(t) ), P − P(t) > at P(t) plus a proximal term η 2 P − P(t) 2 F , and ∇ P q( P(t) ) denotes the gradient of q w.r.t.P. As long as η > 2β L 2 + µ, the above substitution is valid.Therefore, the Equation ( 11) has a closed-form solution formulated as where U and V are the matrices with orthogonal columns, (•) + = max(•, 0), r and π i refer to the rank and singular values of P(t) − 1 η ∇ P q( P(t) ) , respectively, and P (t+1) = P(t+1)T .Fix the others, and the variable Q in the (t + 1)th iteration can be updated by which has the closed-form solution represented by the row-shrinkage thresholding operator.
Finally, fix the others and the variable Z in the (t + 1)th iteration can be updated by According to Equation ( 14), the kth column of Z (t+1) yields where µ and I is an identity matrix.
The above-mentioned equations are the major steps for updating the variables Z, P, Q.The detailed procedure for solving the optimization problem (see Equation ( 5)) in the proposed sllMTL method is summarized in Algorithm 1, in which µ is adjusted using an adaptive updating strategy [36] for faster convergence.As long as the coefficient matrix Z is obtained, the label of the unlabeled test sample Y can be determined by the minimal total residual where X k j and Z k j are associated with the jth (j = 1, 2, . . ., J) class.
, µ max = 10 6 and t = 0 2: while Convergence is not attained do 3: Fix the others and update the variable P by Equation (12) 4: Fix the others and update the variable Q by Equation (13) 5: Fix the others and update the variable Z by Equation (15) 6: Update the Lagrange multipliers M 1 , M 2 and M 3 by Update the parameter µ by µ = min(µ max , ρµ) Update t by t = t + 1 9: end while 10: Determine the class label of Y by Equation ( 16)

Dataset Description
In the experiments, we verify the performance of the proposed hyperspectral classification method on three benchmark datasets, i.e., the Indian Pines data, the University of Pavia data and the Salinas data.All the three hyperspectral datasets used in the experiments have been preprocessed by distortion correction.Details of each dataset are given below.

Experimental Design
To evaluate the effectiveness of the proposed method for hyperspectral classification, we compare our method with several baseline approaches, including the EMP [34], 3D discrete wavelet transform (3D-DWT) [37], SVM [9], SVMCK [11] and MTL [27].The original spectral features (abbreviated as "Spec") are also taken into consideration in the experiments.It is worthwhile to note that the "Spec", EMP, 3D-DWT and 3D-MPs are feature extraction methods, whereas the SVM, SVMCK, MTL and sllMTL are classification methods.To investigate the performance of the proposed method in both feature extraction and classification, we compare the 3D-MPs with the "Spec", EMP and 3D-DWT by fixing the classifier as sllMTL.Moreover, the sllMTL is compared with the SVM, SVMCK and MTL by utilizing the 3D-MPs-based features.
In greater detail, the 3D-MPs-based feature extraction method is validated by comparing with other widely-used methods, i.e., the original "Spec", the multiple features obtained by the EMP, as well as the multiple features generated by the 3D-DWT.As to the above-mentioned feature extraction methods, (1) the original "Spec" features can be viewed as a special case of the sllMTL classifier with K = 1.(2) The EMP does not use the PCA for dimensional reduction but obtains the morphological profiles by performing the opening/closing operators with various SEs on all of the bands.(3) The 3D-DWT-based multiple cubes consisted of the subcubes with different scales, frequencies and orientations are also used for comparison.(4) The 3D-MPs-based multiple cubes are obtained by directly performing the 3D-opening and 3D-closing on the HSI cubes using different 3D-SEs.The 3D-SEs adopted in 3D-MPs are the cube and sphere shapes.Note that there are large areas of homogeneous regions in the Indian Pines data and Salinas data, the side lengths of the 3D-SEs are set to L = 5, 9, 13, 17 and 21.Since many narrow regions are present in the University of Pavia data, smaller side lengths of 3D-SEs are chosen, i.e., L = 3, 5, 9, 13 and 17.
On the other hand, we test the performance of the sllMTL by fixing the input features as 3D-MPs and comparing the sllMTL with the SVM, SVMCK and MTL.Note that no nonlinear kernel is used in the proposed sllMTL, we adopt linear kernel in the SVM, SVMCK and MTL for consistency.For the SVM, which cannot handle the multiple-feature-based scenario, the multiple 3D-MPs are stacked together to form a single cube.For the SVMCK, the composite weight is set to 1  K for all of the 3D-MPs cubes.The penalty term in both SVM and SVMCK is set to 60.For the MTL, the regularization parameter λ is selected from [0.01, 0.05, 0.1, 0.2, . . ., 1].For the sllMTL, three situations are taken into account: (1) λ = β = 0 (abbreviated as sllMTL 1 ), i.e., only low-rank constraint is considered; (2) λ = 0, β = 0 (abbreviated as sllMTL 2 ), i.e., joint sparse and low-rank constraints are considered; (3) λ = 0, β = 0 (abbreviated as sllMTL 3 ), i.e., all the sparse, low-rank and Laplacian-like constraints are considered.In the sllMTL 2 and sllMTL 3 , the nonzero parameters (λ or β) are chosen from [0.01, 0.05, 0.1, 0.2, . . ., 1].Moreover, all the classification experiments are conducted on the aforementioned three hyperspectral datasets (see Section 4.1).Since the class labels are difficult to identify in reality, very limited number of labeled samples are utilized in the experiments.That means, 1% samples per class are randomly selected as training samples and the rest are for testing in the Indian Pines data and Salinas data.In case 1% samples of some classes are less than 2, we randomly choose 2 samples as training samples.Regarding to the University of Pavia data, whose available training samples (see Figure 5c) are separate from the whole ground truth data, 1% samples per class from the available training samples are randomly chosen for training.It should be highlighted that the experimental results in the tables are the average results of 10 independent trials to avoid possible bias.Three popular indexes (overall accuracy (OA), average accuracy (AA) and kappa coefficient (κ)) are adopted to compare different methods quantitatively.

Experimental Results
In this section, the experimental results are provided to assess the performance of the 3D-MPs for spectral-spatial feature extraction and the sllMTL for hyperspectral classification.To this end, the 3D-MPs are compared with the "Spec", EMP and 3D-DWT.The "Spec" utilizes the original spectral features, the EMP can incorporate spatial features, while 3D-DWT and 3D-MPs respect the 3D nature of the HSI and exploit spectral-spatial features.As an example, Figure 7 depicts the 3D-MPs obtained by 3D-opening with both cube and sphere 3D-SEs of side length L = 5, while Figure 8        Second, the 3D-based feature extraction methods (i.e., 3D-DWT and 3D-MPs) achieve comparable or better performance than the 2D-based one (i.e., EMP).For instance, as observed from Table 5, the OA, AA and κ of 3D-DWT are slightly better than the EMP, while the OA, AA and κ of 3D-MPs are 4.17%, 4.19%, 5.60% higher than the EMP, respectively.The classification results of the 3D-DWT and 3D-MPs are also comparable or better than the EMP in Tabels 4 and 6.The reason for good results of 3D-DWT and 3D-MPs-based feature extraction methods is that they can respect the 3D nature of the HSI cube by taking the HSI cube as a whole entity to extract the spectral-spatial features.
Third, the 3D-MPs yields the best classification performance in all the feature extraction methods by fixing the sllMTL 3 as classifier.For instance, in the Indian Pines data (see Table 4), the OA of 3D-MPs is 20.01%, 3.15% and 2.71% higher than those of the "Spec", EMP and 3D-DWT, respectively.In the University of Pavia data (see Table 5), the OA of 3D-MPs is also 10.83%, 4.17% and 3.89% higher than those of the "Spec", EMP and 3D-DWT, respectively.In the Salinas data (see Table 6), the OA of 3D-MPs is 5.97%, 2.10% and 1.84% higher than other feature extraction methods.Similarly, the AA and κ of the 3D-MPs are also much higher than its counterparts on the three HSI datasets.It demonstrates that multiple features obtained by varying the shapes and sizes of SEs can offer additional complementary information, and thus, effectively improve the performance of hyperspectral classification.
Fourth, the classification results of the SVM are inferior to other ones.For instance, it can be observed from Table 4 that the OA of SVM is about 2% to 7% lower than other classifiers (i.e., SVMCK, MTL, sllMTL 1 , sllMTL 2 and sllMTL 3 ), the AA (or κ) of SVM is also about 0.4% to 9% (or 2% to 8%) lower than its competitors.It is notable that the classification results of the University of Pavia data (see Table 5) and the Salinas data (see Table 6) also yield similar properties.This is due to the fact that the SVM cannot able to cope with the multiple-feature-based scenario but simple stacks the multiple 3D-MPs together to generate a single cube, while the other methods can make full use of the multiple 3D-MPs features by taking each 3D-MPs cube as the feature of a task.
Fifth, we compare the multitask-based classifiers (i.e., MTL, sllMTL 1 , sllMTL 2 and sllMTL 3 ) by utilizing the 3D-MPs-based features.To provide intuitive insight, the coefficient matrices of a test sample from class 4 (i.e., corn) located at (35,5) in the Indian Pines data are compared in Figure 12.The x-axis labels the task number and the y-axis labels the representation number.We can see from Figure 12 that, for an instance in a certain class of data, a distinctive training feature is active in the sllMTL 2 and sllMTL 3 .The joint sparse and low-rank constraints in the sllMTL 2 and sllMTL 3 can effectively capture the task specificity and relatedness, and thus alleviate the misclassification problem.The normalized residues of the same sample from class 4 (i.e., corn) located at (35,5) are plotted in Figure 13, from which one can observe that, by utilizing the 3D-MPs features, the sllMTL 2 and sllMTL 3 yield more accurate classification results than the MTL and sllMTL 1 .Moreover, it is shown in Tables 4-6 that the sllMTL 2 and sllMTL 3 outperform the MTL and sllMTL 1 , and the sllMTL 3 performs much better than the sllMTL 2 since a Laplacian-like regularization is added in the sllMTL 3 to take full advantage of the label information of training samples.In addition, as displayed in Figures 9-11, the classification maps of sllMTL 3 are more close to the ground truth (see Figures 4a, 5b and 6b) than other methods.
Finally, it is worthwhile to note that the proposed method is much better than those in previous work (e.g., [17,34]).Although we use the same hyperspectral datasets as [17,34], the number of training and test samples are completely different.For instance, in the Indian Pines data, around 10% of the labeled samples are chosen for training in [17], while only 1% samples per class are selected as training samples in this paper.Therefore, it is not very surprising to see that the OA of our proposed method is much lower than the results shown in Table 2 of [17].As can be seen in Figure 5 of [17], the OA is lower than 75% when only 1% labeled samples in each class are chosen as training samples, while the OA of 3D-MPs-sllMTL 3 achieves to 88.55% with the same number of training samples.Similarly, in the University of Pavia data, the number of training samples is 3921 in [34], while very limited training samples (1% samples from the available 3921 training samples are chosen for training, i.e., about 39 samples in total) are chosen in this paper.It is observed from Table 5 that the classification performance of 3D-MPs is better than the EMP with the same classifier.

Statistical Significance Analysis of the Results
To discuss whether the proposed method significantly outperforms its counterparts, we apply the McNemar's test to evaluate the statistical significance in accuracy improvement.The McNermar's test is a non-parametric test based upon the standardized normal test statistic where f ij indicates the number of samples classified correctly by the classifier i but incorrectly by the classifier j.This test allows two methods to be compared.In this paper, we compare the 3D-MPs-sllMTL 3 with the other methods one by one.If the absolute of the McNemar's value Z is greater than 1.96 (i.e., |Z| > 1.96), the difference in accuracy between the ith and jth classifiers is considered statistically significant at the 5% level of significance.The average McNemar's values Z of the proposed method and other methods by utilizing the afore-mentioned three experimental datasets are depicted in Figure 14.It is clearly shown in Figure 14 that the Z is larger than 1.96 in all cases, and therefore, the 3D-MPs-sllMTL 3 performs significantly better than the other methods according to the McNemar's test.

Sensitivity Analysis of the Parameters
Another important issue is to discuss the sensitivity of the parameters in the proposed 3D-MPs-sllMTL 3 .There are two key parameters in the 3D-MPs-sllMTL 3 : λ and β.The impacts of the parameters λ and β are shown in Figure 15, whose x-axis and y-axis are the value ranges of the corresponding parameters and the z-axis denotes the OA (%) of various hyperspectral datasets.It can be seen from Figure 15 that the classification accuracies for the three datasets generally improve slowly as the parameters λ and β increase, and then, begin to decrease after obtaining the maximum values.When the parameters λ and β are very small, the low-rank constraint term will dominate the optimization process and the sparse and Laplacian-like regularizations will be weakened.The influence of the sparse and Laplacian-like regularizations will become strong with the increase of λ and β.Based on the above analysis, it is better to make a tradeoff among the three constraints so as to achieve satisfactory classification performance.

Influence Analysis of the 3D-SEs
Note that one can extract different structure features of the HSI by applying a series of 3D-SEs, it is interesting to analyze the influence of the 3D-SEs with different shapes and sizes.As an example, we discuss the influence of the 3D-SEs in the Indian Pines data.Both cube and sphere shapes with side length L = 5, 9, 13, 17 and 21 are adopted in the experiments.We set K = 1 in the sllMTL 3 and analyze the influence of the 3D-SEs by classifying each 3D-MP feature separately.The classification results of different 3D-MPs is shown in Figure 16, from which we can observe that the sphere(closing) with L = 5 provides the lowest classification accuracy, while cube(opening) with L = 17 achieves the highest classification accuracy among all the 3D-MPs.However, it is shown in Table 4 that the combination of multiple 3D-MPs gives much better performance than using each 3D-MP feature separately.The reason is that a single 3D-MP can only reflect the characteristic of the HSI in one aspect, while multiple features can represent the characteristic of the HSI more effectively from different perspectives.

Conclusions
In this paper, we have proposed a 3D-MPs-based MTL framework, namely, 3D-MPs-sllMTL, for hyperspectral imagery classification.On the one hand, the 3D-opening and 3D-closing operators are applied for feature extraction of the HSI cubes.Multiple SEs with different shapes and sizes are used to extract the spectral-spatial features from different aspects.Comparing against the vector/image-based methods in the literature, the proposed 3D-MPs-based method respects the 3D nature of the HSI by taking the HSI cube as a whole entity.On the other hand, the sllMTL method is proposed in this paper to exploit the joint sparse and low-rank structures by taking each 3D-MP as features of a specific task.Comparing against the existing MTL methods, the proposed sllMTL can capture both specificities and shared factors of the tasks by adopting the joint sparse and low-rank constraints.The Laplacian-like regularization can also improve the classification performance further by making full use of the class label information.We have validated the superiority of the 3D-MPs-sllMTL approach on three benchmark hyperspectral datasets and compared our method with a number of alternatives, i.e., the original "Spec", EMP, 3D-DWT, SVM, SVMCK and MTL.The experimental results consistently demonstrate the effectiveness of the proposed method.Quantitatively, by utilizing the 3D-MPs features, the OA of sllMTL 3 is at least about 4%, 3% and 2% higher than the SVM, SVMCK and MTL, respectively.By utilizing the sllMTL 3 , the OA of 3D-MPs is at least about 6%, 2% and 2% higher than the "Spec", EMP and 3D-DWT, respectively.Note that the sllMTL considers an equal contribution for any 3D-MPs-based features in the classification process, a future research direction is to investigate the importance of different features by assigning different weights to the features.Extending the sllMTL to its kernel versions is also a promising research topic for further improving the classification performance.Moreover, as to other kinds of images (e.g., the sequence of the 2D images or medical images), we can also take them as 3D cubes, whose 3D-MPs features can be extracted by the 3D MM.However, the sllMTL cannot be directly used for classification of those images because a pixel in those images usually comes from different classes.One should extract and arrange the features from the same class together before applying the sllMTL.Therefore, applying the proposed method to classify other images (e.g., the sequence of the 2D images or medical images) is also an interesting area of future work.

Figure 1 .
Figure 1.Schematic illustration of the proposed HSI classification method.

Figure 3 .
Figure 3.The W used in (a) Indian Pines data, (b) University of Pavia data and (c) Salinas data.

Algorithm 1 :
LADMAP for Solving the Optimization Problem (see Equation (5)) in the Proposed sllMTL Method Require: The test feature matrix Y, the training feature matrix X k , k = 1, 2, . . ., K, and the parameters λ and β Ensure: The solution Z and class label of the unlabeled test sample Y 1: Initialize: Z

Figure 4 .Figure 5 .
Figure 4. Indian Pines data.(a) Three-band false color composite and (b) ground truth data with 16 classes.

Figure 6 .
Figure 6.Salinas data.(a) Three-band false color composite and (b) ground truth data with 16 classes.
plots the 3D-MPs of the 150th band.It can be observed from Figures7 and 8that the 3D-MPs highlight the structural characteristics of the HSI cube and provide smoother features in both spectral and spatial domains.

Figure 7 .
Figure 7. 3D-MPs obtained by 3D-opening with both cube and sphere SEs of side length L = 5.(a) is the original Indian Pines data, (b) is the 3D-MPs obtained by the cube SE, and (c) is the 3D-MPs obtained by the sphere SE.

Figure 10 .
Figure 10.Classification maps of the University of Pavia data obtained by different methods.

Figure 11 .
Figure 11.Classification maps of the Salinas data obtained by different methods.

Figure 15 .
Figure 15.The impact of parameters λ and β on the OA in (a) Indian Pines data, (b) University of Pavia data and (c) Salinas data.

Table 1 .
Pines data: the first data was gathered by the National Aeronautics and Space Administration's (NASA) Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor on 12 June 1992 over the Indian Pines test site in Northwest Indiana, and it consists of 145 × 145 pixels and 220 spectral reflectance bands cover the wavelength range of 0.4-2.5 µm.The number of bands is reduced to 200 by removing the noisy and water-vapor absorption bands (bands 104-108, 150-163, and 220).The spatial resolution is about 20 m per pixel.Figure4depicts the three-band false color composite image together with its corresponding ground truth.The data contains 16 classes of interest land-covers and 10366 labeled pixels ranging unbalanced from 20 to 2468, which poses a big challenge to the classification problem.Table 1 displays the detailed number of samples of each class, and the background color represents different classes of land-covers.2. University of Pavia data: the second dataset was captured by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor on 8 July 2002 over an urban area surrounding the University of Pavia, Italy.It contains 610 × 340 pixels with a spatial resolution of 1.3 m per pixel.The original dataset has 115 spectral channels with a coverage range from 0.43 to 0.86 µm.12 most noisy bands are removed before experiments, remaining 103 bands for experiments.9 classes of interest are considered in this dataset.The color composite image, the ground truth data as well as the available training samples are depicted in Figure 5.As shown in Table 2, there are more than 900 pixels in each class, but the available training samples of each class are less than 600.In analogy to the Indian Pines data, the background color in Table 2 also agrees with that in Figure 5b,c.3. Salinas data: the third dataset was acquired by the AVIRIS sensor on 8 October 1998 over the Salinas Valley, Southern California, USA.There are 224 bands in the original dataset, and 24 bands (bands 108-112, 154-167, and 224) are removed for the water absorption.The size of each band is 512 × 217 pixels with a spatial resolution of 3.7 m per pixel.The color composite image and the ground truth are plotted in Figure 6.This dataset contains 16 classes of ground truth, and the detailed number of samples in each class is listed in Table 3, whose background color corresponds to different classes of land-covers.Number of samples (NoS) used in the Indian Pines data.

Table 2 .
NoS and available training samples (NoATS) used in the University of Pavia data.

Table 3 .
NoS used in the Salinas data.

Table 4 .
Classification accuracy (%) of different methods for the Indian Pines data, bold values indicate the best result for a row.

Table 5 .
Classification accuracy (%) of different methods for the University of Pavia data, bold values indicate the best result for a row.

Table 6 .
Classification accuracy (%) of different methods for the Salinas data, bold values indicate the best result for a row.