Local Matrix Feature-Based Kernel Joint Sparse Representation for Hyperspectral Image Classiﬁcation

: Hyperspectral image (HSI) classiﬁcation is one of the hot research topics in the ﬁeld of remote sensing. The performance of HSI classiﬁcation greatly depends on the effectiveness of feature learning or feature design. Traditional vector-based spectral–spatial features have shown good performance in HSI classiﬁcation. However, when the number of labeled samples is limited, the performance of these vector-based features is degraded. To fully mine the discriminative features in small-sample case, a novel local matrix feature (LMF) was designed to reﬂect both the correlation between spectral pixels and the spectral bands in a local spatial neighborhood. In particular, the LMF is a linear combination of a local covariance matrix feature and a local correntropy matrix feature, where the former describes the correlation between spectral pixels and the latter measures the similarity between spectral bands. Based on the constructed LMFs, a simple Log-Euclidean distance-based linear kernel is introduced to measure the similarity between them, and an LMF-based kernel joint sparse representation (LMFKJSR) model is proposed for HSI classiﬁcation. Due to the superior performance of region covariance and correntropy descriptors, the proposed LMFKJSR shows better results than existing vector-feature-based and matrix-feature-based support vector machine (SVM) and JSR methods on three well-known HSI data sets in the case of limited labeled samples. J.P. W.S.; preparation, X.C., N.C. J.P.; Writing—review X.C., N.C., J.P. and W.S.; Supervision, J.P. and W.S.


Introduction
Hyperspectral images (HSIs) contain hundreds of continuous spectral bands, which can provide a large amount of information for different types of applications, such as military target detection, mineral identification, fine agriculture, natural resource surveys, and so on [1]. In these applications, classification is usually needed. HSI classification is to assign a land cover label to each pixel in an HSI based on the model built on available training samples, which has been one of the hot research topics in the field of remote sensing [2,3].
The performance of HSI classification greatly depends on the effectiveness of feature learning or feature design [4][5][6][7][8][9]. Because HSI contains spectral and spatial information, different types of spectral features, spatial features, and spectral-spatial joint features have been designed. The spectral characteristics that provide reflections of materials at specific spectral bands can be directly used for classification [10,11]. Early HSI classification methods usually use the spectral features or dimension-reduced spectral features [10,11]. Due to the spatial local homogeneous property, different spatial features have been designed to describe spatial textural or structural information [5]. The typical spatial features include morphological profiles [4], Gabor features [12], and local binary pattern (LBP) features [13]. Integrating the rich spectral and spatial information of HSI, many spatial-spectral joint feature extraction and classification methods have been proposed [2,14], such as composite kernels-based methods [15][16][17][18], joint representation-based methods [19][20][21][22], and deep learning methods [23,24]. Typical deep learning methods are convolutional neural network (CNN)-based methods [25], such as 3-D CNN [25], attention-based adaptive spectral-spatial kernel improved residual network (A2S2K-ResNet) [24], and CNN with a noise-inclined module and denoise framework (NoiseCNN) [26].
These vector-based spectral-spatial features have shown good performance in HSI classification. However, when the number of available labeled samples is limited, these vector-based features are usually no longer effective. There is an urgent need to develop advanced feature-extraction methods that can fully exploit the correlation and similarity in spectral and spatial domains. To describe the spectral correlation in a spatial local neighborhood, local covariance matrix features have been designed [7,27]. Local covariance descriptor computes the covariance matrix of samples in a local region and uses the matrix as feature to reflect the correlation of samples in the region [28,29]. It is clear that the covariance feature is a matrix feature whose size is only related to the dimensionality of the data. Therefore, it can be performed on regions with different sizes or shapes [29]. Fang et al. proposed a local covariance matrix representation (LCMR) method for spatialspectral feature extraction of HSIs [7], where a covariance matrix of neighboring pixels in a refined local spatial neighborhood is computed as features for the SVM classification with the use of Log-Euclidean kernels. In [7], the original HSI data are first preprocessed by the maximum noise fraction (MNF) method to reduce the dimensionality. Rather then using MNF, Yang et al. first performed extended multi-attribute profile (EMAP) transformations to reduce the dimensionality of the original HSI and then extracted the covariance matrix features on the EMAP dimension's reduced data for the kernel-based joint sparse representation (KJSR) classification [27]. Considering the nonlinearity between HSI pixels, Zhang et al. proposed a local correntropy matrix representation method (LCEM) for HSI classification [30]. Correntropy is a robust similarity metric and can be used to handle nonlinear and non-Gaussian data [31]. Peng et al. used the correntropy metric to replace the least-squares metric in the JSR model, and the resulting model is robust to both band noise and inhomogeneous pixels [20]. In [30], the correntropy matrix feature can represent the correlation between spectral bands in a spatial local region and has shown excellent classification performance.
In recent decades, the JSR-based classification method has attracted much attention due to its simplicity and effectiveness [19][20][21]32]. Exploiting the similarity of neighboring pixels, JSR uses a common training dictionary to sparsely and linearly represent all neighboring pixels simultaneously and computes the class reconstruction residual for classification [19]. To cope with nonlinear problems, KJSR methods perform JSR in a kernel-based feature space [33]. Traditional KJSR performs on vector-based features. To exploit region features, Yang et al. used the region covariance matrix feature to replace vector feature and proposed a Log-Euclidean kernel-based KJSR (LogEKJSR) method [27]. Although different linear and nonlinear Log-Euclidean kernels are considered, the nonlinear representation ability of covariance matrix feature itself is still insufficient [30].
In previous works [7,30], although both the covariance and correntropy matrices have been introduced to reflect the relations of features in a local region, they are different features. The covariance matrix mainly measures the correlation between neighboring pixels, while the correntropy matrix measures the nonlinear similarity (low-level and highlevel similarities) between spectral variables. To make full use of local features from the pixel and variable aspects, we combine the local covariance and correntropy matrix features and design a novel local matrix feature (LMF) for a region in this paper. In the construction of LMF, a new logarithmic transformed feature of covariance or correntropy matrix is first defined to transfer the matrix distance computation in the Riemannian manifold to general distance in the Euclidean space. Then, the LMF is designed as a linear combination of local covariance matrix feature and local correntropy matrix feature, where the former describes the correlation between spectral pixels and the latter measures the similarity between spectral bands. Based on the constructed LMFs, a simple Log-Euclidean distance-based linear kernel is introduced to measure the similarity between them, and an LMF-based kernel joint sparse representation (LMFKJSR) model is proposed for HSI classification. Due to the superior performance of region covariance and correntropy descriptors, the proposed LMFKJSR shows better results than existing vector-feature-based and matrix-feature-based support vector machine (SVM) and JSR methods on three well-known HSI data sets in the case of limited labeled samples. Figure 1 shows the framework of the proposed local matrix feature-based kernel joint sparse representation (LMFKJSR) method. The maximum noise fraction (MNF) method is first used to reduce the the dimensionality of the original HSI data. Then, spatial local neighborhoods are constructed and local matrix features (LMFs) are extracted. By computing kernels for LMFs and performing the KJSR, HSI pixels can be classified.  Figure 1. The framework of the proposed LMFKJSR algorithm, which mainly consists of three steps, i.e., maximum noise fraction-based dimensionality reduction, local matrix feature extraction, and matrix kernel-based JSR classification.

Maximum Noise Fraction
Considering that the original HSI has high dimensionality and meanwhile may contain different types of noise, the maximum noise fraction (MNF) method is used to reduce the dimensionality of and denoise HSIs [7,30,34]. It finds a linear transformation matrix A to reduce the dimensionality of original data such that the signal-to-noise ratio of the lower-dimensional data is maximized.
Given an observed data X ∈ R N×D with N observations and D variables, assume tht X can be represented as the summation of an idea datum X 0 and a noise E: X = X 0 + E, and the idea datum X 0 and noise E are unrelated; then, the covariance of X is: The linear transformation matrix A ∈ R D×d is obtained by solving the following problem: Equation (2) can be transferred to a generalized eigenvector and eigenvalue problem, and its solution consists of the eigenvectors corresponding to the first d largest eigenvalues of (Σ E ) −1 Σ X . The dimension-reduced data are In the experiments, the noise covariance Σ E is first estimated based on the minimum/maximum autocorrelation factor method [7,34], and the dimension d is set to 25.
As the focus of this manuscript is the local matrix feature representations, we use a simple way to eliminate the effect of spatial inhomogeneous pixels by directly deleting several inhomogeneous neighboring pixels [27]. Given a pixel z, a w 1 × w 1 spatial window centered at z is first determined. Then, the distance between each spatial neighboring pixel and the center pixel z is computed. By sorting the distance in ascending order, the first m 1 pixels with the smallest distances are retained to construct a new local neighborhood (i.e., deleting w 2 1 − m 1 inhomogeneous pixels).

Local Covariance Matrix Representation
A local covariance descriptor is originally proposed to extract second-order features in local image patches [37,38]. The covariance descriptor reflects the correlation of local features and shows good discriminative performance for image classification [7,30,37].
For a local region R that contains m pixels, i.e., z 1 , z 2 , · · · , z m with z i ∈ R d , we can represent it as a matrix: Z R = [z 1 , · · · , z m ] T ∈ R m×d . The covariance matrix of size d × d between these pixels is computed as:

Local Correntropy Matrix Representation
Given two random variables u and v, the correntropy between them is defined as [31]: where E is the expectation operator, and κ σ (·) is the kernel function and usually uses the In the case of Gaussian kernel function, by the Taylor series expansion of Equation (5), we can obtain The correntropy with the Gaussian kernel contains all the even order moment information of u − v and hence can reflect the high-level similarities between variables.
In practice, the joint probability density function of u and v is often unknown, so the expectation operation in Equation (5) , the correntropy can be estimated by the following empirical correntropy: Correntropy can be used to measure nonlinear similarity between variables. In the local region R or matrix Z R , we denote z i,p as the p-th component of pixel z i and b p = [z 1,p , · · · , z m,p ] T ∈ R m×1 as a spectral variable or vector. If we regard spectral vectors b p and b q as two variables, then the correntropy between them can be defined as [30]: The local correntropy matrix representation of features in the local region R is:

Local Matrix Feature
To make the covariance matrix Σ Z R or correntropy matrix C Z R strictly positive definite, regularization is applied to the original matrix as [7,27]: C = C + 10 −3 * trace(C) * I, where I is the identity matrix. To measure the distance between symmetric positive definite (SPD) matrices, the logarithmic operation is performed on the matrices and the Log-Euclidean distance between two SPD matrices C 1 and C 2 is [37,38]: If we consider log(C 1 ) as the feature corresponding to matrix C 1 , then Equation (10) measures the Euclidean distance between logarithmic transformed features in a Riemannian manifold [38].
Although both the covariance and correntropy matrices can reflect the relations of features in a local region, they are different features. The covariance matrix Σ Z R mainly measures the correlation between neighboring pixels, while the correntropy matrix C Z R measures the nonlinear similarity (low-level and high-level similarities) between spectral variables. To make full use of local features from the pixel and variable aspects, we combine the local covariance and correntropy matrix features and design a new local matrix feature for a region R as where µ is a weighting parameter. For a testing pixel h, all pixels in a spatial w 2 × w 2 neighborhood centered at h form a matrix H = [h 1 , · · · , h T ] (T = w 2 2 ). In the joint sparse representation (JSR) model [19], all neighboring pixels are assumed to be similar and can be simultaneously represented by a comment dictionary as: where X = [x 1 , · · · , x M ] is a dictionary matrix consisting of all training pixels, and the simultaneous orthogonal matching pursuit (SOMP) algorithm can be used to solve it as [19]: where B row,0 refers to the number of non-zero rows of B, and K is a parameter to reflect the sparsity level.

Kernel Joint Sparse Representation (KJSR)
It is clear that Equation (12) only performs linear representations for neighboring pixels. To measure nonlinear relations between neighboring pixels and the training dictionary, the kernel method is used [33]. A feature map φ is used to project all pixels onto feature space, and the kernel-based JSR (KJSR) model is: where . Similar to the JSR, the matrix B can be solved by the following optimization problem [33]:

Local Matrix Feature Based Kernel Joint Sparse Representation (LMFKJSR)
In the JSR-based model, the local neighborhood should be first constructed for the joint representation. Here, we use the same strategy as shown in Section 2.2 to construct a spatial local neighborhood [27]. That is, in a w 2 × w 2 window centered at a testing pixel h, m 2 most similar pixels (i.e., h 1 , · · · , h m 2 ) are picked to form local neighborhood pixel set. Then, we can generate the local matrix feature for these neighboring pixels as L h k (k = 1, · · · , m 2 ). For training pixels x 1 , · · · , x M , the corresponding local matrix features are L x i (i = 1, · · · , M).
By performing the KJSR on the local matrix features, one can generate: where is the feature representation of training set, and B = [β 1 , · · · , β m 2 ] is the sparse representation coefficient. The row-sparse matrix B can be obtained by solving the following problem: For solving the problem Equation (17), a key step is the computation the correlation between Φ(L x i ) and Φ(L h j ), where tr is the matrix trace operator, and the linear kernel is used. Denote K X,H ∈ R M×m 2 as the kernel matrix between the training samples and the neighboring pixels whose (i, j)-th entry is κ(L x i , L h j ), and K X,X ∈ R M×M as the kernel matrix for training samples with (i, j)-th entry κ(L x i , L x j ). The sparse coefficient matrix B can be solved by the kernel-based SOMP algorithm [33]. Then, the reconstruction residual of the c-th class can be computed: where Ω c is the index set of selected atoms. Based on the reconstruction residuals, the testing pixel h can be classified into the class with the minimal residual as: The pseudo-code of LMFKJSR is shown in Algorithm 1.
CovKJSR and CEKJSR are KJSR classifiers performed on local covariance and correntropy matrix features, respectively. CovKJSR and CEKJSR are special cases of the proposed LMFKJSR (CEKJSR is also our proposed method). A2S2K and NoiseCNN are recently proposed deep learning HSI classification methods. The class-specific accuracy (CA), average accuracy (AA), overall accuracy (OA), and κ coefficient on the testing set were used for comparison.
Next, in [27], in the local neighborhood construction for local feature representation and joint sparse representation, the window sizes w 1 and w 2 are set to 9, and the number of similar pixels are set as m 1 = 70 and m 2 = 30, respectively. The sparsity level in KJSR is set as K = 40.

Results from IP
For IP data, 1% labeled samples per class were randomly selected for training (in total, 115 training samples) and the other samples were used for testing. All methods were randomly run ten times, and the averaged classification are reported in Table 1. From the results, we can see that: (1) Among the three SVM-based classifiers, the matrix-feature-based classifiers (LCMR and LCEM) show much better results than the vector-feature-based SVM-CK. This demonstrates that the local covariance or correntropy matrix feature representation is more effective than the vector feature representation. In addition, due to the strong nonlinearity similarity representation ability, LCEM shows much better results than LCMR. (2) The recently proposed deep HSI classification methods (A2S2K and NoiseCNN) show poor results due to the limited number of training samples. In particular, for Classes 7 and 9 with only three training samples, the OA of each method is lower than 50%. To achieve satisfactory results, deep learning methods usually need a large number of training samples. (3) By mining nonlinear relations between pixels, KJSR improves JSR. By further selecting similar pixels in the spatial neighborhood based on self-paced learning, SPKJSR improves KJSR. By exploiting matrix representations, CovKJSR and CEKJSR improve the traditional vector-feature-based KJSRs. CEKJSR shows better results than CovKJSR.
(4) Comparing CovKJSR with LCMR (or CEKJSR with LCEM), it can be seen that the KJSR methods show better results than SVM-based methods on these data. For IP data, there are many large homogeneous regions and region-based characteristics that can be used to improve the classification performance. Different from LCMR and LCEM, which only use the region-based matrix features, CovKJSR and CEKJSR use region-based characteristics in both the feature and classification parts. Therefore, KJSR methods show relatively better results. (5) By combining the local covariance and correntropy matrix features, the proposed LMFKJSR improves both methods and provides the best results. It demonstrates that the local covariance and correntropy features are complementary. (6) On the subclasses of "Corn" (Classes 2, 3, 4) and "Soybean" (Classes 10, 11, 12), the proposed LMFKJSR provides overall better results than the other methods (i.e., the best results on Classes 2, 10, and 11, the second best results on Classes 3 and 4). The results demonstrate that the local matrix feature representations exploiting both spectral correlation with covariance features and band similarity with correntropy features are more effective in distinguishing the subtle differences between similar materials.
The classification maps of different methods are shown in Figure 5. By seeing the highlighted elliptic and rectangle regions in the CovKJSR, CEKJSR, and LMFKJSR maps, we can find that LMFKJSR takes advantages of CovKJSR and CEKJSR. CovKJSR shows better results on the elliptic region, while CEKJSR is much better in the rectangular region. LMFKJSR shows consistently better results in both regions. In general, the classification map of LMFKJSR is more consistent with the groundtruth map.

Results from UP
Because the UP data have a large number of samples, only 0.1% labeled samples per class were randomly selected for training (in total, 50 training samples), and the other samples were used for testing. The averaged classification results over ten runs are recorded in Table 2. When there were only 50 training samples, the traditional JSR-based methods showed worse results than SVM-based methods because the dictionary representation ability is insufficient in the case of limited training samples (i.e., the number of dictionary atoms is limited). The deep learning methods produce very poor results because of the lack of training samples. By combining the local covariance and correntropy matrix features, the proposed LMFKJSR method provides the best results. Compared with the KJSR-based methods, LMFKJSR improves the OA by 10% and the κ coefficient by about 13%. Although LMFKJSR provides the best results on only two classes, it has the highest AA. This shows that LMFKJSR can generate more consistent and stable results on different classes. Figure 6 shows the classification maps of different methods, where the proposed LMFKJSR produces a relatively better map than other methods with little "salt and pepper" noise.  (e)

Results from SA
For SA data, only 0.1% labeled samples per class were randomly selected for training (totally, 66 training samples), and the other samples were used for testing. The averaged classification results over ten runs are recorded in Table 3. Except for the NoiseCNN, all methods provide OAs higher than 80%. The LMFKJSR and A2S2K provide the best and second best results. The classification of Classes 8 and 15 (i.e., "Grapes untrained" and "Vinyard untrained") is relatively difficult for the SA data. The traditional JSR methods show poor results on Class 15, while the matrix-feature-based KJSR methods improve the traditional JSR methods by almost 20% in OA. From the classification maps in Figure 7, it can be seen that Classes 8 and 15 are located on the upper left of the image and are spatially adjacent; our proposed LMFKJSR provides better results in these two classes.  Here, the effect of the number of training samples on different methods is analyzed. For IP, the ratios of labeled samples per class are set as 1%, 2%, 3%, 4%, and 5%. As UP and SA have more labeled samples than IP, and the ratios of labeled samples per class are relatively smaller, i.e., 0.1%, 0.2%, 0.3%, 0.4%, and 0.5%. The OAs of different methods such as the changes in the ratio of training samples per class are shown in Figure 8. It can be seen that the proposed LMFKJSR shows consistently better results than other methods in different numbers of training samples.   Figure 9 shows the OA of LMFKJSR versus the MNF dimension d. It can be clearly seen that the OA dramatically increases as the of the number of dimension d increases and becomes stable when the dimension d is larger than 25. In the experiments, the dimension d is set as 25.

The Effect of the Weighting Coefficient
The proposed LMFKJSR exploits the local matrix feature, which is a combination of local covariance matrix feature and local correntropy matrix feature. The combination coefficient µ in Equation (11) measures the importance of local covariance and correntropy features. Figure 10 shows the effect of parameter µ on the LMFKJSR, where µ changes from 0 to 1 with an increment 0.1. The best µ values for three data sets are 0.9, 0.7, and 0.7, respectively. It should be noted that LMFKJSR is reduced to CEKJSR and CovKJSR in the case of µ = 0 and µ = 1, respectively. The OA of LMFKJSR at the optimal parameter is obviously better than either CEKJSR or CovKJSR.

Conclusions
In this paper, a local matrix-feature-based kernel joint sparse representation (LMFKJSR) model has been proposed for hyperspectral image classification. In the proposed LMFKJSR, a novel local matrix feature (LMF) is designed to reflect both the correlation between spectral pixels and the spectral bands. In detail, the local matrix feature is a linear combination of the local covariance matrix feature and the local correntropy matrix feature, where the former can describe the correlation between spectral pixels and the latter measures the similarity between spectral bands in a local spatial neighborhood. Based on the constructed LMFs, a simple linear kernel is introduced to measure the similarity between them, and a KJSR model is performed for classification. Compared with existing vector-feature-based and matrix-feature-based SVM and JSR methods, the proposed LMFKJSR shows better results on three well-known HSI data sets.