Land Cover Classiﬁcation from Hyperspectral Images via Weighted Spatial-Spectral Kernel Collaborative Representation with Tikhonov Regularization

: Precise and timely classiﬁcation of land cover types plays an important role in land resources planning and management. In this paper, nine kinds of land cover types in the acquired hyperspectral scene are classiﬁed based on the kernel collaborative representation method. To reduce the spectral shift caused by adjacency effect when mining the spatial-spectral features, a correlation coefﬁcient-weighted spatial ﬁltering operation is proposed in this paper. Additionally, by introducing this operation into the kernel collaborative representation method with Tikhonov regularization (KCRT) and discriminative KCRT (DKCRT) method, respectively, the weighted spatial-spectral KCRT (WSSKCRT) and weighted spatial-spectral DKCRT (WSSDKCRT) methods are constructed for land cover classiﬁcation. Furthermore, aiming at the problem of difﬁculty of pixel labeling in hyperspectral images, this paper attempts to establish an effective land cover classiﬁcation model in the case of small-size labeled samples. The proposed WSSKCRT and WSSDKCRT methods are compared with four methods, i.e., KCRT, DKCRT, KCRT with composite kernel (KCRT-CK), and joint DKCRT (JDKCRT). The experimental results show that the proposed WSSKCRT method achieves the best classiﬁcation performance, and WSSKCRT and WSSDKCRT outperform KCRT-CK and JDKCRT, respectively, obtaining the OA over 94% with only 540 labeled training samples, which indicates that the proposed weighted spatial ﬁltering operation can effectively alleviate the spectral shift caused by adjacency effect, and it can effectively classify land cover types under the situation of small-size labeled samples.


Introduction
The accurate classification of land cover types is the key and important foundation for land cover mapping. Precisely and timely updating land cover mapping information can provide important theoretical basis for decision-making of land resource planning and management, environmental protection, precision agriculture, landscape pattern analysis, and so on [1][2][3]. Although the traditional land cover information collection method based on field survey can provide accurate land cover details, it costs a lot of manpower and time, and it cannot be carried out under some environmental conditions [1]. With the rapid development of sensor technology, various remote sensing detection technologies are emerging. Additionally, remote sensing technology has become one of the most important means of land cover mapping, because it can efficiently and contactlessly obtain ground object information on a large scale [4,5]. In the past few years, researchers have utilized various remote sensing data to classify and map land cover types and achieved satisfactory results, such as satellite or airborne RGB images [6,7], multispectral images [8,9], hyperspectral images [10,11], synthetic aperture radar [12,13] and multi-source remote sensing images composed of the above data sources [14,15]. Among these remote sensing technologies, hyperspectral images can provide abundant spectral and spatial information for land cover objects, due to containing hundreds of narrow and continuous spectral bands [16,17]. Therefore, it has attracted extensive attention from scholars in the research of land cover classification and mapping.
Land cover classification using hyperspectral images is essentially to assign predefined label information to each pixel in hyperspectral images. In recent years, collaborative representation classification (CRC) model has been widely used in hyperspectral images for land cover classification and mapping, which was first developed for face recognition [18]. On the one hand, CRC utilizes a dictionary composed of all labeled training samples to linearly represent each test sample without considering any prior distribution of samples [19]. On the other hand, CRC employs an 2 -norm minimization-derived closed-form solution to solve the dictionary representation coefficient for each test sample, which possesses higher operation efficiency and better classification performance than that of the sparse representation classification (SRC) model [20].
To improve the classification performance of the collaborative representation (CR)based model in hyperspectral images, Li et al. introduced a distance-weighted Tikhonov regularization into the original nearest-subspace classification (NSC, also called pre-partitioning CR model) and CRC (also called post-partitioning CR model), defined as NRS [21] and CRT [22], respectively. However, in the real hyperspectral images, the sample data are often presented in nonlinear structure, and the linear representation of CR models cannot fully represent the nonlinear structure of samples [19]. To solve this problem, many researchers utilize the kernel trick to project the sample data into a nonlinear high-dimensional feature space, where the separability of samples is improved [19,[22][23][24]. For example, Li et al. incorporated the Gaussian radial basis function (RBF) kernel into the CRT method, which was denoted as KCRT and effectively improved the separability of land cover types in hyperspectral images [22]. Based on this work, Ma et al. proposed a discriminative kernel collaborative representation method with Tikhonov regularization (DKCRT) for land cover classification, which was able to make the kernel collaborative representation of different land cover types to be more discriminative in hyperspectral images [23].
Furthermore, many studies have shown that the combination of spatial and spectral features can effectively improve the performance of CR models for land cover classification in hyperspectral images [22][23][24][25][26][27]. Among them, a spatial filtering operation is a frequently used method to mine spatial-spectral features by directly averaging all the pixels (central pixels) and its corresponding spatial neighborhood pixels in hyperspectral images, such as KCRT with composite kernel (KCRT-CK) [22] and joint DKCRT (JDKCRT) [23]. In hyperspectral images, although each central pixel and its corresponding adjacent pixels belong to the same class in a high probability, it usually includes some pixels of different classes from the central pixel in its adjacent pixels. Moreover, when acquiring the spectral information of ground objects in the same scene, the hyperspectral sensor collects the direct reflection power from the central pixel and the indirect diffuse reflection powers from its adjacent pixels at the same time. Therefore, the spectral curve of the central pixel produces spectral shift affected by these adjacent pixels, which is called adjacency effect [28]. If the spatial adjacent pixels are averaged directly, the reconstructed central pixel (i.e., spatialspectral features) will contain a large amount of noise caused by spectral shift, which affects the performance of CR models for land cover classification. For each central pixel in the hyperspectral images, the pixels of different classes in the adjacent pixels increase the spectral shift of the central pixel, while the pixels of the same class in the adjacent pixels help to reduce the spectral shift of the central pixel [29]. Inspired by reference [29], this paper proposes a weighted spatial-spectral kernel-collaborative representation method with Tikhonov regularization to mine spatial-spectral features for land cover classification, instead of directly averaging all the pixels (central pixels) and its corresponding spatial neighborhood pixels.
In addition, machine learning algorithms, especially deep learning, usually need sufficient labeled training samples to establish hyperspectral land cover classification models, so as to enhance the robustness of classification models. However, in practical hyperspectral applications, it is very difficult to label pixels, which usually consumes a lot of manpower and time [30]. Therefore, the lack of labeled samples is a great challenge in hyperspectral image classification [31]. To solve this problem, this paper attempts to use the proposed method to establish an effective land cover classification model in the case of small-size labeled samples.
The main contributions of this paper are as follows: (1) A correlation coefficient-weighted spatial filtering operation is proposed to mine spatial-spectral features, which effectively reduces the spectral shift of the reconstructed central pixel. (2) By introducing a weighted spatial filtering operation into the KCRT and DKCRT methods, weighted spatial-spectral KCRT (WSSKCRT) and weighted spatial-spectral DKCRT (WSSDKCRT) methods, respectively, are proposed for land cover classification. (3) By optimizing parameters, the proposed method can effectively classify land cover types using hyperspectral images in the case of small-size labeled samples.

Data Collection
The hyperspectral scene in the experiment was collected by a Reflective Optics Spectrographic Imaging System (ROSIS) sensor mounted on a flight platform over the Pavia University in northern Italy. The spatial size of this scene is 610 × 340 pixels, with a high spatial resolution of 1.3 m. Additionally, this scene consists of 115 spectral bands. By removing 12 bands with high noises and water absorption, the remaining 103 bands, ranging from 0.43 to 0.86 µm, are used for the establishment and analysis of land cover classification models. In addition, there are nine land cover types with 42,776 labeled pixels in this hyperspectral scene, including Asphalt, Meadows, Gravel, Trees, Paintedmetal sheets, Bare Soil, Bitumen, Self-Blocking Bricks, and Shadows. The false-color image and ground truth of the acquired hyperspectral scene are shown in Figure 3a,b, respectively.
In hyperspectral scenes, each pixel represents a sample of one class. To satisfy the situation of small-size labeled samples, 60 labeled pixels in each class are randomly selected as training samples and the remaining pixels are as test samples. The specific division of samples is shown in Table 1. Suppose a hyperspectral scene contains C classes with N labeled training samples, and the training samples can be expressed as X = [x 1 , x 2 , · · · , x N ] ∈ R d×N , where d is the dimension of hyperspectral data (i.e., the number of hyperspectral bands). Additionally, the training set of the lth class (l = 1, 2, . . . , C) is denoted as X l = [x l,1 , x l,2 , · · · , x l,N l ] ∈ R d×N l , where N l represents the number of the training samples in the lth class, i.e., ∑ C l=1 N l = N. The essential idea of KCRT is to map each sample to a kernel-induced high-dimensional feature space through a nonlinear mapping function Φ, enhancing the class separability. Then, each mapped test samples Φ(y) are linearly represented using the dictionary constructed by the mapped training samples According to the definition of kernel function [22,23], the inner product of any two samples mapped by nonlinear mapping function can be expressed as kernel function, and the kernel function must satisfy Mercer's conditions. The kernel function used in KCRT is the Gaussian radial basis function (RBF). Therefore, the above statements can be expressed as follows: where λ is a global regularization parameter which is used to balance the minimization between the residual part and the regularization term, and the Tikhonov regularization term Γ Φ(y) in the kernel feature space can be expressed as 1/2 , and i = 1, 2, . . . , N. Then, the representation coefficient vector α can be calculated with a closed-form solution as follows: where Finally, the obtained representation coefficient vector α ∈ R N×1 is divided into C class-specific representation coefficient vectors according to the label information in the training set, The mapped class-specific training samples Φ(X l ) and the corresponding representation coefficient vector α l are used to reconstruct the test sample y, and the class with the minimal reconstruction error is attributed to y, whichis where Φ(X l ) = [Φ(x l,1 ), Φ(x l,2 ), · · · , Φ(x l,N l )] represents kernel sub-dictionary constructed by the training samples in the lth class, K l = Φ(X l ) T Φ(X l ) ∈ R N l ×N l denotes the Gram matrix composed of the training samples in the lth class, and k(X l , y) = [k(x l,1 , y), k(x l,2 , y), · · · , k(x l,N l , y)] T ∈ R N l ×1 .

Principle of the Original DKCRT Method
On the basis of the KCRT method, the DKCRT method adds a discriminative regularization term to consider the correlation among different classes, so as to enhance the class separability. The optimization problem of the representation coefficient vector α in the kernel feature space can be mathematically reduced to the following form where β is a positive regularization parameter controlling the contribution of the discriminative regularization term. The closed-form solution of the representation coefficient vector α can be expressed as where Q is expressed in the following form: As with the KCRT method, the test sample y is classified using formula (5).

Principle of the Original KCRT-CK and JDKCRT Method
KCRT and DKCRT only use spectral features in hyperspectral images to classify land cover types, while KCRT-CK and JDKCRT combine spatial and spectral features to improve the classification accuracy for land cover types. Both KCRT-CK and JDKCRT mine spatialspectral features using a spatial filtering operation that directly averages all the pixels (central pixels) and its corresponding spatial neighborhood pixels in hyperspectral images. The specific mathematical expression is as follows: Suppose that Ψ = {x 0,0 , x 0,1 , . . . , x 0,n×n−1 } is the spatial neighborhood pixel set corresponding to the central pixel x 0,0 under the window of n × n. Note that Ψ contains the central pixel x 0,0 itself. The mean value of all samples in Ψ can be calculated as follows: wherex 0 is the reconstructed central pixel (i.e., spatial-spectral features of x 0,0 ). In this way, all pixels in the hyperspectral scene are traversed. After that, the KCRT and DKCRT methods are used to classify the land cover types in hyperspectral scene.

Principle of the Proposed WSSKCRT and WSSDKCRT Method
It has been introduced in the Introduction part that it usually includes some pixels of different classes from the central pixel in its spatial adjacent pixels, and the pixels of different classes in the spatial adjacent pixels increase the spectral shift of the central pixel. Therefore, the reconstructed central pixel (i.e., spatial-spectral features) by directly averaging adjacent pixels contains a large amount of noise caused by spectral shift. To solve this problem, a correlation coefficient-weighted spatial filtering operation is proposed in this paper, and the proposed spatial filtering operation is introduced into KCRT and DKCRT methods, denoted as WSSKCRT and WSSDKCRT, respectively. The specific formulas are expressed as follows: Similarly, suppose the spatial neighborhood pixel set of a central pixel x 0,0 under the window of n × n is Ψ = {x 0,0 , x 0,1 , . . . , x 0,n×n−1 }. Firstly, the correlation coefficient between each pixel in the spatial neighborhood pixel set Ψ and the central pixel is calculated, respectively. The results can be expressed as R = {r 0,0 , r 0,1 , . . . , r 0,n×n−1 }, where r 0,0 represents the correlation coefficient of the central pixel itself (i.e., 1), and r 0,i represents the correlation coefficient between the central pixel x 0,0 and the spatial neighborhood pixel x 0,i (i = 1,2, . . . , n × n−1). The larger the absolute value of correlation coefficient is between the spatial neighborhood pixel x 0,i and the central pixel x 0,0 , the higher the probability is that it belongs to the same class as the central pixel, so the corresponding spatial neighborhood pixel x 0,i should be assigned a larger weight. To prevent the influence of negative correlation value on the weight, the absolute value of the obtained correlation coefficient is normalized to 0-1, which is used as the weight value of the corresponding spatial neighborhood pixel, i.e., The reconstructed central pixelx 0 can be calculated as follows: All the pixels in the hyperspectral scene are traversed using this weighted spatial filtering operation to mine spatial-spectral features. Finally, land cover types in hyperspectral scene are classified by the KCRT and DKCRT methods.

Hyperspectral Data Preprocessing
In one hyperspectral scene, the spectral curves of the same ground object usually produce amplitude shift due to different geometrical structure and smoothness, which will degrade the performance of classification model [29]. To illustrate amplitude shift, the pixels of three kinds of land cover types, i.e., Asphalt, Gravel, and Bare Soil, are selected in the acquired hyperspectral scene, and 60 pixels are randomly selected in each kind of land cover type. The spectral response curves are shown in Figure 1 a-c. It can be seen from the figure that the shape and trend of spectral curves of the same ground object is basically invariant, but the reflectance level (i.e., spectral amplitude) is obviously different, which is the amplitude shift mentioned above. To alleviate the amplitude shift, the amplitude normalization (AN) method proposed in reference [29] is used to preprocess the original hyperspectral data in this paper. The principle of AN is as follows: Assuming that x i is a pixel of one class in the hyperspectral scene, the amplitude of x i is normalized by the following formula: wherex i is the pixel preprocessed by AN, x ib represents the reflectance of the bth band, and d represents the number of bands. The amplitude of all ground object pixels in the acquired hyperspectral scene is normalized using formula (12). It can be seen from Figure 1d-f that the spectral curves of the same ground object become compact and concentrated after AN pretreatment., which indicates the amplitude shift of spectral curves of the same ground object is effectively alleviated. Therefore, the spectral data preprocessed by the AN method is used for subsequent modeling analysis.

Parameter Optimization
To verify the effectiveness of the proposed WSSKCRT and WSSDKCRT methods for land cover classification, the classification performance is compared with that of the KCRT, DKCRT, KCRT-CK, and JDKCRT methods. In addition, to ensure the fairness of the experiment, the classification performance of all methods is compared under the corresponding optimal parameters.
There are three main parameters (i.e., λ , β , and spatial filtering window size T) that produce a significant impact on the classification performance of the above-mentioned methods, in which λ is a main parameter for KCRT, λ and β are main parameters for DKCRT, λ and T are main parameters for KCRT-CK and WSSKCRT, λ , β and T are main parameters for JDCRT and WSSDKCRT, and these parameters need to be optimized, respectively. In this paper, the corresponding parameters of each method are optimized using 540 labeled training samples randomly selected in the hyperspectral scene and a five-fold cross-validation strategy. During the optimization process, λ and β are chosen from the given intervals {10 −7 , 10 −6 , 10 −5 , 10 −4 , 10 −3 , 10 −2 , 10 −1 , 1}, and window size T is chosen from the given intervals {3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11}. The classification performance for each method under different parameters is shown in Figure 2. For JDCRT and WSSDKCRT, there are three parameters (i.e., λ , β and T) to be optimized at the same time, thus using the surface of different colors to represent the corresponding window size T, as shown in Figure 2c,e. In addition, an asterisk (*) is used to represent the position of the optimal parameters for each method in the three-dimensional graph. The optimal parameter settings for each method are shown in Table 2.

Parameter Optimization
To verify the effectiveness of the proposed WSSKCRT and WSSDKCRT methods for land cover classification, the classification performance is compared with that of the KCRT, DKCRT, KCRT-CK, and JDKCRT methods. In addition, to ensure the fairness of the experiment, the classification performance of all methods is compared under the corresponding optimal parameters.
There are three main parameters (i.e., λ, β, and spatial filtering window size T) that produce a significant impact on the classification performance of the above-mentioned methods, in which λ is a main parameter for KCRT, λ and β are main parameters for DKCRT, λ and T are main parameters for KCRT-CK and WSSKCRT, λ, β and T are main parameters for JDCRT and WSSDKCRT, and these parameters need to be optimized, respectively. In this paper, the corresponding parameters of each method are optimized using 540 labeled training samples randomly selected in the hyperspectral scene and a five-fold cross-validation strategy. During the optimization process, λ and β are chosen from the given intervals {10 −7 , 10 −6 , 10 −5 , 10 −4 , 10 −3 , 10 −2 , 10 −1 , 1}, and window size T is chosen from the given intervals {3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11}. The classification performance for each method under different parameters is shown in Figure 2. For JDCRT and WSSDKCRT, there are three parameters (i.e., λ, β and T) to be optimized at the same time, thus using the surface of different colors to represent the corresponding window size T, as shown in Figure 2c,e. In addition, an asterisk (*) is used to represent the position of the optimal parameters for each method in the three-dimensional graph. The optimal parameter settings for each method are shown in Table 2.

Land Cover Classification
The above-mentioned methods classify the land cover types in the acquired hype spectral scene under the corresponding optimal parameters. Additionally, individu class accuracy, overall accuracy (OA), average accuracy (AA), and kappa statistic (Kapp are employed to evaluate the classification performance of each method. To avoid rando error and any bias, each method is conducted repeatedly for 10 runs. Additionally, in eac run, 60 pixels per class are randomly selected as training samples and the remaining pixe are taken as test samples. The average value of the results of these 10 runs is taken as th final classification accuracy. The classification results of land cover types by each metho are shown in Table 3 and Figure 3. The best classification results are presented in hig lighting font in Table 3.

Land Cover Classification
The above-mentioned methods classify the land cover types in the acquired hyperspectral scene under the corresponding optimal parameters. Additionally, individual class accuracy, overall accuracy (OA), average accuracy (AA), and kappa statistic (Kappa) are employed to evaluate the classification performance of each method. To avoid random error and any bias, each method is conducted repeatedly for 10 runs. Additionally, in each run, 60 pixels per class are randomly selected as training samples and the remaining pixels are taken as test samples. The average value of the results of these 10 runs is taken as the final classification accuracy. The classification results of land cover types by each method are shown in Table 3 and Figure 3. The best classification results are presented in highlighting font in Table 3.  It can be seen from Table 3 that the proposed WSSKCRT method achieves the best classification performance, in which OA, AA, and Kappa for land cover classification is 95.69%, 95.56%, and 0.9429, respectively. Additionally, there is the least classification noise in the classification map obtained by WSSKCRT as shown in Figure 3h. Moreover, the classification performance of WSSKCRT and WSSDKCRT is better than that of KCRT-CK and JDKCRT, respectively, which indicates that the proposed weighted spatial filtering operation can effectively alleviate the spectral shift caused by adjacency effect when mining the spatial-spectral features of hyperspectral images. Compared with other methods, DKCRT and KCRT possess the worst classification performance, due to not considering the spatial features of hyperspectral images, and there is more classification noise in the classification maps obtained by DKCRT and KCRT as shown in Figure 3c,d. In addition, all methods utilize only 540 labeled training samples (60 training samples per class) to establish the land It can be seen from Table 3 that the proposed WSSKCRT method achieves the best classification performance, in which OA, AA, and Kappa for land cover classification is 95.69%, 95.56%, and 0.9429, respectively. Additionally, there is the least classification noise in the classification map obtained by WSSKCRT as shown in Figure 3h. Moreover, the classification performance of WSSKCRT and WSSDKCRT is better than that of KCRT-CK and JDKCRT, respectively, which indicates that the proposed weighted spatial filtering operation can effectively alleviate the spectral shift caused by adjacency effect when mining the spatial-spectral features of hyperspectral images. Compared with other methods, DKCRT and KCRT possess the worst classification performance, due to not considering the spatial features of hyperspectral images, and there is more classification noise in the classification maps obtained by DKCRT and KCRT as shown in Figure 3c,d. In addition, all methods utilize only 540 labeled training samples (60 training samples per class) to establish the land cover classification models and classify the remaining 42,236 ground object samples. In this case, the proposed WSSKCRT and WSSDKCRT methods achieve the promising classification performance with the OA over 94%, which indicates that the proposed methods can effectively classify land cover types under the situation of small-size labeled samples.

Conclusions
In this paper, land cover types are classified by using hyperspectral images and the kernel collaborative representation method. The conclusions of this paper are summarized as follows: (1) The proposed WSSKCRT method achieves the best classification result, in which OA, AA, and Kappa is 95.69%, 95.56%, and 0.9429, respectively. (2) WSSKCRT and WSSDKCRT outperform KCRT-CK and JDKCRT, respectively, which indicates that the proposed weighted spatial filtering operation can effectively alleviate the spectral shift caused by adjacency effect when mining the spatial-spectral features of hyperspectral images. (3) WSSKCRT and WSSDKCRT methods obtain the OA over 94% with only 540 labeled training samples, which indicates that the proposed methods can effectively classify land cover types under the situation of small-size labeled samples.
The experimental results show that the proposed WSSKCRT and WSSDKCRT methods can effectively alleviate the spectral shift caused by adjacency effect, and can effectively classify land cover types under the situation of small-size labeled samples. However, like the traditional collaborative representation methods, the WSSKCRT and WSSDKCRT methods utilize the labeled training samples of all classes to construct a dictionary to represent and classify each test sample, which may degrade the classification performance of collaborative representation models to some extent, due to the irrelevant classes to test samples. In the follow-up research, we will focus on exploring the appropriate nearest neighbor collaborative representation mechanism, that is, using the classes nearest to each test sample to represent and classify a corresponding test sample, so as to eliminate irrelevant classes and further improve the classification performance of collaborative representation models. In addition, the proposed methods achieve effective classification of land types in a hyperspectral scene, with the same spatial resolution and a relatively small size. The classification performance of the proposed methods for hyperspectral scenes with different spatial resolution and larger region needs to be further analyzed and studied.

Conflicts of Interest:
The authors declare no conflict of interest.