Spectral-Similarity-Based Kernel of SVM for Hyperspectral Image Classiﬁcation

: Spectral similarity measures can be regarded as potential metrics for kernel functions, and can be used to generate spectral-similarity-based kernels. However, spectral-similarity-based kernels have not received signiﬁcant attention from researchers. In this paper, we propose two novel spectral-similarity-based kernels based on spectral angle mapper (SAM) and spectral information divergence (SID) combined with the radial basis function (RBF) kernel: Power spectral angle mapper RBF (Power-SAM-RBF) and normalized spectral information divergence-based RBF (Normalized-SID-RBF) kernels. First, we prove these spectral-similarity-based kernels to be Mercer’s kernels. Second, we analyze their efﬁciency in terms of local and global kernels. Finally, we consider three hyperspectral datasets to analyze the effectiveness of the proposed spectral-similarity-based kernels. Experimental results demonstrate that the Power-SAM-RBF and SAM-RBF kernels can obtain an impressive performance, particularly the Power-SAM-RBF kernel. For example, when the ratio of the training set is 20%, the kappa coefﬁcient of Power-SAM-RBF kernel (0.8561) is 1.61%, 1.32%, and 1.23% higher than that of the RBF kernel on the Indian Pines, University of Pavia, and Salinas Valley datasets, respectively. We present three conclusions. First, the superiority of the Power-SAM-RBF kernel compared to other kernels is evident. Second, the Power-SAM-RBF kernel can provide an outstanding performance when the similarity between spectral signatures in the same hyperspectral dataset is either extremely high or extremely low. Third, the Power-SAM-RBF kernel provides even greater beneﬁts compared to other commonly used kernels when the sizes of the training sets increase. In future work, multiple kernels combining with the spectral-similarity-based kernel are expected to be provide better hyperspectral classiﬁcation.


Introduction
Hyperspectral data, which span the visible to infrared spectrum and cover hundreds of bands, can provide important spectral information regarding land cover.Hyperspectral sensors record the collected information as a series of images; these images provide the spatial distribution of solar radiation reflected from a point of observation [1].Such a high-dimensional spectral feature space is suitable for a wide range of applications, including land-cover classification [1], ground target detection [2], anomaly detection [3], and spectral unmixing [4].
High-dimensionality data from hyperspectral imaging also represents a significant challenge for image classification [5,6].Classification performance is strongly affected by the dimensionality of the feature space (e.g., the Hughes phenomenon [7]).This problem can typically be simplified by employing a feature extraction to reduce the dimensionality of the hyperspectral images (HSIs) while maintaining as much valuable data as possible.Next, conventional statistical approaches, such as k-nearest neighbors, a maximum likelihood (ML) or Bayes classification method [8,9], and random forest [10], are used to perform HSI classification.
Two impressive methods for HSI classification are kernel-based methods and spectral similarity measures.Neither are affected by the Hughes phenomenon.Kernel-based methods, such as a support vector machines (SVMs) [11], kernel Fisher discriminant (KFD) analysis [12], support vector clustering (SVC) [13], the regularized AdaBoost (Reg-AB) algorithm [14], and other kernel-based methods [15,16], can achieve strong robustness in terms of the Hughes phenomenon, and provide elegant ways to handle nonlinear problems [7].Such methods have attracted significant attention because they provide superior and stable performance for HSI classification.Among these methods, SVMs are the most well suited for high-dimensional data classification when the training samples are limited [17,18].
The key to the SVM method lies with the kernel functions, which have been focused as its ability to solve the non-linear problem, and determines the mapping between the input and feature spaces with high dimensionality.Commonly used kernels include linear, polynomial, radial basis function (RBF), and sigmoid kernels, although there are other single kernel functions used in specific application.For example, a Fisher kernel [12,19] uses the gradient of the log-likelihood with respect to the parameters of a generative model as a feature for discriminative classifiers [20].Lodhi et al. [21] proposed a string subsequence kernel for categorizing text documents.Additionally, Wahba et al. [22] proposed an analysis of variance kernel that defines joint kernels from existing kernels.Other kernels include the Matérn kernel [23], histogram intersection (HI) kernel [24], and Laplacian kernel [25].In HSI classification, a discrete space model and SVM were combined for HSI classification [26].Xia [27] proposed a rotation-based SVM for HSI classification.
However, some kernels are limited by the complexity of images.Therefore, a number of multiple-kernel methods have been developed for disease prediction [28], electroglottograph signal classification [29], anomaly detection [30], genomic data mining [31], and kinship verification [32].Meanwhile, multiple-kernel-based SVMs have also been widely applied to HSI classification [33,34], as there is a very limited selection of a single kernel, which is able to fit complex data structures [35].For example, subspace multiple-kernel learning MKL [36] uses a subspace method to obtain the weights of the base kernels in a linear combination.Nonlinear MKL learns an optimal combined kernel from predefined linear kernels to achieve better inter-scale and inter-structural similarity among extended morphological profiles [37].Other MKL methods include sparse MKL [38], class-specific MKL [39], and ensemble MKL [40].
Researchers have also used spectral similarity measures as kernel functions for SVMs for HSI classification.Mercier and Lennon [54] proposed two mixture kernels based on spectral angle mapper (SAM)-based RBF (SAM-RBF) and spectral information divergence (SID)-based RBF (SID-RBF) kernels.Fauvel et al. [55] also used the SAM-RBF kernel for HSI classification.The results indicated that the SAM-RBF kernel is inferior to the RBF kernel.However, we experimentally determined that spectral-similarity-based kernels still have certain advantages for HSI classification.We also propose two novel types of kernels for HSI classification based on spectral similarity measures.
In this study, we first prove that both the SAM-RBF and SID-RBF kernels fulfill Mercer's conditions and that the two newly proposed spectral-similarity-based kernels are also Mercer's kernels.Second, we compare the efficiencies of the spectral-similarity-based kernels in terms of local and global kernels.Finally, we employ these kernels in SVM on three hyperspectral datasets in classification experiments, where the classification accuracies and effects of the similarity between the spectral signatures and sizes of the training sets are analyzed in detail.

Support Vector Machines
In this section, the SVM model is briefly reviewed.A detailed description can be found in [11].The SVM model attempts to classify samples by tracing a maximum separating hyperplane in the kernel space.Given a nonlinear mapping function φ(x), the discriminant function associated with the separating hyperplane is defined as follows: where w is the vector normal to the hyperplane, and b is the closest distance to the origin of the coordinate system.Because maximizing the distance between samples and the hyperplane is equivalent to minimizing the norm of w, an SVM aims to solve the following problem: where C controls the generalization capabilities of the classifier, and ξ i is a positive slack variable allowing permitted errors to be considered.
The above optimization problem above can be reformulated through a Lagrange function for which the Lagrange multipliers can be found by means of dual optimization, leading to a quadratic programming (QP) solution [11].The solution can be identified by solving a Lagrangian dual problem defined as follows: where α i is a Lagrange multiplier.A kernel function K(x i , x j ) is defined as follows: Then, a nonlinear SVM can be defined when the kernel function K(x i , x j ) satisfies Mercer's conditions.The popular kernels are defined as follows: For the linear kernel, For the polynomial kernel, For the radial basis function (RBF) kernel,

Mercer's Kernels
The flexibility of the SVM is mainly attributed to its formulation in terms of the kernel function.A kernel function can be viewed as a similarity measure in the feature space corresponding to a mapping of data into a high-dimensional space [56].The kernel function decides the way of mapping data into the high dimensional space, which leads to how high separability of data.Therefore, exploring more efficient kernel is important to classification.A kernel function must satisfy Mercer's condition [57].Mercer's theorem presents the Mercer's condition to justify if a kernel being the Mercer's kernel.Mercer's theorem and the properties of Mercer's kernels are as follows: Mercer's theorem: Let X be a Hilbert space.Suppose K : X × X → IR is a continuous symmetric function in L 2 (X 2 ).Then, there is a mapping Φ and an expansion if and only if, for any g(x) such that g(x) 2 dx is finite, we have where x 1 , x 2 , ..., x n ∈ X .Mercer's condition is an important requirement for obtaining a global solution for an SVM.It is nontrivial to check Mercer's condition, as indicated by Equation ( 9).However, it has been proven that a positive definite kernel is equivalent to a dot product kernel [56].In other words, any kernel that can be expressed as where c p are positive real coefficients and the series is uniformly convergent, satisfying Mercer's condition [58].
When proving or proposing a novel Mercer's kernel, several properties of such kernel are applicable.
Property 1: If K 1 , K 2 , K 3 ... are Mercer's kernels and K x i , x j = lim n→∞ K n x i , x j , then K is a valid Mercer's kernel.
Property 2: If K 1 , K 2 are Mercer's kernels, a 1 ≥ 0, a 2 ≥ 0 and K x i , x j = a 1 K 1 x i , x j + a 2 K 2 x i , x j , then K is a valid Mercer's kernel.
Property 3: If K 1 , K 2 are Mercer's kernels and K x i , x j = K 1 x i , x j K 2 x i , x j , then K is a valid Mercer's kernel.

Spectral-Similarity-Based Kernels and Proofs
Kernel functions can be viewed as metrics or similarity measures in the feature space corresponding to a mapping of data into a high-dimensional space [56].Spectral similarity measures are used in HSI analysis to weigh the similarity and discrimination between a target and reference spectral signature.Therefore, a spectral similarity measure is a type of metric in the spectral feature space.

Given two spectral vectors
the spectral angle mapper (SAM) and spectral information divergence (SID) can be defined as follows: SAM: SID: where and p = (p 1 , p 2 , p 3 , • • • , p n ) T and q = (q 1 , q 2 , q 3 , • • • , q n ) T are the probability vectors from A and B, respectively.Additionally, p i and q i are defined as follows: Mercier and Lennon [54], and Fauvel et al. [55] used the SAM and SID to obtain new kernel functions.However, they did not present the details of their proofs.Here, we provide a proof to satisfy Mercer's condition.
Proposition 1.Given a pair of training samples x i , x j ∈ X , the SAM-RBF kernel function defined as is a Mercer's kernel.

Proof.
Here, is a normalized linear kernel, meaning it is also a Mercer's kernel.Let K n = and the Taylor expansion of cos −1 (K n ) be expressed as follows: Then, according to Properties 2 and 3, cos can also be spread using Taylor's formula as follows: Therefore, based on Properties 2 and 3, it can be proven that the spectral angle mapper-based RBF (SAM-RBF) kernel function is a Mercer's kernel.Proposition 2. Given a pair of training samples x i , x j ∈ X , the spectral information divergence-based RBF (SID-RBF) kernel function defined as is a Mercer's kernel.
Proof.According to the Equation (14), K(x i , x j ) in Equation ( 20) can be rewritten as where K j,j = x j S j , log( x j S j ) where S i = ∑ n x i (n) and S j = ∑ n x j (n).Therefore, K(x i , x j ) in Equation ( 21) can be divided into four power exponents of K i,i , K i,j , K j,i , and K j,j .The first exponent K i,i in Equation ( 22) can be rewritten as follows: One can see that K i,i is a Mercer's kernel according to Property 2. Similarly, K i,j , K j,i , and K j,j can also be considered as Mercer's kernels.Therefore, K(x i , x j ) is a Mercer's kernel according to Property 2.

Proposed Kernels
Spectral similarity measures are used to weigh the similarity between a target and reference spectral signature.Therefore, it can be considered as a metric and kernel function for SVM.Meanwhile, since spectral similarity measures are commonly used into hyperspectral image classification, they will have high optiential to improve hyperspectral image classification as kernel functions.Here, we propose two modified spectral similarity-based kernels based on the SAM-RBF and SID-RBF kernels.Proposition 3. A modified kernel, called the Power-SAM-RBF kernel defined as is a Mercer's kernel.
Proof.According to Proof 1, we must prove that , where t ∈ IR is a Mercer's kernel.
In Equation (10), p is an integral real coefficient.Because is a Mercer's kernel, we should prove that K(x i , x j ) t , where t ∈ IR is also a Mercer's kernel.This expression can be rewritten as follows: where t ∈ IR, a, b = 1, 2, 3, ..., N. Additionally, the Taylor expansion of 1 K(x i ,x j ) can be expressed as Then, Equation ( 29) can be rewritten as where r i ∈ Z.
According to Properties 2 and 3, 1 is also a Mercer's kernel.Finally, the function in Proposition 3 can be used as a Mercer's kernel for an SVM.
Compared to the SAM-RBF kernel, this modified kernel has one additional parameter and must be optimized, which will give it the potential to outperform the SAM-RBF kernel.Proposition 4. A modified kernel, called the Normalized-SID-RBF kernel, is defined as follows: where , log( represnet a Mercer's kernel. Proof.According to Equation (32), K j,j is the normalized function of , log( x j S j ) , which is a Mercer's kernel.Therefore, K j,j is also Mercer's kernel.Similarly, K j,i , K i,i , and K i,j are Mercer's kernels.Next, we can infer that the Normalized-SID-RBF kernel K(x i , x j ) in Equation ( 31) is a Mercer's kernel, according to Proof 2.

Kernel Efficiency
A kernel function is essential for determining the efficiency of an SVM model in its application.Smits and Jordaan [59] divided kernels into two classes: Local and global kernels.Local kernels, having an effect on the data in the neighborhood of the center point of kernel, have a better interpolation ability than global kernels but fail to achieve longer-range extrapolation, whereas global kernels, allowing every data point far away from others to have an influence on the kernel values as well, perform better than local kernels in terms of their extrapolation.Given a two-dimensional vector x = (x 1 , x 2 ) T , test input point (2, 2), and kernel range x ∈ [0, 10], y ∈ [0, 10], the polynomial, RBF, SAM-RBF, SID-RBF, and proposed Power-SAM-RBF and Normalized-SID-RBF kernels are presented for the analysis of kernel efficiency.Second, spectral-similarity-based kernels, namely SAM-RBF and SID-RBF, are illustrated in Figure 2, which reveals that they combine the characteristics of both local and global kernels.The SAM-RBF kernel response on the whole increases as x 1 and x 2 increase overall.In this regard, it is similar to a global kernel.However, local kernels have distinct characteristics along the direction of x 1 or x 2 .It should be noted that the properties of the local kernels are sensitive to the parameter of σ.When σ increases from 0.2 to 1.0, as shown in Figure 2a-c, the shape of the SAM-RBF kernel exhibits a significant change in the gradient of the "watershed." As shown in Figure 2d-f, there is a distinct appearance in the form of a peak response for the SID-RBF kernel.Therefore, it also possesses the characteristics of a local kernel, which is weaker than the SAM-RBF kernel.
Third, the Power-SAM-RBF kernel requires two parameters, σ(σ > 0) and t(t ∈ IR), for controlling its performance.It is similar to the global kernel that the response of Power-SAM-RBF kernel increases with x 1 and x 2 increasing; meanwhile, it has the characteristic of local kernel, because the response along the vector of [x 1 , x 2 ] is higher than others.Therefore, there is a good blance between the capbility of interpolation and extrapolation.According to this, we can conclude the following: 1.The characteristics of the global kernel become weaker and the characteristics of the local kernel become stronger when the power parameter of t increases.For example, when comparing Figure 3a,d, the saddle-backing along the watershed tends to shrink as t increases.2. With an increasing parameter of σ, the Power-SAM-RBF kernel exhibits more characteristics of a global kernel and fewer characteristics of a local kernel.As shown in Figure 3a-c, the response of the kernel becomes less pronounced as σ increases.
As shown in Figure 4, one can see the Normalized-SID-RBF kernel also has the characteristics of the global kernel, because its response increases with x 1 and x 2 increasing.In the mean time, there is also the characteristics of the local kernel with the response along some direction being higher than others.The Normalized-SID-RBF kernel has more distinct characteristics of global kernels than the SID-RBF kernel.Debnath and Takahashi [60] claimed that the normalized kernel achieves better performance than the original kernel.However, regarding the characteristics of a local kernel, the direction of its ridge trends toward one of its dimensions such that data in other dimensions are ignored.This indicates that some features in the original data may not be fully operational during model training.(a) t = 0.2; σ = 0.2 (b) t = 0.2; σ = 0.6 (c) t = 0.2; σ = 1.0(d) t = 2.2; σ = 0.2 (e) t = 2.2; σ = 0.6 (f) t = 2.2; σ = 1.0

Indian Pines
This dataset, which was acquired by an AVIRIS sensor, represents agricultural information from the Indian Pines test site in Northwestern Indiana, USA.After 20 water absorption bands are discarded, the image has a size of 145 × 145 × 220.The spatial resolution is 20 m per pixel, and the spectral coverage ranges from 0.2 to 2.4 µm.It contains 16 reference classes of crops (e.g., corn, soybean, and wheat).However, only nine classes were selected for our experiments because the number of samples in these nine classes (Table 1) is greater than that in the other classes for model training.Figure 5a,b present color composites of the Indian Pines image and corresponding ground truth data, respectively.

Experimental Setup
We evaluated the spectral-similarity-based kernels used for HSI classification using the following experimental settings:

•
Training sample selection: 5%, 10%, 15%, and 20% of samples randomly selected from the ground truth data as training samples.

•
Classification accuracies: Five iteration classification experiments were conducted and the mean and variance of the overall accuracy (OA), average accuracy (AA), and Kappa coefficient were used for the evaluation.Additionally, the product accuracy (PA) for the experiments using the Indian Pines dataset was used for analysis.If p i is the number of correctly classified samples of the ith class, t i is the number of samples of the ith class in the ground-truth data, and N is the number of classes, then the OA, AA, and kappa coefficient can be defined as • Methods: Six kernels for the SVM method, namely Linear-SVM, RBF-SVM, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF, were employed for classification experiments.The range of the parameters coef0 and γ of the kernels was [0.01, 2000], and the range of the parameter of power t was [0.01, 5].

•
Parameter optimization: Parameter optimization: We applied the PSO method to optimize the parameters of the SVM.The parameter settings for the PSO method, including the acceleration constants, maximum number of generations, and warm scale, are listed in Table 2.

Results for the Indian Pines Dataset
Table 3 presents a comparison of all kernels on the Indian Pines dataset in terms of OA, AA, and kappa coefficients with different ratios (5%, 10%, 15%, and 20%) of training set.
The Power-SAM-RBF kernel generally performs better than the other kernels, particularly in terms of OA and kappa coefficient.When the percentage of training is high, it obtains the highest AAs among all kernels.The SAM-RBF kernel can be regarded as the second-best among all kernels considered.The only time the RBF kernel performs best is in terms of AA with a small proportion of training samples.Regardless, RBF is the third-best kernel overall.The SID-RBF and Normalized-SID-RBF kernels perform worse than the other four kernels for all proportions of training data.
Regarding the spectral-similarity-based kernels, the Power-SAM-RBF and SAM-RBF kernels yield impressive performance for all proportions of training set, but particularly for the high proportions of training data (15% or 20%).For all proportions of the training set, these two kernels outperform the other kernels in terms of OA and kappa coefficient.When the proportion of the training set is greater than 10%, the AAs of these two kernels rapidly exceed those of the RBF kernel.However, the performances of the SID-RBF and Normalized-SID-RBF kernels are less promising on the Indian Pines dataset for all proportions of training sample.
Figure 8 plots the curves of OA, AA, and kappa coefficient of the classification results of all kernels with proportions of training data ranging from 5% to 20%.The superiority of the Power-SAM-RBF and SAM-RBF kernels becomes more obvious as the proportion of training data increases.
Considering the Power-SAM-RBF kernel as an example, regarding the kappa coefficient, when the proportion of training data is 5%, the value of the Power-SAM-RBF (0.7389) kernel is 0.8% higher than that of the RBF kernel (0.7309).When the proportion of training data is 20%, the value of the Power-SAM-RBF kernel (0.8561) is 1.61% higher than that of the RBF kernel (0.8400).The improvement in terms of OA is 0.85% (Power-SAM-RBF kernel 78.05%, RBF kernel 77.20%) for a proportion of 5%, and (Power-SAM-RBF 87.80%, RBF kerenel 86.42%) 1.38% when the proportion is 20%.

Results from the University of Pavia Dataset
Table 4 reveals that the Power-SAM-RBF kernel also yields the best performance among all kernels for the University of Pavia dataset, Pavia, Italy.It achieves the highest OAs, AAs, and kappa coefficients for all proportions of training data.The Normalized-SID-RBF kernel yields the worst performance.
Regarding the performance of the spectral-similarity-based kernels, both the Power-SAM-RBF and SAM-RBF kernel achieve promising performance.The SID-RBF kernel performs worse than the Linear and RBF kernels when the proportion of training data is small.As the proportion increases, the accuracy of the SID-RBF improves significantly.For example, when the proportion of training data is 20%, its OA (92.31%),AA (90.35%), and kappa coefficient (0.8977) are distinctly higher than those of the Linear kernel (OA: 91.28%; AA: 87.40%; and kappa coefficient: 0.8832), and close to those of the RBF kernel, although its AA is significantly higher than that of the RBF kernel (89.71%).While the Normalized-SID-RBF kernel still underperforms, its performance on the University of Pavia dataset is better than that on the Indian Pines dataset.
Figure 9 presents the curves of OA, AA, and kappa coefficient for all kernels and all proportions of training data on the University of Pavia dataset.The results reveal similar trends to those of the Indian Pines dataset.Overall, higher accuracies are achieved as the proportion of training data increases.Additionally, the Power-SAM-RBF and SAM-RBF kernels consistently provide the best performance.
The final comparison in Figure 9 is between the Linear and RBF kernels.Here, the superiority of the Power-SAM-RBF and SAM-RBF kernels also tends to increase with the proportion of training data.When the proportion of training data is 5%, the OA, AA, and kappa coefficient of the Power-SAM-RBF kernel are only 0.43%, 1.49%, and 0.58%, respectively.When the proportion of training data is 20%, the improvements of the Power-SAM-RBF kernel compared to the RBF kernel in terms of OA, AA, and the kappa coefficient are 0.97%, 1.79%, and 1.33%, respectively.

Results for the Salinas Valley Dataset
As shown in Table 5, the Power-SAM-RBF kernel generally obtains the best performance on the Salinas HSI.For small proportions of training samples, the Power-SAM-RBF kernel performs better than the other kernels in terms of OA and kappa coefficient, but not in terms of AA, for which the Linear kernel exhibits the best performance.The SAM-RBF kernel achieves good classification results but not better than those of the Power-SAM-RBF kernel.The SID-RBF and Normalized-SID-RBF kernels exhibit the worst performance among all kernels for all proportions of training data.
Regarding the spectral-similarity-based kernels, the Power-SAM-RBF and SAM-RBF kernels achieve impressive performance, particularly for high proportions of training data.When the proportion of training data reaches 20%, the AA of the Power-SAM-RBF kernel (96.96%) is greater those of the Linear (96.83%) and RBF kernels (96.38%).The OA and kappa coefficient of the Power-SAM-RBF kernel are 1.09% and 1.23% higher, respectively, than those of the commonly used RBF kernel.The OA and kappa coefficient of the SAM-RBF kernel are also higher than those of the Linear and RBF kernels.The performance of the SID-RBF and Normalized-SID-RBF kernels on the Salinas Valley dataset is still poor for all proportions of training data.
Similar to the experiment on the Indian Pines dataset, the Power-SAM-RBF kernel does not perform the best when the percentage of the proportion of training data is small.However, as shown in Figure 10, the superiority of the Power-SAM-RBF compared to the other kernels increases as the proportion of training data increases.When the proportion of training data is 5%, the OA, AA, and kappa coefficient of the Power-SAM-RBF kernel are lower than those of the RBF kernel.However, when the proportion of training data is 20%, the Power-SAM-RBF kernel outperforms the RBF-kernel in terms of OA, AA, and kappa coefficient by 1.09%, 0.58%, and 1.23%, respectively.

Effects of Similarity in Spectral Signatures
We noted that the improvement in terms of AA of the Power-SAM-RBF and SAM-RBF kernels compared to that of the Linear and RBF kernels is extremely different from that in terms of OA for different datasets.In the experiment on the Indian Pines and Salinas Valley datasets, the Power-SAM-RBF kernel exhibited stronger superiority over the RBF-kernel in terms of OA than in terms of AA.For example, when the proportion of training data is 20%, the OA of the Power-SAM-RBF kernel is 1.38% higher than that of the RBF kernel, whereas the AA of the Power-SAM-RBF kernel is only 0.50% higher than that of the RBF kernel.However, the superiority of the Power-SAM-RBF kernel over the RBF-kernel in terms of OA is less than that in terms of AA on the University of Pavia dataset.When the proportion of training data is 20%, the OA of the Power-SAM-RBF kernel is 0.97% higher than that of the RBF kernel, while its AA is 1.71% higher.
This indicates that the differences in kernel performance between the Indian Pines/Salinas Valley datasets and the University of Pavia dataset are relative to the original spectral signatures of these datasets.Figure 11 illustrates the average spectral signatures of each class from all labeled pixels in the ground-truth data.The differences between each spectral signature in the Indian Pines/Salinas Valley data can be clearly observed, as shown in Figure 11a,c, as well as for the University of Pavia data, as shown in Figure 11b.The higher spectral similarity of the Indian Pines/Salinas Valley datasets compared to that of the University of Pavia dataset indicates that the Power-SAM-RBF and SAM-RBF kernels are well suited to HSIs with low spectral similarity between each class.As a result, we can conclude that the superiority of the Power-SAM-RBF and SAM-RBF kernels compared with the RBF kernel is generally more pronounced when the discrimination between each spectral signature is increases.To further validate the relationship between spectral similarity and classification accuracy, we consider the Indian Pines experimental results with proportions of training data of 5% and 20% as an example to compare the performances of the Power-SAM-RBF kernel and the commonly used RBF kernel.The sums of the five experimental results in the confusion matrices for the Indian Pines dataset with proportions of training data of 5% and 20% are listed in Tables 6 and 7, respectively.We define the similarity between the spectral signatures of a pair of classes using the one-norm as follows: where S mean_i and S mean_j are the average spectral signatures of the ith and jth classes, respectively.The similarities between the spectral signatures of each pair of classes are listed in Table 8.
According to the similarity of such spectral signatures, we divided the signatures into three groups, namely high (SS pair < 2 × 10 3 ), medium (2 × 10 3 < SS pair < 10 × 10 3 ), and low (SS pair > 10 × 10 3 ) similarity groups.Given a confusion matrix T, the number of misclassified samples T i,j b represents the number of samples in C i misclassified as class C j .Based on the confusion matrix T of the Power-SAM-RBF and RBF kernel, we propose a definition of Ratio PSR_RBF (i, j) to describe the improvement of the Power-SAM-RBF kernel compared to the RBF kernel as follows: where PSR i,j is the number of misclassified samples between classes i and j using the Power-SAM-RBF kernel, and RBF i,j is the number of misclassified samples between classes i and j using the RBF kernel.When Ratio PSR_RBF (i, j) is lower than 1.0, this indicates that the Power-SAM-RBF kernel outperforms the RBF kernel.Otherwise, it indicates that the RBF kernel outperforms the Power-SAM-RBF kernel.
The calculated Ratio PSR_RBF (i, j) results are plotted in Figure 12.Because the number of misclassified samples is zero for some pairs of classes, the results may be not a number (NaN) or infinity (Inf).(Figure 12 does not present these calculation results).There are seven and nine points missing in Figure 12a,b, respectively.Regardless, one can see that most of the class pairs (17 for the 5% training set and 18 for the 20% training set) have values less than or equal to 1.0.This indicates that the Power-SAM-RBF kernel is generally superior to the RBF kernel.Further details regarding this analysis are provided below.
As shown in Figure 12a,b, most of the Ratio PSR_RBF (i, j) results are less than 1.0 when the SS pair is less than 2 × 10 3 or greater than 10 × 10 3 .Specifically, all Ratio PSR_RBF (i, j) resultsfor which the similarity of the corresponding class pair is greater than 10 × 10 3 , are less than or equal to 1.0.This indicates that the Power-SAM-RBF kernel outperforms the RBF kernel when the similarity of class pairs is high or low.When the similarity of class pairs is moderate, the Power-SAM-RBF kernel is inferior to the RBF kernel.The quadratic fitting curves also validate this phenomenon.Overall, the Power-SAM-RBF kernel is superior to the RBF kernel with either extremely high or low similarities between the spectral signatures of class pairs, whereas with moderate similarities of class pairs, it is inferior to the RBF kernel.

Effects of the Sizes of the Training Set
The experimental results for the three hyperspectral datasets discussed above support the theory that the superiority of the Power-SAM-RBF and SAM-RBF kernels compared to the Linear and RBF kernels becomes more evident as the size of the training set increases.In this section, we analyze the experimental results for the Indian Pines dataset again by comparing the Power-SAM-RBF and RBF kernels with different proportions of training samples.The numbers of samples for each class is listed in Table 9.
Table 10 lists the average PAs of each class for the RBF and Power-SAM-RBF kernels with proportions of training data of 5% and 20%.When the proportion of training data is 5%, the PAs of the classes of C4, C6, C7, and C9 for the Power-SAM-RBF kernel are higher than those for the RBF kernel.When the PAs of the RBF kernel are compared to those of the Power-SAM-RBF kernel with a 20% proportion of training data, the PAs of C2, C6, C7, and C9 for the Power-SAM-RBF kernel are higher than those for the RBF kernel.This indicates that the number of classes does not increase with an increase in the number of training samples.However, one can see that the PAs of C1, C3, C4, C5, and C8 for the Power-SAM-RBF kernel are close to those for the RBF kernel.To demonstrate the superiority of the Power-SAM-RBF kernel to the RBF kernel with an increasing number of training samples, we define two indexes.Given that ACC K,n is the accuracy with a kernel of K with n% training samples for one class, the index P K,K represents the ratio of the accuracy improvement with a kernel k compared to that with another kernel k when the number of training samples increases.Another index S K,K is used to represent the D-value of the superiority of a kernel k to another kernel k when the number of training samples increases.Therefore, P K,K and S K,K can be defined as follows: If P K,K > 1, this indicates that the accuracy improvement with the kernel K is greater than that with the kernel K when the number of training samples increases or decreases from n to n.If S K,K > 0, this indicates that the accuracy improvement with the kernel k compared to that with the kernel K with n% of training samples is greater than that with n % of training samples.In Figure 13, we have plotted the curves of P K,K and S K,K for the Power-SAM-RBF and RBF kernels when the proportion of training samples increases from 5% to 20%, according to Table 8 and Equations ( 34) and (35).
Figure 13 indicates that when there are few original samples, the superiority of the Power-SAM-RBF kernel to the RBF kernel is more pronounced.The P K,K values of the Power-SAM-RBF kernel versus the RBF kernel for the classes of C2, C3, C5, C6, and C8 are all above 1.0.Additionally, the S K,K values of the Power-SAM-RBF kernel versus the RBF kernel for the class of C2, C3, C5, C6, and C8 are all above zero.Therefore, both P K,K and S K,K indicate the superiority of the Power-SAM-RBF kernel to the RBF kernel for the classes of C2, C3, C5, C6, and C8.As shown in Table 9, the sample numbers of C1, C7, and C9 are all above 1,000 and are the highest values among the nine classes.Therefore, the superiority of the Power-SAM-RBF kernel compared to the RBF kernel is proven to respond to increases in the number of training samples when the number of original samples is small.

Conclusions
In this study, we proposed two novel spectral-similarity-based kernels (Power-SAM-RBF and Normalized-SID-RBF kernels).Additionally, we demonstrated that four spectral-similarity-based kernels, namely the two proposed kernels, SAM-RBF kernel, and SID-RBF kernel, satisfy Mercer's condition.Furthermore, a comparative analysis of these spectral-similarity-based kernels indicated that they have the characteristics of both local and global kernels.The SID-RBF and Normalized-SID-RBF kernels are non-isotropic.Therefore, the direction of the ridge trends toward one of the dimensions such that data with other dimensions are ignored.The Power-SAM-RBF and SAM-RBF kernels, which are isotropic, provide higher efficiency than the SID-RBF and Normalized-SID-RBF kernels.
HSIs of the Indian Pines, Pavia University, and Salinas Valley were used as experimental datasets.The results of using different proportions of the data for training revealed that the Power-SAM-RBF and SAM-RBF kernels achieve enhanced performance compared to the Linear, RBF, SID-RBF, and Normalized-SID-RBF kernels.The superiority of these two kernels, particularly the Power-SAM-RBF kernel, becomes more pronounced as the proportion of training data increases.For Indian Pines, when the percentage of training data is 20%, the OA, AA, kappa coefficient of Power-SAM-RBF kernel get the highest values of 87.80%, 88.24%, and 0.8561, respectively.For University of Pavia, when the percentage of training data is 20%, the OA, AA, kappa coefficient of Power-SAM-RBF kernel get the highest values of 93.86%, 91.50%, and 0.9182, respectively.For Salinas Valley, when the percentage of training data is 20%, the OA, AA, kappa coefficient of Power-SAM-RBF kernel get the highest values of 94.04%, 96.96%, and 0.9336, respectively.Furthermore, we presented deep comparative analysis of the efficiency of the Power-SAM-RBF kernel in terms of the similarity of spectral signatures and size of the training set.First, according to the differences in the characteristics of the spectral signatures among the three hyperspectral datasets, we found that the superiority of the Power-SAM-RBF and SAM-RBF kernels over other kernels becomes more pronounced when a dataset has either extremely high or extremely low similarity among the spectral features of each class.Confusion matrices for the Power-SAM-RBF and RBF kernels in the Indian Pines experiment also confirmed this rule based on the analysis of three groups with different similarities of spectral signatures.Second, the PAs in the experimental results for the Indian Pines dataset with different numbers of training samples revealed that the increase in performance of the Power-SAM-RBF kernel versus the RBF kernel becomes more pronounced as the proportion of training samples increases.
In summary, there are three main conclusions to be drawn from this study.First, the spectralsimilarity-based kernels discussed in this paper can satisfy Mercer's condition.Additionally, the Power-SAM-RBF and SAM-RBF kernels for the SVM method can achieve significantly enhanced performance in terms of HSI classification, particularly the Power-SAM-RBF kernel.Second, either extremely high or extremely low similarities between the spectral signatures of different classes may yield better performance for the Power-SAM-RBF kernel compared to the other kernels.Finally, the Power-SAM-RBF kernel achieves even greater classification superiority with a larger training set compared to other kernels.The classification performance by using the proposed kernels for SVM is also not too distinct.Therefore, in a future study, we will employ spectral-similarity-based kernels in multiple-kernel methods to validate their efficiency in terms of HSI classification.Meanwhile, we will make efforts to explore more effective novel kernels for HSI classification.

Figure 3 .
Figure 3. Power-SAM-RBF kernel characteristics with different parameters of t and σ.

Figure 5 .Figure 6 .Figure 7 .
Figure 5. (a) False color hyperspectral remote sensing image over the Indian Pines test site (using bands 50, 27, and 17).(b) Ground truth of the labeled area with nine classes of land cover: Corn-notill, Corn-mintill, Grass-pasture, Grass-trees, Hay-windrowed, Soybean-notill, Soybean-mintill, Soybean-clean, and Woods.4.1.2.University of Pavia This dataset was acquired by the ROSIS instrument over the University of Pavia, Pavia, Italy in 2001.The image has a pixel resolution of 610 × 340, spectral coverage ranging from 0.43 to 0.86 µm, and spatial resolution of 1.3 m per pixel.After discarding noisy and water absorption

Figure 11 .
Figure 11.Average spectral signature of each class for all labeled pixels in the ground-truth data for the (a) Indian Pines, (b) University of Pavia, and (c) Salinas Valley datasets.

Figure 13 .
Figure13.Curves of (a) P K,K (the ratio of the accuracy improvement with the kernel k to that with another kernel k ) and (b)S K,K (the D-value of the superiority of the kernel k over another kernel k ) for the Power-SAM-RBF kernel versus the RBF kernel for the Indian Pines dataset.

Table 1 .
Ground truth classes for the Indian Pines dataset and their corresponding numbers of samples.

Table 2 .
Parameter settings for the PSO method.

Table 6 .
Summation of five experimental results in the confusion matrix for the Indian Pines dataset with a proportion of training data of 5%.

Table 7 .
Summation of five experimental results in the confusion matrix for the Indian Pines dataset with a proportion of training data of 20%.

Table 8 .
Similarity (×103) between the spectral signatures of each pair of classes.Note that the high-similarity group is shown in green, the medium-similarity group is shown in yellow, and the low-similarity group is shown in red.

Table 9 .
Ground truth classes for the Indian Pines dataset and their corresponding numbers of samples.

Table 10 .
Average product accuracy (PA) for each class with the RBF and Power-SAM-RBF kernels with 5% and 20% proportions of training data.