Next Article in Journal
Drought Impacts on Vegetation in Southeastern Europe
Previous Article in Journal
Identification of Short-Rotation Eucalyptus Plantation at Large Scale Using Multi-Satellite Imageries and Cloud Computing Platform
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spectral-Similarity-Based Kernel of SVM for Hyperspectral Image Classification

1
State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing 210098, China
2
Department of Geographical Information Science, Hohai University, Nanjing 210098, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2020, 12(13), 2154; https://doi.org/10.3390/rs12132154
Submission received: 23 April 2020 / Revised: 18 June 2020 / Accepted: 23 June 2020 / Published: 6 July 2020

Abstract

:
Spectral similarity measures can be regarded as potential metrics for kernel functions, and can be used to generate spectral-similarity-based kernels. However, spectral-similarity-based kernels have not received significant attention from researchers. In this paper, we propose two novel spectral-similarity-based kernels based on spectral angle mapper (SAM) and spectral information divergence (SID) combined with the radial basis function (RBF) kernel: Power spectral angle mapper RBF (Power-SAM-RBF) and normalized spectral information divergence-based RBF (Normalized-SID-RBF) kernels. First, we prove these spectral-similarity-based kernels to be Mercer’s kernels. Second, we analyze their efficiency in terms of local and global kernels. Finally, we consider three hyperspectral datasets to analyze the effectiveness of the proposed spectral-similarity-based kernels. Experimental results demonstrate that the Power-SAM-RBF and SAM-RBF kernels can obtain an impressive performance, particularly the Power-SAM-RBF kernel. For example, when the ratio of the training set is 20 % , the kappa coefficient of Power-SAM-RBF kernel (0.8561) is 1.61 % , 1.32 % , and 1.23 % higher than that of the RBF kernel on the Indian Pines, University of Pavia, and Salinas Valley datasets, respectively. We present three conclusions. First, the superiority of the Power-SAM-RBF kernel compared to other kernels is evident. Second, the Power-SAM-RBF kernel can provide an outstanding performance when the similarity between spectral signatures in the same hyperspectral dataset is either extremely high or extremely low. Third, the Power-SAM-RBF kernel provides even greater benefits compared to other commonly used kernels when the sizes of the training sets increase. In future work, multiple kernels combining with the spectral-similarity-based kernel are expected to be provide better hyperspectral classification.

1. Introduction

Hyperspectral data, which span the visible to infrared spectrum and cover hundreds of bands, can provide important spectral information regarding land cover. Hyperspectral sensors record the collected information as a series of images; these images provide the spatial distribution of solar radiation reflected from a point of observation [1]. Such a high-dimensional spectral feature space is suitable for a wide range of applications, including land-cover classification [1], ground target detection [2], anomaly detection [3], and spectral unmixing [4].
High-dimensionality data from hyperspectral imaging also represents a significant challenge for image classification [5,6]. Classification performance is strongly affected by the dimensionality of the feature space (e.g., the Hughes phenomenon [7]). This problem can typically be simplified by employing a feature extraction to reduce the dimensionality of the hyperspectral images (HSIs) while maintaining as much valuable data as possible. Next, conventional statistical approaches, such as k-nearest neighbors, a maximum likelihood (ML) or Bayes classification method [8,9], and random forest [10], are used to perform HSI classification.
Two impressive methods for HSI classification are kernel-based methods and spectral similarity measures. Neither are affected by the Hughes phenomenon. Kernel-based methods, such as a support vector machines (SVMs) [11], kernel Fisher discriminant (KFD) analysis [12], support vector clustering (SVC) [13], the regularized AdaBoost (Reg-AB) algorithm [14], and other kernel-based methods [15,16], can achieve strong robustness in terms of the Hughes phenomenon, and provide elegant ways to handle nonlinear problems [7]. Such methods have attracted significant attention because they provide superior and stable performance for HSI classification. Among these methods, SVMs are the most well suited for high-dimensional data classification when the training samples are limited [17,18].
The key to the SVM method lies with the kernel functions, which have been focused as its ability to solve the non-linear problem, and determines the mapping between the input and feature spaces with high dimensionality. Commonly used kernels include linear, polynomial, radial basis function (RBF), and sigmoid kernels, although there are other single kernel functions used in specific application. For example, a Fisher kernel [12,19] uses the gradient of the log-likelihood with respect to the parameters of a generative model as a feature for discriminative classifiers [20]. Lodhi et al. [21] proposed a string subsequence kernel for categorizing text documents. Additionally, Wahba et al. [22] proposed an analysis of variance kernel that defines joint kernels from existing kernels. Other kernels include the Matérn kernel [23], histogram intersection (HI) kernel [24], and Laplacian kernel [25]. In HSI classification, a discrete space model and SVM were combined for HSI classification [26]. Xia [27] proposed a rotation-based SVM for HSI classification.
However, some kernels are limited by the complexity of images. Therefore, a number of multiple-kernel methods have been developed for disease prediction [28], electroglottograph signal classification [29], anomaly detection [30], genomic data mining [31], and kinship verification [32]. Meanwhile, multiple-kernel-based SVMs have also been widely applied to HSI classification [33,34], as there is a very limited selection of a single kernel, which is able to fit complex data structures [35]. For example, subspace multiple-kernel learning MKL [36] uses a subspace method to obtain the weights of the base kernels in a linear combination. Nonlinear MKL learns an optimal combined kernel from predefined linear kernels to achieve better inter-scale and inter-structural similarity among extended morphological profiles [37]. Other MKL methods include sparse MKL [38], class-specific MKL [39], and ensemble MKL [40].
Spectral similarity measures are used to measure the spectral similarity between target and reference spectral signatures and to implement HSI classification. Such measures are also unaffected by the Hughes phenomenon. Commonly used spectral similarity measures include the spectral angle mapper (SAM) [41], spectral information divergence (SID) [42], spectral correlation mapper (SCM) [43], spectral gradient angle (SGA) [44], Euclidean distance (ED) [45], and SID×tan(SAM) and SID×sin(SAM) [46]. Wang et al. [47,48] proposed frequency-domain-based spectral similarity measures for HSI classification. Such measures can be used for anomaly detection [49], crop monitoring [50,51,52], and land cover classification [53].
Researchers have also used spectral similarity measures as kernel functions for SVMs for HSI classification. Mercier and Lennon [54] proposed two mixture kernels based on spectral angle mapper (SAM)-based RBF (SAM-RBF) and spectral information divergence (SID)-based RBF (SID-RBF) kernels. Fauvel et al. [55] also used the SAM-RBF kernel for HSI classification. The results indicated that the SAM-RBF kernel is inferior to the RBF kernel. However, we experimentally determined that spectral-similarity-based kernels still have certain advantages for HSI classification. We also propose two novel types of kernels for HSI classification based on spectral similarity measures.
In this study, we first prove that both the SAM-RBF and SID-RBF kernels fulfill Mercer’s conditions and that the two newly proposed spectral-similarity-based kernels are also Mercer’s kernels. Second, we compare the efficiencies of the spectral-similarity-based kernels in terms of local and global kernels. Finally, we employ these kernels in SVM on three hyperspectral datasets in classification experiments, where the classification accuracies and effects of the similarity between the spectral signatures and sizes of the training sets are analyzed in detail.

2. Support Vector Machines

In this section, the SVM model is briefly reviewed. A detailed description can be found in [11]. The SVM model attempts to classify samples by tracing a maximum separating hyperplane in the kernel space. Given a nonlinear mapping function ϕ ( x ) , the discriminant function associated with the separating hyperplane is defined as follows:
f ( x ) = w T ϕ ( x ) + b ,
where w is the vector normal to the hyperplane, and b is the closest distance to the origin of the coordinate system. Because maximizing the distance between samples and the hyperplane is equivalent to minimizing the norm of w, an SVM aims to solve the following problem:
min w , ξ i , b { 1 2 w 2 2 + C i ξ i } s . t . y i ( w T ϕ ( x i ) + b ) 1 , i = 1 , 2 , 3 , , m ,
where C controls the generalization capabilities of the classifier, and ξ i is a positive slack variable allowing permitted errors to be considered.
The above optimization problem above can be reformulated through a Lagrange function for which the Lagrange multipliers can be found by means of dual optimization, leading to a quadratic programming (QP) solution [11]. The solution can be identified by solving a Lagrangian dual problem defined as follows:
max α i = 1 m α i 1 2 i = 1 m j = 1 m α i α j y i y j ϕ ( x i ) T ϕ ( x j ) s . t . i = 1 m α i y i = 0 , α i 0 , i = 1 , 2 , 3 , , m ,
where α i is a Lagrange multiplier. A kernel function K ( x i , x j ) is defined as follows:
K ( x i , x j ) = ϕ ( x i ) T ϕ ( x j ) .
Then, a nonlinear SVM can be defined when the kernel function K ( x i , x j ) satisfies Mercer’s conditions. The popular kernels are defined as follows:
For the linear kernel,
K ( x i , x j ) = x i , x j
For the polynomial kernel,
K ( x i , x j ) = ( a x i , x j + b ) d
For the radial basis function (RBF) kernel,
K ( x i , x j ) = e | | x i x j | | 2 2 2 σ 2

3. Methods

3.1. Mercer’s Kernels

The flexibility of the SVM is mainly attributed to its formulation in terms of the kernel function. A kernel function can be viewed as a similarity measure in the feature space corresponding to a mapping of data into a high-dimensional space [56]. The kernel function decides the way of mapping data into the high dimensional space, which leads to how high separability of data. Therefore, exploring more efficient kernel is important to classification. A kernel function must satisfy Mercer’s condition [57]. Mercer’s theorem presents the Mercer’s condition to justify if a kernel being the Mercer’s kernel. Mercer’s theorem and the properties of Mercer’s kernels are as follows:
Mercer’s theorem: Let X be a Hilbert space. Suppose K : X × X I R is a continuous symmetric function in L 2 ( X 2 ) . Then, there is a mapping Φ and an expansion
K ( x i , x j ) = n Φ ( x i ) n Φ ( x j ) n
if and only if, for any g ( x ) such that g ( x ) 2 d x is finite, we have
K ( x i , x j ) g ( x i ) g ( x j ) d x d y 0
where x 1 , x 2 , , x n X .
Mercer’s condition is an important requirement for obtaining a global solution for an SVM. It is nontrivial to check Mercer’s condition, as indicated by Equation (9). However, it has been proven that a positive definite kernel is equivalent to a dot product kernel [56]. In other words, any kernel that can be expressed as
K ( x i , x j ) = p = 0 inf c p ( x i , x j ) p ,
where c p are positive real coefficients and the series is uniformly convergent, satisfying Mercer’s condition [58].
When proving or proposing a novel Mercer’s kernel, several properties of such kernel are applicable.
Property 1: If K 1 , K 2 , K 3 are Mercer’s kernels and K x i , x j = l i m n K n x i , x j , then K is a valid Mercer’s kernel.
Property 2: If K 1 , K 2 are Mercer’s kernels, a 1 0 , a 2 0 and K x i , x j = a 1 K 1 x i , x j + a 2 K 2 x i , x j , then K is a valid Mercer’s kernel.
Property 3: If K 1 , K 2 are Mercer’s kernels and K x i , x j = K 1 x i , x j K 2 x i , x j , then K is a valid Mercer’s kernel.

3.2. Spectral-Similarity-Based Kernels and Proofs

Kernel functions can be viewed as metrics or similarity measures in the feature space corresponding to a mapping of data into a high-dimensional space [56]. Spectral similarity measures are used in HSI analysis to weigh the similarity and discrimination between a target and reference spectral signature. Therefore, a spectral similarity measure is a type of metric in the spectral feature space. Given two spectral vectors A = ( A 1 , A 2 , A 3 , , A n ) T and B = ( B 1 , B 2 , B 3 , , B n ) T , the spectral angle mapper (SAM) and spectral information divergence (SID) can be defined as follows:
SAM:
S A M = cos 1 A , B | | A | | · | | B | | ,
SID:
S I D ( A , B ) = ( D ( A | | B ) + D ( B | | A ) ) ,
where
D ( A | | B ) = i = 1 n p i log p i q i ,
D ( B | | A ) = i = 1 n q i log q i p i ,
and p = ( p 1 , p 2 , p 3 , , p n ) T and q = ( q 1 , q 2 , q 3 , , q n ) T are the probability vectors from A and B, respectively. Additionally, p i and q i are defined as follows:
p i = A i i = 1 n A i ,
q i = B i i = 1 n B i .
Mercier and Lennon [54], and Fauvel et al. [55] used the SAM and SID to obtain new kernel functions. However, they did not present the details of their proofs. Here, we provide a proof to satisfy Mercer’s condition.
Proposition 1.
Given a pair of training samples x i , x j X , the SAM-RBF kernel function defined as
K ( x i , x j ) = e x p cos 1 x i , x j | | x i | | · | | x j | | 2 σ 2
is a Mercer’s kernel.
Proof. 
Here, x i , x j | | x i | | · | | x j | | is a normalized linear kernel, meaning it is also a Mercer’s kernel. Let K n = x i , x j | | x i | | · | | x j | | , | K n | < 1 , and the Taylor expansion of cos 1 ( K n ) be expressed as follows:
cos 1 ( K n ) = π K n + 1 2 K n 3 3 + 1 · 3 2 · 4 K n 5 5 + | K n | < 1 .
Then, according to Properties 2 and 3, cos 1 ( K n ) = cos 1 x i , x j | | x i | | · | | x j | | is a Mercer’s kernel. Let K a r c c o s = cos 1 x i , x j | | x i | | · | | x j | | . Similarly, e x p K a r c c o s 2 σ 2 can also be spread using Taylor’s formula as follows:
e x p K a r c c o s 2 σ 2 = 1 K a r c c o s 2 σ 2 + K a r c c o s 2 2 ! 2 2 σ 2 · 2 + K a r c c o s n n ! 2 n σ 2 n .
Therefore, based on Properties 2 and 3, it can be proven that the spectral angle mapper-based RBF (SAM-RBF) kernel function is a Mercer’s kernel.  □
Proposition 2.
Given a pair of training samples x i , x j X , the spectral information divergence-based RBF (SID-RBF) kernel function defined as
K ( x i , x j ) = e x p D ( x i | | x j ) + D ( x j | | x i ) 2 σ 2
is a Mercer’s kernel.
Proof. 
According to the Equation (14), K ( x i , x j ) in Equation (20) can be rewritten as
K ( x i , x j ) = e x p K j , j + K i , i K j , i K i , j 2 σ 2 ,
where
K i , i = x i S i , l o g ( x i S i )
K j , j = x j S j , l o g ( x j S j )
K i , j = x i S i , l o g ( x j S j )
K j , i = x j S j , l o g ( x i S i )
where S i = n x i ( n ) and S j = n x j ( n ) .
Therefore, K ( x i , x j ) in Equation (21) can be divided into four power exponents of K i , i , K i , j , K j , i , and K j , j . The first exponent K i , i in Equation (22) can be rewritten as follows:
x i S i , l o g ( x i S i ) = 1 S i x i , l o g ( x i ) l o g ( S i ) )   = 1 S i ( x i , l o g ( x i ) x i , l o g ( S i ) ) .
One can see that K i , i is a Mercer’s kernel according to Property 2. Similarly, K i , j , K j , i , and K j , j can also be considered as Mercer’s kernels. Therefore, K ( x i , x j ) is a Mercer’s kernel according to Property 2.  □

3.3. Proposed Kernels

Spectral similarity measures are used to weigh the similarity between a target and reference spectral signature. Therefore, it can be considered as a metric and kernel function for SVM. Meanwhile, since spectral similarity measures are commonly used into hyperspectral image classification, they will have high optiential to improve hyperspectral image classification as kernel functions. Here, we propose two modified spectral similarity-based kernels based on the SAM-RBF and SID-RBF kernels.
Proposition 3.
A modified kernel, called the Power-SAM-RBF kernel defined as
K ( x i , x j ) = e x p cos 1 x i , x j | | x i | | · | | x j | | t 2 σ 2 t > 0 , t I R
is a Mercer’s kernel.
Proof. 
According to Proof 1, we must prove that x i , x j | | x i | | · | | x j | | t , where t I R is a Mercer’s kernel. In Equation (10), p is an integral real coefficient. Because x i , x j | | x i | | · | | x j | | is a Mercer’s kernel, we should prove that K ( x i , x j ) t , where t I R is also a Mercer’s kernel. This expression can be rewritten as follows:
K ( x i , x j ) t = K ( x i , x j ) a · 1 K ( x i , x j ) b ,
where t I R , a , b = 1 , 2 , 3 , , N . Additionally, the Taylor expansion of 1 K ( x i , x j ) can be expressed as
1 K ( x i , x j ) = 1 ( K ( x i , x j ) + 1 ) + ( K ( x i , x j ) + 1 ) 2
Then, Equation (29) can be rewritten as
1 K ( x i , x j ) = r 0 + r 1 K ( x i , x j ) + r 2 K ( x i , x j ) 2 + ,
where r i Z .
According to Properties 2 and 3, 1 K ( x i , x j ) is a Mercer’s kernel. Additionally, K ( x i , x j ) a · 1 K ( x i , x j ) is also a Mercer’s kernel. Finally, the function in Proposition 3 can be used as a Mercer’s kernel for an SVM.  □
Compared to the SAM-RBF kernel, this modified kernel has one additional parameter and must be optimized, which will give it the potential to outperform the SAM-RBF kernel.
Proposition 4.
A modified kernel, called the Normalized-SID-RBF kernel, is defined as follows:
K ( x i , x j ) = e x p ( K j , j K j , i + K i , i K i , j 2 σ 2 ) ,
where
K j , j = x j S j , l o g ( x j S j ) | | x j S j | | · | | l o g ( x j S j ) | | ,
K j , i = x j S j , l o g ( x i S i ) | | x j S j | | · | | l o g ( x i S i ) | | ,
K i , i = x i S i , l o g ( x i S i ) | | x i S i | | · | | l o g ( x i S i ) | | ,
K i , j = x i S i , l o g ( x j S j ) | | x i S i | | · | | l o g ( x j S j ) | | ,
represnet a Mercer’s kernel.
Proof. 
According to Equation (32), K j , j is the normalized function of x j S j , l o g ( x j S j ) , which is a Mercer’s kernel. Therefore, K j , j is also Mercer’s kernel. Similarly, K j , i , K i , i , and K i , j are Mercer’s kernels. Next, we can infer that the Normalized-SID-RBF kernel K ( x i , x j ) in Equation (31) is a Mercer’s kernel, according to Proof 2.  □

3.4. Kernel Efficiency

A kernel function is essential for determining the efficiency of an SVM model in its application. Smits and Jordaan [59] divided kernels into two classes: Local and global kernels. Local kernels, having an effect on the data in the neighborhood of the center point of kernel, have a better interpolation ability than global kernels but fail to achieve longer-range extrapolation, whereas global kernels, allowing every data point far away from others to have an influence on the kernel values as well, perform better than local kernels in terms of their extrapolation. Given a two-dimensional vector x = ( x 1 , x 2 ) T , test input point ( 2 , 2 ) , and kernel range x [ 0 , 10 ] , y [ 0 , 10 ] , the polynomial, RBF, SAM-RBF, SID-RBF, and proposed Power-SAM-RBF and Normalized-SID-RBF kernels are presented for the analysis of kernel efficiency.
First, Figure 1 presents the polynomial and RBF kernels within the neighborhood of the test input point. As a clarification of Reference [59], one can see that polynomial (global) kernels have an advantage for extrapolation and that RBF (local) kernels have an advantage for interpolation.
Second, spectral-similarity-based kernels, namely SAM-RBF and SID-RBF, are illustrated in Figure 2, which reveals that they combine the characteristics of both local and global kernels. The SAM-RBF kernel response on the whole increases as x 1 and x 2 increase overall. In this regard, it is similar to a global kernel. However, local kernels have distinct characteristics along the direction of x 1 or x 2 . It should be noted that the properties of the local kernels are sensitive to the parameter of σ . When σ increases from 0.2 to 1.0, as shown in Figure 2a–c, the shape of the SAM-RBF kernel exhibits a significant change in the gradient of the “watershed.”
As shown in Figure 2d–f, there is a distinct appearance in the form of a peak response for the SID-RBF kernel. Therefore, it also possesses the characteristics of a local kernel, which is weaker than the SAM-RBF kernel.
Third, the Power-SAM-RBF kernel requires two parameters, σ ( σ > 0 ) and t ( t I R ) , for controlling its performance. It is similar to the global kernel that the response of Power-SAM-RBF kernel increases with x 1 and x 2 increasing; meanwhile, it has the characteristic of local kernel, because the response along the vector of [ x 1 , x 2 ] is higher than others. Therefore, there is a good blance between the capbility of interpolation and extrapolation. According to this, we can conclude the following:
  • The characteristics of the global kernel become weaker and the characteristics of the local kernel become stronger when the power parameter of t increases. For example, when comparing Figure 3a,d, the saddle-backing along the watershed tends to shrink as t increases.
  • With an increasing parameter of σ , the Power-SAM-RBF kernel exhibits more characteristics of a global kernel and fewer characteristics of a local kernel. As shown in Figure 3a–c, the response of the kernel becomes less pronounced as σ increases.
As shown in Figure 4, one can see the Normalized-SID-RBF kernel also has the characteristics of the global kernel, because its response increases with x 1 and x 2 increasing. In the mean time, there is also the characteristics of the local kernel with the response along some direction being higher than others. The Normalized-SID-RBF kernel has more distinct characteristics of global kernels than the SID-RBF kernel. Debnath and Takahashi [60] claimed that the normalized kernel achieves better performance than the original kernel. However, regarding the characteristics of a local kernel, the direction of its ridge trends toward one of its dimensions such that data in other dimensions are ignored. This indicates that some features in the original data may not be fully operational during model training.

4. Experimental Results

4.1. Dataset Description

4.1.1. Indian Pines

This dataset, which was acquired by an AVIRIS sensor, represents agricultural information from the Indian Pines test site in Northwestern Indiana, USA. After 20 water absorption bands are discarded, the image has a size of 145 × 145 × 220 . The spatial resolution is 20 m per pixel, and the spectral coverage ranges from 0.2 to 2.4 μ m. It contains 16 reference classes of crops (e.g., corn, soybean, and wheat). However, only nine classes were selected for our experiments because the number of samples in these nine classes (Table 1) is greater than that in the other classes for model training. Figure 5a,b present color composites of the Indian Pines image and corresponding ground truth data, respectively.

4.1.2. University of Pavia

This dataset was acquired by the ROSIS instrument over the University of Pavia, Pavia, Italy in 2001. The image has a pixel resolution of 610 × 340 , spectral coverage ranging from 0.43 to 0.86 μ m, and spatial resolution of 1.3 m per pixel. After discarding noisy and water absorption bands, 103 spectral bands are retained. Figure 6a,b present a false color composite of the University of Pavia images and the corresponding ground-truth data, including nine classes of interest (Table 1).

4.1.3. Salinas Valley

These images were collected by an AVIRIS sensor with a spatial resolution of 3.7 m per pixel over Salinas Valley, California, USA. The image size is 512 × 217 pixels with 224 spectral bands. In our experiment, only 204 spectral bands were used after discarding noisy and water absorption bands. A total of 16 ground truth classes (Table 1) were considered. The false color compositions of bands 50, 30, and 20, and the ground truth map are presented in Figure 7a,b.

4.2. Experimental Setup

We evaluated the spectral-similarity-based kernels used for HSI classification using the following experimental settings:
  • Training sample selection: 5 % , 10 % , 15 % , and 20 % of samples randomly selected from the ground truth data as training samples.
  • Classification accuracies: Five iteration classification experiments were conducted and the mean and variance of the overall accuracy (OA), average accuracy (AA), and Kappa coefficient were used for the evaluation. Additionally, the product accuracy (PA) for the experiments using the Indian Pines dataset was used for analysis. If p i is the number of correctly classified samples of the ith class, t i is the number of samples of the ith class in the ground-truth data, and N is the number of classes, then the OA, AA, and kappa coefficient can be defined as
    O A = p i t i ,
    A A = p i t i N ,
    K a p p a = O A p i × t i n 1 p i × t i n .
  • Methods: Six kernels for the SVM method, namely Linear-SVM, RBF-SVM, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF, were employed for classification experiments. The range of the parameters coef0 and γ of the kernels was [0.01, 2000], and the range of the parameter of power t was [0.01, 5].
  • Parameter optimization: Parameter optimization: We applied the PSO method to optimize the parameters of the SVM. The parameter settings for the PSO method, including the acceleration constants, maximum number of generations, and warm scale, are listed in Table 2.

4.3. Results for the Indian Pines Dataset

Table 3 presents a comparison of all kernels on the Indian Pines dataset in terms of OA, AA, and kappa coefficients with different ratios ( 5 % , 10 % , 15 % , and 20 % ) of training set.
The Power-SAM-RBF kernel generally performs better than the other kernels, particularly in terms of OA and kappa coefficient. When the percentage of training is high, it obtains the highest AAs among all kernels. The SAM-RBF kernel can be regarded as the second-best among all kernels considered. The only time the RBF kernel performs best is in terms of AA with a small proportion of training samples. Regardless, RBF is the third-best kernel overall. The SID-RBF and Normalized-SID-RBF kernels perform worse than the other four kernels for all proportions of training data.
Regarding the spectral-similarity-based kernels, the Power-SAM-RBF and SAM-RBF kernels yield impressive performance for all proportions of training set, but particularly for the high proportions of training data ( 15 % or 20 % ). For all proportions of the training set, these two kernels outperform the other kernels in terms of OA and kappa coefficient. When the proportion of the training set is greater than 10 % , the AAs of these two kernels rapidly exceed those of the RBF kernel. However, the performances of the SID-RBF and Normalized-SID-RBF kernels are less promising on the Indian Pines dataset for all proportions of training sample.
Figure 8 plots the curves of OA, AA, and kappa coefficient of the classification results of all kernels with proportions of training data ranging from 5 % to 20 % . The superiority of the Power-SAM-RBF and SAM-RBF kernels becomes more obvious as the proportion of training data increases.
Considering the Power-SAM-RBF kernel as an example, regarding the kappa coefficient, when the proportion of training data is 5 % , the value of the Power-SAM-RBF (0.7389) kernel is 0.8 % higher than that of the RBF kernel (0.7309). When the proportion of training data is 20 % , the value of the Power-SAM-RBF kernel (0.8561) is 1.61 % higher than that of the RBF kernel (0.8400). The improvement in terms of OA is 0.85 % (Power-SAM-RBF kernel 78.05 % , RBF kernel 77.20 % ) for a proportion of 5 % , and (Power-SAM-RBF 87.80 % , RBF kerenel 86.42 % ) 1.38 % when the proportion is 20 % .

4.4. Results from the University of Pavia Dataset

Table 4 reveals that the Power-SAM-RBF kernel also yields the best performance among all kernels for the University of Pavia dataset, Pavia, Italy. It achieves the highest OAs, AAs, and kappa coefficients for all proportions of training data. The Normalized-SID-RBF kernel yields the worst performance.
Regarding the performance of the spectral-similarity-based kernels, both the Power-SAM-RBF and SAM-RBF kernel achieve promising performance. The SID-RBF kernel performs worse than the Linear and RBF kernels when the proportion of training data is small. As the proportion increases, the accuracy of the SID-RBF improves significantly. For example, when the proportion of training data is 20 % , its OA ( 92.31 % ), AA ( 90.35 % ), and kappa coefficient ( 0.8977 ) are distinctly higher than those of the Linear kernel (OA: 91.28 % ; AA: 87.40 % ; and kappa coefficient: 0.8832 ), and close to those of the RBF kernel, although its AA is significantly higher than that of the RBF kernel ( 89.71 % ). While the Normalized-SID-RBF kernel still underperforms, its performance on the University of Pavia dataset is better than that on the Indian Pines dataset.
Figure 9 presents the curves of OA, AA, and kappa coefficient for all kernels and all proportions of training data on the University of Pavia dataset. The results reveal similar trends to those of the Indian Pines dataset. Overall, higher accuracies are achieved as the proportion of training data increases. Additionally, the Power-SAM-RBF and SAM-RBF kernels consistently provide the best performance.
The final comparison in Figure 9 is between the Linear and RBF kernels. Here, the superiority of the Power-SAM-RBF and SAM-RBF kernels also tends to increase with the proportion of training data. When the proportion of training data is 5 % , the OA, AA, and kappa coefficient of the Power-SAM-RBF kernel are only 0.43 % , 1.49 % , and 0.58 % , respectively. When the proportion of training data is 20 % , the improvements of the Power-SAM-RBF kernel compared to the RBF kernel in terms of OA, AA, and the kappa coefficient are 0.97 % , 1.79 % , and 1.33 % , respectively.

4.5. Results for the Salinas Valley Dataset

As shown in Table 5, the Power-SAM-RBF kernel generally obtains the best performance on the Salinas HSI. For small proportions of training samples, the Power-SAM-RBF kernel performs better than the other kernels in terms of OA and kappa coefficient, but not in terms of AA, for which the Linear kernel exhibits the best performance. The SAM-RBF kernel achieves good classification results but not better than those of the Power-SAM-RBF kernel. The SID-RBF and Normalized-SID-RBF kernels exhibit the worst performance among all kernels for all proportions of training data.
Regarding the spectral-similarity-based kernels, the Power-SAM-RBF and SAM-RBF kernels achieve impressive performance, particularly for high proportions of training data. When the proportion of training data reaches 20 % , the AA of the Power-SAM-RBF kernel ( 96.96 % ) is greater those of the Linear ( 96.83 % ) and RBF kernels ( 96.38 % ). The OA and kappa coefficient of the Power-SAM-RBF kernel are 1.09 % and 1.23 % higher, respectively, than those of the commonly used RBF kernel. The OA and kappa coefficient of the SAM-RBF kernel are also higher than those of the Linear and RBF kernels. The performance of the SID-RBF and Normalized-SID-RBF kernels on the Salinas Valley dataset is still poor for all proportions of training data.
Similar to the experiment on the Indian Pines dataset, the Power-SAM-RBF kernel does not perform the best when the percentage of the proportion of training data is small. However, as shown in Figure 10, the superiority of the Power-SAM-RBF compared to the other kernels increases as the proportion of training data increases. When the proportion of training data is 5 % , the OA, AA, and kappa coefficient of the Power-SAM-RBF kernel are lower than those of the RBF kernel. However, when the proportion of training data is 20 % , the Power-SAM-RBF kernel outperforms the RBF-kernel in terms of OA, AA, and kappa coefficient by 1.09 % , 0.58 % , and 1.23 % , respectively.

4.6. Effects of Similarity in Spectral Signatures

We noted that the improvement in terms of AA of the Power-SAM-RBF and SAM-RBF kernels compared to that of the Linear and RBF kernels is extremely different from that in terms of OA for different datasets. In the experiment on the Indian Pines and Salinas Valley datasets, the Power-SAM-RBF kernel exhibited stronger superiority over the RBF-kernel in terms of OA than in terms of AA. For example, when the proportion of training data is 20 % , the OA of the Power-SAM-RBF kernel is 1.38 % higher than that of the RBF kernel, whereas the AA of the Power-SAM-RBF kernel is only 0.50 % higher than that of the RBF kernel. However, the superiority of the Power-SAM-RBF kernel over the RBF-kernel in terms of OA is less than that in terms of AA on the University of Pavia dataset. When the proportion of training data is 20 % , the OA of the Power-SAM-RBF kernel is 0.97 % higher than that of the RBF kernel, while its AA is 1.71 % higher.
This indicates that the differences in kernel performance between the Indian Pines/Salinas Valley datasets and the University of Pavia dataset are relative to the original spectral signatures of these datasets. Figure 11 illustrates the average spectral signatures of each class from all labeled pixels in the ground-truth data. The differences between each spectral signature in the Indian Pines/Salinas Valley data can be clearly observed, as shown in Figure 11a,c, as well as for the University of Pavia data, as shown in Figure 11b. The higher spectral similarity of the Indian Pines/Salinas Valley datasets compared to that of the University of Pavia dataset indicates that the Power-SAM-RBF and SAM-RBF kernels are well suited to HSIs with low spectral similarity between each class. As a result, we can conclude that the superiority of the Power-SAM-RBF and SAM-RBF kernels compared with the RBF kernel is generally more pronounced when the discrimination between each spectral signature is increases.
To further validate the relationship between spectral similarity and classification accuracy, we consider the Indian Pines experimental results with proportions of training data of 5 % and 20 % as an example to compare the performances of the Power-SAM-RBF kernel and the commonly used RBF kernel. The sums of the five experimental results in the confusion matrices for the Indian Pines dataset with proportions of training data of 5 % and 20 % are listed in Table 6 and Table 7, respectively.
We define the similarity between the spectral signatures of a pair of classes using the one-norm as follows:
S S p a i r = | | S m e a n _ i S m e a n _ j | | 1 ,
where S m e a n _ i and S m e a n _ j are the average spectral signatures of the ith and jth classes, respectively. The similarities between the spectral signatures of each pair of classes are listed in Table 8. According to the similarity of such spectral signatures, we divided the signatures into three groups, namely high ( S S p a i r < 2 × 10 3 ), medium ( 2 × 10 3 < S S p a i r < 10 × 10 3 ), and low ( S S p a i r > 10 × 10 3 ) similarity groups.
Given a confusion matrix T, the number of misclassified samples T i , j b represents the number of samples in C i misclassified as class C j . Based on the confusion matrix T of the Power-SAM-RBF and RBF kernel, we propose a definition of R a t i o P S R _ R B F ( i , j ) to describe the improvement of the Power-SAM-RBF kernel compared to the RBF kernel as follows:
R a t i o P S R _ R B F ( i , j ) = P S R i , j R B F i , j
where P S R i , j is the number of misclassified samples between classes i and j using the Power-SAM-RBF kernel, and R B F i , j is the number of misclassified samples between classes i and j using the RBF kernel. When R a t i o P S R _ R B F ( i , j ) is lower than 1.0, this indicates that the Power-SAM-RBF kernel outperforms the RBF kernel. Otherwise, it indicates that the RBF kernel outperforms the Power-SAM-RBF kernel.
The calculated R a t i o P S R _ R B F ( i , j ) results are plotted in Figure 12. Because the number of misclassified samples is zero for some pairs of classes, the results may be not a number (NaN) or infinity (Inf). (Figure 12 does not present these calculation results). There are seven and nine points missing in Figure 12a,b, respectively. Regardless, one can see that most of the class pairs (17 for the 5 % training set and 18 for the 20 % training set) have values less than or equal to 1.0. This indicates that the Power-SAM-RBF kernel is generally superior to the RBF kernel. Further details regarding this analysis are provided below.
As shown in Figure 12a,b, most of the R a t i o P S R _ R B F ( i , j ) results are less than 1.0 when the S S p a i r is less than 2 × 10 3 or greater than 10 × 10 3 . Specifically, all R a t i o P S R _ R B F ( i , j ) resultsfor which the similarity of the corresponding class pair is greater than 10 × 10 3 , are less than or equal to 1.0. This indicates that the Power-SAM-RBF kernel outperforms the RBF kernel when the similarity of class pairs is high or low. When the similarity of class pairs is moderate, the Power-SAM-RBF kernel is inferior to the RBF kernel. The quadratic fitting curves also validate this phenomenon.
Overall, the Power-SAM-RBF kernel is superior to the RBF kernel with either extremely high or low similarities between the spectral signatures of class pairs, whereas with moderate similarities of class pairs, it is inferior to the RBF kernel.

4.7. Effects of the Sizes of the Training Set

The experimental results for the three hyperspectral datasets discussed above support the theory that the superiority of the Power-SAM-RBF and SAM-RBF kernels compared to the Linear and RBF kernels becomes more evident as the size of the training set increases. In this section, we analyze the experimental results for the Indian Pines dataset again by comparing the Power-SAM-RBF and RBF kernels with different proportions of training samples. The numbers of samples for each class is listed in Table 9.
Table 10 lists the average PAs of each class for the RBF and Power-SAM-RBF kernels with proportions of training data of 5 % and 20 % . When the proportion of training data is 5 % , the PAs of the classes of C4, C6, C7, and C9 for the Power-SAM-RBF kernel are higher than those for the RBF kernel. When the PAs of the RBF kernel are compared to those of the Power-SAM-RBF kernel with a 20 % proportion of training data, the PAs of C2, C6, C7, and C9 for the Power-SAM-RBF kernel are higher than those for the RBF kernel. This indicates that the number of classes does not increase with an increase in the number of training samples. However, one can see that the PAs of C1, C3, C4, C5, and C8 for the Power-SAM-RBF kernel are close to those for the RBF kernel.
To demonstrate the superiority of the Power-SAM-RBF kernel to the RBF kernel with an increasing number of training samples, we define two indexes. Given that A C C K , n is the accuracy with a kernel of K with n % training samples for one class, the index P K , K represents the ratio of the accuracy improvement with a kernel k compared to that with another kernel k when the number of training samples increases. Another index S K , K is used to represent the D-value of the superiority of a kernel k to another kernel k when the number of training samples increases. Therefore, P K , K and S K , K can be defined as follows:
P K , K = A C C K , n A C C K , n A C C K , n A C C K , n ,
S K , K = ( A C C K , n A C C K , n ) ( A C C K , n A C C K , n ) .
If P K , K > 1 , this indicates that the accuracy improvement with the kernel K is greater than that with the kernel K when the number of training samples increases or decreases from n to n. If S K , K > 0 , this indicates that the accuracy improvement with the kernel k compared to that with the kernel K with n % of training samples is greater than that with n % of training samples. In Figure 13, we have plotted the curves of P K , K and S K , K for the Power-SAM-RBF and RBF kernels when the proportion of training samples increases from 5 % to 20 % , according to Table 8 and Equations (34) and (35).
Figure 13 indicates that when there are few original samples, the superiority of the Power-SAM-RBF kernel to the RBF kernel is more pronounced. The P K , K values of the Power-SAM-RBF kernel versus the RBF kernel for the classes of C2, C3, C5, C6, and C8 are all above 1.0. Additionally, the S K , K values of the Power-SAM-RBF kernel versus the RBF kernel for the class of C2, C3, C5, C6, and C8 are all above zero. Therefore, both P K , K and S K , K indicate the superiority of the Power-SAM-RBF kernel to the RBF kernel for the classes of C2, C3, C5, C6, and C8. As shown in Table 9, the sample numbers of C1, C7, and C9 are all above 1,000 and are the highest values among the nine classes. Therefore, the superiority of the Power-SAM-RBF kernel compared to the RBF kernel is proven to respond to increases in the number of training samples when the number of original samples is small.

5. Conclusions

In this study, we proposed two novel spectral-similarity-based kernels (Power-SAM-RBF and Normalized-SID-RBF kernels). Additionally, we demonstrated that four spectral-similarity-based kernels, namely the two proposed kernels, SAM-RBF kernel, and SID-RBF kernel, satisfy Mercer’s condition. Furthermore, a comparative analysis of these spectral-similarity-based kernels indicated that they have the characteristics of both local and global kernels. The SID-RBF and Normalized-SID-RBF kernels are non-isotropic. Therefore, the direction of the ridge trends toward one of the dimensions such that data with other dimensions are ignored. The Power-SAM-RBF and SAM-RBF kernels, which are isotropic, provide higher efficiency than the SID-RBF and Normalized-SID-RBF kernels.
HSIs of the Indian Pines, Pavia University, and Salinas Valley were used as experimental datasets. The results of using different proportions of the data for training revealed that the Power-SAM-RBF and SAM-RBF kernels achieve enhanced performance compared to the Linear, RBF, SID-RBF, and Normalized-SID-RBF kernels. The superiority of these two kernels, particularly the Power-SAM-RBF kernel, becomes more pronounced as the proportion of training data increases. For Indian Pines, when the percentage of training data is 20 % , the OA, AA, kappa coefficient of Power-SAM-RBF kernel get the highest values of 87.80 % , 88.24 % , and 0.8561, respectively. For University of Pavia, when the percentage of training data is 20 % , the OA, AA, kappa coefficient of Power-SAM-RBF kernel get the highest values of 93.86 % , 91.50 % , and 0.9182, respectively. For Salinas Valley, when the percentage of training data is 20 % , the OA, AA, kappa coefficient of Power-SAM-RBF kernel get the highest values of 94.04 % , 96.96 % , and 0.9336, respectively.
Furthermore, we presented deep comparative analysis of the efficiency of the Power-SAM-RBF kernel in terms of the similarity of spectral signatures and size of the training set. First, according to the differences in the characteristics of the spectral signatures among the three hyperspectral datasets, we found that the superiority of the Power-SAM-RBF and SAM-RBF kernels over other kernels becomes more pronounced when a dataset has either extremely high or extremely low similarity among the spectral features of each class. Confusion matrices for the Power-SAM-RBF and RBF kernels in the Indian Pines experiment also confirmed this rule based on the analysis of three groups with different similarities of spectral signatures. Second, the PAs in the experimental results for the Indian Pines dataset with different numbers of training samples revealed that the increase in performance of the Power-SAM-RBF kernel versus the RBF kernel becomes more pronounced as the proportion of training samples increases.
In summary, there are three main conclusions to be drawn from this study. First, the spectral-similarity-based kernels discussed in this paper can satisfy Mercer’s condition. Additionally, the Power-SAM-RBF and SAM-RBF kernels for the SVM method can achieve significantly enhanced performance in terms of HSI classification, particularly the Power-SAM-RBF kernel. Second, either extremely high or extremely low similarities between the spectral signatures of different classes may yield better performance for the Power-SAM-RBF kernel compared to the other kernels. Finally, the Power-SAM-RBF kernel achieves even greater classification superiority with a larger training set compared to other kernels. The classification performance by using the proposed kernels for SVM is also not too distinct. Therefore, in a future study, we will employ spectral-similarity-based kernels in multiple-kernel methods to validate their efficiency in terms of HSI classification. Meanwhile, we will make efforts to explore more effective novel kernels for HSI classification.

Author Contributions

K.W. and L.C. had the original idea for the study and drafted the manuscript. B.Y. contributed to revision and polishing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 41771358), Natural Science Foundation of Jiangsu Province, China (Grant No. BK20140842), and Fundamental Research Funds for the Central Universities (Grant No. 2014B03514).

Acknowledgments

In this section you can acknowledge any support given which is not covered by the author contribution or funding sections. This may include administrative and technical support, or donations in kind (e.g., materials used for experiments).

Conflicts of Interest

The authors declare there are no conflicts of interest regarding the publication of this paper.

References

  1. Fauvel, M.; Tarabalka, Y.; Benediktsson, J.A.; Chanussot, J.; Tilton, J.C. Advances in spectral-spatial classification of hyperspectral images. Proc. IEEE 2013, 101, 652–675. [Google Scholar] [CrossRef] [Green Version]
  2. Zhang, L.; Zhang, L.; Tao, D.; Huang, X. Sparse transfer manifold embedding for hyperspectral target detection. IEEE Trans. Geosci. Remote Sens. 2014, 52, 1030–1043. [Google Scholar] [CrossRef]
  3. Du, B.; Zhang, L. Random-selection-based anomaly detector for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2011, 49, 1578–1589. [Google Scholar] [CrossRef]
  4. Bioucas-Dias, J.M.; Plaza, A.; Dobigeon, N.; Parente, M.; Du, Q.; Gader, P.; Chanussot, J. Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 354–379. [Google Scholar] [CrossRef] [Green Version]
  5. Landgrebe, D. Hyperspectral image data analysis. IEEE Signal Process. Mag. 2002, 19, 17–28. [Google Scholar] [CrossRef]
  6. Shaw, G.; Manolakis, D. Signal processing for hyperspectral image exploitation. CVGIP Graph. Model Image Process. 2002, 19, 12–16. [Google Scholar] [CrossRef]
  7. Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef] [Green Version]
  8. Jia, X.; Richards, J.A. Segmented prineipal components transformation for efficient hyperspectral remote sensing image display and classification. IEEE Trans. Geosci. Remote Sens. 1999, 37, 538–542. [Google Scholar]
  9. Maghsoudi, Y.; Zoej, M.J.V.; Collins, M. Using class-based feature selection for the classification of hyperspectral data. Int. J. Remote Sens. 2011, 32, 4311–4326. [Google Scholar] [CrossRef]
  10. Ham, J.; Chen, Y.; Crawford, M.M.; Ghosh, J. Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 492–501. [Google Scholar] [CrossRef] [Green Version]
  11. Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: NewYork, NY, USA, 1995. [Google Scholar]
  12. Dundar, M.M.; Landgrebe, A. A cost-effective semisupervised classifier approach with kernels. IEEE Trans. Geosci. Remote Sens. 2004, 42, 264–270. [Google Scholar] [CrossRef]
  13. Ben-Hur, A.; Horn, D.; Siegelmann, H.; Vapnik, V. Support vector clustering. Mach. Learn. Res. 2001, 2, 125–137. [Google Scholar] [CrossRef]
  14. Rätsch, G.; Schökopf, B.; Smola, A.; Mika, S.; Onoda, T.; Müller, K.R. Robust ensemble learning. In Advances in Large Margin Classifiers; Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D., Eds.; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
  15. Camps-Valls, G.; Bruzzone, L. Kernel-based methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1351–1362. [Google Scholar] [CrossRef]
  16. Bharath, B.; Nicolasourty, C.; Sébastien, L. Sparse Hilbert Schmidt Independence Criterion and Surrogate-Kernel-Based Feature Selection for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2385–2398. [Google Scholar]
  17. Fauvel, M.; Benediktsson, J.A.; Chanussot, J.; Sveinsson, J.R. Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles. IEEE Trans. Geosci. Remote Sens. 2008, 46, 3804–3814. [Google Scholar] [CrossRef] [Green Version]
  18. Li, J.; Marpu, P.R.; Plaza, A.; Bioucas-Dias, J.M.; Benediktsson, J.A. Generalized composite kernel framework for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4816–4829. [Google Scholar] [CrossRef]
  19. Gartner, T. A survey of kernels for structured data. CM SIGKDD Explor. 2003, 5, 49–58. [Google Scholar] [CrossRef]
  20. Jaakkola, T.; Haussler, D. Exploiting generative models in discriminative classifiers. Adv. Neural Inf. Process. Syst. 1999, 10, 487–493. [Google Scholar]
  21. Lodhi, H.; Saunders, C.; Shawe-Taylor, J.; Cristian-ini, N.; Watkins, C. Text classification using string kernels. J. Mach. Learn. Res. 2002, 2, 419–444. [Google Scholar]
  22. Wahba, G.; Wang, Y.; Gu, C.; Klein, R.; Klein, B. Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. Ann. Stat. 1995, 23, 1865–1895. [Google Scholar]
  23. Matérn, B. Spatial Variation; Springer: New York, NY, USA, 1960. [Google Scholar]
  24. Swain, M.; Ballard, D. Color indexing. Int. J. Comput. Vis. 1991, 7, 11–32. [Google Scholar] [CrossRef]
  25. Boughorbel, S.; Tarel, J.P.; Boujemaa, N. Conditionally positive definite kernels for SVM based image recognition. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2015), Amsterdam, The Netherlands, 6 July 2005; pp. 113–116. [Google Scholar]
  26. Xie, L.; Li, G.; Xiao, M.; Peng, L.; Chen, Q. Hyperspectral Image Classification Using Discrete Space Model and Support Vector Machines. IEEE Geosci. Remote Sens. Lett. 2017, 14, 374–378. [Google Scholar] [CrossRef]
  27. Xia, J.; Chanussot, J.; Du, P.; He, X. Rotation-Based Support Vector Machine Ensemble in Classification of Hyperspectral Data With Limited Training Samples. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1519–1531. [Google Scholar] [CrossRef]
  28. Collazos-Huertas, D.; Cardenas-Pena, D.; Castellanos-Dominguez, G. Instance-Based Representation Using Multiple Kernel Learning for Predicting Conversion to Alzheimer Disease. Int. J. Neural Syst. 2019, 29, 1850042-1–1850042-8. [Google Scholar] [CrossRef] [PubMed]
  29. Dai, M.; Wang, S.; Zheng, D.; Na, R.; Zhang, S. Domain Transfer Multiple Kernel Boosting for Classification of EEG Motor Imagery Signals. IEEE Access 2019, 7, 49951–49960. [Google Scholar] [CrossRef]
  30. Gautam, C.; Balaji, R.; Sudharsan, K.; Tiwari, A.; Ahuja, K. Localized Multiple Kernel learning for Anomaly Detection: One-class Classification. Knowl. Based Syst. 2019, 165, 241–252. [Google Scholar] [CrossRef] [Green Version]
  31. Wilson, C.M.; Li, K.; Yu, X.; Kuan, P.; Wang, X. Multiple-kernel learning for genomic data mining and prediction. BMC Bioinform. 2019, 20, 241–252. [Google Scholar] [CrossRef] [Green Version]
  32. Zhao, Y.; Song, Z.; Zheng, F. Learning a Multiple Kernel Similarity Metric for kinship verification. Inf. Sci. 2017, 430, 247–260. [Google Scholar] [CrossRef]
  33. Wu, Y.; Yang, X.; Plaza, A.; Qiao, F.; Gao, L.; Zhang, B.; Cui, Y. Approximate Computing of Remotely Sensed Data: SVM Hyperspectral Image Classification as a Case Study. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 5806–5818. [Google Scholar] [CrossRef]
  34. Liu, L.; Huang, W.; Wang, C. Hyperspectral Image Classification With Kernel-Based Least-Squares Support Vector Machines in Sum Space. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1144–1157. [Google Scholar] [CrossRef]
  35. Gu, Y.; Chanussot, J.; Jia, X.; Benediktsson, J.A. Multiple Kernel Learning for Hyperspectral Image Classification: A Review. IEEE Trans. Geosci. Remote Sens. 2017, 55, 6547–6565. [Google Scholar] [CrossRef]
  36. Wang, Q.; Gu, Y.; Tuia, D. Discriminative multiple kernel learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3912–3927. [Google Scholar] [CrossRef]
  37. Gu, Y.; Liu, T.; Jia, X.; Benediktsson, J.A.; Chanussot, J. Nonlinear Multiple Kernel Learning with Multiple-Structure-Element Extended Morphological Profiles for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3235–3247. [Google Scholar] [CrossRef]
  38. Gu, Y.; Gao, G.; Zuo, D.; You, D. Model selection and classification with multiple kernel learning for hyperspectral images via sparsity. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2119–2130. [Google Scholar] [CrossRef]
  39. Liu, T.; Gu, Y.; Jia, X.; Benediktsson, J.A.; Chanussot, J. Classspecific sparse multiple kernel learning for spectral–spatial hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7351–7365. [Google Scholar] [CrossRef]
  40. Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
  41. Kruse, F.A.; Lefkoff, A.B.; Boardman, J.W.; Heidebrecht, K.B.; Shapiro, A.T.; Barloon, P.J.; Goetz, A.F.H. The spectral image processing system (SIPS)–interactive visualization and analysis of imaging spectrometer data. Remote Sens. Environ. 1993, 44, 145–163. [Google Scholar] [CrossRef]
  42. Chang, C.I. An information-theoretic approach to spectral variability, similarity, and discrimination for hyperspectral image analysis. IEEE Trans. Inf. Theory 2000, 46, 1927–1932. [Google Scholar] [CrossRef] [Green Version]
  43. De Carvalho, J.; Meneses, O.A. Spectral correlation mapper (SCM): An improvement on the spectral angle mapper (SAM). In Proceedings of the Summaries of the 9th JPL Airborne Earth Science Workshop, Pasadena, CA, USA, 23–25 February 2000; JPL Publication: Pasadena, CA, USA, 2000; Volume 2, pp. 00–18. [Google Scholar]
  44. Estrada, F.J.; Jepson, A.D. Spectral gradients: A material descriptor invariant to geometry and incident illumination. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 861–867. [Google Scholar]
  45. Richards, J.A.; Jia, X. Remote Sensing Digital Image Analysis, an Introduction; Springer: Berlin, Germany, 1999. [Google Scholar]
  46. Du, Y.; Chang, C.I.; Ren, H.; D’Amico, F.M.; Jensen, J.O. New hyperspectral discrimination measure for spectral characterization. Opt. Eng. 2004, 43, 1777–1786. [Google Scholar]
  47. Wang, K.; Gu, X.; Yu, T.; Lin, J.; Wu, G.; Li, X. Segmentation of high-resolution remotely sensed imagery combining spectral similarity with phase congruency. J. Infrared Millim. Waves 2013, 32, 73–79. [Google Scholar] [CrossRef]
  48. Wang, K.; Yong, B.; Gu, X.; Xiao, P.; Zhang, X. Spectral similarity measure using frequency spectrum for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 130–134. [Google Scholar] [CrossRef]
  49. He, Y.; Liu, D.; Yi, S. Recursive spectral similarity measure-based band selection for anomaly detection in hyperspectral imagery. J. Opt. 2010, 13, 015401. [Google Scholar] [CrossRef]
  50. Yang, C.H.; Everitt, J.H. Using spectral distance, spectral angle and plant abundance derived from hyperspectral imagery to characterize crop yield variation. Precis. Agric 2012, 13, 62–75. [Google Scholar] [CrossRef]
  51. Kumar, M.N.; Seshasai, M.V.R.; Prasad, K.S.V.; Kamala, V.; Ramana, K.V.; Dwivedi, R.S.; Roy, P.S. Nonparametric weighted feature extraction for classification. Int. J. Remote Sens. 2011, 32, 4041–4053. [Google Scholar]
  52. Chauhan, H.J.; Mohan, B.K. Effectiveness of SID as Spectral Similarity Measure to Develop Crop Spectra from Hyperspectral Image. J. Indian Soc. Remote Sens. 2018, 46, 1853–1862. [Google Scholar] [CrossRef]
  53. Zhang, W.; Li, W.; Zhang, C.; Li, X. Incorporating Spectral Similarity Into Markov Chain Geostatistical Cosimulation for Reducing Smoothing Effect in Land Cover Postclassification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1082–1095. [Google Scholar] [CrossRef]
  54. Mercier, G.; Lennon, M. Support vector machines for hyperspectral image classification with spectral-based kernels. In Proceedings of the IEEE Geoscience and Remote Sensing Symposium, Toulouse, France, 21–25 July 2003; Volume 1, pp. 288–290. [Google Scholar]
  55. Fauvel, M.; Chanussot, J.; Benediktsson, J.A. Evaluation of kernels for multiclass classification of hyperspectral remote sensing data. In Proceedings of the IEEE International Conference on Acoustics, Toulouse, France, 14–19 May 2006; Volume 2, pp. 813–816. [Google Scholar]
  56. Boughorbel, S. Kernels for Image Classification with Support Vector Machines. Ph.D. Thesis, Université de Paris-Sud, Orsay, France, 2005. [Google Scholar]
  57. Mercer, J. Functions of Positive and Negative Type, And Their Connection with the Theory of Integral Equations. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 1909, 209, 415–446. [Google Scholar]
  58. Burges, C.J.C. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 1998, 2, 121–167. [Google Scholar] [CrossRef]
  59. Smits, G.F.; Jordaan, E.M. Improved SVM regression using mixtures of kernels. In Proceedings of the International Joint Conference on Neural Networks, Honolulu, HI, USA, 12–17 May 2002; pp. 2785–2790. [Google Scholar]
  60. Debnath, R.; Takahashi, G. Kernel selection for the support vector machine. IEICE Trans. Inf. Syst. 2004, E87-D, 2903–2904. [Google Scholar]
Figure 1. Polynomial and radial basis function (RBF) kernel representations. Polynomial kernels of degree (a) 2, (b) 3, and (c) 5; RBF kernels with the parameter σ of (d) 0.2, (e) 0.6, and (f) 1.0.
Figure 1. Polynomial and radial basis function (RBF) kernel representations. Polynomial kernels of degree (a) 2, (b) 3, and (c) 5; RBF kernels with the parameter σ of (d) 0.2, (e) 0.6, and (f) 1.0.
Remotesensing 12 02154 g001
Figure 2. Spectral angle mapper (SAM)-RBF kernel and spectral information divergence (SID)-RBF kernel representation. SAM-RBF kernel with parameter σ values of (a) 0.2, (b) 0.6, and (c) 1.0. SID-RBF kernel with parameter σ values of (d) 0.05, (e) 0.2, and (f) 0.9.
Figure 2. Spectral angle mapper (SAM)-RBF kernel and spectral information divergence (SID)-RBF kernel representation. SAM-RBF kernel with parameter σ values of (a) 0.2, (b) 0.6, and (c) 1.0. SID-RBF kernel with parameter σ values of (d) 0.05, (e) 0.2, and (f) 0.9.
Remotesensing 12 02154 g002
Figure 3. Power-SAM-RBF kernel characteristics with different parameters of t and σ .
Figure 3. Power-SAM-RBF kernel characteristics with different parameters of t and σ .
Remotesensing 12 02154 g003
Figure 4. Normalized-SID-RBF kernel representation with parameter σ values of (a) 0.05, (b) 0.2, and (c) 0.9.
Figure 4. Normalized-SID-RBF kernel representation with parameter σ values of (a) 0.05, (b) 0.2, and (c) 0.9.
Remotesensing 12 02154 g004
Figure 5. (a) False color hyperspectral remote sensing image over the Indian Pines test site (using bands 50, 27, and 17). (b) Ground truth of the labeled area with nine classes of land cover: Corn-notill, Corn-mintill, Grass-pasture, Grass-trees, Hay-windrowed, Soybean-notill, Soybean-mintill, Soybean-clean, and Woods.
Figure 5. (a) False color hyperspectral remote sensing image over the Indian Pines test site (using bands 50, 27, and 17). (b) Ground truth of the labeled area with nine classes of land cover: Corn-notill, Corn-mintill, Grass-pasture, Grass-trees, Hay-windrowed, Soybean-notill, Soybean-mintill, Soybean-clean, and Woods.
Remotesensing 12 02154 g005
Figure 6. (a) False color hyperspectral data over the University of Pavia (using bands 105, 63, and 29). (b) Ground truth of the labeled area with nine classes of land cover: Asphalt, meadows, gravel, trees, painted metal sheets, bare soil, bitumen, self-blocking bricks, and shadows.
Figure 6. (a) False color hyperspectral data over the University of Pavia (using bands 105, 63, and 29). (b) Ground truth of the labeled area with nine classes of land cover: Asphalt, meadows, gravel, trees, painted metal sheets, bare soil, bitumen, self-blocking bricks, and shadows.
Remotesensing 12 02154 g006
Figure 7. (a) False color hyperspectral image (HSI) over Salinas Valley (using bands 68, 30, and 18); (b) Ground truth of the labeled area with 16 classes of land cover: Broccoli green weeds 1, Broccoli green weeds 2, Fallow, Fallow rough plow, Fallow smooth, Stubble, Celery, Grapes untrained, Soil vineyard develop, Corn senesced green weeds, Lettuce romaine 4 wk, Lettuce romaine 5 wk, Lettuce romaine 6 wk, Lettuce romaine 7 wk, Vineyard untrained, and Vineyard vertical trellis.
Figure 7. (a) False color hyperspectral image (HSI) over Salinas Valley (using bands 68, 30, and 18); (b) Ground truth of the labeled area with 16 classes of land cover: Broccoli green weeds 1, Broccoli green weeds 2, Fallow, Fallow rough plow, Fallow smooth, Stubble, Celery, Grapes untrained, Soil vineyard develop, Corn senesced green weeds, Lettuce romaine 4 wk, Lettuce romaine 5 wk, Lettuce romaine 6 wk, Lettuce romaine 7 wk, Vineyard untrained, and Vineyard vertical trellis.
Remotesensing 12 02154 g007
Figure 8. Curves of the (a) OA, (b) AA, and (c) kappa coefficient for the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels with proportions of training data of 5 % , 10 % , 15 % , and 20 % for the Indian Pines dataset.
Figure 8. Curves of the (a) OA, (b) AA, and (c) kappa coefficient for the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels with proportions of training data of 5 % , 10 % , 15 % , and 20 % for the Indian Pines dataset.
Remotesensing 12 02154 g008
Figure 9. Curves of the (a) OA, (b) AA, and (c) kappa coefficient for the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels for proportions of training data of 5 % , 10 % , 15 % , and 20 % for the University of Pavia dataset.
Figure 9. Curves of the (a) OA, (b) AA, and (c) kappa coefficient for the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels for proportions of training data of 5 % , 10 % , 15 % , and 20 % for the University of Pavia dataset.
Remotesensing 12 02154 g009
Figure 10. Curves of (a) OA, (b) AA, and (c) kappa coefficient for the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels with proportions of training data of 5 % , 10 % , 15 % , and 20 % for the Salinas Valley dataset.
Figure 10. Curves of (a) OA, (b) AA, and (c) kappa coefficient for the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels with proportions of training data of 5 % , 10 % , 15 % , and 20 % for the Salinas Valley dataset.
Remotesensing 12 02154 g010
Figure 11. Average spectral signature of each class for all labeled pixels in the ground-truth data for the (a) Indian Pines, (b) University of Pavia, and (c) Salinas Valley datasets.
Figure 11. Average spectral signature of each class for all labeled pixels in the ground-truth data for the (a) Indian Pines, (b) University of Pavia, and (c) Salinas Valley datasets.
Remotesensing 12 02154 g011
Figure 12. Curves of R a t i o P S R _ R B F ( i , j ) when the training set being (a) 5 % and (b) 20 % .
Figure 12. Curves of R a t i o P S R _ R B F ( i , j ) when the training set being (a) 5 % and (b) 20 % .
Remotesensing 12 02154 g012
Figure 13. Curves of (a) P K , K (the ratio of the accuracy improvement with the kernel k to that with another kernel k ) and (b) S K , K (the D-value of the superiority of the kernel k over another kernel k ) for the Power-SAM-RBF kernel versus the RBF kernel for the Indian Pines dataset.
Figure 13. Curves of (a) P K , K (the ratio of the accuracy improvement with the kernel k to that with another kernel k ) and (b) S K , K (the D-value of the superiority of the kernel k over another kernel k ) for the Power-SAM-RBF kernel versus the RBF kernel for the Indian Pines dataset.
Remotesensing 12 02154 g013
Table 1. Ground truth classes for the Indian Pines dataset and their corresponding numbers of samples.
Table 1. Ground truth classes for the Indian Pines dataset and their corresponding numbers of samples.
LabelIndian PinesUniversity of PaviaSalinas Valley
C1Corn-notillAsphaltBrocoli green weeds 1
C2Corn-mintillMeadowsBrocoli green weeds 2
C3Grass-pastureGravelFallow
C4Grass-treesTreesFallow rough plow
C5Hay-windrowedPainted metal sheetsFallow smooth
C6Soybean-notillBare SoilStubble
C7Soybean-mintillBitumenCelery
C8Soybean-cleanSelf-Blocking BricksGrapes untrained
C9WoodsShadowsSoil vinyard develop
C10--Corn senesced green weeds
C11--Lettuce romaine 4wk
C12--Lettuce romaine 5wk
C13--Lettuce romaine 6wk
C14--Lettuce romaine 7wk
C15--Vinyard untrained
C16--Vinyard vertical trellis
Table 2. Parameter settings for the PSO method.
Table 2. Parameter settings for the PSO method.
LabelClassSamples
Acceleration constants c 1 and c 2 1.5 and 1.7
maximal number of generations M a x G e n 5
Swarm scale S i z e P o p 10
Inertia weight w V 1
Constriction factork1
Table 3. Overall accuracies (OA), average accuracies (AA), kappa coefficients of Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels on the Indian Pines dataset.
Table 3. Overall accuracies (OA), average accuracies (AA), kappa coefficients of Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels on the Indian Pines dataset.
Training SampleAccuraciesLinear KernelRBF KernelSAM-RBF KernelPower-SAM-RBF KernelSID-RBF KernelNormalized-SID-RBF Kernel
5%OA75.33 ± 1.2577.20 ± 1.3377.50 ± 0.7278.05 ± 0.7672.90 ± 3.6973.28 ± 1.99
AA75.55 ± 1.2378.07 ± 1.2676.03 ± 0.9776.65 ± 0.9672.17 ± 3.5271.61 ± 3.01
Kappa0.7079 ± 0.01420.7309 ± 0.01520.7317 ± 0.00900.7389 ± 0.00920.6803 ± 0.04200.6830 ± 0.0245
10%OA80.58 ± 0.5482.64 ± 0.4583.09 ± 0.4283.49 ± 0.7677.10 ± 1.8773.19 ± 4.47
AA80.95 ± 0.7583.23 ± 0.8182.58 ± 0.6783.10 ± 1.1177.23 ± 2.1272.73 ± 4.30
Kappa0.7783 ± 0.01930.8020 ± 0.01880.8042 ± 0.00680.8091 ± 0.00530.7408 ± 0.03840.6912 ± 0.0619
15%OA82.16 ± 0.5284.37 ± 0.6085.42 ± 0.2785.73 ± 0.3677.15 ± 1.0175.13 ± 1.94
AA82.88 ± 0.3985.34 ± 0.6485.66 ± 0.1986.11 ± 0.4376.91 ± 0.8074.27 ± 1.90
Kappa0.7892 ± 0.00600.8154 ± 0.00730.8276 ± 0.00310.8314 ± 0.00430.7311 ± 0.01120.7068 ± 0.0226
20%OA83.72 ± 0.5086.42 ± 0.4087.44 ± 0.5787.80 ± 0.4879.13 ± 0.8677.46 ± 2.45
AA84.68 ± 0.9087.74 ± 0.5787.83 ± 0.4488.24 ± 0.3379.04 ± 0.7076.30 ± 3.02
Kappa0.8079 ± 0.00640.8400 ± 0.00500.8518 ± 0.00670.8561 ± 0.00560.7548 ± 0.01000.7341 ± 0.0289
Table 4. OAs, AAs, kappa coefficients of the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels on the University of Pavia dataset.
Table 4. OAs, AAs, kappa coefficients of the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels on the University of Pavia dataset.
Training SampleAccuraciesLinear KernelRBF KernelSAM-RBF KernelPower-SAM-RBF KernelSID-RBF KernelNormalized-SID-RBF Kernel
5%OA89.95 ± 0.1690.97 ± 1.9790.77 ± 0.3291.40 ± 0.2388.88 ± 0.5783.96 ± 1.68
AA85.31 ± 0.5886.65 ± 5.2387.09 ± 0.5088.14 ± 0.4586.74 ± 0.6282.91 ± 1.18
Kappa0.8651 ± 0.00220.8791 ± 0.02750.8761 ± 0.00450.8849 ± 0.00330.8520 ± 0.00760.7864 ± 0.0215
10%OA90.88 ± 0.1292.01 ± 1.3992.43 ± 0.3092.75 ± 0.1791.12 ± 0.5483.23 ± 0.64
AA87.12 ± 0.3587.95 ± 3.6089.56 ± 0.4190.08 ± 0.2889.16 ± 0.7183.60 ± 1.22
Kappa0.8780 ± 0.00160.8929 ± 0.01940.8989 ± 0.00400.9032 ± 0.00240.8819 ± 0.00710.7773 ± 0.0089
15%OA90.95 ± 0.1393.01 ± 1.2593.13 ± 0.0793.56 ± 0.0791.84 ± 0.2784.53 ± 0.90
AA87.11 ± 0.3190.00 ± 2.5890.32 ± 0.1991.00 ± 0.1489.98 ± 0.3684.29 ± 0.76
Kappa0.8790 ± 0.00170.9065 ± 0.01710.9083 ± 0.00100.9142 ± 0.00100.8916 ± 0.00360.7946 ± 0.0119
20%OA91.28 ± 0.1392.89 ± 0.8993.46 ± 0.1993.86 ± 0.0892.31 ± 0.4584.61 ± 0.63
AA87.40 ± 0.3389.71 ± 1.7690.84 ± 0.4491.50 ±0.0690.35 ± 0.4584.99 ± 0.74
Kappa0.8832 ± 0.00180.9049 ± 0.01240.9128 ± 0.00250.9182 ± 0.00100.8977 ± 0.00590.7958 ± 0.0086
Table 5. OAs, AAs, and kappa coefficients of the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels on the Salinas Valley dataset.
Table 5. OAs, AAs, and kappa coefficients of the Linear, RBF, SAM-RBF, Power-SAM-RBF, SID-RBF, and Normalized-SID-RBF kernels on the Salinas Valley dataset.
Training SampleAccuraciesLinear KernelRBF KernelSAM-RBF KernelPower-SAM-RBF KernelSID-RBF KernelNormalized-SID-RBF Kernel
5%OA92.10 ± 0.2891.39 ± 0.2791.15 ± 0.2491.84 ± 0.1588.96 ± 0.5058.42 ± 4.83
AA95.65 ± 0.2395.01 ± 0.2694.73 ± 0.1995.35 ± 0.3193.36 ± 0.6860.00 ± 3.68
Kappa0.9118 ± 0.00310.9039 ± 0.00300.9012 ± 0.00270.9090 ± 0.00170.8768 ± 0.00570.5410 ± 0.0506
10%OA92.73 ± 0.1392.46 ± 0.6892.55 ± 0.2693.19 ± 0.2890.13 ± 0.1658.30 ± 5.58
AA96.47 ± 0.1195.98 ± 0.4695.88 ± 0.2796.35 ± 0.2594.25 ± 0.2160.57 ± 3.96
Kappa0.9189 ± 0.00140.9159 ± 0.00770.9169 ± 0.00300.9241 ± 0.00310.8899 ± 0.00170.5398 ± 0.0600
15%OA93.05 ± 0.0892.65 ± 0.5593.32 ± 0.3993.72 ± 0.1489.62 ± 1.3458.02 ± 6.51
AA96.70 ± 0.1396.16 ± 0.2996.41 ± 0.1996.68 ± 0.1094.74 ± 0.7059.43 ± 4.32
Kappa0.9224 ± 0.00090.9180 ± 0.00620.9255 ± 0.00440.9300 ± 0.00150.8844 ± 0.01490.5366 ± 0.0692
20%OA93.09 ± 0.0792.95 ± 0.3793.53 ± 0.3694.04 ± 0.1289.67 ± 0.6963.06 ± 8.99
AA96.83 ± 0.0796.38 ± 0.1396.60 ± 0.3096.96 ± 0.1595.18 ± 0.2761.17 ± 5.31
Kappa0.9229 ± 0.00080.9213 ± 0.00420.9278 ± 0.00410.9336 ± 0.00140.8850 ± 0.00760.5896 ± 0.0969
Table 6. Summation of five experimental results in the confusion matrix for the Indian Pines dataset with a proportion of training data of 5 % .
Table 6. Summation of five experimental results in the confusion matrix for the Indian Pines dataset with a proportion of training data of 5 % .
Predicted Class
C1C2C3C4C5C6C7C8C9
Ground truth classRBF kernelC14608287151545831219540
C22532376350589842610
C3151119808827211235106
C46071332301132121
C5209022590000
C64016521423103989300
C79607494655662989492651
C8255224911023659714830
C9001795500115774
Power-SAM-RBF kernelC14529126618545315221260
C232920951303211553250
C38517941285791940235
C44053334812116922
C5207422560100
C633623165831711008480
C773828835582548299071270
C84682334509876712400
C900437800005889
Table 7. Summation of five experimental results in the confusion matrix for the Indian Pines dataset with a proportion of training data of 20 % .
Table 7. Summation of five experimental results in the confusion matrix for the Indian Pines dataset with a proportion of training data of 20 % .
Predicted Class
C1C2C3C4C5C6C7C8C9
Ground truth classRBF kernelC147541064110245569210
C2241239202012583900
C373180720218282421
C45092874022208
C5204019010300
C6216301790290371410
C76322723231242782821420
C8588911803014920250
C900463200604976
Power-SAM-RBF kernelC14682723130278591710
C21782494020105101260
C352177544510282338
C410292864046016
C5003619001000
C625013111103107486120
C74261443242336087101030
C8137857502920119060
C900205500004985
Table 8. Similarity ( × 10 3 ) between the spectral signatures of each pair of classes. Note that the high-similarity group is shown in green, the medium-similarity group is shown in yellow, and the low-similarity group is shown in red.
Table 8. Similarity ( × 10 3 ) between the spectral signatures of each pair of classes. Note that the high-similarity group is shown in green, the medium-similarity group is shown in yellow, and the low-similarity group is shown in red.
C1C2C3C4C5C6C7C8C9
C1-1.711.47.95.02.61.72.513.2
C2--11.07.44.81.00.60.912.7
C3---4.16.811.511.411.12.1
C4----4.67.97.97.55.6
C5-----5.45.24.98.6
C6------1.10.613.3
C7-------1.113.3
C8--------12.8
C9---------
Table 9. Ground truth classes for the Indian Pines dataset and their corresponding numbers of samples.
Table 9. Ground truth classes for the Indian Pines dataset and their corresponding numbers of samples.
ClassSamples
C1Corn-notill1428
C2Corn-mintill830
C3Grass-pasture483
C4Grass-trees730
C5Hay-windrowed478
C6Soybean-notill972
C7Soybean-mintill2455
C8Soybean-clean593
C9Woods1265
Table 10. Average product accuracy (PA) for each class with the RBF and Power-SAM-RBF kernels with 5 % and 20 % proportions of training data.
Table 10. Average product accuracy (PA) for each class with the RBF and Power-SAM-RBF kernels with 5 % and 20 % proportions of training data.
ClassRBF KernelPower-SAM-RBF Kernel
5%20%5%20%
C167.9183.2666.7582.00
C260.3072.0553.1775.12
C386.2793.6378.1792.00
C495.9098.4296.6298.08
C599.5299.5399.3899.48
C667.2474.6368.7179.87
C776.7584.3484.9788.70
C852.6885.4444.0580.42
C996.0798.3497.9998.52

Share and Cite

MDPI and ACS Style

Wang, K.; Cheng, L.; Yong, B. Spectral-Similarity-Based Kernel of SVM for Hyperspectral Image Classification. Remote Sens. 2020, 12, 2154. https://doi.org/10.3390/rs12132154

AMA Style

Wang K, Cheng L, Yong B. Spectral-Similarity-Based Kernel of SVM for Hyperspectral Image Classification. Remote Sensing. 2020; 12(13):2154. https://doi.org/10.3390/rs12132154

Chicago/Turabian Style

Wang, Ke, Ligang Cheng, and Bin Yong. 2020. "Spectral-Similarity-Based Kernel of SVM for Hyperspectral Image Classification" Remote Sensing 12, no. 13: 2154. https://doi.org/10.3390/rs12132154

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop