Classiﬁcation of Hyperspectral Images by SVM Using a Composite Kernel by Employing Spectral, Spatial and Hierarchical Structure Information

: In this paper, we introduce a novel classiﬁcation framework for hyperspectral images (HSIs) by jointly employing spectral, spatial, and hierarchical structure information. In this framework, the three types of information are integrated into the SVM classiﬁer in a way of multiple kernels. Speciﬁcally, the spectral kernel is constructed through each pixel’s vector value in the original HSI, and the spatial kernel is modeled by using the extended morphological proﬁle method due to its simplicity and effectiveness. To accurately characterize hierarchical structure features, the techniques of Fish-Markov selector (FMS), marker-based hierarchical segmentation (MHSEG) and algebraic multigrid (AMG) are combined. First, the FMS algorithm is used on the original HSI for feature selection to produce its spectral subset. Then, the multigrid structure of this subset is constructed using the AMG method. Subsequently, the MHSEG algorithm is exploited to obtain a hierarchy consist of a series of segmentation maps. Finally, the hierarchical structure information is represented by using these segmentation maps. The main contributions of this work is to present an effective composite kernel for HSI classiﬁcation by utilizing spatial structure information in multiple scales. Experiments were conducted on two hyperspectral remote sensing images to validate that the proposed framework can achieve better classiﬁcation results than several popular kernel-based classiﬁcation methods in terms of both qualitative and quantitative analysis. Speciﬁcally, the proposed classiﬁcation framework can achieve 13.46–15.61% in average higher than the standard SVM classiﬁer under different training sets in the terms of overall accuracy.


Introduction
With the rapid development of hyperspectral sensors, the present hyperspectral images (HSIs) contain rich spectral and spatial information.Therefore, different objects can be accurately recognized from HSIs using various classification algorithms for different applications, such as geological survey [1], mineral mapping [2], fine agricultural research [3,4], environmental monitoring [5], etc.
HSI classification is one of the most popular problems in the field of remote sensing and has aroused much concern, but faces the following challenges [6][7][8]: First, it is very difficult to acquire sufficient labeled samples.Second, information redundancy and Hughes phenomenon are inevitable due to the high dimensional features represented by hundreds of spectral bands.Finally, HSIs are often corrupted with different types of noise and dominated by mixed pixels.To solve these problems, many researchers recurred to pixel-wise methods to classify each pixel in HSIs to a certain class using its spectral information individually [9][10][11][12][13][14].Among them, the SVM [10,15] and multinomial logistic regression (MLR) [16][17][18] are the two most commonly used techniques.However, these methods often result in much "salt-and-pepper" noise in classification maps, without considering spatial neighborhoods, and the classification performance cannot be further improved.
This difficulty has been greatly conquered at the appearance of spectral-spatial classification methods [19].Generally, these methods can be divided into three categories.In the first category, spatial information is integrated with spectral information by using composite kernels [20][21][22][23].There are many methods for spatial feature extraction, such as mean filtering [20], area filtering [24], Gabor filtering [25], gray-level co-occurrence matrix [26], edge-preserving filtering (EPF) [27], and extended morphological profiles (EMPs) [28].In the second category, the integration of the spectral information and the spatial information is first performed by image segmentation algorithms, such as mean-shift [29], watershed [30], hierarchical segmentation [31,32], minimum spanning forest [33], graph cut [34,35], and superpixel [36] approaches.Then, the final classification map is produced by combining the pixel-wise classification map and the unsupervised segmentation map by employing a majority voting algorithm.In the third category, the two types of information are jointly included in the classification process using Markov random field (MRF) models.By applying the maximum a posteriori (MAP) decision rule, HSI classification can be effectively solved by minimizing a MAP-MRF energy function.The ensemble method of SVM and the MRF-based model is a regular scheme [37][38][39][40][41][42][43].
Kernel-based classification methods have been very popular for HSI classification because they can effectively deal with the intractable issues of curse of dimensionality, limited labeled samples, and noise corruption.The SVM algorithm using a single (e.g., linear, polynomial, or Gaussian radial basis function (RBF)) kernel has been widely used for image classification.To perform HSI classification, several SVM techniques using the spectral-spatial kernel were presented.For instance, Camps-Valls et al. [20] formulated a general framework of multiple kernels by exploring both the spectral and spatial information, and the spatial information is defined using basic statistical measures within a fixed-size window in the image.The selection of a suitable window size is a challenging problem because spatial structures extracted from such a region cannot be accurately represented.To solve this problem, the adaptive neighborhood system based on morphological filtering and area filtering has been considered.On the one hand, Fauvel et al. [44] applied feature extraction on the original HSI and its EMPs, respectively, and performed the SVM classification using the RBF kernel with spectral-spatial stacked vectors.Li et al. [45] developed a MLR framework using generalized composite kernels (GCK), where the spatial information is represented by using EMPs as well.The obtained spatial information by such methods is highly dependent to the Structuring Element (SE) of morphological operators.On the other hand, Fauvel et al. [22] proposed an improved SVM by using a customized spectral-spatial kernel where the spatial information is modeled as the median value on the adaptive neighbourhood of each pixel defined using morphological area filtering.The result that is achieved by such a method is very sensitive to the predefined number of areas.Recently, the superpixel-based techniques have been applied to HSI classification by Shutao Li's research group.Fang et al. [46] presented an effective SVM classifier characterized with a superpixel-based composite kernel, where the three types of the spectral, intra-superpixel and inter-superpixel information are combined, and the superpixel map is obtained using the entropy rate superpixel (ERS) algorithm.Meanwhile, texture features are crucial for object classification of HSIs.Later, we introduced an alternative SVM classifier featured with a spectral-texture kernel [23], where the textual information is modeled for each superpixel with its local spectral histogram.The number of superpixels is a data-dependent and greatly influences classification results.Lu et al. [47] developed an effective HSI classification framework by integrating the multiple feature-induced kernels into a SVM classifier, where subpixel, pixel and superpixel features are combined.More recently, Peng et al. [48] improved the spectral-spatial composite kernel by embedding label information with an ideal regularized technique.The information that is extracted from the label domain cannot describe spatial structures well.
In this paper, we develop a novel SVM classification framework with the spectral, spatial, and hierarchical kernels (SVM-SSHK), in which the spectral, spatial, and hierarchical structure information in HSIs are integrated into the SVM classifier in a way of multiple kernels.Specifically, the spectral kernel is constructed through each pixel's vector value in the original HSI, and the spatial kernel is modeled by using the EMP method due to its simplicity and effectiveness.To accurately characterize hierarchical structure features, the techniques of Fish-Markov selector (FMS), marker-based hierarchical segmentation (MHSEG) and algebraic multigrid (AMG) are combined.First, the FMS algorithm is used on the original image for feature selection to produce its spectral subset.Then, the multigrid structure of this subset is constructed using the AMG method.Subsequently, the MHSEG algorithm is exploited to obtain a hierarchy consisting of a series of segmentation maps.Finally, the hierarchical structure information is modeled by using these segmentation maps.The main contributions of this work is to present an effective composite kernel framework for HSI classification by utilizing spatial structure information in multiple scales.The previously mentioned kernel-based approaches cannot simultaneously capture salient and fine structures in the image with a predefined number of regions.However, the proposed framework can obtain a hierarchical representation of spatial structure information in HSIs.Furthermore, this hierarchical structure is only dependent to the original HSI, without considering the problem of choosing a neighborhood system or the size of a region (e.g., an area or a superpixel).
The remainder of the paper is organized as follows.In Section 2, some related techniques are reviewed.In Section 3, the proposed classification framework that is characterized with a spectral-spatial-hierarchical kernel is introduced.In Section 4, experimental results are reported in comparing to popular HSI classification methods and some issues are discussed.The last section presents some concluding remarks and the future work.

Related Techniques
Let x represent an HSI which contains B-band vectors with x ≡ {x 1 , x 2 , . . . ,x N } ∈ R B , y ≡ {y 1 , y 2 , . . . ,y N } ∈ L N the final classification result with training samples, n i the number of the training samples of L i (i = 1, 2, . . ., Z).

Spatial Information with EMPs
In the spectral-spatial classification method, the first step is to extract some featured bands to model spectral information from hyperspectral images (HSIs) by dimensionality reduction, which is used to minimize redundant information and to improve computational efficiency.To this end, the most popular approaches have been used, such as principal component analysis (PCA) [28], independent component analysis (ICA) [49], Kernel PCA [50], decision boundary feature extraction, nonparametric weighted feature extraction, and Bhattacharyya distance feature selection [51].In this work, the widely used PCA transform was used to produce the EMP.First, almost all of the spectral information in HSIs can be represented by using the first three or four principal components (PCs).Second, object boundaries in the HSIs can be better preserved in the resultant PCs [27].Finally, it was recorded that the EMP was first constructed using PCA [7].
The main idea of EMP is to reconstruct the spatial information through morphological (opening/closing) operators, while preserving the boundaries of the image.Let k and n be the total number of the selected principal components (PCs) and the morphological operators, respectively, ψ and η the opening and closing operations, and I a gray-level image, we can build the morphological profile (MP) for each PC, as follows: For each PC, the MP is a (2n + 1)-band image.Then, the MPs are stacked to obtain the EMP as follows: where EMP is a stacked vector with the dimensionality of k(2n + 1) and includes both the spectral and spatial information of the HSIs.In fact, we can extract EMPs for all of the spectral bands or for some selective bands in the HSIs without PCA, which causes the following limitations.First, some redundancy can be observed in the B(2n + 1)-band image, where B is the number of spectral bands for the HSI, which may decrease the classification accuracies.Second, the classification process should be fit for such high-dimensional data with much more computational cost.

Band Selection with FMS
In many research fields, it is necessary for supervised classification to perform feature selection.Given a set of test samples, the selected features are used to assign a class label for each sample.Feature selection and subspace methods are widely used for dimensionality reduction [52][53][54][55].For instance, Cheng et al. [56] presented the FMS algorithm for the feature selection of high-dimensional data, whose basic idea is to find the optimal subset of features to maximize intra-class separability and minimize inter-class variations in a higher dimensional kernel space.By employing some spectral kernel functions, such as the polynomial kernel, the feature selection problem can be solved efficiently using MRF optimization techniques.In the original space, denote the within-, between-(or inter-) class, and total scatter matrices by S w , S b , and S t : i is the ith training sample in class L j , and m j and m represent the sample means for class L j and the whole training set, respectively.The scatter matrices are denoted by S w , S b , and S t in the kernel space, whose traces in algebra can be calculated as follows: Tr where Tr(•) and Sum(•) are the summation and trace operators, respectively, and K and K (i) are two matrices with size of n × n and n i × n i , respectively, and have the following forms: The feature selector is represented by α = [α 1 , α 2 , . . . ,α B ] Z ∈ {0, 1} B , where "1" indicates that the kth feature is selected or "0" not selected.The selected features from the vector x are defined, as follows: x(α) = x α (11) where is the Hadamard product.Substituting ( 9) and ( 10) with (11), K and K (i) can be expressed as functions of α: In such way, the previously mentioned scatter matrices can be defined as functions of α as Tr S w (α) , Tr S b (α) and Tr S t (α) .The aim of feature selection is to maximize the class separations for the most discriminative capability of the variables.According to the spirit of Fisher, the following optimization function can be obtained: where λ is a parameter to balance the two items.Actually, Equation ( 14) is a special case of the Markov problem without pairwise interaction term and can result in the optimal solution.In this work, we can compute a coefficient for each band of HSIs to demonstrate its significance by using the FMS algorithm.The higher the coefficient, the more significant the corresponding band.In this way, we can obtain the most relevant spectral bands.

Hierarchical Representation of HSIs
To construct a scale-space representation of a HSI u = [u 1 , u 2 , . . . ,u N ], a vector-valued anisotropic diffusion PDE can be used [57,58]: where u σ is obtained by convolving u with a Gaussian kernel of standard deviation σ, and g(•) is the diffusivity of |∇u σ |.Recently, AMG has been used for multiscale representation of HSIs due to the advantage that AMG is capable of constructing a hierarchical representation of the problem from a fine grid to a coarse grid and the linear system is suitable to be effectively solved in the coarsest grid [59].In this work, the proposed framework exploits all of the vertices in the multigrid structure as the markers.
According to the work of [60], we can construct a "pyramid" multigrid structure of HSIs as shown in Figure 1, where each grid, s = 1, 2, . . ., S, can be described by a weighted graph (V s , E s ) in which V s and E s are the set of vertices and edges, respectively, and the weight g ij of (i, j) ⊂ E s expresses the similarity between the pixels of u s i and u s j in V s .Initially, the first graph (V 0 , E 0 ) is built from the original HSI, where V 0 denotes the set of vertices, whose size is the same as the HSI, while E 0 represents the set of edges connecting each vertex to its four-neighborhoods with weights.In our method, the initial weights g 0 ij of (i, j) ⊂ E 0 are computed by using the diffusivity of the anisotropic diffusion partial differential equation: where θ is an indicator of the image edge strength using the Euclidean distance (ED) or the spectral angle mapper (SAM) between two pixel vectors, β denotes a gradient threshold.
( ) where θ is an indicator of the image edge strength using the Euclidean distance (ED) or the spectral angle mapper (SAM) between two pixel vectors, β denotes a gradient threshold.The main steps of building the multigrid structure is summarized as follows [60].
Step 1: To consecutively select a new set of To build the AMG multigrid structure, the authors of [60] introduced a mass i m for each vertex, which is a measure for the number of pixels that are assigned to a given vertex selected to the next grid and can be initialized as V with the greatest mass.The rest vertices in l V are sorted in decreasing order of mass.Then, a new vertex is iteratively selected if this satisfies the condition as follows [10]: where υ is a threshold value with V .In the multigrid structure, each vertex has a mass value and the masses in the ( ) 1 l + th grid are calculated as follows: : where l i m and The main steps of building the multigrid structure is summarized as follows [60].
Step 1: To consecutively select a new set of V l+1 from V l .To build the AMG multigrid structure, the authors of [60] introduced a mass m i for each vertex, which is a measure for the number of pixels that are assigned to a given vertex selected to the next grid and can be initialized as m 0 i = 1.The first vertex of V l+1 is selected as the vertex in V l with the greatest mass.The rest vertices in V l are sorted in decreasing order of mass.Then, a new vertex is iteratively selected if this satisfies the condition as follows [10]: where υ is a threshold value with 0 ≤ υ ≤ 1, and V l \V l+1 indicates the set difference between V l and V l+1 .In the multigrid structure, each vertex has a mass value and the masses in the (l + 1)th grid are calculated as follows: where m l i and m l+1 i are the masses in the lth and (l + 1)th grids, respectively, and w l ij weights how much vertex i ∈ V l \V l+1 depends on the vertex j ∈ Step 2: To connect the vertices in V l+1 to obtain E l+1 .The matrix of diffusivities are obtained using the Garlekin operator [61], where I c f and I f c denote the restriction and interpolation operators: According to the Garlekin operator, the weight in the (l + 1)th grid is computed, as follows: E l+1 are obtained by connecting the vertices, as follows: To iteratively perform the previously two steps, a S-level multigrid structure of HSIs is constructed.The markers correspond to pixels of the smoothed image u t determined by the position of the vertices in the coarse grid.The smoothed spectra can be considered as the average of spectrally similar and spatially adjacent pixels, which can decrease noise and improve the representation of the different objects in the HSI.In this work, the vertices in each grid level are used for the subsequent region growing algorithm.

AMG-MHSEG Algorithm
As described in [32], we presented a AMG-MHSEG classification framework of HSIs.The advantages of this framework are summarized as follows.First, the marker selection is performed using a AMG-derived approach, which is more effective than the classification-derived methods proposed by Tarabalka et al. [31].The selection of markers in the classification-derived methods depend highly on the performance of the pixel-wise classifiers.Moreover, the selected markers may be greatly different due to randomly selection of training samples.The previously mentioned difficulties always cause uncertainty in the classification maps.However, the markers selected by the AMG-derived approach are only determined by structure features of HSIs.Second, the combination of the multigrid representation approach of HSIs and the MHSEG algorithm can provide the multiscale segmentation maps.The main steps of the AMG-MHSEG algorithm are introduced in Algorithm 1. Input a hyperspectral image and construct an undirected graph as the finest grid.

•
At the finest grid level, perform a Gauss-Seidel relaxation to solve I − τG 0 X 0 = u with an initial guess image u and compute the error X 0 = I − τG 0 X 0 − u.

•
At the coarser grid level l (0 < l ≤ S), perform a Gauss-Seidel relaxation to solve the residual equation I − τG l X l = F l with an initial guess 0, and compute the error and then the residual 3. AMG Coarse-Grid Correction: Select the set of vertices in V l for V l+1 to obtain F l+1 for the coarser grid level l + 1.
If l ≤ S, go to step 2; otherwise, go to the next step.6.
Initialize the vertices in grid l as markers for the subsequent HSEG algorithm by assigning each vertex a non-zero marker label and each pixel as a separate region.7.
Perform the M-HSEG algorithm by using the markers obtained from grid l of the hyperspectral image: (a) Calculate the dissimilarity criterion (DC) values between all pairs of spatially adjacent regions.
It should be noted that we only calculate the DC value between a markered pixel and a non-markered pixel and merge the pair of adjacent pixels that has the smallest DC value.(b) Merge the pair of adjacent pixels that has the smallest DC value.(c) Stop when there is no more merging, which means that the DC value is NaN.

8.
Obtain the resultant segmentation maps for the subsequent classification.

The Proposed Classification Framework
In this section, the classical SVM classifier with the spectral-spatial kernel is first described.Then, the integration of the spectral, spatial and hierarchical structure information into a composite kernel framework is presented in our methodology.Figure 2 illustrates the schematic diagram of the SVM-SSHK method.
regions.It should be noted that we only calculate the DC value between a markered pixel and a non-markered pixel and merge the pair of adjacent pixels that has the smallest DC value.(b) Merge the pair of adjacent pixels that has the smallest DC value.(c) Stop when there is no more merging, which means that the DC value is NaN. 3. Obtain the resultant segmentation maps for the subsequent classification.

The Proposed Classification Framework
In this section, the classical SVM classifier with the spectral-spatial kernel is first described.Then, the integration of the spectral, spatial and hierarchical structure information into a composite kernel framework is presented in our methodology.Figure 2 illustrates the schematic diagram of the SVM-SSHK method.

Spectral-Spatial Kernel
Let us consider an HSI that contains B-band vectors of ×N and the hierarchical segmentation map x HIE ≡ x HIE 1 , x HIE 2 , . . ., x HIE N ∈ R S×N .The supervised SVM classifier is widely used for statistical classification and regression analysis due to its characteristics of geometrical margin maximization and empirical error minimization [62].Because the HSIs are not linearly separable, the pixels are mapped from x SPE to a kernel Hilbert space by using a mapping function φ x SPE to construct the hyperplane.Then, the decision function can be defined as follows: where α = α 1 , α 2 , . . ., α q is a set of coefficients associated with z q , b is the bias of the decision function f, and K x i , x j = f (x i ) ' f x j .For HSI classification using SVM, the Gaussian RBF kernel is the most widely employed as a spectral kernel, measuring the similarity between two pixels.The typical spectral kernel can be defined, as follows: where σ is the width of the RBF kernel.Similarly, the spatial kernel can be constructed using the RBF kernel.Specifically, for two vectors x SPA i and x SPA j , the spatial kernel is defined as follows: As stated in [20,63], if k 1 and k 2 are two kernels, then µ 1 k 1 + µ 2 k 2 is a new kernel with µ 1 , µ 2 ≥ 0. According to this property, Camps-Valls et al. [20] formulated a SVM classifier with a spectral-spatial kernel for HSI classification, and this composite kernel is shown as follows: where µ is a weight to balance the spectral kernel and the spatial one.The authors in [20] performed the spatial feature extraction for each pixel by computing the mean and variance within a fixed-size window.The SVM classifier with the composite kernel (27) can effectively combine the spectral and spatial information and achieve better results than that using the spectral kernel individually.However, the spatial structure information may not be well represented for classification within such a predefined region.

The SVM-SSHK Method
In this work, we propose an effective SVM classifier that is characterized with three kernels, which are computed on the pixels from the original, feature and hierarchical spaces to extract the spectral, spatial and hierarchical structure features, respectively.In the proposed framework, the spectral features are extracted directly through each pixel's vector value in the original HSI, and the spatial feature extraction in the proposed framework is performed using the EMP method due to its simplicity and effectiveness.As addressed in the previous question, the spectral information in HSIs can be represented by the limited PCs.It means that the spatial information of the HSI can be projected into a lower dimensional space after the PCA transform.To construct the EMP of the HSI, we can first define the MP for each PC instead of each spectral band, and then stacked the MPs of all the PCs to produce a final EMP.Specifically, the PCA transform is first applied to the original HSI for feature extraction.Then, the first three PCs are used as a feature image to obtain the EMP, where each pixel is a stacked vector, according to Equation (2).
To remedy the shortcomings of the spatial feature extraction, the hierarchical structure information can be used to as a supplement to the spatial features.Based on our previous study [32], the hierarchical structure information is helpful to improve HSI classification accuracies.As proposed in [32], the AMG method is very effective to model the spatial structure information because the multigrid structure can be used as the hierarchical representation of HSIs.To construct the hierarchical kernel, the FMS algorithm is applied to the original HSI for feature selection to obtain its spectral subset.Then, the multigrid representation of this subset is built using the AMG-based method.Next, the AMG-MHSEG algorithm is performed on each grid to obtain the corresponding segmentation map.Finally, these maps are combined to produce a stacked vector for each pixel and its value is featured with the cluster labels in different grids.The proposed hierarchical kernel is introduced, as follows: To exploit the spectral, spatial, and hierarchical structure information for HSI classification, composite kernels are considered for combining information.In this work, we present a weighted summation kernel, as follows: where µ SPE , µ SPA and µ HIE are weights to indicate the contribution of each feature information involved in HSI classification under the condition of µ SPE + µ SPA + µ HIE = 1.For clarity, the SVM-SSHK method is introduced in Algorithm 2.

Algorithm 2: SVM-SSHK
Input: An original hyperspectral image u, the available training samples, required number of segmentation maps S, the time step size τ, Gaussian scale σ, the gradient threshold β, the critical threshold υ and the number of morphological operators n.
Step 2: Obtain the first three PCs of u; Step 3: Construct the EMP by computing the MPs for all the PCs in Step 2 as described in Section 2.1.
Step 4: Perform the FMS algorithm on u for feature selection to produce its spectral subset u1 with the most relevant spectral bands as described in Section 2.2.
Step 5: For i = 1, 2, . . ., S (a) Construct the ith grid of u1 using the procedures described in Section 2.3.(b) Select all the vertices in the ith grid as makers for the HSEG algorithm and initialize each vertex with a non-zero marker label.(c) Obtain the ith segmentation map by using the MHSEG algorithm described in Algorithm 1.
Step 7: Construct the spectral, spatial and hierarchical kernels as described in Section 3.2.
Step 8: Apply the SVM classifier with the proposed SSHK kernel in (29) to classify u using the training samples by choosing the optimal C and γ.
Step 9: Obtain the final classification map.

Image Description
The effectiveness of the SVM-SSHK method was validated using two hyperspectral remote sensing images of the AVIRIS Indian Pines (IP) and the ROSIS-03 University of Pavia (UP).The 145 × 145 IP image was obtained over northwestern Indiana, USA, and its ground truth data (GTD) includes 16 agricultural objects, and the 610 × 340 UP image was acquired over an urban area in Pavia, Italy, and its GTD has been produced with nine classes available.In this work, two spectral subsets of the IP and UP images with 185 bands and 103 bands, respectively, are used in our experiments, because those discarded bands locate in the absorption spectrum of water or are too noisy.Figure 3 shows the RGB color composite of the two images and GTD.Note that the class of background in the two HSIs was removed from further consideration in the following experiments.

Experimental Settings
To evaluate the performance of the SVM-SSHK method, seven state-of-the-art kernel-based classification methods were selected for comparison, including SVM, EMP [28], EPF [27], SVM using a composite kernel (SVM-CK) [20], MLR-GCK [45], the two superpixel-based classifiers using spectral-spatial kernel (SC-SSK) [22], and multiple kernels (SC-MK) [46].The overall accuracy (OA), average accuracy (AA), and kappa coefficient (κ) were used for quantitative evaluation.Before demonstrating the experimental results, a brief description on the parameter settings and related issues are provided.To fix the optimal parameter settings for each method, we tuned these parameters in a certain range based on the original references to obtain the best classification performance, which can be comparative to the classification results from these original references for the IP and UP images with the same number of training samples.The parameter settings for each method are provided as follows: (1) The SVM algorithm with the RBF kernel was exploited by all of the methods, except for MLR-GCK, and the optimal C and γ for each method were obtained by five-fold cross validation ranging from 2 −5 to 2 15 and 2 −15 to 2 5 , respectively.
(2) For EMP, the first three PCs were used for building the MPs, which were computed using a flat disk-shaped SE with radius from 1 to 15 with the step size of 2. (3) For EPF, the first PC was used as a guidance image, a local 5 × 5 window was used for the joint bilateral filter, two Gaussian scales were fixed as δ s = 3 and δ r = 0.2.(4) For SVM-CK, the weight was fixed as µ = 0.4, and a local 5 × 5 window was used for each pixel to compute the mean and variance.(5) For MLR-GCK, the spectral and spatial variances were fixed as σ SPE = 1.5 and σ SPA = 2, respectively, and λ = 10 −5 .( 6) For SC-SSK, the two parameters were fixed as µ = 0.8 and σ = 0.8.The number of superpixels was fixed as 200 and 3500 for the IP and UP images, respectively.( 7) For SC-MK, the three weights were fixed as µ Spec = 0.2, µ IntraS = 0.4 and µ InterS = 0.4, respectively, and the number of superpixels was fixed as 200.
Remote Sens. 2018, 10, x FOR PEER REVIEW 11 of 27 Step 9: Obtain the final classification map.

Image Description
The effectiveness of the SVM-SSHK method was validated using two hyperspectral remote sensing images of the AVIRIS Indian Pines (IP) and the ROSIS-03 University of Pavia (UP).The 145 × 145 IP image was obtained over northwestern Indiana, USA, and its ground truth data (GTD) includes 16 agricultural objects, and the 610 × 340 UP image was acquired over an urban area in Pavia, Italy, and its GTD has been produced with nine classes available.In this work, two spectral subsets of the IP and UP images with 185 bands and 103 bands, respectively, are used in our experiments, because those discarded bands locate in the absorption spectrum of water or are too noisy.Figure 3 shows the RGB color composite of the two images and GTD.Note that the class of background in the two HSIs was removed from further consideration in the following experiments.

Experimental Settings
To evaluate the performance of the SVM-SSHK method, seven state-of-the-art kernel-based classification methods were selected for comparison, including SVM, EMP [28], EPF [27], SVM using a composite kernel (SVM-CK) [20], MLR-GCK [45], the two superpixel-based classifiers using spectral-spatial kernel (SC-SSK) [22], and multiple kernels (SC-MK) [46].The overall accuracy (OA), average accuracy (AA), and kappa coefficient (κ) were used for quantitative evaluation.Before demonstrating the experimental results, a brief description on the parameter settings and related issues are provided.To fix the optimal parameter settings for each method, we tuned these In our experiments, we randomly divided the GTD for training and test and followed the scheme in [46] by setting training samples M ranging from 15 to 40 with a step size of 5 for each class and the rest for test.For some minority classes in the IP image, the labeled samples were divided into the equal training and test samples when the total of the labeled samples is less than M. Table 1 demonstrates that the percentage of the total samples (pixels) that were used for training and test for the two HSIs under different values of M. The classification experiments using each training set were repeated 10 times for reliable evaluation of the results.

The IP Image
In the first experiment, we reported the classification results in the case of M = 40 in Table 2 to show the contribution of each kernel in the proposed method with µ SPE = 0.3, µ SPA = 0.1 and µ H IE = 0.6 in K SPE−SPA−HIE , τ = 1, σ = 0.1, υ = 0.3, β = 0.01 and S = 11 were used by the AMG-MHSEG algorithm, and the PC 1-3 and n = 8 were used for the constructions of the EMP.For the IP image, the most relevant 30 spectral bands were selected by the FMS algorithm.Table 1 shows that the hierarchical structure information can further increase discriminative capability of the SVM classifier.Specifically, SVM with K SPE−HIE can increases the OA, AA, and κ by 10.31%~15.77%,6.21%~9.81%,and 11.7%~17.82%,respectively, when compared to SVM with K SPE .Furthermore, SVM with K SPE−SPA−HIE can improve the OA, AA, and κ over the others in this table by 0.61%~13.65%,0.25%~8.26%,and 0.69%~15.45% in average, respectively.The improvement of K SPE−SPA−HIE over the other kernels in Table 1 demonstrates that the combination of the spectral, spatial, and hierarchical kernels can generate better classification results than using a single or double kernels in terms of OA, AA, and κ.Finally, the SVM classifier with K SPE−SPA−HIE can achieve the highest CAs for 12 of 16 classes above 90%.In the second experiment, we applied each classification method to the IP image under different training sets.Table 3 lists the classification results and the last row of this table records the average rank for each method.All of the accuracies of the same row in this table are ranked in descending order and average rank is defined as the mean of the rankings for the same column.We can observe from Table 3 that using composite or multiple kernels in the SVM classifier can well combine the spectral and spatial information and provide higher results in all of the cases than the single feature-stacked kernel methods, including SVM and EMP, except for EPF, which can obtain a lower average rank of 4.94 than that of SVM-CK.The average rank values of SVM-CK and MLR-GCK are 5.72 and 4, respectively, and the superpixel-based methods of SC-SSK and SC-MK are better than these two methods and achieve similar performances with 2.5 and 2.67, respectively, in terms of the average rank.SVM-SSHK can outperform the other methods in terms of OA, AA, and κ in the case of different training samples and its average rank reaches 1.33.   3 with M = 40.The noise in the SVM classification maps in Figure 4a was obviously visible and can be greatly removed by the other kernel methods, which validated that the spatial information is significant for improving the classification results.However, the noise effect was still observed in two classes of Soybeans-no till and Soybeans-min till in the EMP and MLR-GCK results.The classification maps can be improved by removing the noise in the two previously mentioned classes by SVM-CK and SC-SSK.Nevertheless, the edges of the image were corrupted with the noise by EPF and SVM-CK due to using a fixed-size window for feature extraction.The adaptive neighborhood system of SC-SSK can solve the problem of SVM-CK, but cannot completely remove the noise effect.The SC-MK and SVM-SSHK classification maps were comparable and much better than the others and less noise and classification errors were seen in the SVM-SSHK result by comparison. .The noise in the SVM classification maps in Figure 4a was obviously visible and can be greatly removed by the other kernel methods, which validated that the spatial information is significant for improving the classification results.However, the noise effect was still observed in two classes of Soybeans-no till and Soybeans-min till in the EMP and MLR-GCK results.The classification maps can be improved by removing the noise in the two previously mentioned classes by SVM-CK and SC-SSK.Nevertheless, the edges of the image were corrupted with the noise by EPF and SVM-CK due to using a fixed-size window for feature extraction.The adaptive neighborhood system of SC-SSK can solve the problem of SVM-CK, but cannot completely remove the noise effect.The SC-MK and SVM-SSHK classification maps were comparable and much better than the others and less noise and classification errors were seen in the SVM-SSHK result by comparison.  is capable of obtaining the highest CAs for all of the classes above 96% for the UP image, except for the class of Self-Blocking Bricks.

The UP Image
Similarly, the classification results in the case of M = 40 are recorded in Table 4 to evaluate the contribution of each kernel in the SVM-SSHK method, µ SPE = 0.2, µ SPA = 0.6, and µ H IE = 0.2 in K SPE−SPA−HIE , τ = 1, σ = 0.1, υ = 0.2, β = 0.01, and S = 13 were used by the AMG-MHSEG algorithm, and the PC 1-3 and n = 8 were used for the constructions of the EMP.For the UP image, the most relevant 30 spectral bands were selected by the FMS algorithm.It can be observed from Table 4 that SVM with K SPE−HIE can increases the OA, AA and κ by 6.51%~13.33%,5.73%~9.13%,and 8.42%~16.66%,respectively, when compared to SVM with K SPE .Furthermore, SVM with K SPE−SPA−HIE can improve the OA, AA, and κ over the others in this table by 3.16%~16.56%,1.47%~11.52%,and 4.12%~21.08% in average, respectively.In addition, SVM with K SPE−SPA−HIE is capable of obtaining the highest CAs for all of the classes above 96% for the UP image, except for the class of Self-Blocking Bricks.

Class
Kernels Used in the SVM Classifier Next, we applied each classification method to the UP image under different training sets and the classification result of each method is listed in Table 5.In this table, the average rank of SVM is lowest with 8, which is the same as in Table 3. EMP, EPF and SVM-CK performed HSI classification with similar average rank values of 5.38, 5.94, and 5.56, achieving the fifth, sixth, and seventh positions in this table, respectively.The remaining methods using composite or multiple kernels can obtain higher average rank values than the previously mentioned methods.For instance, the average rank values of SC-SSK, MLR-GCK, and SC-MK are 4.72, 3.28, and 2.11, respectively.The proposed SVM-SSHK method can achieve the best classification accuracies in all cases of training samples in terms of OA, AA, and κ.The improvement of the SSHK over the other composite or multiple kernels indicates that the introduction of the hierarchical structure information for classification can further improve discriminative capability of the kernel methods.Figure 5 shows the classification results corresponding to Table 5 with M = 40.From this figure, we can see that the SVM classification map was corrupted with much noise.Some pixels that belonging to Meadows are incorrectly assigned with a Bare Soil label in the EMP classification map.This problem can be partially resolved by SVM-CK and MLR-GCK to generate better classification results in Figure 5d,e, respectively.The EPF and SC-SSK classification maps became smoother, but several misclassified areas were produced in the middle and bottom of the image.SC-MK improved the SC-SSK classification map by greatly correcting such areas and caused classification errors in other parts of the image as well.For instance, two large areas of two classes of Asphalt and Meadows in the GTD were labelled to Bare Soil and Self-Blocking Bricks in the upper-left and right of the image, respectively.SVM-SSHK can better discriminate all of the objects, though very few pixels had false class labels.
Remote Sens. 2018, 10, x FOR PEER REVIEW 19 of 27 Figure 5 shows the classification results corresponding to Table 5 with 40 M = .From this figure, we can see that the SVM classification map was corrupted with much noise.Some pixels that belonging to Meadows are incorrectly assigned with a Bare Soil label in the EMP classification map.This problem can be partially resolved by SVM-CK and MLR-GCK to generate better classification results in Figure 5d,e, respectively.The EPF and SC-SSK classification maps became smoother, but several misclassified areas were produced in the middle and bottom of the image.SC-MK improved the SC-SSK classification map by greatly correcting such areas and caused classification errors in other parts of the image as well.For instance, two large areas of two classes of Asphalt and Meadows in the GTD were labelled to Bare Soil and Self-Blocking Bricks in the upper-left and right of the image, respectively.SVM-SSHK can better discriminate all of the objects, though very few pixels had false class labels.

Discussion
As mentioned in Sections 2 and 3, some parameters should be fixed in the SVM-SSHK method.All of our experiments on HSIs, including those that are not mentioned here, confirmed that the number of the morphological operators n and the selected PCs play an important role for the construction of the EMP, and the critical threshold υ greatly influences the hierarchical information extraction.Furthermore, the weights in the spectral-spatial-hierarchical kernel make a significant impact on the classification performance of the proposed method.In this section, the impact of all the

Discussion
As mentioned in Sections 2 and 3, some parameters should be fixed in the SVM-SSHK method.All of our experiments on HSIs, including those that are not mentioned here, confirmed that the number of the morphological operators n and the selected PCs play an important role for the construction of the EMP, and the critical threshold υ greatly influences the hierarchical information extraction.
Furthermore, the weights in the spectral-spatial-hierarchical kernel make a significant impact on the classification performance of the proposed method.In this section, the impact of all the previously mentioned parameters is further analyzed to better understand the application of SVM-SSHK method for HSI classification.

Impact of n
To exploit the spatial kernel in the proposed framework, the number of the opening/closing operators (n) should be appropriately selected.In this subsection, the impact of n on the performance of the SVM-SSHK method is firstly analyzed.Experiments were performed on the IP and UP images in the case of M = 40 and the parameter settings were fixed the same to the previous experiments in Sections 4.3.1 and 4.3.2.Table 6 lists the classification accuracies by the proposed framework under different values of n.In this table, the highest classification accuracies for the IP image can be obtained when n = 8, and the OA, AA and κ for the UP image were very stable when n ≥ 8 around 98.1%, 98.6%, and 97.5%, respectively, and the highest OA, AA, and κ were achieved when n = 16, 14, and 12, respectively.A large value of n means that more number of MPs should be computed for spatial information extraction.To ensure computational efficiency of the SVM-SSHK method, we fixed this parameter as n = 8 for both the IP and UP images.

Impact of Different Number of PCs
To present further inspections with respect to the most appropriate number of PCs, three combinations were analyzed for spatial information extraction.Experiments were performed on the two HSIs in the case of M = 40 and the parameter settings were fixed the same to the previous experiments in Sections 4.3.1 and 4.3.2.Table 7 lists the classification accuracies by the proposed framework under different number of PCs.In this table, as the number of PCs is increased, which means that more spatial information can be exploited for constructing the EMP of the HSI, the improved classification accuracies can be obtained.For instance, the SVM-SSHK method using the first three PCs can increase the OA for the IP image by 0.84% and 0.88%, and for the UP image by 0.43% and 4.74%, than using PC 1 + PC 2 and PC 1, respectively.For conciseness and efficiency, the first three PCs were exploited for spatial information extraction.

Impact of υ
To figure out the impact of υ, experiments were performed on the IP and UP images in the case of M = 40 and the parameter settings were fixed the same to the previous experiments in Sections 4.3.1 and 4.3.2.Table 8 provides the classification accuracies by the proposed framework under different values of υ.As υ is increased from 0.05 to 0.1 for the two HSIs, the variation of the classification accuracies is very similar.Specifically, the highest OA, AA, and κ of 95.86%, 97.12%, and 95.25% for the IP image and 98.1%, 98.73%, and 97.49% for the UP image were achieved when υ = 0.3 and υ = 0.2, respectively.To ensure that the SVM-SSHK is capable of achieving the optimal results, the parameter settings were used in the previous experiments for comparison.

Impact of Weights
In the SVM-SSK method, the weights in K SPE−SPA−HIE critically determine the classification performance, since their values indicate the contribution of spectral, spatial, and hierarchical structure information for classification.An appropriate combination of their values may obtain better results.To obtain the interaction effect of µ SPE , µ SPA , and µ HIE , we can perform a four-dimensional (4-D) analysis to evaluate the influence of these three weights on our method's performance.Based on the constraint of µ SPE + µ SPA + µ HIE = 1, we converted this 4-D analysis to a problem of analyzing different combinations of µ SPE and µ SPA in terms of classification accuracies.Figure 6      First, for the IP image, if µ SPE = 0, it means that the proposed framework includes only two kernels of K SPA and K HIE .In such case, we can obtain the OA, AA and κ with 91.01%~95.59%,93.53%~96.91%,and 89.69%~94.95%,respectively; if µ SPA = 0, it indicates that the proposed framework includes only two kernels of K SPE and K HIE .In such a case, we can obtain the OA, AA and κ with 82.06%~95.1%,88.12%~96.65%,and 79.51%~94.38%,respectively.Specifically, the OA, AA and κ of 91.52%, 94.7%, and 90.31% were achieved when µ SPE = µ SPA = 0.For the UP image, the OA, AA, and κ can be obtained with 61.28%~98.05%,70.91%~98.68%,and 52.53%~97.42% when µ SPE = 0, respectively, and 61.28%~92.86%,70.91%~95.27%,and 52.53%~90.65%when µ SPA = 0, respectively.Specifically, the very poor OA, AA, and κ of 61.28%, 70.91%, and 52.53% were achieved when µ SPE = µ SPA = 0, respectively.
Second, the appropriate selection of µ SPE , µ SPA , and µ HIE can result in the best classification accuracies.For instance, the highest OA, AA, and κ for the IP and UP images can reach to 95.86%, 97.12%, and 95.25% under the condition of µ SPE = 0.3, µ SPA = 0.1, and µ H IE = 0.6, and to 98.14%, 98.75%, and 97.53% under the condition of µ SPE = 0.1, µ SPA = 0.7, and µ H IE = 0.2, respectively.Compared to Tables 2 and 4, it can be confirmed again that the combination of the spectral, spatial and hierarchical kernels is really essential to produce better classification accuracies than using a single or double kernels in the SVM classifier.
Finally, the SVM-SSHK method can demonstrate very stable classification performance in most cases of different parameter settings on µ SPE and µ SPA .According to Figure 6, there are 66 combinations of the two weights in total.For the IP image, the SVM-SSHK method can obtain the OA, AA and κ higher than 92%, 94%, and 90% for 53 of 66 (80.30%) different parameters settings, respectively.For the UP image, the proposed method is capable of achieving the OA, AA, and κ higher than 95%, 95%, and 90% for 40 of 66 (60.60%) different parameters settings, respectively.

Conclusions
In this paper, we present an effective classification framework by integrating the spectral, spatial, and hierarchical structure information into the SVM classifier in a way of multiple kernels.In this framework, the spectral kernel is constructed using directly the original HSI, the spatial kernel is modeled using the EMP method and the hierarchical kernel is introduced by combining the techniques of FMS and AMG-MHSEG.The main advantage of the proposed framework is to utilize spatial structure information in multiple scales for HSI classification.Experimental results on two benchmark HSIs confirmed the following conclusions: (1) The combination of the spectral, spatial and hierarchical kernels in the SVM-SSHK method can generate better classification results than using any single or double of these three kernels; (2) The SVM-SSHK method can achieve the most accurate classification results under different training sets, when compared to the popular kernel-based classification methods.Specifically, SVM-SSHK can be 0.02-15.24%and 0.08-15.61%higher than the other methods in average in the terms of OA for the IP and UP images, respectively; (3) SVM-SSHK can demonstrate stable classification performance in most cases of different parameter settings on the weights of the three kernels.In conclusion, the SVM-SSHK method is very promising for the improvement of classification of hyperspectral images.In the future, advanced studies will be performed by exploring more efficient SVMs with multiple kernels.

Algorithm 1 :
AMG-MHSEGInput: An original hyperspectral image u and the coarsest grid level S. Output: Segmentation maps 1.

Figure 3 .
Figure 3. Hyperspectral images and the corresponding ground truth data (GTD).(a) A false color composite image (bands 47, 23, and 13) of the Indian Pines (IP) image and (b) its GTD; (c) a false color composite image (bands 103, 56, and 31) of the University of Pavia (UP) image and (d) its GTD.

Figure 3 .
Figure 3. Hyperspectral images and the corresponding ground truth data (GTD).(a) A false color composite image (bands 47, 23, and 13) of the Indian Pines (IP) image and (b) its GTD; (c) a false color composite image (bands 103, 56, and 31) of the University of Pavia (UP) image and (d) its GTD.

Figure 4
Figure 4 illustrates some classification maps by the different methods with 40 training samples per class, corresponding to Table3with M = 40.The noise in the SVM classification maps in Figure4awas obviously visible and can be greatly removed by the other kernel methods, which validated that the spatial information is significant for improving the classification results.However, the noise effect was still observed in two classes of Soybeans-no till and Soybeans-min till in the EMP and MLR-GCK results.The classification maps can be improved by removing the noise in the two previously mentioned classes by SVM-CK and SC-SSK.Nevertheless, the edges of the image were corrupted with the noise by EPF and SVM-CK due to using a fixed-size window for feature extraction.The adaptive neighborhood system of SC-SSK can solve the problem of SVM-CK, but cannot completely remove the noise effect.The SC-MK and SVM-SSHK classification maps were comparable and much better than the others and less noise and classification errors were seen in the SVM-SSHK result by comparison.
illustrates the three-dimensional (3-D) plot of the classification accuracies with the change of µ SPE and µ SPA from 0 to 1 with a step size of 0.1.Several conclusions can be observed from this figure.

Table 1 .
The percentage of the total pixels used as training and test for the IP and UP images under different values of M.

Table 2 .
Classification Results [Mean Accuracy (%) ± Standard Deviation] by the SVM Classifier with the Spectral, Spatial and Hierarchical Kernels for the IP Image.The best accuracies are indicated in bold in each raw.

Table 3 .
Classification Results [Mean Accuracy (%) ± Standard Deviation] by Different Methods using Different Number of Training Samples for the IP Image.The best accuracies are indicated in bold in each raw.
Remote Sens. 2018, 10, x FOR PEER REVIEW 16 of 27 Figure 4 illustrates some classification maps by the different methods with 40 training samples per class, corresponding to Table3 with

Table 4 .
Classification Results [Mean Accuracy (%) ± Standard Deviation] by the SVM Classifier with the Spectral, Spatial and Hierarchical Kernels for the UP Image.The best accuracies are indicated in bold in each raw.

Table 5 .
Classification Results [Mean Accuracy (%) ± Standard Deviation] by Different Methods using Different Number of Training Samples for the UP Image.The best accuracies are indicated in bold in each raw.

Table 6 .
Classification accuracy (%) by the SVM-SSHK method under different values of n for the IP and UP images.The best accuracies are indicated in bold in each column.

Table 7 .
Classification accuracy (%) by the SVM-SSHK method under different number of PCs for the IP and UP images.

Table 8 .
Classification accuracy (%) by the SVM-SSHK method under different values of υ for the IP and UP images.The best accuracies are indicated in bold in each column.