Spectral-Spatial Classification of Hyperspectral Image Based on Kernel Extreme Learning Machine

Extreme learning machine (ELM) is a single-layer feedforward neural network based classifier that has attracted significant attention in computer vision and pattern recognition due to its fast learning speed and strong generalization. In this paper, we propose to integrate spectral-spatial information for hyperspectral image classification and exploit the benefits of using spatial features for the kernel based ELM (KELM) classifier. Specifically, Gabor filtering and multihypothesis (MH) prediction preprocessing are two approaches employed for spatial feature extraction. Gabor features have currently been successfully applied for hyperspectral image analysis due to the ability to represent useful spatial information. MH prediction preprocessing makes use of the spatial piecewise-continuous nature of hyperspectral imagery to integrate spectral and spatial information. The proposed Gabor-filtering-based KELM classifier and MH-prediction-based KELM classifier have been validated on two real hyperspectral datasets. Classification results demonstrate that the proposed methods outperform the conventional pixel-wise classifiers as well as Gabor-filtering-based support vector machine (SVM) and MH-prediction-based SVM in challenging small training sample size conditions. OPEN ACCESS Remote Sens. 2014, 6 5796


Introduction
Hyperspectral imagery (HSI) captures reflectance values over a wide range of electromagnetic spectra for each pixel in the image.This rich spectral information allows for distinguishing or classifying materials with subtle differences in their reflectance signatures.HSI classification plays an important role in many remote-sensing applications, being a theme common to environmental mapping, crop analysis, plant and mineral exploration, and biological and chemical detection, among others [1].
Over the last two decades, many machine learning techniques including artificial neural networks (ANNs) and support vector machines (SVMs) have been successfully applied to hyperspectral image classification (e.g., [2][3][4][5]).In particular, neural architectures have demonstrated great potential to model mixed pixels which result from low spatial resolution of hyperspectral cameras and multiple scattering [3].However, there are several limitations involved with ANNs that use the back-propagation algorithm, the most popular technique, as the learning algorithm.Neural network model development for hyperspectral data is a computationally expensive procedure since hyperspectral images typically are represented as three-dimensional cubes with hundreds of spectral channels [6].In addition, ANNs require a good deal of hyperparameter turning such as the number of hidden layers, the number of nodes in each layer, learning rate, etc.In recent years, SVM-based approaches have been extensively used for hyperspectral image classification since SVMs have often been found to outperform traditional statistical and neural methods, such as the maximum likelihood and the multilayer perceptron neural network classifiers [5].Furthermore, SVMs have demonstrated excellent performance for classifying hyperspectral data when a relative low number of labeled training samples are available [4,5,7].However, the SVM parameters (i.e., regularization and kernel parameters) have to be tuned for optimal classification performance.
Extreme learning machine (ELM) [8] as an emerging learning technique belongs to the class of single-hidden layer feed-forward neural networks (SLFNs).Traditionally, a gradient-based method such as back-propagation algorithm is used to train such networks.ELM randomly generates the hidden node parameters and analytically determines the output weights instead of iterative tuning, which makes the learning extremely fast.ELM is not only computationally efficient but also tends to achieve similar or even better generalization performance than SVMs.However, ELM can produce a large variation in classification accuracy with the same number of hidden nodes due to the randomly assigned input weights and bias.In [9], kernel extreme learning machine (KELM) which replaces the hidden layer of ELM with a kernel function was proposed to solve this problem.It is worth noting that the kernel function used in KELM does not need to satisfy Mercer's theorem and KELM provides a unified solution to multiclass classification problems.
The utilization of ELM for hyperspectral image classification has been fairly limited in the literature.In [10], ELM and optimally pruned ELM (OP-ELM) were applied to soybean variety classification in hyperspectral images.In [11], ELM was used for land cover classification, which achieved comparable classification accuracies to a back-propagation neural network on two datasets considered.KELM was used in [12] for multi-and hyperspectral remote-sensing images classification.The results indicate that KELM is similar to, or more accurate than, SVMs in terms of classification accuracy and offer notably low computational cost.However, in these works, ELM was employed as a pixel-wise classifier, which indicates that only the spectral signature has been exploited while ignoring the spatial information at neighboring locations.Yet, for HSI, it is highly probable that two adjacent pixels belong to the same class.Considering both spectral and spatial information has been verified to improve the HSI classification accuracy significantly [13,14].There are two major categories utilizing spatial features: to extract some type of spatial features (e.g., texture, morphological profiles, and wavelet features), and to directly use pixels in a small neighborhood for joint classification assuming that these pixels usually share the same class membership.In the first category (feature dimensionality increased), Gabor features have been successfully used for hyperspectral image classification [15][16][17][18] recently due to the ability to represent useful spatial information.In [15,16], three-dimensional (3-D) Gabor filters were applied to hyperspectral images to extract 3-D Gabor features; in [17,18], two-dimensional (2-D) Gabor features were extracted in a principal component analysis (PCA)-projected subspace.In our previous work [19], a preprocessing algorithm based on multihypothesis (MH) prediction was proposed to integrate spectral and spatial information for noise-robust hyperspectral image classification, which falls into the second category (feature dimensionality not increased).In addition, object-based-classification approaches (e.g., [20][21][22]) are important methods in spectral-spatial classification as well.These approaches group the spatially adjacent pixels into homogeneous objects and then perform classification on objects as the minimum processing unit [20].
In this paper, we investigate the benefits of using spatial features (i.e., Gabor features and MH prediction) for KELM classifier under the small sample size (SSS) condition.Two real hyperspectral datasets will be employed to validate the proposed classification method.We will demonstrate that Gabor-filtering-based KELM and MH-prediction-based KELM yield superior classification performance over the conventional pixel-wise classifiers (e.g., SVM and KELM) as well as Gabor-filtering-based SVM and MH-prediction-based SVM in challenging small training sample size conditions.In addition, the proposed methods (i.e., KELM-based methods) are faster than the SVM-based methods since KELM runs at much faster learning and testing speed than the traditional SVM.
The remainder of this paper is organized as follows.Section 2 introduces the Gabor filter, MH prediction for spatial features extraction, KELM classifier, and our proposed methods.Section 3 presents the hyperspectral data and experimental setup as well as comparison of the proposed methods and some traditional techniques.Finally, Section 4 makes several concluding remarks.

Gabor Filter
Gabor filters are bandpass filters which have been successfully applied for a variety of image processing and machine vision applications [23][24][25][26].A 2-D Gabor function is an oriented complex sinusoidal grating modulated by a 2-D Gaussian envelope.In a 2-D coordinate (a,b) system, the Gabor filter, including a real component and imaginary one, can be represented as where a a b where δ represents the wavelength of the sinusoidal factor, and θ represents the orientation separation angle of Gabor kernels (see Figure 1).Note that we need only to consider θ in the interval [0°, 180°] since symmetry makes other directions redundant.ψ is the phase offset, σ is the standard derivation of Gaussian envelope, and γ is the spatial aspects ratio (the default value is 0.5 in [27]) specifying the ellipticity of the support of the Gabor function.ψ = 0 and ψ = π/2 return the real part and imaginary part of Gabor filter, respectively.Parameter σ is determined by δ and spatial frequency bandwidth bw as

MH Prediction for Spatial Features Extraction
In our previous work [19], a spectral-spatial preprocessing algorithm based MH prediction was proposed.It was motivated by our earlier success at applying MH prediction in compresses-sensing image and video reconstruction [28], single-image super-resolution [29], and hyperspectral image reconstruction from random projections [30].The algorithm is driven by the idea that, for each pixel in a hyperspectral image, its neighboring pixels will likely share similar spectral characteristics or have the same class membership since HSI commonly contains homogeneous regions.Therefore, each pixel in a hyperspectral image may be represented by some linear combinations of its neighboring pixels.Specifically, multiple predictions or hypotheses drawn for a pixel of interest are made from spatially surrounding pixels.These predictions are then combined to yield a composite prediction that approximates the pixel of interest.
Consider a hyperspectral dataset with M pixels  =    =1  in   (N is the dimensionality or number of spectral bands).For a pixel of interest x, the objective is to find an optimal linear combination of all possible predictions to represent x.The optimal representation can be formulated as x Zw ˆarg min (5) here,  =  1 , … ,   ∈   ×  is a hypothesis matrix whose columns are K hypotheses generated from all neighboring pixels of x within a d × d spatial search window, and  ∈   × 1 is a vector of weighting coefficients corresponding to the K hypotheses in Z.In most cases, the dimensionality of the hypotheses is not equal to the number of hypotheses, i.e.,  ≠ , Tikhonov regularization [31] is used to regularize the least-squares problem of (5).Then, the weight vector  is calculated according to where Γ is the Tikhonov matrix and λ is the regularization parameter.The Γ term allows the imposition of prior knowledge on the solution.Specifically, a diagonal Γ is used in the form of (7) where z 1 , …, z K are the columns of Z.Each diagonal term in Γ measures the similarity between the pixel of interest and a hypothesis.With this structure of Γ, hypotheses which are dissimilar from the pixel of interest x are given less weights than those which are similar.The weight vector  can be calculated in a closed form Therefore, an approximation to x, i.e., the predicted pixel, is calculated as  x Zw (9) For each pixel in X, a corresponding predicted pixel can be generated via (9) resulting in a predicted dataset  =    =1  in   .Furthermore, once a predicted dataset  is generated through MH prediction, it can be used as the current input dataset, i.e., a new X, to repeat the MH prediction process in an iterative fashion.The predicted dataset which effectively integrates spectral and spatial information is then used for classification.

Kernel Extreme Learning Machine
ELM was originally developed from feed-forward neural networks [8,32].Recently, KELM generalizes ELM from explicit activation function to implicit mapping function, which can produce better generalization in most applications.
For C classes, let us define   ∈ 0, 1 , 1 ≤ k ≤ C. A row vector y =  1 , … ,   … ,   indicates the class that a sample belongs to.For example, if y k = 1 and other elements in y are zero, then the sample belongs to the kth class.Given P training samples   ,   =1  belonging to C classes, where   ∈   and   ∈   , the output function of an ELM having L hidden neurons can be represented as where h(•) is a nonlinear activation function (e.g., Sigmoid function),   ∈   is the weight vector connecting the jth hidden neuron and the output neurons,   ∈   is the weight vector connecting the j th hidden neuron and the input neurons, and e j is the bias of the jth hidden neuron.  •   denotes the inner product of   • and   .With P equations, Equation ( 10) can be written compactly as  Hβ Y (11) where , and H is the hidden layer output matrix of the neural network: is the output of the hidden neurons with respect to the input x i , which maps the data from the N-dimensional input space to the L-dimensional feature space.
In most cases, the number of hidden neurons is much smaller than the number of training samples, i.e., ≪  , the smallest norm least-squares solution of Equation (11) proposed in [8] is defined as †   β HY (13) where  † is the Moore-Penrose generalized inverse of matrix  [33].The Moore-Penrose generalized inverse of H can be calculated as  † =   (  ) −1 [9].For better stability and generalization, a positive value is added to the diagonal elements of   .Therefore, we have the output function of In ELM, a feature mapping h(x i ) is usually known to users.If a feature mapping is unknown to users, a kernel matrix for ELM can be defined as follows: Thus, the output function of KELM can be written as The label of the input data is determined by the index of the output node with the largest value.

Proposed Spectral-Spatial Kernel Extreme Learning Machine
A Gabor filter can capture some physical structures of an object in an image, such as specific orientation information, using a spatial convolution kernel.Previous work [15][16][17][18] has applied extracted spectral-spatial features of Gabor filter for hyperspectral image classification.Following the recent research in [17,18], a two-dimensional Gabor filter is considered to exploit the useful information in a PCA-projected subspace.The Gabor features and the original spectral features are simply concatenated.Each spatial feature (Gabor feature) vector and spectral feature vector are normalized to have a unit l 2 norm before feature concatenation or stacking.We note that the implementation of Gabor filter in a subset of original bands with band selection [34] could equally be employed.The Gabor-filtering-based KELM is denoted as Gabor-KELM.We also employ the MH prediction as the preprocessing of KELM classifier, which is denoted as MH-KELM.The proposed spectral-spatial KELM framework is illustrated in Figure 2.

Experiments
In this section, we compare the classification performance of proposed Gabor-KELM and MH-KELM with SVM, KELM, Gabor-filtering-based SVM (Gabor-SVM), and MH-prediction-based SVM (MH-SVM).SVM with radial basis function (RBF) kernel is implemented using the libsvm package [35].For KELM with RBF kernel, we use the implementation available from the ELM website [36].

Data Description and Experimental Setup
We validate the effectiveness of proposed methods, i.e., Gabor-KELM and MH-KELM, using two hyperspectral datasets.The first HSI dataset in our tests was acquired using NASA's Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor and was collected over northwest Indiana's Indian Pines test site in June 1992.This scene represents a vegetation-classification scenario with 145 × 145 pixels in the 0.4--2.45-μmregion of the visible and infrared spectrum with a spatial resolution of 20 m.For this dataset, spectral bands {104-108, 150-163, 220} correspond to water-absorption bands are removed, resulting in 200 spectral bands.The original Indian Pines dataset consists of 16 ground-truth land-cover classes.
The second dataset used in our experiments, University of Pavia, is an urban scene acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) [37].The image scene, covering the city of Pavia, Italy, was collected under the HySens project managed by DLR (the German Aerospace Agency) [38].The ROSIS sensor generates 115 spectral bands ranging from 0.43 to 0.86 μm and has a spatial resolution of 1.3 m per pixel and contains 610 × 340 pixels.The dataset consists of 103 spectral bands with the 12 noisiest bands removed.The labeled ground truth of this dataset is comprised of 9 classes.The class descriptions and sample distributions for both the Indian Pines and University of Pavia datasets are given in Table 1 and 2. Both datasets, and their corresponding ground truth maps, are obtained from the publicly available website [39] of the Computational Intelligence Group from the Basque University (UPV/EHU).False-color images of two datasets are displayed in Figure 3.For the Indian Pines dataset, some of the classes contain a small number of samples.For example, the Oats class has only 20 samples.In one of our experiments, we sort the 16 classes according to the number of samples in each class in ascending order and conduct a separate set of experiments with the last nine classes, allowing for more training samples from a statistical viewpoint [5].The class numbers of the nine classes are highlighted in boldface in Table 1.The SSS condition will be discussed in the following work, and if we select 20 labeled samples per class (180 total), all the left will be ones to be classified.Each classification experiment is repeated for 10 trials with different training and testing samples, and overall classification accuracy is averaged over 10 repeated trials.The University of Pavia dataset is processed similarly, the only difference being that we first choose 900 samples at random from each class to form the total sample set (8100 total) for each trial.Then, the training and testing samples are chosen randomly from each class of the total sample set for classification.This procedure is used since some classes of the University of Pavia dataset contain significantly more samples than other classes, which might bias the accuracy.In order to have a fair comparison, the number of samples per class should be equal or similar.
All experiments are carried out using MATLAB (except SVM, which is implemented in C) on an Intel i7 Quadcore 2.63-GHz machine with 6 GB of RAM.

Parameter Tuning
First of all, we study the parameters of Gabor filter for hyperspectral images.In our work, eight orientations, 0, , , as shown in Figure 1 are considered.According to Equation (4), δ and bw are the two parameters of Gabor filter to be investigated.We test different δ and bw as shown in Figure 4a for the Indian Pines dataset and (b) for the University of Pavia dataset.Figure 4 illustrates the classification accuracy of the proposed Gabor-KELM versus varying δ as well as bw.Note that for Gabor-KELM in the experiment, we empirically choose the first 10 principal components (PCs) of both datasets that account for over 99% of the total variation in the images.From the results, we set the optimal δ and bw for both experimental datasets to 26 and 1, respectively.An important parameter involved in MH prediction is the search-window size d used in hypothesis generation.We analyze the effect of the search-window size in terms of the overall classification accuracy as well as the execution time of the algorithm.A set of window sizes, d ∈ 3, 5, 7, 9, 11, 13 , , is used for testing.From Figure 5, we can see that the classification accuracies are similar when the window size is between 9 × 9 and 13 × 13.We also find that using d = 11 takes over twice the execution time of d = 9 but does not yield any significant gains in classification accuracy.Specifically, Table 3 shows the execution time of one iteration of MH prediction for various search-window sizes.In all the experiments, two iterations of MH prediction are used.Another important parameter is λ that controls the relative effect of the Tikhonov regularization term in the optimization of Equation ( 6).Many approaches have been presented in the literature-such as L-curve [40], discrepancy principle, and generalized cross-validation (GCV)-for finding an optimal value for such regularization parameter.Here, we find an optimal λ by examining a set of values as shown in Figure 6, which presents the overall classification accuracy with different values of λ for MH prediction.One can see that the classification accuracy is quite stable over the interval λ ∈ [1,2].As a result, in all the experiments reported here, we use λ = 1.5.

Classification Results
The SSS problem is one of the most fundamental and challenging issues in hyperspectral image classification.In practice, the number of available labeled samples is often insufficient for hyperspectral images.Thus, we investigate the classification accuracy of aforementioned classifiers as a function of different labeled samples size, varying from 20-40 per class.To avoid any bias, all the experiments are repeated 10 times, and we report the averaged classification accuracy as well as the corresponding standard deviation.In all experiments, if no specific instructions, the tuning parameters of KELM (RBF kernel parameters) and the parameters of the competing method (SVM) are chosen as those that maximize the training accuracy by means of five-fold cross-validation to avoid over-fitting.The performance of the proposed spectral-spatial-based KELM methods is shown in Tables 4 and 5 for two experimental data.
From the results of each individual classifier, with Gabor features or MH prediction, the classification accuracy is significantly improved at all training sample sizes compared with the accuracy of classifying with the original spectral signature only.For example, in Table 4, Gabor-SVM has 26.9% higher accuracy than SVM, MH-SVM has 21.8% higher accuracy than SVM, Gabor-KELM has 24.7% higher accuracy than KELM, and MH-KELM has 24.1% higher accuracy than KELM when there are 20 labeled samples per class for training for the Indian Pines dataset.Moreover, for the Indian Pines dataset, KELM employing spatial features (Gabor features or MH prediction) achieved better classification performance than SVM employing spatial features.Especially for the MH-prediction-based methods, the accuracy of proposed MH-KELM is always about 5% higher than MH-SVM at all sample sizes.For the University of Pavia dataset, in terms of classification accuracy, Gabor-KELM outperforms Gabor-SVM, and MH-KELM outperforms MH-SVM.It is interesting to notice that the performance of Gabor-KELM is close to MH-KELM for both datasets, which demonstrates that KELM has better generalization than SVM.Based on the results shown in Tables 4 and 5, we further perform the standard McNemar's test [41], which is based on a standardized normal test statistic 12 21 where f 12 indicates the number of samples classified correctly by classifier 1 and simultaneously misclassified by classifier 2. The test is employed to verify the statistical significance in accuracy improvement of the proposed method.Tables 6 and 7 present the statistical significance from the standardized McNemar's test about the difference between the proposed KELM-based methods and the traditional SVM-based methods.In these two tables, classifier 1 is denoted as C1 and classifier 2 is denoted as C2.As listed in the tables, the difference in accuracy between the two methods is viewed to be significantly differently at 95% confidence level if  > 1.96 and at 99% confidence level if  > 2.58.Moreover, the sign of Z indicates whether classifier 1 outperforms classifier 2 (Z > 0) or vice versa.We can observe that the overall results of McNemar's test for both datasets all have negative signs.This demonstrates that KELM-based methods outperform SVM-based methods, which confirms the conclusions obtained from the classification accuracies as shown in Tables 4 and 5.We also conduct an experiment using the whole scene of the two datasets.For the Indian Pines dataset, we randomly select 10% of the samples from each class (16 classes are used in this experiment) for training and the rest for testing.For the University of Pavia dataset, we use 1% of the samples from each class for training and the rest for testing.The classification accuracy for each class, overall accuracy (OA), average accuracy (AA), and the Cohen-қ are shown in Tables 8 and 9 for the two datasets, respectively.As can be seen from Tables 8 and 9, the proposed Gabor-KELM and MH-KELM have superior performance to the pixel-wise classifiers and outperform Gabor-SVM and MH-SVM.More importantly, we can see that employing the spatial features for classification can improve the accuracy under the SSS condition.For example, in Table 8, the classification accuracies for class 1 (four training samples), 7 (two training samples) and 9 (two training samples) improved over 40% by integrating the spatial information (i.e., Gabor features or MH prediction) for KELM classifier.Due to the high cost of training data, such performance at low numbers of training data is important in many applications.Hence, we conclude that proposed Gabor-KELM and MH-KELM are very effective classification strategies for hyperspectral data analysis tasks under the SSS condition.Figures 7 and 8 provide a visual inspection of the classification maps generated using the whole HSI scene for the Indian Pines dataset (145 × 145 including unlabeled pixels) and the University Pavia dataset (610 × 340, including unlabeled pixels), respectively.As shown in the two figures, classification maps of spectral-spatial based classification methods are less noisy and more accurate than maps generated from pixel-wise classification methods.Moreover, spectral-spatial based classification methods exhibit better spatial homogeneity than pixel-wise classification methods.This homogeneity is observable within almost every labeled area.Finally, we report the computational complexity of the aforementioned classification methods using 20 labeled samples per class.All experiments are carried out using MATLAB on an Intel i7 Quadcore 2.63-GHz machine with 6 GB of RAM.The execution time for the two experimental data is listed in Table 10.For spectral-spatial based methods, we report the time for feature extraction and classification (training and testing) separately.It should be noted that SVM is implemented in the libsvm package which uses the MEX function to call C program in MATLAB while KELM is implemented purely in MATLAB.As can be seen in Table 10, in terms of execution time of pixel-wise classifiers, KELM is much faster than SVM even though SVM is implemented in C. For the spectral-spatial based classifiers (i.e., Gabor-filtering-based and MH-prediction-based classifiers) are, as expected, much slower than the pixel-wise classifiers due to the fact that they carry the additional burden of spatial feature extraction (i.e., Gabor filtering on PCs, or MH prediction preprocessing).MH-prediction-based methods are the most time-consuming ones since two iterations of MH prediction are used in the experiments and the weight vector  has to be calculated for every pixel in the image according to Equation ( 8) during MH prediction.It is worth mentioning that Gabor feature extraction procedure is performed independently on each PC, which means that Gabor feature extraction can go parallel.Thus, the speed of Gabor feature extraction on PCs can be greatly improved.

Conclusions
In this paper, we proposed to integrate spectral and spatial information to improve the performance of KELM classifier by using Gabor features and MH prediction preprocessing.Specifically, a simple two-dimensional Gabor filter was implemented to extract spatial features in the PCA-projected domain.MH prediction preprocessing makes use of the spatial piecewise-continuous nature of hyperspectral imagery to integrate spectral and spatial information.The proposed classification techniques, i.e., Gabor-KELM and MH-KELM, have been compared with the conventional pixel-wise classifiers, such as SVM and KELM, as well as Gabor-SVM and MH-SVM, under the SSS condition for hyperspectral data.Experimental results have demonstrated that the proposed methods can outperform the conventional pixel-wise classifiers as well as Gabor-filtering-based SVM and MH-prediction-based SVM in challenging small training sample size conditions.Specifically, the proposed spectral-spatial classification methods achieved over 16% and 9% classification accuracy improvement over the pixel-wise classification methods for the Indian Pines dataset and the University of Pavia dataset, respectively.MH-KELM outperformed MH-SVM by about 5% for the Indian Pines dataset and Gabor-KELM outperformed Gabor-SVM by about 1.3% for the University of Pavia dataset at all training sample sizes.Moreover, KELM exhibits very fast training and testing speed, which is an important attribute for hyperspectral analysis applications.Although the proposed methods carry additional burden on spatial feature extraction, the computational cost can be reduced by parallel computing.

Figure 4 .
Figure 4. Classification accuracy (%) versus varying δ and bw for the proposed Gabor-KELM using 20 labeled samples per class for (a) Indian Pines dataset; and (b) University of Pavia dataset.

Figure 5 .
Figure 5. Classification accuracy (%) versus varying search-window size (d) for the proposed MH-KELM using 20 labeled samples per class for two experimental datasets.

Figure 6 .
Figure 6.Classification accuracy (%) for Indian Pines and University of Pavia datasets as a function of the MH-prediction regularization parameter λ for the proposed MH-KELM using 20 labeled samples per class.The search-window size for MH prediction is d = 9 × 9.

Figure 7 .
Figure 7. Thematic maps resulting from classification using 1018 training samples (10% per class) for the Indian Pines dataset with 16 classes.The overall classification accuracy of each algorithm is indicated in parentheses.

Figure 8 .
Figure 8. Thematic maps resulting from classification using 423 training samples (1% per class) for the University of Pavia dataset.The overall classification accuracy of each algorithm is indicated in parentheses.

Table 1 .
Per-class samples for the Indian Pines dataset.

Table 2 .
Per-class samples for the University of Pavia dataset.

Table 3 .
Execution time (s) for one iteration of MH prediction for the Indian Pines dataset as a function of search-window size d.

Table 4 .
Overall classification accuracy (%)-mean ± standard deviation over 10 trials using varying number of labeled training samples (ratio represents the proportion of labeled training samples and samples to be classified) per class for the Indian Pines dataset (nine classes).

Table 5 .
Overall classification accuracy (%)-mean ± standard deviation over 10 trials using a varying number of labeled training samples (ratio represents the proportion of labeled training samples and samples to be classified) per class for the University of Pavia dataset.

Table 6 .
McNemar's test (Z ) for the Indian Pines dataset (nine classes, 20 samples per class for training).

Table 7 .
McNemar's test (Z) for the University of Pavia dataset (180 training and 7920 testing samples).

Table 9 .
Classification accuracy (%) for the University of Pavia dataset (whole scene).

Table 10 .
Execution time for the Indian Pines dataset (nine classes, 180 training and 9054 testing samples) and the University of Pavia dataset (180 training and 7920 testing samples).