Next Article in Journal
Estimation of the Motion-Induced Horizontal-Wind-Speed Standard Deviation in an Offshore Doppler Lidar
Previous Article in Journal
Global Land Surface Temperature Influenced by Vegetation Cover and PM2.5 from 2001 to 2016
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Kernel Extreme-Learning Machine for the Spectral–Spatial Classification of Hyperspectral Imagery

1
The State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710200, China
2
The Department of Electronic and Computer Engineering, Mississippi State University, Starkville, MS 39762, USA
3
The First Institute of Oceanography, State Oceanic Administration, Qingdao 266000, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2018, 10(12), 2036; https://doi.org/10.3390/rs10122036
Submission received: 12 November 2018 / Revised: 8 December 2018 / Accepted: 12 December 2018 / Published: 14 December 2018

Abstract

:
Extreme-learning machines (ELM) have attracted significant attention in hyperspectral image classification due to their extremely fast and simple training structure. However, their shallow architecture may not be capable of further improving classification accuracy. Recently, deep-learning-based algorithms have focused on deep feature extraction. In this paper, a deep neural network-based kernel extreme-learning machine (KELM) is proposed. Furthermore, an excellent spatial guided filter with first-principal component (GFFPC) is also proposed for spatial feature enhancement. Consequently, a new classification framework derived from the deep KELM network and GFFPC is presented to generate deep spectral and spatial features. Experimental results demonstrate that the proposed framework outperforms some state-of-the-art algorithms with very low cost, which can be used for real-time processes.

Graphical Abstract

1. Introduction

Hyperspectral imagery in remote sensing with hundreds of narrow spectral bands is used in many applications, such as global environment monitoring, land cover change detection, natural disaster assessment and medical diagnosis [1,2,3,4] etc. Classification is a significant information acquisition technique for hyperspectral imagery, which focuses on distinguishing physical objects and classifying each pixel into a unique label. With the development of machine learning, most machine learning algorithms based on statistical learning theory are employed in the information processing field. There are many traditional classification algorithms, such as k-nearest neighbors (KNN) [5], Bayesian models [6], random forests [7], etc. One of the most important and famous classifiers for hyperspectral image classification is the kernel-based support vector machine (KSVM), which can also be considered to be a neural network [8,9]. It provides superior classification performance by learning an optimal decision hyperplane to best separate different classes in a high-dimensional feature space through non-linear mapping. Some popular kernels include polynomial function and Gaussian radial basis function (RBF).
Recently, deep neural networks (DNNs) have been highlighted in the literature, which can learn high-level features hierarchically [10,11,12,13]. DNNs have demonstrated their potential in classification; in particular, they have motivated successful applications of deep models in hyperspectral image classification, which outperform other traditional classification methods. The typical deep-learning architectures include stacked autoencoders (SAEs) [14], deep belief networks (DBNs) [15] and convolutional neural networks (CNNs) [16,17]. Chen et al. first employed SAE depth model to extract features of hyperspectral imagery and classify these features via logical regression [18]. Then, Chen et al. used DBN as a classifier to distinguish each pixel [19]. In addition, Li and Du proposed a hyperspectral classification model which combines an optimized DBN with a texture feature enhancement model, achieving superior classification accuracy [20]. In particular, due to its local receptive fields, CNNs play a dominant role for processing the visual-based issues. Hu et al. employed a CNN to classify hyperspectral images directly in spectral domain [21]. However, without enough training samples, the traditional CNN faces an over-fitting problem. Li et al. proposed a fully CNN for feature enhancement and obtained outstanding hyperspectral accuracy [22]. Nevertheless, due to the high computation cost and space complexity, the aforementioned algorithms are very time-consuming. Real-time application is the predominant trend in future hyperspectral processing, and most of the aforementioned algorithms may not meet the requirements in fast data processing and analysis.
Extreme-learning machine (ELM) as a very fast and effective machine learning algorithm with a single hidden-layer feed-forward neural network was proposed by Huang in [23]. The parameters between its input and hidden layers are simply random variables. The only parameters to be trained are output weights, which can be easily estimated by a smallest norm least-squares solution. Compared with the traditional gradient-based back propagation (BP) learning, ELM is computationally much more efficient than the SVM and BP neural networks. Therefore, plentiful works about ELM have already done in hyperspectral classification [24,25,26,27] and achieved acceptable contributions. However, with the randomly generated weights and bias of ELM, it leads to different results even with the same hidden nodes. The kernel-based ELM (KELM) [28] was proposed to overcome this problem, which employs a kernel function to replace the hidden layer of the ELM. In [29,30], KELM has been used for hyperspectral image classification and obtained appreciate results. However, the feature representation ability is limited of the shallow networks. Therefore, multilayer solutions are imperative. Inspired by the multilayer perception (MLP) theory, Huang extended ELM to a multilayer ELM (ML-ELM) through using ELM-based autoencoder (ELM-AE) for feature representation and extraction [31]. From the perspective of deep learning, ELM-AE is stacked by deep layers can further extract deep robust and abstract features. In [32], the ML-ELM and KELM were combined for handling the EEG classification, where the former acted as a feature extractor and the latter as a classifier. The architecture of the method is complex and too much hyperparameters need to be determined. Additionally, several studies [33,34,35,36] have focused on integrating spatial and spectral information in hyperspectral imagery to assist classification. Multi-features are extracted and employed for classification. For instance, Kang et al. used a guided filter to process the pixel-wise classification map in each class by using the first-principal component (PC) or the first three PCs to capture major spatial features [37].
In this paper, we firstly investigate the ML-ELM for its suitability and effectiveness for hyperspectral classification. Then, to acquire desirable performance, we promote a deep layer-based kernel ELM (DKELM) algorithm to extract the deep and robust features of hyperspectral imagery. In addition, spatial information through different filters is added to further enhance the classification accuracy. The main contributions of this paper are summarized below:
  • ML-ELM is investigated and applied firstly in hyperspectral classification.
  • A classification framework is proposed for hyperspectral classification which combines DKELM and a novel guided filtering first-principal component (GFFPC) spatial filter.
  • The DKELM model remains simple, because randomly generated parameters are not necessary but only kernel parameters need to be tuned in each layer. Furthermore, compared to the ML-ELM, the numbers of nodes for each hidden layer are not required to be set due to the kernel tricks.
  • The proposed framework can achieve superior performance with very fast training speed, which is beneficial for real-time application.
This paper is organized as follows. Section 2 briefly introduces the ELM, KELM and ML-ELM (for convenience, we use MELM to replace ML-ELM in the following sections). Section 3 proposes our new framework to address the hyperspectral classification problem. Section 4 depicts the datasets and parameters tuning. Experimental results are presented and discussed in Section 5. The conclusions are drawn in Section 6.

2. Related Works

2.1. ELM and KELM

The ELM is a single hidden-layer feed-forward neural network (SLFN) as depicted in Figure 1. Please note that the hidden layer is non-linear because of the use of a non-linear activation function. However, the output layer is linear without an activation function. It contains three layers: input layer, hidden layer, and output layer.
Let x represent a training sample and f ( x ) be the output of the neural network. The SLFN with k hidden nodes can be represented by the following equation:
f ELM ( x ) = B T G w , b , x ,
where G ( w , b , x ) denotes the hidden-layer activation function, w is the input weight matrix connecting the input layer to the hidden layer, b means the bias weight of the hidden layer, and B = β 1 β 2 . . . β m is the weight between the hidden layer and output layer. For an ELM with n training samples, d input neurons (i.e., the number of bands), k hidden neurons, and m output neurons (i.e., m classes), Equation (1) becomes
t i = B T g w j , x i + b j , i = 1 , 2 , n ,
where t i is the m-dimensional desired output vector for the i-th training sample x i , the d-dimensional w j represents the j-th weight vector from the input layer to the j-th hidden neuron, and b j is the bias of the j-th hidden neuron. Here, w j , x i denotes the inner product of w j and x i . The sigmoid function g is used as the activation function, so the output of the j-th hidden neuron is
g w j , x i + b j = 1 / 1 + exp w j T x i + b j 2 ϵ 2 ,
where exp ( · ) denotes the exponent arithmetic, and ϵ 2 is the steepness parameter.
In matrix form, model (2) can be rearranged as
H B = T ,
where T R n × m is the target output, B R k × m . H = h ( x 1 ) h ( x n ) is referred to as hidden-layer output matrix of ELM with the size of ( n , k ) , which can also be expressed as follows:
H = g W . X + b = g ( < w 1 , x 1 > + b 1 ) g ( < w k , x 1 > + b k ) g ( < w 1 , x n > + b 1 ) g ( < w k , x n > + b k ) n × k .
Then, B can be estimated by a smallest norm least-squares solution:
B = H T = H T ( I C + H H T ) 1 T ,
where C is a regularization parameter. The ELM model can be represented as
f ELM ( x ) = h ( x ) H T ( I C + H H T ) 1 T ,
ELM can be extended to kernel-based ELM (KELM) via using kernel trick. Let
Ω = H H T ,
in which
Ω i , j = k ( x i , x j ) ,
where x i and x j represent the i-th and j-th training sample, respectively. Then, replacing H H T by Ω , the representation of KELM can be written as
f KELM ( x ) = h ( x ) H T ( I C + Ω ) 1 T ,
where f KELM ( x ) represents the output of the KELM model, and h ( x ) H T = k ( x , x 1 ) k ( x , x n ) .
Obviously, different from ELM, the most important characteristic of KELM is that the number of hidden nodes is not desired to be set and there are no random feature mappings. Furthermore, the computing time is reduced compared with ELM due to the kernel trick used.

2.2. Multilayer Extreme-Learning Machines (MELM)

Figure 2 depicts the detailed structure of ELM autoencoder(ELM-AE). ELM-AE represents features based on singular values. MELM is a multilayer neural network through stacking multiple ELM-AEs.
Let X ( i ) = [ x 1 ( i ) , , x n ( i ) ] , where x k ( i ) is the i-th data representation for input x k , k=1 to n. Suppose Λ ( i ) = [ λ 1 ( i ) , , λ n ( i ) ] is the i-th transformation matrix, where λ k ( i ) is the transformation vector used for representation learning with respect to x k ( i ) . According to Equation (4), B is replaced by Λ ( i ) and T is replaced by X ( i ) respectively in MELM.
H ( i ) Λ ( i ) = X ( i ) ,
where H ( i ) is the output matrix of the i-th hidden layer with respect to X ( i ) , and Λ ( i ) can be solved by
Λ ( i ) = ( H ( i ) ) T ( I C + H ( i ) ( H ( i ) ) T ) 1 X ( i ) .
Then
X * = g ( X ( i ) ( Λ ( i ) ) T ) ,
where X * is the last representation of X ( 1 ) . X * is used as hidden-layer output to calculate the output weight β * and β * can be computed as
β * = ( X * ) T = ( X * ) T ( I C + X * ( X * ) T ) 1 T .

3. The Proposed Framework for Hyperspectral Classification

In this section, we propose a new framework for hyperspectral classification which combines the hyperspectral spatial features with the Deep-layer-based KELM. The details of the proposed framework are discussed in this section. The main procedure of our proposed framework is briefly depicted in Figure 3. From Figure 3, we can see that the major three parts are as follows: data normalization, spatial feature enhancement and DKELM classification. The following sections introduce these three procedures in detail.

3.1. Data Normalization

Let X R N L be a hyperspectral data, where N denotes the number of samples and L is the number of bands. Data normalization is the pre-procedure to make each sample standardization. For each sample:
x ^ i = x i μ i δ i ,
where x ^ i is the normalized sample, x i X , i { 1 , , N } , and μ i and δ i denote the mean and variance of the samples, respectively. After this process, the data has zero mean and unit variance.

3.2. Spatial Features Enhancement

In our proposed framework, we use the Gaussian filters to extract spatial information; furthermore, a spatial feature enhancement method combined with guided filter (GF) and principal component analysis (PCA) is presented to enhance spatial information. Here, we introduce the GFFPC in detail.
The GF was proposed by He in 2012 [38], which can be used as an edge-preserving filter such as bilateral filter and it performs better near edges with fast time. As a local linear model, GF assumes that output image o is a linear transformation of guidance image g in a window W k of size ω * ω centered at the pixel k, where ω = (2r + 1):
o i = a k g i + b k , i W k ,
where o i is the i-th pixel of the output image o and g i is the i-th pixel of guidance image g, respectively. It is obvious that this model ensures o = a g , which means the output o will have an edge only if g has an edge. To calculate the coefficients a k and b k , a minimum energy function is applied as follows:
E a k , b k = a k g i + b k f i 2 + α a k 2 ,
where f i is the i-th pixel of the input image, and α is a regularization parameter penalizing large a k , which can affect the blurring for the GF. According to the energy function, it is expected that the output image o ought to be as close as possible to the input image f, while preserving the texture information of guidance image g through the linear model.
The solution to (16) can be addressed by linear regression as follows (Draper, 1981):
a k = 1 W i W k g i f i μ k f ¯ k v k 2 + α ,
b k = f ¯ k a k μ k ,
where μ k and v k 2 are the mean and variance of guidance image g within the local window W k , respectively. W is the number of pixels in W k . In addition, f ¯ k = k W k f k is the mean value of f in the window.
The structure of GFFPC is shown in Figure 4. We can see that the original hyperspectral image is processed by PCA method firstly, then we get the reconstructed dataset consisting of a new set of independent bands named PCs. After that, the GF is performed on each band of the original dataset, where the first PC of the reconstructed dataset is used as the gray-scale guidance image with the most information of the hyperspectral image including spatial features. The filtering output is the novel hyperspectral data cube with more distinctive edges and texture features, that can help further hyperspectral classification.

3.3. DKELM Classification

DKELM consists of several KELM-auto-encoders (AEs) in deep layer. Thus, we firstly present a brief description of the KELM-AE.

3.3.1. KELM-AE

Figure 5 demonstrates the structure of KELM-AE which is very similar to ELM-AE except the kernel representation. The kernel operation in Figure 5 can be represented as
Ω = k ( x ˜ , x j ) = < ϕ ( x ˜ ) , ϕ ( x j ) > ,
x ˜ is referred to as the testing samples, x j denotes the j-th training sample, and ϕ is the mapping function to the reproducing kernel Hilbert space(RKHS). From Figure 5, the input matrix X ( i ) is mapped into a kernel matrix Ω ( i ) through kernel function k ( k ) ( x i , x j ) = exp ( | | x i x j | | / 2 σ k 2 ) . In our proposed DKELM, we use the RBF kernel function with parameter σ k .
Then Λ ¯ ( i ) is employed to represent the i-th transformation matrix in KELM-AE, which is similar to ELM-AE in (10)
Ω ( i ) Λ ¯ ( i ) = X ( i ) ,
and then Λ ¯ ( i ) is calculated via
Λ ¯ ( i ) = ( I C + Ω ( i ) ) 1 X ( i ) ,
The data can be represented through the final data transformation procedure using:
X ( i + 1 ) = g ( X ( i ) ( Λ ¯ ( i ) ) T ) ,
where g is still an activation function. The hidden-layer activation functions can be either linear or non-linear. In our proposed DKELM, we use non-linear activation functions. Because deep distinct and abundant features can be learned and captured through the data representation via non-linear activation functions used between each KELM-AE. The combination of dense and sparse representations has more powerful interpretation than only linear learning. Compared to ELM-AE, we can find that the number of hidden nodes is not necessary to be set in advance because of the kernel trick used in each hidden layer. The pseudocode of KELM-AE is depicted in Algorithm 1.

3.3.2. DKELM

DKELM can obtain the universal approximation due to two separate learning procedures contained as same as in H-ELM [31]. Each pair of Λ ¯ ( i ) and X ( i ) (in the i-th KELM-AE) can be computed via Equations (19) and (23), respectively. At last, the final data representation X final * is calculated, and then X final * is used as the training input to train a KELM classification model as:
Ω final * β = T ,
where Ω final * is obtained from X final * , then the output weight β can be calculated via
β = ( I C + Ω final * ) 1 T .
The procedure of DKELM is depicted in Algorithm 2, including the training and testing phases.
Algorithm 1 The pseudocode of KELM-AE.
Input: Input matrix X ( i ) , regularization parameter C i , kernel parameter σ i , activation function g i .
Output: Transformation matrix Λ ¯ ( i ) , new data representation X ( i + 1 ) .
Step1: Calculate the kernel matrix Ω k , j ( i ) K ( x k , x j , σ i ) , where x k and x j are referred to as the k-th and j-th training sample, respectively.
Step2: Calculate the output weight Λ ¯ ( i ) I C i + Ω i 1 X ( i ) T .
Step3: Calculate the new data representation X ( i + 1 ) g i ( Λ ¯ ( i ) X ( i ) ) T .
Return: X ( i + 1 ) , Λ ¯ ( i ) .
Algorithm 2: The pseudocode of DKELM.
  • Training Phase
Input: Input matrix X ( i ) , output matrix T , regularization C i for each hidden layer, kernel parameter σ i for each hidden layer, activation function g i for each hidden layer, and the number of layers N.
Output: Transformation matrix Λ ¯ ( i ) for each hidden layer, the final representation of the training samples X N and final output weight β of the output layer.
Step1:
      for i = 1 : N 1 do:
      Calculate X ( i + 1 ) , Λ ¯ ( i ) KELM AE ( X ( i ) , C i , σ i , g i )
      end.
Step2: X N X ( i + 1 )
Step3: Calculate Ω k , j final K ( x k N , x j ( N ) , σ N ) , where x k N and x j ( N ) are the final representation of the training samples x k and x j respectively.
Step4: Calculate the final output weight β = I C i + Ω final 1 T T .
Return Λ ¯ ( 1 ) Λ ¯ ( N 1 ) , X N , β .
  • Testing Phase
Input: Input matrix of testing samples TX ( 1 ) , output of the training phase Λ ¯ ( 1 ) Λ ¯ ( N 1 ) , X N , β .
Output: Output matrix of the testing samples TY .
Step1:
      for i = 1 : N 1 do:
      Calculate the hidden representation T X ( l + 1 ) g l ( Λ ( l ) T X ( l ) T )
      end.
Step2: TX N TX ( l + 1 )
Step3: Calculate the kernel matrix Ω k , j ( N ) K ( x k N , x T X ( N ) j , σ N ) , where x k N and x T X ( N ) j are the final representation of the training samples x k and the final representation of the testing sample x j , respectively.
Step4: Calculate the final output of the DKELM TY = ( Ω ( N ) ) T β .
MELM employs the pseudoinverse to calculate the transformation matrix in each layer. Compared to MELM, exact inverse is used to calculate Λ ¯ ( i ) via invertible kernel matrix in the KELM-AEs of DKELM. Therefore, a theoretically perfect reconstruction of X ( i ) is resulted, which will reduce the error accumulation of the AEs in a certain degree. Consequently, DKELM can learn a better data representation and make for better generalization.

4. Experiments

In this section, we design a series of experiments to evaluate our proposed hyperspectral framework combining spatial filters with DKELM. As the comparison algorithms, ELM, KELM, KSVM and CNN are used in our experiments. Besides, all these algorithms are combined with GFFPC spatial feature enhancement method for further comparison purpose. The evaluation criteria employed are overall accuracy (OA) and kappa coefficient. Three classic hyperspectral benchmark datasets and one self-photographed hyperspectral dataset, noted as Indian Pines, University of Pavia, Salinas and Glycine ussuriensis, are used. In this paper, except CNN, all experiments are performed using MATLAB R2017b on a computer with 3.2 GHz CPU and 8.0 GB RAM. CNN algorithm is performed on NVIDIA Tesla K80 GPU.

4.1. Hyperspectral Datasets

The Indian Pines dataset was gathered by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor in North-western Indiana. It contains 220 spectral bands from 0.4–2.45 μ m region with spatial resolution of 20 m. This image has 145 × 145 pixels with 200 bands after removing 20 noisy and water absorption bands. The Indian Pines scene includes two-third agriculture, and one-third forest. The groundtruth of this image is introduced in Figure 6. Sixteen classes are contained and their numbers of labeled samples are tabulated in Table 1. In Indian Pines, 10% to 50% of labeled data of each class is selected randomly as training samples, and the remaining is testing samples.
The second dataset we used is about the University of Pavia, an urban scene acquired by the Reflective Optics Spectrometer (ROSIS) sensor. The ROSIS sensor can generate 115 spectral bands over 0.43 to 0.86 μ m. This image scene has 610 × 340 pixels, and each pixel has 103 bands after noisy band removal. The geometric resolution is 1.3 m per pixel. The Nine groundtruth classes with the number of labeled samples are tabulated in Table 2. Figure 7 demonstrates the groundtruth of the Pavia University using as the referenced image. In Pavia University, 5% to 25% of labeled data of each class used as training samples to evaluate the proposed framework.
The third dataset is named Salinas which was also collected by the 224-band AVIRIS sensor over Salinas Valley, California. This image scene comprises 512 × 217 pixels. It has 204 bands after removing noisy and water absorption bands. The groundtruth depicted in Figure 8 contains 16 classes and the detailed samples are showed in Table 3. In Salinas dataset experiments, 5% to 25% of labeled samples of each class are chosen randomly for training our proposed classification framework.
The last dataset is named Glycine ussuriensis dataset which was collected over the Yellow River Delta National Nature Reserve, Qingdao, China. The image is acquired by the Nano-hyperspec imaging system equipped on unmanned aerial vehicle. This image scene comprises 355 × 266 pixels with 270 bands after removing noisy and water absorption bands. The Glycine ussuriensis dataset contains 4 classes shown in Figure 9 and Table 4. All four categories are plants, specifically, Glycine ussuriensis is a small-seeded species, which is grown in hills, roadsides, or shrubs at 100–800 m above sea level. Unlike the datasets mentioned before, the classes in Glycine ussuriensis have no strict geographical separation. For instance, the samples in tarragon are surrounded by other samples. In the experiments, 5% to 25% of labeled samples of each class are chosen randomly to testify our proposed classification framework.

4.2. Parameters Tuning and Setting

In our proposed framework, we need to tune several parameters. There are two user specified parameters are required for GFFPC, the size of local sliding window denoted ω and the regularization parameter denoted α , respectively, in which the latter determining the degree of the blurring for the GF. In the simulations, ω is chosen from [3, 5, 7, 9, 11], and α is selected in the range of [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]. Figure 10a shows the classification accuracies of DKELM in the ω subspace, where the parameter α is prefixed. It can be seen that better performance is achieved when ω equals 7 for all the datasets. Then Figure 10b indicates the impact on the classification results of different α . The OA first increased, and then decreased as α decreasing. Thus, we set α to 1e-4 finally.
Figure 11 shows the accuracies obtained with different numbers of kernel layers in DKELM. We can see that in the case of using three kernel layers, the classification performance of DKELM can achieve superior results, because, based on the characteristics of hyperspectral datasets, three kernel layers of the network can already extract sufficient refined and distinguished features for the classification task, and the over-fitting occurs when the network is too deep with limited training samples. Thus, in our proposed framework, we set the number of kernel layers to 3.
Kernel function we used in this paper is RBF. The activation functions employed to connect different KELM-AEs are sigmoid and ReLU function. Sigmoid function can constrain the output of each layer to the range from 0 to 1, which has been represented in Equation (3). The ReLU function [39] can be expressed as follows:
ReLU ( x ) = x , x > 0 0 , x 0
where x is the input. The ReLU leads a sparse learning which can decrease the relationship among the parameters and eliminate the over-fitting problem. Therefore, the combination of sigmoid and ReLU functions can learn deep feature with different scales, which is beneficial to DKELM. Our proposed DKELM has three kernel layers, therefore three main parameters need to be tuned. σ 1 , σ 2 and σ 3 are the parameters used in RBF kernel functions. To make full advantages of our framework, a gird search algorithm is employed in tuning σ 1 , σ 2 and σ 3 . Figure 12 depicts the detailed tuning procedures of the three parameters in Indian Pines, University of Pavia, Salinas, and Glycine ussuriensis datasets, respectively. The vertical coordinate axis represents σ 3 , and the two horizontal coordinate axes express σ 1 and σ 2 . The color bars in Figure 12 mean the classification accuracy obtained via the different sets of values. The black circle with best classification accuracy depicts the values we finally chosen. Here, we list the final chosen values employed in the four datasets: {4e2, 3.6e3, 5.2e7} , {2.9e2, 1e5, 6e6}, {4e2, 5e4, 8e6} and {4e2, 3.6e3, 6e7}.

5. Experimental Results and Discussions

In this section, the proposed GFFPC and the novel classification model will be assessed, and the related results will be summarized and discussed at length. All the algorithms are repeated 10 runs with different locations of the training samples for each dataset, and the mean results with the corresponding standard deviation are reported.

5.1. Discussion on the Proposed GFFPC

In the experiments, we first investigate the impact of different spatial filters to MELM and DKELM. Figure 13 illustrates the OA of different spatial filters based on MELM and DKELM algorithms. Origin denotes the performance obtained via the original MELM and DKELM. G_3 × 3 and G_5 × 5 mean using Gaussian filter with 3 × 3 and 5 × 5 size of windows respectively before employing MELM and DKELM. The GFFPC represents using GFFPC based on MELM and DKELM.
From Figure 13, We can see that the performance with spatial filters become better than without spatial filters. Furthermore, the GFFPC obtains superior performance to Gaussian filters. Consequently, in our subsequent experiments, the GFFPC filter is our recommended spatial filter for strongly enhancing spatial features.

5.2. Discussion on the Classification Results

Table 5 tabulates the performance achieved by different classification algorithms and the combinations with GFFPC spatial filter through using 10% of labeled samples as training samples in Indian Pines. Although the OA of DKELM is slightly worse than the performance of CNN, our proposed framework DKELM-GFFPC outperforms other classification algorithms. In addition, quite apart from that, our proposed GFFPC can enhance the classification performance by a wide range. For instance, the OA of ELM-GFFPC is increased by 16.23% and the OA of CNN-GFFPC has been improved from 83.19% to 95.57%. In particular, the performance of MELM and DKLEM are enhanced by 17.02% and 16.38%, respectively. For class 1, 9 and 16 of Indian Pines, the accuracies increase from 83.33%, 66.67% and 98.33% to 100%, 94.44% and 100% when DKELM-GFFPC is used instead of DKELM. This phenomenon indicates that our proposed framework can beneficial to the performance of several small-size classes.
Table 6 also lists the classification performance of the comparison algorithms and the proposed algorithm in University of Pavia dataset through using 5% labeled samples as training samples. From the accuracies exhibited in Table 6, the OA of DKELM is improved by 13.95% via using GFFPC spatial filter. The performance of other algorithms is also enhanced in different degrees from 2.6% to 11.29%. It also can be seen that after combining with GFFPC, the classification performance is enhanced greatly. In particular, the DKELM-GFFPC achieves the best performance. Furthermore, the accuracies of small-size classes such as classes 5, 7 and 9 are improved through using GFFPC, implying that GFFPC can keep more distinctive features of small-size classes.
Table 7 demonstrates the performance obtained via different classification algorithms in Salinas dataset through adopting 5% of labeled samples as training samples. Compared with other algorithms, the proposed DKELM-GFFPC is still a predominant one. In addition, the classification performance of ELM, KELM, KSVM, CNN, MELM and DKELM is increased by 5.86%, 7.81%, 6.21%, 6.43%, 6.61% and 6.22%, respectively. Furthermore, classes 11, 13 and 14 are small classes with a few training samples, the performances of which are enhanced greatly through using the GFFPC spatial filter.
The classification accuracies achieved through the Glycine ussuriensis dataset are tabulated in Table 8. From Table 8, the DKELM-GFFPC obtains best performance than other classification frameworks. GFFPC still has the positive influence on enhancing the classification performance. For instance, the OAs of MELM and CNN are improved by 13.41% and 12.48% through adding GFFPC spatial filter. Moreover, the classification accuracy of the class Glycine ussuriensis with the least labeled samples is enhanced greatly through our proposed spatial filter and classification framework.
From Table 5, Table 6, Table 7 and Table 8, we can see that CNN, MELM and DKELM can work better than other traditional classifiers. Therefore, to further testify our proposed framework, we compare DKELM-GFFPC with CNN-GFFPC and MELM-GFFPC when using different percentage of training samples in Figure 14. With the increasing training samples, the classification performance is increasing gradually. Obviously, DKELM-GFFPC outperforms other two classification frameworks regardless the number of training samples. One of the most significant advantages of the proposed classification framework is the very fast training procedure. Thus, the average training times of different algorithms are compared and demonstrated in Table 9. ELM-based methods are faster than KSVM and CNN. The training time of DKELM and KELM depend on the number of training data. Meanwhile, ELM and MELM rely on the number of hidden neurons. The more the hidden neurons, the more the training time since the higher model generalization is provided subject to the data complexity. CNN can achieve superior classification performance. Nevertheless, it spends more training time which is nearly 135 to 411 times than that of DKELM in different experimental datasets. Therefore, DKELM is the most appealing one with the best performance and the least training time.
To illustrate the merits of our proposed classification framework and the spatial filter from the perspective of visualization, Figure 15, Figure 16, Figure 17 and Figure 18 demonstrate the classification maps. Clearly, compared with the groundtruth shown in Figure 6, Figure 7, Figure 8 and Figure 9, the classification maps obtained by our proposed framework are the smoothest and clearest. Besides, the classification maps achieved with GFFPC spatial filter are more distinct than without evidently. In particular, the border pixels and the boundaries of different classes in DKELM-GFFPC are more distinct. Compared with other classification methods, our proposed framework is better because of less salt-an-pepper noise contained in the classification maps.

6. Conclusions

In this work, the MELM algorithm is investigated and firstly applied to hyperspectral classification. Then, a DKELM-GFFPC framework is proposed consisting of the GFFPC for enhancing spatial features and the DKELM, a kernel version of MELM. Experimental results demonstrate that it can outperform other traditional algorithms, especially, DKELM-GFFPC can improve the accuracy of those classes with small-size samples in a different degree for each dataset. Moreover, compared with Gaussian filter, the proposed GFFPC can play an important role in enhancing hyperspectral classification performance. Finally, our proposed classification framework takes much lowest computation cost to achieve the highest classification accuracy. Based on the above-mentioned advantages, we believe that the proposed hyperspectral classification framework based on the novel DKELM and GFFPC is more suitable to process hyperspectral data in practical applications with low cost of computing, furthermore, in real-time application.

Author Contributions

J.L. and Q.D. conceived and designed the study; B.X. performed the experiments; G.R. and R.S. shared part of the experiment data; J.L. and Y.L. analyzed the data; J.L. and B.X. wrote the paper. Y.L., Q.D. and R.S. reviewed and edited the manuscript. All authors read and approved the manuscript.

Acknowledgments

This work was partially supported by the Fundamental Research Funds for the Central Universities JB170109, General Financial Grant from the China Postdoctoral Science Foundation (no. 2017M623124) and Special Financial Grant from the China Postdoctoral Science Foundation (no. 2018T111019). It was also partially supported by the National Nature Science Foundation of China (no. 61571345, 91538101, 61501346 and 61502367) and the 111 project (B08038).

Conflicts of Interest

The authors declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work and this paper was not published before, the founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

References

  1. Shan, J.; Zhao, J.; Liu, L.; Zhang, Y.; Wang, X.; Wu, F. A novel way to rapidly monitor microplastics in soil by hyperspectral imaging technology and chemometrics. Environ. Pollut. 2018, 238, 121–129. [Google Scholar] [CrossRef]
  2. Liu, K.; Su, H.; Li, X. Estimating High-Resolution Urban Surface Temperature Using a Hyperspectral Thermal Mixing (HTM) Approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 804–815. [Google Scholar] [CrossRef]
  3. Haboudane, D.; Tremblay, N.; Miller, J.R.; Vigneault, P. Remote Estimation of Crop Chlorophyll Content Using Spectral Indices Derived From Hyperspectral Data. IEEE Trans. Geosci. Remote Sens. 2008, 46, 423–437. [Google Scholar] [CrossRef]
  4. Pike, R.; Lu, G.; Wang, D.; Chen, Z.G.; Fei, B. A Minimum Spanning Forest-Based Method for Noninvasive Cancer Detection With Hyperspectral Imaging. IEEE Trans. Biomed. Eng. 2016, 63, 653–663. [Google Scholar] [CrossRef]
  5. Li, W.; Du, Q.; Zhang, F.; Hu, W. Collaborative-Representation-Based Nearest Neighbor Classifier for Hyperspectral Imagery. IEEE Geosci. Remote Sens. Lett. 2015, 12, 389–393. [Google Scholar] [CrossRef]
  6. Rankin, B.M.; Meola, J.; Eismann, M.T. Spectral Radiance Modeling and Bayesian Model Averaging for Longwave Infrared Hyperspectral Imagery and Subpixel Target Identification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 6726–6735. [Google Scholar] [CrossRef]
  7. Zhang, Y.; Cao, G.; Li, X.; Wang, B. Cascaded Random Forest for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1082–1094. [Google Scholar] [CrossRef]
  8. Melgani, F.; Bruzzone, L. Support vector machines for classification of hyperspectral remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef]
  9. Camps-Valls, G.; Bruzzone, L. Kernel-based methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1351–1362. [Google Scholar] [CrossRef] [Green Version]
  10. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep Learning-Based Classification of Hyperspectral Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  11. Li, B.; Dai, Y.; He, M. Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference. Pattern Recognit. 2018, 83, 328–339. [Google Scholar] [CrossRef] [Green Version]
  12. Makantasis, K.; Doulamis, A.D.; Doulamis, N.D.; Nikitakis, A. Tensor-Based Classification Models for Hyperspectral Data Analysis. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6884–6898. [Google Scholar] [CrossRef]
  13. Li, B.; Shen, C.; Dai, Y.; van den Hengel, A.; He, M. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1119–1127. [Google Scholar] [CrossRef]
  14. Özdemir, A.O.B.; Gedik, B.E.; Çetin, C.Y.Y. Hyperspectral classification using stacked autoencoders with deep learning. In Proceedings of the 2014 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Lausanne, Switzerland, 24–27 June 2014; pp. 1–4. [Google Scholar] [CrossRef]
  15. Zhong, P.; Gong, Z.; Li, S.; Schönlieb, C. Learning to Diversify Deep Belief Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3516–3530. [Google Scholar] [CrossRef]
  16. Lee, H.; Kwon, H. Going Deeper With Contextual CNN for Hyperspectral Image Classification. IEEE Trans. Image Process. 2017, 26, 4843–4855. [Google Scholar] [CrossRef] [Green Version]
  17. Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef]
  18. Lin, Z.; Chen, Y.; Zhao, X.; Wang, G. Spectral-spatial classification of hyperspectral image using autoencoders. In Proceedings of the 2013 9th International Conference on Information, Communications Signal Processing, Tainan, Taiwan, 10–13 December 2013; pp. 1–5. [Google Scholar] [CrossRef]
  19. Chen, Y.; Zhao, X.; Jia, X. Spectral–Spatial Classification of Hyperspectral Data Based on Deep Belief Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [Google Scholar] [CrossRef]
  20. Li, J.; Xi, B.; Li, Y.; Du, Q.; Wang, K. Hyperspectral Classification Based on Texture Feature Enhancement and Deep Belief Networksk. Remote Sens. 2018, 10, 396. [Google Scholar] [CrossRef]
  21. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep Convolutional Neural Networks for Hyperspectral Image Classification. J. Sens. 2015, 2015, 1–12. [Google Scholar] [CrossRef]
  22. Li, J.; Zhao, X.; Li, Y.; Du, Q.; Xi, B.; Hu, J. Classification of Hyperspectral Imagery Using a New Fully Convolutional Neural Network. IEEE Geosci. Remote Sens. Lett. 2018, 15, 292–296. [Google Scholar] [CrossRef]
  23. Huang, G.; Zhu, Q.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef] [Green Version]
  24. Li, J.; Du, Q.; Li, W.; Li, Y. Optimizing extreme learning machine for hyperspectral image classification. J. Appl. Remote Sens. 2015, 9, 097296. [Google Scholar] [CrossRef]
  25. Jiang, M.; Cao, F.; Lu, Y. Extreme Learning Machine With Enhanced Composite Feature for Spectral-Spatial Hyperspectral Image Classification. IEEE Access 2018, 6, 22645–22654. [Google Scholar] [CrossRef]
  26. Li, W.; Chen, C.; Su, H.; Du, Q. Local Binary Patterns and Extreme Learning Machine for Hyperspectral Imagery Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3681–3693. [Google Scholar] [CrossRef]
  27. Zhou, Y.; Peng, J.; Chen, C.L.P. Extreme Learning Machine With Composite Kernels for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2351–2360. [Google Scholar] [CrossRef]
  28. Huang, G.; Zhou, H.; Ding, X.; Zhang, R. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Trans. Syst. Man Cybern. Part B 2012, 42, 513–529. [Google Scholar] [CrossRef] [Green Version]
  29. Pal, M.; Maxwell, A.E.; Warner, T.A. Kernel-based extreme learning machine for remote-sensing image classification. Remote Sens. Lett. 2013, 4, 853–862. [Google Scholar] [CrossRef]
  30. Chen, C.; Li, W.; Su, H.; Liu, K. Spectral-Spatial Classification of Hyperspectral Image Based on Kernel Extreme Learning Machine. Remote Sens. 2014, 6, 5795–5814. [Google Scholar] [CrossRef] [Green Version]
  31. Tang, J.; Deng, C.; Huang, G. Extreme Learning Machine for Multilayer Perceptron. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 809–821. [Google Scholar] [CrossRef]
  32. Ding, S.; Zhang, N.; Xu, X.; Guo, L.; Zhang, J. Deep Extreme Learning Machine and Its Application in EEG Classification. Math. Probl. Eng. 2015, 2015, 1–11. [Google Scholar] [CrossRef]
  33. Ma, X.; Wang, H.; Geng, J. Spectral–Spatial Classification of Hyperspectral Image Based on Deep Auto-Encoder. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 4073–4085. [Google Scholar] [CrossRef]
  34. Gao, W.; Peng, Y. Ideal Kernel-Based Multiple Kernel Learning for Spectral-Spatial Classification of Hyperspectral Image. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1051–1055. [Google Scholar] [CrossRef]
  35. Patra, S.; Bhardwaj, K.; Bruzzone, L. A Spectral-Spatial Multicriteria Active Learning Technique for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 5213–5227. [Google Scholar] [CrossRef]
  36. Fauvel, M.; Tarabalka, Y.; Benediktsson, J.A.; Chanussot, J.; Tilton, J.C. Advances in Spectral-Spatial Classification of Hyperspectral Images. Proc. IEEE 2013, 101, 652–675. [Google Scholar] [CrossRef]
  37. Kang, X.; Li, S.; Benediktsson, J.A. Spectral–Spatial Hyperspectral Image Classification With Edge-Preserving Filtering. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2666–2677. [Google Scholar] [CrossRef]
  38. He, K.; Sun, J.; Tang, X. Guided Image Filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1397–1409. [Google Scholar] [CrossRef]
  39. Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical Evaluation of Rectified Activations in Convolutional Network. Comput. Sci. 2015, arXiv:1505.00853. [Google Scholar]
Figure 1. The structure of ELM.
Figure 1. The structure of ELM.
Remotesensing 10 02036 g001
Figure 2. The structure of ELM-AE: both the input and output are x , and ELM-AE has the same solution as the original ELM. g d is the d-th hidden node for input x .
Figure 2. The structure of ELM-AE: both the input and output are x , and ELM-AE has the same solution as the original ELM. g d is the d-th hidden node for input x .
Remotesensing 10 02036 g002
Figure 3. The main procedure of our proposed framework. Λ ¯ ( 1 ) is the first layer’s transformation matrix of DKELM got from the KELM-AE of the input data. Λ ¯ ( i + 1 ) refers to the transformation matrix of ( i + 1 ) - t h hidden layer h i + 1 of DKELM, which obtained by KELM-AE of the i-th hidden layer h i .
Figure 3. The main procedure of our proposed framework. Λ ¯ ( 1 ) is the first layer’s transformation matrix of DKELM got from the KELM-AE of the input data. Λ ¯ ( i + 1 ) refers to the transformation matrix of ( i + 1 ) - t h hidden layer h i + 1 of DKELM, which obtained by KELM-AE of the i-th hidden layer h i .
Remotesensing 10 02036 g003
Figure 4. The structure of GFFPC.
Figure 4. The structure of GFFPC.
Remotesensing 10 02036 g004
Figure 5. The structure of KELM-AE.
Figure 5. The structure of KELM-AE.
Remotesensing 10 02036 g005
Figure 6. The groundtruth of Indian Pines.
Figure 6. The groundtruth of Indian Pines.
Remotesensing 10 02036 g006
Figure 7. The groundtruth of University of Pavia.
Figure 7. The groundtruth of University of Pavia.
Remotesensing 10 02036 g007
Figure 8. The groundtruth of Salinas Dataset.
Figure 8. The groundtruth of Salinas Dataset.
Remotesensing 10 02036 g008
Figure 9. The groundtruth of Glycine Ussuriensis Dataset.
Figure 9. The groundtruth of Glycine Ussuriensis Dataset.
Remotesensing 10 02036 g009
Figure 10. The impact on the classification results of different parameters of GFFPC (a) local sliding window size ω , (b) regularization parameter α .
Figure 10. The impact on the classification results of different parameters of GFFPC (a) local sliding window size ω , (b) regularization parameter α .
Remotesensing 10 02036 g010
Figure 11. The accuracies obtained with different numbers of kernel layers in DKELM.
Figure 11. The accuracies obtained with different numbers of kernel layers in DKELM.
Remotesensing 10 02036 g011
Figure 12. Parameters tuning of DKELM in (a) Indian pines, (b) University of Pavia, (c) Salinas, and (d) Glycine ussuriensis Datasets.
Figure 12. Parameters tuning of DKELM in (a) Indian pines, (b) University of Pavia, (c) Salinas, and (d) Glycine ussuriensis Datasets.
Remotesensing 10 02036 g012
Figure 13. The performance of different spatial filters combined with MELM and DKELM in (a) Indian pines, (b) University of Pavia, (c) Salinas, and (d) Glycine ussuriensis Datasets.
Figure 13. The performance of different spatial filters combined with MELM and DKELM in (a) Indian pines, (b) University of Pavia, (c) Salinas, and (d) Glycine ussuriensis Datasets.
Remotesensing 10 02036 g013
Figure 14. The performance of CNN, MELM and DKELM combined with GFFPC through using different size of training sample in (a) Indian pines, (b) University of Pavia, (c) Salinas, and (d) Glycine ussuriensis Dataset.
Figure 14. The performance of CNN, MELM and DKELM combined with GFFPC through using different size of training sample in (a) Indian pines, (b) University of Pavia, (c) Salinas, and (d) Glycine ussuriensis Dataset.
Remotesensing 10 02036 g014
Figure 15. Classification maps of the (a) ELM, (b) ELM + GFFPC, (c) KELM, (d) KELM + GFFPC, (e) KSVM, (f) KSVM + GFFPC, (g) CNN, (h) CNN + GFFPC, (i) MELM, (j) MELM + GFFPC, and the proposed (k) DKELM, and (l) DKELM + GFFPC for the Indian Pines.
Figure 15. Classification maps of the (a) ELM, (b) ELM + GFFPC, (c) KELM, (d) KELM + GFFPC, (e) KSVM, (f) KSVM + GFFPC, (g) CNN, (h) CNN + GFFPC, (i) MELM, (j) MELM + GFFPC, and the proposed (k) DKELM, and (l) DKELM + GFFPC for the Indian Pines.
Remotesensing 10 02036 g015
Figure 16. Classification maps of the (a) ELM, (b) ELM + GFFPC, (c) KELM, (d) KELM + GFFPC, (e) KSVM, (f) KSVM + GFFPC, (g) CNN, (h) CNN + GFFPC, (i) MELM, (j) MELM + GFFPC, and the proposed (k) DKELM, and (l) DKELM + GFFPC for the University of Pavia.
Figure 16. Classification maps of the (a) ELM, (b) ELM + GFFPC, (c) KELM, (d) KELM + GFFPC, (e) KSVM, (f) KSVM + GFFPC, (g) CNN, (h) CNN + GFFPC, (i) MELM, (j) MELM + GFFPC, and the proposed (k) DKELM, and (l) DKELM + GFFPC for the University of Pavia.
Remotesensing 10 02036 g016
Figure 17. Classification maps of the (a) ELM, (b) ELM + GFFPC, (c) KELM, (d) KELM + GFFPC, (e) KSVM, (f) KSVM + GFFPC, (g) CNN, (h) CNN + GFFPC, (i) MELM, (j) MELM + GFFPC, and the proposed (k) DKELM, and (l) DKELM + GFFPC for the Salinas dataset
Figure 17. Classification maps of the (a) ELM, (b) ELM + GFFPC, (c) KELM, (d) KELM + GFFPC, (e) KSVM, (f) KSVM + GFFPC, (g) CNN, (h) CNN + GFFPC, (i) MELM, (j) MELM + GFFPC, and the proposed (k) DKELM, and (l) DKELM + GFFPC for the Salinas dataset
Remotesensing 10 02036 g017
Figure 18. Classification maps of the (a) ELM, (b) ELM + GFFPC, (c) KELM, (d) KELM + GFFPC, (e) KSVM, (f) KSVM + GFFPC, (g) CNN, (h) CNN + GFFPC, (i) MELM, (j) MELM + GFFPC, and the proposed (k) DKELM, and (l) DKELM + GFFPC for the Glycine ussuriensis dataset.
Figure 18. Classification maps of the (a) ELM, (b) ELM + GFFPC, (c) KELM, (d) KELM + GFFPC, (e) KSVM, (f) KSVM + GFFPC, (g) CNN, (h) CNN + GFFPC, (i) MELM, (j) MELM + GFFPC, and the proposed (k) DKELM, and (l) DKELM + GFFPC for the Glycine ussuriensis dataset.
Remotesensing 10 02036 g018
Table 1. Labeled samples in Indian Pines dataset.
Table 1. Labeled samples in Indian Pines dataset.
No.ClassesSamples
1ALFALFA46
2CORN-NOTILL1428
3CORN-MIN830
4CORN237
5GRASS/PASTURE483
6GRASS/TREES730
7GRASS/PASTURE-MOWED28
8HAY-WINDOWED478
9OATS20
10SPYBEAN-NOTILL972
11SOYBEAN-MIN2455
12SOYBEAN-CLEAN593
13WHEATS205
14WOODS1265
15BUILDING-GRASS-TREE-DRIVES386
16STONE-STEEL TOWERS93
TOTAL10,249
Table 2. Labeled samples in University of Pavia.
Table 2. Labeled samples in University of Pavia.
No.ClassesSamples
1ASPHALT6631
2MEADOWS18,649
3GRAVEL2099
4TREES3064
5METAL SHEETS1345
6BARE SOIL5029
7BITUMEN1330
8BRICKS3682
9SHADOWS947
TOTAL42,776
Table 3. Labeled samples in Salinas Dataset.
Table 3. Labeled samples in Salinas Dataset.
No.ClassesSamples
1BROCOIL_GREEN_WEEDS_12009
2BROCOIL_GREEN_WEEDS_23726
3FALLOW1976
4FALLOW_ROUGH_PLOW1394
5FALLOW_SMOOTH2678
6STUBBLE3959
7CELERY3579
8GAPES_UNTRAINED11,271
9SOIL_VINYARD_DEVELOP6203
10CORN_SENNESCED_GREEN_WEEDS3278
11LETTUCE_ROMAINE_4WK1068
12LETTUCE_ROMAINE_5WK1927
13LETTUCE_ROMAINE_6WK916
14LETTUCE_ROMAINE_7WK1070
15VINYARD_UNTRAINED7268
16VINYARD_VERTICAL_TRELLIS1807
TOTAL54,129
Table 4. Labeled samples in Glycine Ussuriensis Dataset.
Table 4. Labeled samples in Glycine Ussuriensis Dataset.
No.ClassesSamples
1SETARIA VIRIDIS8295
2TARRAGON33,985
3GLYCINE USSURIENSIS3446
4TAMARISK31,112
TOTAL76,838
Table 5. Classification Accuracy(%) of the comparison and proposed algorithms in Indian Pines.
Table 5. Classification Accuracy(%) of the comparison and proposed algorithms in Indian Pines.
NO.ELMKELMKSVMCNNMELMDKELMELM-GFFPCKELM-GFFPCKSVM-GFFPCCNN-GFFPCMELM-GFFPCDKELM-GFFPC
192.00 ± 2.24100.00 ± 0.0073.53 ± 1.2575.00 ± 1.2188.24 ± 4.6783.33 ± 0.66100.00 ± 0.00100.00 ± 0.0090.70 ± 0.76100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00
270.79 ± 1.8077.40 ± 2.2478.37 ± 0.4877.05 ± 0.7771.46 ± 0.9478.20 ± 3.6584.48 ± 0.9291.21 ± 0.8595.30 ± 0.8793.76 ± 0.4196.48 ± 0.1597.60 ± 1.22
369.13 ± 3.6177.38 ± 1.1474.34 ± 1.3880.13 ± 0.5670.75 ± 2.5374.42 ± 1.5290.15 ± 1.0296.08 ± 0.5794.24 ± 1.2497.31 ± 0.9695.75 ± 0.3798.91 ± 0.88
461.58 ± 2.7569.14 ± 3.5759.49 ± 1.5568.16 ± 2.6674.44 ± 2.4867.39 ± 2.0793.67 ± 0.9183.40 ± 1.1783.68 ± 1.1983.61 ± 0.3497.70 ± 0.4593.39 ± 1.38
584.91 ± 1.5386.69 ± 0.9786.40 ± 0.3591.55 ± 0.8385.56 ± 0.9585.49 ± 1.3194.35 ± 0.3997.09 ± 0.5596.26 ± 0.9497.72 ± 0.6499.01 ± 0.0999.28 ± 0.06
689.73 ± 0.8286.60 ± 0.1290.84 ± 0.1292.99 ± 0.4990.50 ± 0.5889.54 ± 0.1396.89 ± 1.2198.20 ± 0.9999.85 ± 0.7598.64 ± 0.26100.00 ± 0.0099.85 ± 0.05
780.00 ± 10.4880.00 ± 0.0070.00 ± 3.5595.45 ± 1.8094.12 ± 5.1885.00 ± 1.1795.65 ± 0.9574.19 ± 1.3371.88 ± 1.7964.71 ± 2.4082.76 ± 1.9277.42 ± 2.22
894.21 ± 1.0287.68 ± 1.0396.09 ± 0.0296.32 ± 2.1191.59 ± 0.8292.59 ± 0.6499.06 ± 0.0699.07 ± 0.6599.77 ± 0.0399.30 ± 0.5799.77 ± 0.2399.31 ± 0.18
966.67 ± 0.6566.67 ± 3.0075.00 ± 3.1041.67 ± 6.78100.00 ± 0.0066.67 ± 0.550.00 ± 0.00100.00 ± 0.0070.83 ± 2.1391.67 ± 1.52100.00 ± 0.0094.44 ± 0.64
1069.73 ± 0.5177.27 ± 2.1474.97 ± 2.0776.21 ± 3.6678.16 ± 1.5376.88 ± 0.6991.95 ± 1.2194.45 ± 0.5094.32 ± 0.4592.60 ± 0.8297.37 ± 0.4497.42 ± 0.42
1170.60 ± 2.2173.71 ± 0.2977.35 ± 1.6781.21 ± 0.3779.67 ± 1.0378.73 ± 1.2293.58 ± 0.7894.90 ± 0.4298.86 ± 0.8695.74 ± 1.2199.28 ± 0.4099.23 ± 0.45
1274.34 ± 0.4281.58 ± 0.3181.02 ± 0.1680.39 ± 1.1676.89 ± 1.1484.68 ± 0.4785.56 ± 2.5592.49 ± 0.9194.12 ± 1.0388.85 ± 0.9295.90 ± 0.3697.19 ± 1.44
1396.77 ± 2.2192.86 ± 0.4886.26 ± 1.4192.71 ± 0.2194.27 ± 1.3893.78 ± 0.58100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00
1488.75 ± 0.6082.69 ± 1.3892.49 ± 1.1293.48 ± 1.0791.35 ± 0.3689.65 ± 2.3199.91 ± 0.0899.91 ± 0.0699.12 ± 0.1299.91 ± 0.0599.82 ± 0.0999.91 ± 0.04
1563.76 ± 2.6884.76 ± 0.6169.58 ± 2.0568.28 ± 2.3588.04 ± 1.2586.14 ± 1.4595.39 ± 1.5597.06 ± 1.2196.43 ± 0.1196.76 ± 0.8299.04 ± 0.3198.53 ± 0.36
1698.39 ± 1.3398.61 ± 0.0098.57 ± 0.35100.00 ± 0.00100.00 ± 0.0098.33 ± 0.08100.0 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00
OA76.70 ± 0.2879.42 ± 0.8881.06 ± 1.2083.19 ± 0.8581.24 ± 0.2882.21 ± 0.5692.93 ± 0.3395.31 ± 0.3996.64 ± 0.9195.57 ± 0.4998.26 ± 0.12 98.59 ± 0.40
Kappa73.25 ± 0.3576.26 ± 0.4478.34 ± 0.6080.77 ± 0.9878.52 ± 0.3379.59 ± 0.7891.92 ± 0.2194.64 ± 0.2296.17 ± 1.6494.94 ± 0.3698.02 ± 0.13 98.39 ± 0.29
Table 6. Classification Accuracy(%) of the comparison and proposed algorithms in University of Pavia.
Table 6. Classification Accuracy(%) of the comparison and proposed algorithms in University of Pavia.
NO.ELMKELMKSVMCNNMELMDKELMELM-GFFPCKELM-GFFPCKSVM-GFFPCCNN-GFFPCMELM-GFFPCDKELM-GFFPC
189.22 ± 0.5292.62 ± 1.0293.87 ± 0.1894.88 ± 0.7190.69 ± 0.4091.17 ± 0.2989.03 ± 0.4890.07 ± 0.8795.47 ± 1.2192.47 ± 0.5094.74 ± 2.1899.65 ± 0.34
281.88 ± 1.3382.22 ± 0.1995.97 ± 0.1997.35 ± 1.0985.54 ± 0.2293.38 ± 0.5992.69 ± 0.5895.48 ± 0.4999.48 ± 0.4699.22 ± 0.7697.94 ± 0.1199.99 ± 0.01
375.20 ± 1.8080.94 ± 0.1084.90 ± 0.8186.59 ± 2.5578.31 ± 1.0979.99 ± 2.0898.23 ± 0.2098.99 ± 0.0998.94 ± 0.1393.64 ± 0.9095.45 ± 0.4698.45 ± 0.50
483.84 ± 0.2285.21 ± 0.1497.84 ± 0.1992.61 ± 0.1588.05 ± 1.1194.77 ± 1.3285.28 ± 0.5387.60 ± 1.0194.07 ± 0.3892.45 ± 0.6088.01 ± 0.5596.12 ± 0.74
599.45 ± 0.3299.53 ± 0.0898.99 ± 0.3999.07 ± 0.0699.84 ± 0.08100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00
684.90 ± 7.7992.49 ± 0.1492.18 ± 1.1493.00 ± 1.3387.17 ± 0.2990.38 ± 1.3796.55 ± 0.9298.38 ± 0.9599.50 ± 0.0898.88 ± 0.9995.62 ± 2.1299.71 ± 0.17
777.35 ± 0.1286.61 ± 0.1683.13 ± 2.4887.45 ± 2.4682.80 ± 0.4791.47 ± 1.1497.92 ± 0.8899.92 ± 0.03100.00 ± 0.0095.82 ± 0.1499.44 ± 0.0999.84 ± 0.08
864.07 ± 0.1371.14 ± 2.6084.71 ± 0.8783.26 ± 0.4672.17 ± 0.2274.51 ± 3.9692.31 ± 1.0196.02 ± 0.8193.97 ± 1.4392.20 ± 1.6197.21 ± 1.5998.23 ± 0.89
991.58 ± 0.6791.29 ± 0.19100.00 ± 0.00100.00 ± 0.0093.93 ± 2.0091.42 ± 0.97100.00 ± 0.00100.00 ± 0.0099.89 ± 0.0799.89 ± 0.4498.30 ± 0.3398.33 ± 0.50
OA81.52 ± 0.9084.02 ± 0.2193.59 ± 0.6394.12 ± 0.5285.41 ± 0.1990.51 ± 1.2192.81 ± 0.6995.03 ± 0.2698.01 ± 0.2896.72 ± 0.6096.40 ± 0.21 99.36 ± 0.29
Kappa74.66 ± 1.3678.06 ± 2.3091.47 ± 0.1492.21 ± 0.4380.14 ± 0.2887.31 ± 0.8890.13 ± 0.5993.35 ± 0.1497.36 ± 0.3195.65 ± 0.4395.20 ± 0.15 99.15 ± 0.45
Table 7. Classification Accuracy(%) of the comparison and proposed algorithms in Salinas Dataset.
Table 7. Classification Accuracy(%) of the comparison and proposed algorithms in Salinas Dataset.
NO.ELMKELMKSVMCNNMELMDKELMELM-GFFPCKELM-GFFPCKSVM-GFFPCCNN-GFFPCMELM-GFFPCDKELM-GFFPC
199.84 ± 0.16100.00 ± 0.00100.00 ± 0.00100.00 ± 0.0099.78 ± 0.22100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00
299.10 ± 0.6498.25 ± 0.1299.07 ± 0.0599.24 ± 0.1699.18 ± 0.0899.49 ± 0.1199.69 ± 0.1899.97 ± 0.2399.89 ± 0.0799.69 ± 0.27100.00 ± 0.00100.00 ± 0.00
397.53 ± 1.0198.81 ± 0.2695.79 ± 0.3394.56 ± 1.5298.58 ± 0.9897.55 ± 0.0897.66 ± 1.0599.79 ± 0.6597.61 ± 1.1897.81 ± 0.36100.00 ± 0.00100.00 ± 0.00
499.47 ± 0.3099.54 ± 0.1399.02 ± 0.2197.20 ± 0.3899.53 ± 0.0999.54 ± 0.1398.21 ± 0.9198.73 ± 0.7898.63 ± 0.5198.14 ± 1.8499.08 ± 0.0899.02 ± 0.71
593.46 ± 1.0496.86 ± 1.1299.12 ± 0.2099.35 ± 0.0798.74 ± 0.5598.78 ± 0.6499.64 ± 0.3399.76 ± 0.2398.81 ± 0.5999.72 ± 0.2398.94 ± 0.1499.45 ± 0.40
699.87 ± 0.0399.95 ± 0.02100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.0099.97 ± 0.0199.95 ± 0.05100.00 ± 0.0099.87 ± 0.11100.00 ± 0.0099.97 ± 0.01
798.52 ± 1.0999.41 ± 0.3699.21 ± 0.3599.94 ± 0.0599.88 ± 0.0999.59 ± 0.0799.76 ± 0.1899.88 ± 0.0998.42 ± 0.4199.79 ± 0.1599.94 ± 0.0299.97 ± 0.01
874.68 ± 1.4575.78 ± 3.2780.28 ± 1.2676.98 ± 2.6580.73 ± 0.5882.42 ± 0.8788.02 ± 1.2398.76 ± 0.1598.55 ± 0.6297.67 ± 1.0598.97 ± 0.3799.91 ± 0.03
998.30 ± 0.2898.68 ± 0.4599.05 ± 0.4598.05 ± 0.5198.92 ± 0.0798.92 ± 0.2599.81 ± 0.1299.95 ± 0.0399.51 ± 0.3699.49 ± 0.4299.97 ± 0.0299.85 ± 0.11
1092.73 ± 0.0595.85 ± 1.2298.34 ± 0.6294.87 ± 2.6198.58 ± 0.3698.42 ± 0.2898.79 ± 0.5299.17 ± 0.2299.15 ± 0.8098.67 ± 0.6699.32 ± 0.5899.20 ± 0.50
1196.16 ± 0.0797.33 ± 0.6898.24 ± 0.0692.69 ± 1.2199.80 ± 0.1998.83 ± 0.3099.61 ± 0.07100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00
1292.99 ± 1.6396.37 ± 0.7898.12 ± 0.2396.47 ± 0.9799.29 ± 0.1198.92 ± 1.30100.00 ± 0.00100.00 ± 0.00100.00 ± 0.0099.78 ± 0.07100.00 ± 0.00100.00 ± 0.00
1392.07 ± 0.4292.21 ± 1.3595.64 ± 1.2293.44 ± 0.2297.71 ± 0.2299.07 ± 0.5597.26 ± 1.2899.53 ± 0.1199.88 ± 0.0997.59 ± 1.89100.00 ± 0.00100.00 ± 0.00
1495.92 ± 0.6896.73 ± 1.6696.63 ± 0.5594.37 ± 1.5596.82 ± 0.4496.87 ± 0.1896.99 ± 0.8197.16 ± 1.8696.05 ± 0.6997.01 ± 0.6396.89 ± 1.5998.51 ± 0.59
1579.67 ± 0.3882.36 ± 3.2883.39 ± 0.2685.64 ± 1.8680.32 ± 1.0684.17 ± 2.3691.24 ± 1.3995.85 ± 2.2398.13 ± 0.9791.67 ± 2.7898.41 ± 0.1499.80 ± 0.11
1699.71 ± 0.1899.82 ± 0.1897.53 ± 0.7898.66 ± 0.9499.94 ± 0.2499.77 ± 0.0299.94 ± 0.0299.94 ± 0.0598.27 ± 0.3999.94 ± 0.04100.00 ± 0.0099.94 ± 0.02
OA90.01 ± 0.4691.17 ± 1.3792.69 ± 0.5991.48 ± 1.2192.77 ± 0.2893.58 ± 0.7895.87 ± 0.2598.98 ± 0.5898.90 ± 0.5497.91 ± 0.8799.38 ± 0.25 99.80 ± 0.20
Kappa88.84 ± 0.5290.13 ± 0.1191.84 ± 0.4590.48 ± 0.9891.93 ± 0.3192.83 ± 0.5595.40 ± 0.1998.87 ± 0.5098.78 ± 0.3497.68 ± 0.7699.31 ± 0.31 99.78 ± 0.22
Table 8. Classification Accuracy(%) of the comparison and proposed algorithms in Glycine ussuriensis Dataset.
Table 8. Classification Accuracy(%) of the comparison and proposed algorithms in Glycine ussuriensis Dataset.
NO.ELMKELMKSVMCNNMELMDKELMELM-GFFPCKELM-GFFPCKSVM-GFFPCCNN-GFFPCMELM-GFFPCDKELM-GFFPC
180.73 ± 1.8180.01 ± 0.2779.59 ± 0.9580.65 ± 0.7979.96 ± 2.6783.30 ± 0.7092.33 ± 0.6992.74 ± 0.7694.62 ± 0.7093.69 ± 1.1994.17 ± 0.7596.34 ± 0.55
275.94 ± 2.9077.35 ± 1.5484.33 ± 1.4882.58 ± 0.9581.18 ± 1.7582.64 ± 0.3192.74 ± 0.3195.02 ± 0.7994.36 ± 0.7594.75 ± 0.4996.24 ± 0.2597.80 ± 0.14
364.73 ± 1.9084.26 ± 0.9575.48 ± 0.8077.37 ± 0.6686.22 ± 2.7486.65 ± 0.2891.54 ± 0.9592.06 ± 0.1995.54 ± 1.2897.39 ± 0.5992.13 ± 0.5196.76 ± 0.15
477.56 ± 2.1285.26 ± 3.5785.23 ± 1.1483.55 ± 1.3584.04 ± 1.3986.52 ± 0.4693.27 ± 0.3095.77 ± 0.4995.45 ± 1.6895.48 ± 0.4396.11 ± 0.7097.09 ± 0.26
OA76.50 ± 0.6380.79 ± 0.9583.78 ± 0.4282.56 ± 0.8482.36 ± 0.6584.38 ± 0.7192.85 ± 0.4494.94 ± 0.4595.38 ± 0.6695.04 ± 0.5995.77 ± 0.89 97.30 ± 0.44
Kappa62.16 ± 0.9769.16 ± 0.9674.18 ± 0.9172.09 ± 0.9371.55 ± 0.7174.85 ± 0.2888.60 ± 0.3891.93 ± 0.6591.81 ± 1.1694.94 ± 0.2293.26 ± 0.96 95.70 ± 0.25
Table 9. Training time (In second) of different algorithms based on four datasets.
Table 9. Training time (In second) of different algorithms based on four datasets.
ELMKELMKSVMCNNMELMDKELM
10%Indian2.590.07406.80246.75.180.60
5%Pavia1.420.31446.91847.065.092.85
5%Salinas1.370.622094.14679.353.105.02
5%Glycine ussuriensis6.911.514927.52966.417.6713.52

Share and Cite

MDPI and ACS Style

Li, J.; Xi, B.; Du, Q.; Song, R.; Li, Y.; Ren, G. Deep Kernel Extreme-Learning Machine for the Spectral–Spatial Classification of Hyperspectral Imagery. Remote Sens. 2018, 10, 2036. https://doi.org/10.3390/rs10122036

AMA Style

Li J, Xi B, Du Q, Song R, Li Y, Ren G. Deep Kernel Extreme-Learning Machine for the Spectral–Spatial Classification of Hyperspectral Imagery. Remote Sensing. 2018; 10(12):2036. https://doi.org/10.3390/rs10122036

Chicago/Turabian Style

Li, Jiaojiao, Bobo Xi, Qian Du, Rui Song, Yunsong Li, and Guangbo Ren. 2018. "Deep Kernel Extreme-Learning Machine for the Spectral–Spatial Classification of Hyperspectral Imagery" Remote Sensing 10, no. 12: 2036. https://doi.org/10.3390/rs10122036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop