Next Article in Journal
Optical Turbulence Profile in Marine Environment with Artificial Neural Network Model
Next Article in Special Issue
Towards Single-Component and Dual-Component Radar Emitter Signal Intra-Pulse Modulation Classification Based on Convolutional Neural Network and Transformer
Previous Article in Journal
One-Shot Dense Network with Polarized Attention for Hyperspectral Image Classification
Previous Article in Special Issue
Non-Line-of-Sight Moving Target Detection Method Based on Noise Suppression
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hyperspectral Image Classification via Deep Structure Dictionary Learning

1
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
2
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
3
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(9), 2266; https://doi.org/10.3390/rs14092266
Submission received: 4 March 2022 / Revised: 28 April 2022 / Accepted: 4 May 2022 / Published: 8 May 2022
(This article belongs to the Special Issue Signal Processing Theory and Methods in Remote Sensing)

Abstract

:
The construction of diverse dictionaries for sparse representation of hyperspectral image (HSI) classification has been a hot topic over the past few years. However, compared with convolutional neural network (CNN) models, dictionary-based models cannot extract deeper spectral information, which will reduce their performance for HSI classification. Moreover, dictionary-based methods have low discriminative capability, which leads to less accurate classification. To solve the above problems, we propose a deep learning-based structure dictionary for HSI classification in this paper. The core ideas are threefold, as follows: (1) To extract the abundant spectral information, we incorporate deep residual neural networks in dictionary learning and represent input signals in the deep feature domain. (2) To enhance the discriminative ability of the proposed model, we optimize the structure of the dictionary and design sharing constraint in terms of sub-dictionaries. Thus, the general and specific feature of HSI samples can be learned separately. (3) To further enhance classification performance, we design two kinds of loss functions, including coding loss and discriminating loss. The coding loss is used to realize the group sparsity of code coefficients, in which within-class spectral samples can be represented intensively and effectively. The Fisher discriminating loss is used to enforce the sparse representation coefficients with large between-class scatter. Extensive tests performed on hyperspectral dataset with bright prospects prove the developed method to be effective and outperform other existing methods.

1. Introduction

Hyperspectral images (HSI) show specific spectral data through sampling from hundreds of contiguous narrow spectral bands [1]. HSI with significantly improved dimensionality of the data can show discriminative spectral information in each pixel, whereas it reveals that it is highly challenging to perform accurate analysis for various land covers in remote sensing [2,3,4], which is the curse of intrinsic or extrinsic spectral variability. HSI feature learning [5,6,7,8] aiming to extract invariable characteristics has become a crucial step in HSI analysis and has been widely used in different applications (e.g., classification, target detection, and image fusion). Therefore, the focus of our framework is extracting an effective spectral feature for HSI classification.
Generally, the existing HSI feature extraction (FE) can be classified into linear and nonlinear methods. The common linear FE models are band-clustering and merging-based approaches [9,10], which split the high-correlation spectral bands into several groups to acquire typical band or feature of a range of groups. In general, the above techniques have low computation cost and can be extensively used in real applications. Kumar et al. [9] calculated a band clustering approach based on discriminative bases by taking into account overall classes in the meantime. Rashwan et al. [10] used the Pearson correlation coefficient of adjacent bands to complete band splitting.
The other linear FE models are projection models that are developed for linearly projecting or transforming the spectral information into feature space with lower dimensions. Principal component analysis (PCA) [11] projects the samples on the eigenvectors of the covariance matrix to capture the maximum variance, which has been widely used for hyperspectral investigation [12]. Green et al. [13] proposed a maximum noise fraction (MNF) model that completed projection under the maximum signal-to-noise ratio (SNR). Despite the low complexity of the linear model, they face low representation capability and fails to process inherently nonlinear hyperspectral data.
A nonlinear model handles hyperspectral data with a nonlinear transformation. It is likely that such nonlinear features perform better than linear features due to the presence of nonlinear class boundaries. One of the widely used nonlinear models is the kernel-based method, which focuses on mapping the data into a higher-dimensional space to achieve better separability. The kernel versions of the abovementioned algorithms, i.e., kernel PCA (KPCA) [14] and kernel ICA (KICA) [15], were proposed and used for HSI classification [16] and change detection [17]. The support vector machine (SVM) [18] is a representative kernel-based approach and has shown effective performance in HSI classification [19]. Bruzzone et al. [20] proposed a hierarchical SVM to capture features in a semi-supervised manner. Recently, a spectral-spatial SVM-based multi-layer learning algorithm was designed for HSI classification [21]. However, the above methods usually have no theoretical foundation to select kernels and may not produce satisfying results in practical applications.
Deep learning is another nonlinear model with great potential for learning features [22]. Chen et al. designed complex frameworks [23,24], including PCA, deep learning architecture, and logistic regression, and used it to verify the eligibility of a stacked auto-encoder (SAE) and a deep belief network (DBN) for HSI classification. Recurrent neural networks (RNNs) [25,26] processed the whole spectral bands as a sequence and used a flexible network configuration for classifying HSIs. Rasti et al. [27] provided a technical overview of the existing techniques for HSI classification, in particular model for deep learning. Although deep learning models present powerful information extraction ability, discriminative ability needs to be improved by rational design of loss functions.
Dictionary-based methods have emerged in recent years for HSI feature learning. To extract features effectively, those methods represent high-dimensional spectral data as a complex combination of dictionary atoms, with a low reconstruction error and a high sparsity level in the abundance matrix. Sparse representation-based classification (SRC) [28] constructed an unsupervised dictionary that was used for HSI classification in [29], and it acts as one vanguard to open the prologue of classification with the help of dictionary coding. SRC operates impressively in face recognition and is robust to different noise [28]; moreover, redundant atoms and disorder structure making is unsuitable for intricate HSI classification [29,30]. Yang et al. [31] constructed a class-specific dictionary to overcome the shortcomings of SRC, but it does not consider the discriminative ability between different coefficients, resulting in low classification accuracy. Yang et al. [32] proposed a complicated model called Fisher discriminant dictionary learning (FDDL), which used the Fisher criterion to learn a structured dictionary, but this model is time consuming and the reconstructive ability needs to be improved. Gu et al. [33] designed an efficient dictionary pair learning (DPL) model that replaces the sparsity constraint with a block-diagonal constraint to reduces the computational cost but the linear projection of the analysis dictionary restricts the classification performance. Akhtar et al. [34] used the Bayesian framework for learning discriminative dictionaries for hyper-spectral classification. Tu et al. combined a discriminative sub-dictionary with a multi-scale super-pixel strategy and achieved a significant improvement in classification [35]. Dictionary-based methods are promising to represent HSI features. However, the above dictionaries encounter difficulty with extracting deeper spectral information and suffer from poor discriminative ability.
The latest works [36,37] tried to incorporate the deep learning module into a dictionary algorithm and achieve impressive results for target detection. Nevertheless, the code coefficients of the above models require more powerful constraints, and the combination form needs to be improved. To address the aforementioned issues, in this paper, we propose a deep learning-based structure dictionary model for HSI classification. The main novelties of this paper are threefold, as follows:
(1)
We devise an effective feature learning framework that adopts convolutional neural networks (CNNs) to capture abundant spectral information and construct a structure dictionary to predict HSI samples.
(2)
We design a novel shared constraint in terms of the sub-dictionaries. In this way, the common and specific feature of HSI samples will be learned separately to represent features in a more discriminative manner.
(3)
We carefully design two kinds of loss functions,, i.e., coding loss and discriminating loss, for code coefficients to enhance the classification performance.
(4)
Extensive experiments conducted on several hyperspectral datasets demonstrate the superiority of proposed method in terms of the performance and efficiency in comparison with the state-of-the-art techniques.
The rest of the paper is organized as follows. Section 2 presents a short description for the experimental datasets which are widely used in HSI classification applications at first. Afterwards, the proposed methods are detailed as illustrated subsequently. Furthermore, Section 3 and Section 4 present the experimental results and the corresponding discussions to better demonstrate the effectiveness of the proposed method. Finally, conclusions are drawn in Section 5.

2. Materials and Methodology

In this section, we first introduce the experimental datasets and then elaborate the framework for our deep learning-based structure dictionary method.

2.1. Experimental Datasets

Despite the research progress in designing robust algorithms in HSI classification applications, some researchers devoted themselves to constructing publicly available datasets, which could provide the community with fair comparisons across different algorithms. In this subsection, we give a brief review in terms of the band range, image resolution, and classes of interests for four popular datasets, which are also employed in this article for further compare our method with the other existing methods.
Center of Pavia [38]: This dataset is acquired by the reflective optics system imaging spectrometer (ROSIS) with 115 spectral bands ranging from 0.43 to 0.86 μm over the urban area in Pavia. It should be mentioned that the noisy and water absorption bands are discarded by the authors in [38] and finally obtaining HSIs with dimension of 1096 × 492 × 102 and nine land cover classes.
Botswana [39]: This dataset was captured by the NASA earth observing one (EO-1) with 145 bands ranging from 0.4 to 2.5 μm over the Okavango Delta, Botswana. The dataset contains 1476 × 256 pixels and 14 classes of interests.
Houston 2013 [40]: This dataset was collected by the compact airborne spectrographic imager (CASI) with 144 bands ranging from 0.38 to 1.05 μm over the campus of the University of Houston and the neighboring urban area. The dataset contains 349 × 1905 pixels and 15 classes of interest.
Houston 2018 [41]: This dataset was acquired by the same CASI sensor with 48 bands sampling the wavelength of between 0.38 and 1.05 μm over the same region as the Houston 2013. Houston 2018 contains 601 × 2384 pixels and 20 classes of interest.

2.2. Methodology

Dictionary learning aims to learn a set of atoms, called visual words in the computer vision community, in which a few atoms can be linearly combined to approximate a given sample well [42]. However, the role of sparse coding in classification is still an open problem and the code coefficients of the recently models require more powerful constraints, and the combination form needs to be improved. Therefore, we propose a deep learning-based structure dictionary model in this paper.
Figure 1 presents the pipeline of the developed framework, with a CNN constructed to encode the spectral information, as well as a structured dictionary established to classify HSIs. Spectral data are first encoded by the CNN model, in which residual networks are used to optimize the main networks. We can acquire the group sparse code using the full connection layer. Meanwhile, two loss functions, i.e., coding loss and Fisher discriminating loss, are calculated to optimize code coefficients. More importantly, a discriminative dictionary is constructed by reconstruction loss to enhance the discriminative ability of the developed model.

2.2.1. Residual Networks Encoder

Compared with the common dictionary-based models using sparse constraints, the group sparsity models have more intensive and effective code coefficients, i.e., the coefficient values converge to the diagonal of the matrix [33] and achieve more effective results in classification applications. Therefore, we replace the sparsity constraint with a group sparsity constraint for the dictionary model and construct an encoder to acquire the group sparsity effect.
Suppose Γ = [ Γ 1 , , Γ i , , Γ C ] R ( ( N A · C ) × N ) is coding coefficients of discriminative dictionary D = [ D 1 , , D i , , D C ] R L × ( N A · C ) for spectral samples X = [ X 1 , , X i , , X C ] R L × N . We want to construct an encoder P = [ P 1 , , P j , , P C ] R ( ( N A · C ) × N ) , and apply P j , to project the spectral samples X i into to a nearly null space, i.e.,
Γ i j = P j X i 0 , j i .
Considering the poor performance of linear projection, a CNN encoder is designed to complete the transformation. Thus, the performance for dictionary learning will be enhanced greatly. He et al. [43] suggested that deeper networks encounter difficulties in the following degradation problem: as the network depth increases, accuracy becomes saturated and then degrades rapidly. Therefore, residual networks are used to address the degradation problem and increase the convergence rate. As depicted in Figure 2, the building block of residual networks (Figure 2a) contains two convolutional layers (Conv), two batch normalization layers (BN), and two leaky ReLU layers. The output X i ( k + 1 ) of residual networks is calculated as follows:
X i ( k + 1 ) = X i ( k ) + F X i ( k ) .
We explicitly let the stacked nonlinear layers fit the residual function F X i ( k ) and original mapping is recast into X i ( k ) + F X i ( k ) . The formulation of Equation (2) can be realized by feedforward neural networks with “shortcut connections” (Figure 2a). Shortcut connections [43,44] are those skipping one or more layers. The entire network can still be trained end-to-end by stochastic gradient descent (SGD) with backpropagation.
Based on the above block of residual networks, we construct a 15-layer encoder (in terms of convolutional layers) as presented in Figure 2b. We first employ a 1 × 5 convolutional layer to capture large receptive field [45] and max pooling is used to select the remarkable values. Then, we design 7 pieces of residual block, in which a 1 × 3 convolutional layer is used to acquire effective receptive field. More importantly, feature maps corresponding to various convolutional kernel increase from 16 to 128 to acquire abundant spectral information. Finally, we adopt an average pooling to compress the spectral information and employ a fully connected (FC) layer to connect the coding coefficients. The output number of FC depends on the product of spectral band and sub-dictionary atom numbers. For backpropagation, we use SGD with a batch size of 8. The learning rate starts from 0.1 and epoch number is 500.
We apply the encoder P to extract spectral information from HSI and enforce the code coefficients Γ to be group sparsity. To enhance the discriminative ability of our model, we build a structured dictionary, where each sub-dictionary can be directly used to represent the specific-class samples, i.e., the dictionaries are interpretable. The structure dictionary can be calculated as follows:
{ Γ , D } = arg min D , Γ i = 1 C X i D i Γ i F 2 + j = 1 , j i C Γ i j F 2 .
The first term of Equation (3) is the reconstruction loss which is used to construct the structure dictionary. Each sub-dictionary D i is established and learned from ith class samples. The second term i = 1 C j = 1 , j i C λ Γ i j F 2 for Equation (3) is the coding loss. However, the structure of existing dictionary needs optimization to learn common and specific features of HSI samples. As depicted in Figure 3, the test samples contain the common characteristics, i.e., the features they shared. To solve this problem, we design the shared constraint for sub-dictionaries. Our shared constraint (the com sub-dictionary) is built to describe duplicated information (common characteristics in Figure 3). Then, the discriminative characteristics will be “amplified” relative to the original characteristics.

2.2.2. Dictionary Learning

Here, we design a sub-dictionary D c o m to calculate the class-shared characters as follows:
D = { D 1 , D 2 , , D C , D c o m } ,
where D c o m denotes the shared (common) sub-dictionary. Each sub-dictionary (both specific and common ones) D i R L × N A contains N A atoms, and each atom is L × 1 column vector.The matrix of specific and common sub-dictionaries is randomly initialized, and corresponding atoms are constantly updated according to the objective function. The corresponding objective function is modified as follows:
{ Γ , D } = arg min D , Γ i = 1 C X i D i Γ i D c o m Γ c o m F 2 + j = 1 , j i C Γ i j F 2 ,
where Γ c o m is the coding coefficient for common sub-dictionary. With the calculation of term D c o m Γ c o m , the results of term i = 1 C j = 1 , j i C Γ i j F 2 tend to be closer to zero, and the corresponding reconstructive ability of the structured dictionary will be improved. Meanwhile, the coding loss and discriminating loss will facilitate the construction of shared (common) sub-dictionaries.
In our framework, the dictionary is calculated by the single convolutional layer, and the convolutional weights serve as dictionary coefficients. Therefore, the dictionary is updated by the back-propagation (BP) algorithm. We design additional channels for shared (common) sub-dictionaries that will also be updated during BP processing. The main difference from the CNN model (end-to-end) is that we design a dictionary module, which contains various sub-dictionaries, to optimize the discriminative ability. Moreover, we design coding and discriminating loss for the middle variable (code coefficients Γ ), which is a very different algorithm structure.

2.2.3. Loss Functions

To enhance the discriminative ability of the developed model, we design two kinds of loss functions for the proposed model. First, we design the coding loss i = 1 C j = 1 , j i C Γ i j F 2 to realize a fast and effective spectral data encoding.
Then, Fisher discriminative loss ( t r ( S W ( Γ ) ) t r ( S B ( Γ ) ) ) is designed to enhance the discriminative ability of our model. Meanwhile, a reconstruction loss i = 1 C | | X i D i Γ i D c o m Γ c o m | | F 2 is also used to build the structure dictionary. Overall, we apply three loss functions to optimize the classification performance.
Two kinds of discriminative loss functions can be implemented by programming. One is Fisher discriminative loss ( t r ( S W ( Γ ) ) t r ( S B ( Γ ) ) ) and the other is cross-entropy loss s o f t m a x ( Γ ) . Both loss functions have been implemented in the program, and we will provide two versions of the final program for researchers. In this paper, all of the results are calculated with cross-entropy loss. Figure 4 shows the variation trend of three loss function values and classification accuracy with increasing epoch. All loss functions are convergent, and our approach achieves excellent classification accuracy. Therefore, the final objective function is as follows:
{ Γ , D } = arg min D , Γ λ 1 i = 1 C X i D i Γ i D c o m Γ c o m F 2 + λ 2 j = 1 , j i C Γ i j F 2 + s o f t m a x ( Γ ) ,
where λ 1 = 100 and λ 2 = 1 are the scalar constants and they will be discussed in experiments.

3. Experimental Results and Analysis

In this section, we quantitatively and qualitatively evaluate the classification performance of the proposed model on four public datasets, namely Pavia [38], Botswana [39], Houston 2013 [40], and Houston 2018 [41]. We compare the performance of the proposed method with other existing algorithms, including SVM [18,21], FDDL [32], DPL [33], ResNet [33], AE [27], RNN [27], CNN [27], and CRNN [27], for HSI classification. We report the overall accuracy (OA), average accuracy (AA) and kappa coefficient [46] of different datasets and present corresponding classification maps. Furthermore, we analyze the classification performance in detail, following each experimental dataset.

3.1. Sample Selection

We randomly choose 10% of the labeled samples in the dataset as the training data. To overcome the imbalance issue, we adopt a weighted sample generation strategy and make all trained samples per class equal to others as follows:
x n e w = α x 1 + ( 1 α ) x 2 ,
where x n e w is the new sample generated by combining of samples x 1 and x 2 , and α is a random constant between 0 and 1. Samples x 1 and x 2 are randomly selected from the same class of training data. All the methods in comparison are implemented on the balanced dataset in this strategy.

3.2. Parameter Setting

In the proposed model, two groups of free parameters need to be adjusted: (1) the number of dictionary atoms and (2) regularization parameters λ 1 and λ 2 . The above parameter settings are a critical factor to ensure the performance of the model and will be analyzed.

3.2.1. Number of Dictionary Atoms

We set all the subdictionaries D i { D 1 , D 2 , , D C , D c o m } to have the same number atoms. The number of atoms is adaptively estimated with Houston 2013 datasets, as depicted in Figure 5. The classification OA increases with the number of atoms for each sub-dictionary (atom number under 8), while the changes in OA are not enormous along with the variation in atom number beyond 10. We set the number of atoms to 8 for each dataset to efficiently train the developed model.

3.2.2. Constraint Coefficients λ 1 and λ 2

λ 1 and λ 2 mainly balance reconstruction and coding loss in Equation (6). We test the proposed model with regularization parameters λ 1 and λ 2 ranging from 0 to 10 3 , and select the value of parameter with the highest accuracy. Figure 6 presents the visualized result on Houston 2013 datasets. The OA values will decrease greatly with the increasing of λ 2 (beyond 10). This result demonstrates that our model is sensitive for coding loss. We set λ 1 = 100 and λ 2 = 1 for the classification of Houston 2013 dataset.

3.3. Classification Performance Analysis for Different Datasets

3.3.1. Center of Pavia

Table 1 shows the classification results of the compared algorithms. According to the presented score, our method outperforms other algorithms, especially on Water, Tiles, and Bare Soil surface regions. Benefiting from the designed deep and sub-dictionary framework, the proposed method is more outstanding than other dictionary learning-based method (i.e., FDDL). Compared with other CNN-based models, by involving dictionary structure, our method shows better deep feature extraction ability and achieves higher OA, AA, and kappa coefficients with 98.75 % , 96.55 % and 97.85 % , respectively. In addition, Figure 7 presents the confusion matrix of the developed model, which also indicates an effective discrimination ability for surface regions.
Figure 8 presents the classification maps acquired by various methods on the Center of Pavia data set. Figure 8a,h are the pseudo-color image and ground truth map, and Figure 8b–g depicts the corresponding classification maps of FDDL, DPL, ResNet, RNN, CNN, and the proposed model, respectively. It can be seen that our method shows better visual performance than the dictionary and CNN-based algorithms in both the smoothness in same material region and sharpness on the edges between different material one. For better comparison, we highlight river bank and building regions with red and yellow rectangles in Figure 8. In terms of the classification results in the red rectangle, all of the methods, except the CNN model and our method, incorrectly classify some water samples into Bare Soil class. As for the yellow rectangle region, our model achieved smoothest classification results on Bara Soil class in which CNN method confuses some samples into Bitumen class, which leads to a inferior score when comparing with our proposed algorithm.

3.3.2. Botswana

Table 2 presents the class-specific classification accuracy in terms of the Botswana data set, where our algorithm achieves the highest classification accuracy in over 50 percent of the classes (8 out of total 14 classes). In particular, our method performs better than other comparison methods in classes Reeds, Acacia Woodlands, and Exposed Soils. To be more specific, for the Reeds class our method is almost 6 % higher than the others, and 9 % higher than comparison methods for the Exposed Soils class. In addition, according to the corresponding confusion matrix in Figure 9, our algorithm outperforms other methods in terms of OA, AA, and kappa coefficients with 92.2 % , 92.71 % , and 91.56 % , respectively. Such performance could further demonstrate the superior discrimination classification capability for our model.
Figure 10 shows the classification maps for the Botswana dataset, where Figure 10a,h are the pseudo-color image and ground truth map, and Figure 10b–g are the corresponding classification results of FDDL, DPL, ResNet, RNN, CNN, and the proposed model. We mark the most misplaced regions with yellow and red rectangles for more obvious comparison. As shown in yellow rectangle region, Acacia Woodlands and Acacia Shrublands are wrongly classified into Exposed Soils or the Floodplain Grasses 1 class by other methods. As for the rectangles colored red, other methods almost lose the ability to distinguish between different land covers, i.e., the CNN method misclassified the Island Interior class into the Acacia Shrublands class. In contrast, our approach removed noisy scattered points and led to smoother classification results without blurring the boundaries. The superior performance could be owing to the effectiveness of the proposed structured dictionary learning model.

3.3.3. Houston 2013

The classification results achieved by the methods in comparison on the Houston 2013 dataset are presented in Table 3, and Figure 11 presents the confusion matrix relative to the developed model. As depicted in Table 3, the classification results is more than 5 % higher than other methods on classes Commercial, Road, Highway and Parking Lot 1, while it performs competitively with other methods for other classes. Specifically, for Stressed Grass class, our method only is 0.4 % lower than AE method and 0.1 % lower than CNN method, but almost two percent in average higher than the other methods. It is noteworthy that the developed algorithm outperforms other operators in feature extraction generally for OA, AA and kappa with 95.39 % , 94.99 % , and 95.02 % , respectively. Furthermore, the confusion matrix in Figure 11 also reveals that the algorithm here can be effective to distinguish a surface region.
Figure 12 depicts the classification map generated by the approaches in comparison on Houston 2013 dataset. Figure 12a,h shows the pseudo-color image and ground truth map, and Figure 12b–g shows the corresponding classification results thereof. We label the misclassified regions which contain buildings and cars with yellow and red rectangles, respectively. According to the ground truth in yellow regions, there exists a small Parking Lot 1 class but this is misplaced by most of the methods. Specifically, the classification of DPL, RNN, and CNN contain many misclassified points which lead to an unsatisfactory classification effect. For the red one, the other algorithms show significant misclassification of the Parking Lot 2 class, FDDL, DPL and ResNet are unable to distinguish between Parking Lot 2 and Parking Lot 1 classes. In contrast, as impacted by the robust property within the spectra’s local variation, our method showed more complete and correct classification results. Besides that, our method also achieved an effective removal of salt-and-pepper noise effects from the classification map. Moreover, it maintains the significant objects or structures. The developed model acquires a classification map with higher accuracy around and in the parking lot due to the robust property in the spectra’s local variation.

3.3.4. Houston 2018

Table 4 and Figure 13 present the classification results and confusion matrix on Houston 2018 datasets in comparison with other algorithms, respectively. According to the results, one can see that the classification accuracy decreases for each method when compared to the one in the Houston 2013 dataset. This phenomenon is more obvious for sidewalk and crosswalk classes, where the highest scores are only 56.11 % and 76.33 % , respectively. We believe that the decrease could be attributed to the increase of land cover types and the reduction of spectral information (only 46 spectral bands). In addition, our method ranks first in 9 out of 20 total classes of land covers and our method also outperforms other algorithms at least two points in terms of OA, AA, and kappa indicators. Furthermore, our model also shows significant discriminative ability between 20 classes according to the confusion matrix as illustrated in Figure 13.
Figure 14 highlights the superiority of the proposed method by means of classification maps on Houston 2018 dataset. Figure 14a,h shows the pseudo-color image and ground truth map, and Figure 14b–g shows the corresponding classification results of FDDL, DPL, ResNet, RNN, CNN, and the proposed model. For better visualization, we marked the Commercial, Road, Major Thoroughfares, and Stressed grass with yellow and red rectangles. According to the yellow rectangles, FDDL and DPL classify the Commercial into the Seats class, and they fail to identify the Road class owing to their poor ability to extract deeper spectral information. For the region of the red rectangle, RNN and CNN misclassified small Road into Cars. Moreover, there also exist some noisy points in the Commercial class. In contrast, our models showed more smooth and complete classification results.
According to the experimental results above, our approach achieved the highest OA, AA, and kappa coefficient on four benchmark datasets which were collected by different sensors with various land covers. Compared with other dictionary-based and CNN-based models, our model generally achieves smoother and more accurate classification results, especially for the large area of homogeneous land cover and the irregular small area types. The aforementioned experiments verified that benefiting from the rationally designed structure classification layer (dictionary) and loss functions, our model has the capability to extract the intrinsic invariant spectral feature representation from the HSI and achieves more effective feature extraction.

4. Discussion

In the last section, we presented the result and analysis of experiments on each dataset. In this section, we analyze and discuss the key factors, which affect the performance of the algorithm and need to be considered in practical HSI classification applications.

4.1. Influence of Imbalanced Samples

Chang et al. [47] suggested that imbalanced data will be of great significance to the HSI classification. To explore the effect arising from imbalanced data on classification performance, we randomly choose 10% of the labeled sample of each class and performed an experiment on Houston 2013 dataset with imbalanced data, as listed in Table 5. The imbalanced data will reduce the classification performance, leading to a decrease of about 3∼4% in classification accuracy. Our model still achieves the best performance when the training data are imbalanced. The classification results confirm the powerful classification ability of our model.

4.2. Influence of Small Training Samples

To confirm the effectiveness of our framework in practical scenario with the absence of samples, we reduce the number of training samples into 10, 20, and 30 per class. As listed in Table 6, we observed that the models generally suffer from unstable classification performance for different classes. Overall, the proposed model can achieve the best results in most of the indices with an increase of at least 4% accuracy compared with CNN-based models. We will obtain a better classification result with the increasing training data which corresponds with the results.

4.3. Computational Cost

Another factor that affects the practical application of the HSI classification model is the efficiency of the algorithm. Therefore, we present the computational time-consuming for the comparison algorithms.
All the tests here are performed on an Intel Core i7-8700 CPU 3.20 GHz desktop with 16 GB memory and carried out by employing Python on the Windows 10 operating system. In Table 7, we can see that the developed model has a slower running speed than DPL, which applies linear projection to extract spectral features. However, the developed model achieves faster testing speed than CNN-based models due to its simpler convolutional structure.

5. Conclusions

In this paper, we propose a novel deep learning-based structure dictionary model to extract spectral features from HSIs. Specifically, a residual network is used with a dictionary learning framework to strengthen the feature representation for the original data. Afterwards, sub-dictionaries with shared constraints are introduced to extract the common feature for samples with different class label. Moreover, three kinds of loss functions are combined to enhance the discriminative ability for the overall model. Numerous tests were carried out on HSI datasets, and qualitatively and quantitatively results showed that the proposed feature learning model requires much less computation time when comparing with other SVM and CNN-based models, demonstrating its potential and superiority in HSI classification tasks.

Author Contributions

Funding acquisition, Y.H. and C.D.; Methodology, W.W. and Z.L.; Supervision, C.D.; Validation, Z.L.; Writing original draft, W.W. and Y.H. All authors have read and agreed to the published version of this manuscript.

Funding

This study receives support from China Postdoctoral Science Foundation under Grant 2021TQ0177 and the National Natural Science Foundation of China (NSFC) under Grant 62171040.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found in references [38,39,40,41].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ghamisi, P.; Plaza, J.; Chen, Y.; Li, J.; Plaza, A.J. Advanced spectral classifiers for hyperspectral images: A review. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–32. [Google Scholar] [CrossRef] [Green Version]
  2. Hong, D.; Yokoya, N.; Chanussot, J.; Zhu, X.X. An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Trans. Image Process. 2018, 28, 1923–1938. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Li, J.; Huang, X.; Gamba, P.; Bioucas-Dias, J.M.; Zhang, L.; Benediktsson, J.A.; Plaza, A. Multiple feature learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2014, 53, 1592–1606. [Google Scholar] [CrossRef] [Green Version]
  4. Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Du, Q.; Zhang, B. More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4340–4354. [Google Scholar] [CrossRef]
  5. Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Du, Q.; Zhang, B. Joint and progressive subspace analysis (JPSA) with spatial-spectral manifold alignment for semi-supervised hyperspectral dimensionality reduction. IEEE Trans. Cybern. 2021, 51, 3602–3615. [Google Scholar] [CrossRef]
  6. Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5966–5978. [Google Scholar] [CrossRef]
  7. Hong, D.; Gao, L.; Yao, J.; Yokoya, N.; Chanussot, J.; Heiden, U.; Zhang, B. Endmember-guided unmixing network (EGU-Net): A general deep learning framework for self-supervised hyperspectral unmixing. IEEE Trans. Neural Netw. Learn. Syst. 2021. [Google Scholar] [CrossRef]
  8. Liu, X.; Deng, C.; Chanussot, J.; Hong, D.; Zhao, B. StfNet: A two-stream convolutional neural network for spatiotemporal image fusion. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6552–6564. [Google Scholar] [CrossRef]
  9. Kumar, S.; Ghosh, J.; Crawford, M.M. Best-bases feature extraction algorithms for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2001, 39, 1368–1379. [Google Scholar] [CrossRef] [Green Version]
  10. Rashwan, S.; Dobigeon, N. A split-and-merge approach for hyperspectral band selection. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1378–1382. [Google Scholar] [CrossRef] [Green Version]
  11. Jolliffe, I.T. Principal component analysis. Technometrics 2003, 45, 276. [Google Scholar]
  12. Senthilnath, J.; Omkar, S.; Mani, V.; Karnwal, N.; Shreyas, P. Crop stage classification of hyperspectral data using unsupervised techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 6, 861–866. [Google Scholar] [CrossRef]
  13. Green, A.A.; Berman, M.; Switzer, P.; Craig, M.D. A transformation for ordering multispectral data in terms of image quality with implications for noise removal. IEEE Trans. Geosci. Remote Sens. 1988, 26, 65–74. [Google Scholar] [CrossRef] [Green Version]
  14. Schölkopf, B.; Smola, A.; Müller, K.R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef] [Green Version]
  15. Mei, F.; Zhao, C.; Wang, L.; Huo, H. Anomaly detection in hyperspectral imagery based on kernel ICA feature extraction. In Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application, Shanghai, China, 20–22 December 2008; Volume 1, pp. 869–873. [Google Scholar]
  16. Fauvel, M.; Chanussot, J.; Benediktsson, J.A. Kernel principal component analysis for the classification of hyperspectral remote sensing data over urban areas. EURASIP J. Adv. Signal Process. 2009, 2009, 1–14. [Google Scholar] [CrossRef] [Green Version]
  17. Marchesi, S.; Bruzzone, L. ICA and kernel ICA for change detection in multispectral remote sensing images. In Proceedings of the 2009 IEEE International Geoscience and Remote Sensing Symposium, Cape Town, South Africa, 12–17 July 2009; Volume 2. [Google Scholar]
  18. Cortes, C.; Vapnik, V. Support vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  19. Camps-Valls, G.; Bruzzone, L. Kernel-based methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1351–1362. [Google Scholar] [CrossRef]
  20. Bruzzone, L.; Chi, M.; Marconcini, M. A novel transductive SVM for semisupervised classification of remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2006, 44, 3363–3373. [Google Scholar] [CrossRef] [Green Version]
  21. Zhao, C.; Liu, W.; Xu, Y.; Wen, J. A spectral-spatial SVM-based multi-layer learning algorithm for hyperspectral image classification. Remote Sens. Lett. 2018, 9, 218–227. [Google Scholar] [CrossRef]
  22. Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef] [Green Version]
  23. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  24. Chen, Y.; Zhao, X.; Jia, X. Spectral–spatial classification of hyperspectral data based on deep belief network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [Google Scholar] [CrossRef]
  25. Shi, C.; Pun, C.M. Multi-scale hierarchical recurrent neural networks for hyperspectral image classification. Neurocomputing 2018, 294, 82–93. [Google Scholar] [CrossRef]
  26. Hang, R.; Liu, Q.; Hong, D.; Ghamisi, P. Cascaded recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5384–5394. [Google Scholar] [CrossRef] [Green Version]
  27. Rasti, B.; Hong, D.; Hang, R.; Ghamisi, P.; Kang, X.; Chanussot, J.; Benediktsson, J.A. Feature extraction for hyperspectral imagery: The evolution from shallow to deep: Overview and toolbox. IEEE Geosci. Remote Sens. Mag. 2020, 8, 60–88. [Google Scholar] [CrossRef]
  28. Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.S.; Ma, Y. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 210–227. [Google Scholar] [CrossRef] [Green Version]
  29. Chen, Y.; Nasrabadi, N.M.; Tran, T.D. Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3973–3985. [Google Scholar] [CrossRef]
  30. Gao, L.; Yu, H.; Zhang, B.; Li, Q. Locality-preserving sparse representation-based classification in hyperspectral imagery. J. Appl. Remote Sens. 2016, 10, 042004. [Google Scholar] [CrossRef]
  31. Yang, M.; Zhang, L.; Yang, J.; Zhang, D. Metaface learning for sparse representation based face recognition. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 1601–1604. [Google Scholar]
  32. Yang, M.; Zhang, L.; Feng, X.; Zhang, D. Fisher discrimination dictionary learning for sparse representation. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 543–550. [Google Scholar]
  33. Gu, S.; Zhang, L.; Zuo, W.; Feng, X. Projective dictionary pair learning for pattern classification. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
  34. Akhtar, N.; Mian, A. Nonparametric coupled Bayesian dictionary and classifier learning for hyperspectral classification. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 4038–4050. [Google Scholar] [CrossRef]
  35. Tu, X.; Shen, X.; Fu, P.; Wang, T.; Sun, Q.; Ji, Z. Discriminant sub-dictionary learning with adaptive multiscale superpixel representation for hyperspectral image classification. Neurocomputing 2020, 409, 131–145. [Google Scholar] [CrossRef]
  36. Tang, H.; Liu, H.; Xiao, W.; Sebe, N. When dictionary learning meets deep learning: Deep dictionary learning and coding network for image recognition with limited data. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 2129–2141. [Google Scholar] [CrossRef] [PubMed]
  37. Tao, L.; Zhou, Y.; Jiang, X.; Liu, X.; Zhou, Z. Convolutional neural network-based dictionary learning for SAR target recognition. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1776–1780. [Google Scholar] [CrossRef]
  38. Liu, Q.; Zhou, F.; Hang, R.; Yuan, X. Bidirectional-convolutional LSTM based spectral-spatial feature learning for hyperspectral image classification. Remote Sens. 2017, 9, 1330. [Google Scholar] [CrossRef] [Green Version]
  39. Yang, H.L.; Crawford, M.M. Spectral and spatial proximity-based manifold alignment for multitemporal hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2016, 54, 51–64. [Google Scholar] [CrossRef]
  40. Debes, C.; Merentitis, A.; Heremans, R.; Hahn, J.; Frangiadakis, N.; Kasteren, T.; Liao, W.; Bellens, R.; Pižurica, A.; Gautama, S. Hyperspectral and LiDAR Data Fusion: Outcome of the 2013 GRSS Data Fusion Contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2405–2418. [Google Scholar] [CrossRef]
  41. Xu, Y.; Du, B.; Zhang, L.; Cerra, D.; Pato, M.; Carmona, E.; Prasad, S.; Yokoya, N.; Hänsch, R.; Le Saux, B. Advanced Multi-Sensor Optical Remote Sensing for Urban Land Use and Land Cover Classification: Outcome of the 2018 IEEE GRSS Data Fusion Contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1709–1724. [Google Scholar] [CrossRef]
  42. Kong, S.; Wang, D. A brief summary of dictionary learning based approach for classification (revised). arXiv 2012, arXiv:1205.6544. [Google Scholar]
  43. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  44. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
  45. Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
  46. Sinha, B.; Yimprayoon, P.; Tiensuwan, M. Cohen’s Kappa Statistic: A Critical Appraisal and Some Modifications. Math. Calcutta Stat. Assoc. Bull. 2006, 58, 151–170. [Google Scholar] [CrossRef]
  47. Chang, C.I.; Ma, K.Y.; Liang, C.C.; Kuo, Y.M.; Chen, S.; Zhong, S. Iterative random training sampling spectral spatial classification for hyperspectral images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3986–4007. [Google Scholar] [CrossRef]
Figure 1. Workflow of the proposed feature extraction model.
Figure 1. Workflow of the proposed feature extraction model.
Remotesensing 14 02266 g001
Figure 2. Network architectures for our encoder: (a) a block of residual networks, (b) main structure of CNNs.
Figure 2. Network architectures for our encoder: (a) a block of residual networks, (b) main structure of CNNs.
Remotesensing 14 02266 g002
Figure 3. Overview of the built dictionary of the developed model. Shared constraints are used to describe the common features of all classes of HSI samples.
Figure 3. Overview of the built dictionary of the developed model. Shared constraints are used to describe the common features of all classes of HSI samples.
Remotesensing 14 02266 g003
Figure 4. The loss function value of training samples and classification accuracy of the developed model versus the number of epochs.
Figure 4. The loss function value of training samples and classification accuracy of the developed model versus the number of epochs.
Remotesensing 14 02266 g004
Figure 5. The classification OA under different numbers of atoms for each sub-dictionary.
Figure 5. The classification OA under different numbers of atoms for each sub-dictionary.
Remotesensing 14 02266 g005
Figure 6. The classification OA under different regularization parameters λ 1 and λ 2 .
Figure 6. The classification OA under different regularization parameters λ 1 and λ 2 .
Remotesensing 14 02266 g006
Figure 7. The confusion matrix of the developed model on the Center of Pavia dataset.
Figure 7. The confusion matrix of the developed model on the Center of Pavia dataset.
Remotesensing 14 02266 g007
Figure 8. Classification maps of the Center of Pavia dataset with methods in comparison: (a) pseudo-color image; (b) FDDL; (c) DPL; (d) ResNet; (e) RNN; (f) CNN; (g) Ours; (h) ground truth. The yellow and red rectangles correspond to building and water areas.
Figure 8. Classification maps of the Center of Pavia dataset with methods in comparison: (a) pseudo-color image; (b) FDDL; (c) DPL; (d) ResNet; (e) RNN; (f) CNN; (g) Ours; (h) ground truth. The yellow and red rectangles correspond to building and water areas.
Remotesensing 14 02266 g008
Figure 9. Confusion matrix of the developed model on the Botswana dataset.
Figure 9. Confusion matrix of the developed model on the Botswana dataset.
Remotesensing 14 02266 g009
Figure 10. Classification map for the Botswana dataset with methods in comparison: (a) pseudo-color image; (b) FDDL; (c) DPL; (d) ResNet; (e) RNN; (f) CNN; (g) ours; (h) ground truth. Rectangles colored as red and yellow represent mountain and grassland areas.
Figure 10. Classification map for the Botswana dataset with methods in comparison: (a) pseudo-color image; (b) FDDL; (c) DPL; (d) ResNet; (e) RNN; (f) CNN; (g) ours; (h) ground truth. Rectangles colored as red and yellow represent mountain and grassland areas.
Remotesensing 14 02266 g010
Figure 11. Confusion matrix of the developed model on the Houston 2013 dataset.
Figure 11. Confusion matrix of the developed model on the Houston 2013 dataset.
Remotesensing 14 02266 g011
Figure 12. Classification map generated by Houston 2013 dataset with approaches in comparison: (a) pseudo-color image; (b) FDDL; (c) DPL; (d) ResNet; (e) RNN; (f) CNN; (g) ours; (h) ground truth. The rectangles with the colors of red and yellow represent the parking lot space and building area.
Figure 12. Classification map generated by Houston 2013 dataset with approaches in comparison: (a) pseudo-color image; (b) FDDL; (c) DPL; (d) ResNet; (e) RNN; (f) CNN; (g) ours; (h) ground truth. The rectangles with the colors of red and yellow represent the parking lot space and building area.
Remotesensing 14 02266 g012
Figure 13. The confusion matrix of our model on the Houston 2018 dataset.
Figure 13. The confusion matrix of our model on the Houston 2018 dataset.
Remotesensing 14 02266 g013
Figure 14. Classification maps of the Houston 2018 dataset with compared methods: (a) pseudo-color image; (b) FDDL; (c) DPL; (d) ResNet; (e) RNN; (f) CNN; (g) Ours; (h) ground truth. The yellow and red rectangles are corresponding to grassland andbuilding areas.
Figure 14. Classification maps of the Houston 2018 dataset with compared methods: (a) pseudo-color image; (b) FDDL; (c) DPL; (d) ResNet; (e) RNN; (f) CNN; (g) Ours; (h) ground truth. The yellow and red rectangles are corresponding to grassland andbuilding areas.
Remotesensing 14 02266 g014
Table 1. Classification Accuracy for Center of Pavia Dataset. The red bold fonts and blue italic fonts indicate the best and the second best performance.
Table 1. Classification Accuracy for Center of Pavia Dataset. The red bold fonts and blue italic fonts indicate the best and the second best performance.
ClassSVMFDDLDPLResNetAERNNCNNCRNNOurs
10.98660.98820.98560.98450.99970.98360.99660.99991.0000
20.63020.23190.37430.66410.97520.41180.74960.98610.9662
30.97080.98510.96820.96440.88840.99020.96690.89940.9579
40.50550.37600.25680.48770.86750.46460.52560.85000.8619
50.99690.98480.97290.98350.96800.99240.99050.98090.9785
60.66590.69440.85760.70350.95970.83350.93310.96960.9776
70.91630.88110.91430.93630.94430.94650.95030.96040.9556
80.94160.95950.97110.95040.98120.97940.99040.99610.9925
90.99650.96430.98250.98950.99800.99300.98740.99800.9995
OA0.92340.90570.92440.92890.98280.93310.96630.98640.9875
AA0.84560.78500.80930.85150.95350.84390.89890.96000.9655
kappa0.89270.86770.89370.90040.97040.90600.95240.97670.9785
Table 2. Classification Accuracy for Botswana Dataset. The red bold fonts and blue italic fonts indicate the best and the second best performance.
Table 2. Classification Accuracy for Botswana Dataset. The red bold fonts and blue italic fonts indicate the best and the second best performance.
ClassSVMFDDLDPLResNetAERNNCNNCRNNOurs
10.94650.97120.97940.98350.89340.93460.94920.95291.0000
21.00000.85710.93410.98900.71260.91890.83330.90480.9136
30.84510.79200.84960.82740.94260.83660.92640.97700.9701
40.89180.78870.91750.89180.61110.78460.93230.94790.9709
50.70370.68310.72840.75720.78800.77040.82190.82000.8935
60.68310.64610.63790.62140.65520.62500.78610.74710.7222
70.96150.74790.93160.90170.94620.92340.96070.97350.9808
80.88520.91260.98360.97810.77840.82140.90050.93940.9816
90.72790.70320.67840.77390.78770.76510.76510.87500.9405
100.73210.47770.83480.85270.79190.77040.80710.87680.8543
110.74180.75640.89450.88360.72330.84040.85170.88970.9221
120.90800.80370.88340.98160.73530.77460.85800.79270.9379
130.57850.78100.85540.73970.85220.73710.89660.88990.8930
140.90700.66280.79070.79070.74680.74040.89010.79001.0000
OA0.80170.75150.84200.84440.78840.80170.86760.88460.9220
AA0.82230.75600.85000.85520.78320.80310.86990.88400.9271
kappa0.78540.73110.82890.83160.77060.78500.85660.87510.9156
Table 3. Classification Accuracy for Houston 2013 Dataset. The red bold fonts and blue italic fonts indicate the best and the second best performance.
Table 3. Classification Accuracy for Houston 2013 Dataset. The red bold fonts and blue italic fonts indicate the best and the second best performance.
ClassSVMFDDLDPLResNetAERNNCNNCRNNOurs
10.88900.90760.98310.93870.91660.95380.92240.96590.9920
20.93530.94770.98140.97520.98560.96280.98240.94430.9811
30.95860.99840.98250.99041.00000.98570.98880.99520.9892
40.88750.94460.86340.95980.94800.97140.94350.99620.9980
50.92840.97760.99020.97230.95630.97850.96630.97790.9930
60.87030.98290.96930.95900.92880.92490.96910.98980.9846
70.62610.78810.69960.79770.83410.78200.85670.93890.9369
80.72500.51880.65710.56340.79070.42230.79450.84880.9578
90.55100.65570.73290.70630.71580.70450.72690.85800.9152
100.63890.42440.84620.77470.79820.77380.78080.84890.9460
110.51170.43170.59260.77520.78400.83540.78890.87810.9008
120.53960.53150.65950.60360.70230.74500.73480.85500.9422
130.27660.54140.28840.64300.79110.57450.48790.62500.7313
140.96890.99480.98960.98960.94500.99080.97210.98070.9854
150.95450.98820.98480.95620.99660.97810.93510.98660.9943
OA0.74090.74760.81030.82550.86000.82800.85490.91270.9539
AA0.75080.77560.81470.84040.87290.83810.85790.91260.9499
kappa0.71990.72710.79490.81140.84850.81420.84310.90560.9502
Table 4. Classification Accuracy for Houston 2018 Dataset. The red bold fonts and blue italic fonts indicate the best and the second best performance.
Table 4. Classification Accuracy for Houston 2018 Dataset. The red bold fonts and blue italic fonts indicate the best and the second best performance.
ClassSVMFDDLDPLResNetAERNNCNNCRNNOurs
10.99220.92950.98130.94860.83190.79250.60370.81570.9399
20.93710.80080.90640.75040.92750.92770.93050.88980.9288
30.98211.00001.00001.00000.99680.99510.99680.99520.9984
40.97170.89720.96470.93670.90280.84210.85200.87380.9653
50.87740.83870.89020.77010.74850.59420.40480.76970.8604
60.96700.83050.97840.97540.91430.94930.81310.90730.9847
70.92080.99580.99580.96250.88850.94240.81310.97950.9625
80.75350.68290.72470.80430.67410.74730.69750.83280.8802
90.63410.41070.64230.80660.94320.93800.98350.92620.9277
100.45010.26930.37570.44520.67650.68030.63940.65830.7109
110.45910.37530.43580.48520.68090.60730.43010.68440.6975
120.50910.44990.56110.34160.27620.26800.19450.25130.3738
130.47000.25440.42350.43890.73620.73780.62730.76330.7184
140.81170.78700.83800.75280.71660.71440.64780.69860.8308
150.96430.71540.93870.93660.94090.92230.89680.90900.9819
160.89340.81790.88430.81290.84680.79420.72260.88780.9080
170.96210.89390.98480.99240.86180.97540.90340.94701.0000
180.82030.63600.71250.71610.57740.54500.40960.66600.8003
190.89120.65450.88990.67250.77390.77040.37470.86320.9185
200.95310.94380.94240.81100.92050.88170.60370.89020.9824
OA0.66460.49380.64980.71930.83520.82980.74330.84510.8667
AA0.81100.70920.80350.76790.79180.78130.67730.81050.8685
kappa0.59880.42770.58250.64780.78740.77980.68490.79790.8281
Table 5. Classification Results for Imbalanced Data on Houston 2013. The red bold fonts and blue italic fonts indicate the best and the second best performance.
Table 5. Classification Results for Imbalanced Data on Houston 2013. The red bold fonts and blue italic fonts indicate the best and the second best performance.
Class No.AERNNCNNCRNNOurs
10.94490.90590.92300.96540.9654
20.96100.94420.95210.97310.9734
30.97770.99840.98230.99521.0000
40.98930.95710.95020.95950.9866
50.97760.98570.93380.98790.9839
60.88050.97950.97270.98610.9966
70.75390.89140.70380.92030.9247
80.62230.59550.74750.78300.9125
90.72320.71430.68910.78230.8784
100.83890.67690.71880.79350.9258
110.81560.54140.74430.76940.8687
120.76220.55950.61890.76580.8991
130.29310.40900.55170.64590.3191
140.95080.73320.88430.93430.9793
150.98320.99160.98480.96090.9916
OA0.83870.78920.81550.97540.9213
AA0.83160.79220.82380.88150.9070
kappa0.82550.77200.80020.86530.9148
Table 6. Classification Results for a Small Number of Training Samples on Houston 2013. The red bold fonts and blue italic fonts indicate the best and the second best performance.
Table 6. Classification Results for a Small Number of Training Samples on Houston 2013. The red bold fonts and blue italic fonts indicate the best and the second best performance.
Class No.10 Samples per Class20 Samples per Class30 Samples per Class
AERNNCNNCRNNOursAERNNCNNCRNNOursAERNNCNNCRNNOurs
10.330.570.770.820.840.730.920.770.930.960.890.720.860.940.99
20.390.460.920.950.960.590.740.810.810.940.870.860.980.840.93
30.560.260.520.990.990.930.900.980.940.990.990.840.990.950.99
40.750.840.880.960.910.770.850.970.920.860.970.910.970.970.86
50.860.740.900.950.960.960.890.970.990.980.960.880.980.980.97
60.690.410.920.870.970.830.800.760.750.970.970.960.890.900.98
70.490.350.600.810.800.500.520.490.800.750.560.400.710.780.74
80.230.240.550.490.660.450.320.410.650.670.660.600.810.750.75
90.460.350.410.480.630.610.610.650.680.780.640.530.660.680.76
100.180.000.610.570.610.410.260.480.660.850.610.450.710.720.83
110.510.300.520.670.690.480.650.530.750.740.620.540.600.750.79
120.380.260.310.450.380.190.340.630.640.650.580.500.610.670.80
130.150.570.150.260.460.180.120.220.360.520.190.100.430.390.51
140.720.890.850.860.930.900.660.950.930.960.830.830.900.850.95
150.960.660.960.970.990.960.660.960.950.990.930.920.980.930.99
OA0.530.450.630.720.770.620.630.680.790.830.740.640.800.810.85
AA0.510.460.660.740.790.630.620.700.780.840.750.670.800.810.86
kappa0.490.410.610.700.750.610.610.660.770.820.720.610.780.790.84
Table 7. Computational Costs for Different Classification Methods (In Seconds).
Table 7. Computational Costs for Different Classification Methods (In Seconds).
ClassSVMFDDLDPLResNetAERNNCNNCRNNOurs
Pavia center6.8346.13.4 × 10 3 16.13.567.28.752.83.6
Botswana1.840.22.9 × 10 4 0.80.33.50.43.50.2
Houston 20132.169.14.5 × 10 4 2.50.813.81.613.80.5
Houston 201816.33310.81.3 × 10 2 79.132.2155.251.6137.710.9
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, W.; Han, Y.; Deng, C.; Li, Z. Hyperspectral Image Classification via Deep Structure Dictionary Learning. Remote Sens. 2022, 14, 2266. https://doi.org/10.3390/rs14092266

AMA Style

Wang W, Han Y, Deng C, Li Z. Hyperspectral Image Classification via Deep Structure Dictionary Learning. Remote Sensing. 2022; 14(9):2266. https://doi.org/10.3390/rs14092266

Chicago/Turabian Style

Wang, Wenzheng, Yuqi Han, Chenwei Deng, and Zhen Li. 2022. "Hyperspectral Image Classification via Deep Structure Dictionary Learning" Remote Sensing 14, no. 9: 2266. https://doi.org/10.3390/rs14092266

APA Style

Wang, W., Han, Y., Deng, C., & Li, Z. (2022). Hyperspectral Image Classification via Deep Structure Dictionary Learning. Remote Sensing, 14(9), 2266. https://doi.org/10.3390/rs14092266

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop