Next Article in Journal
Method of 3D Voxel Prescription Map Construction in Digital Orchard Management Based on LiDAR-RTK Boarded on a UGV
Next Article in Special Issue
AGCosPlace: A UAV Visual Positioning Algorithm Based on Transformer
Previous Article in Journal
Lane Level Positioning Method for Unmanned Driving Based on Inertial System and Vector Map Information Fusion Applicable to GNSS Denied Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spectral-Spatial Attention Rotation-Invariant Classification Network for Airborne Hyperspectral Images

1
Key Laboratory of Spectral Imaging Technology CAS, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710100, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
School of Telecommunication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710061, China
*
Author to whom correspondence should be addressed.
Drones 2023, 7(4), 240; https://doi.org/10.3390/drones7040240
Submission received: 6 March 2023 / Revised: 22 March 2023 / Accepted: 28 March 2023 / Published: 29 March 2023
(This article belongs to the Special Issue Urban Features Extraction from UAV Remote Sensing Data and Images)

Abstract

:
An airborne hyperspectral imaging system is typically equipped on an aircraft or unmanned aerial vehicle (UAV) to capture ground scenes from an overlooking perspective. Due to the rotation of the aircraft or UAV, the same region of land cover may be imaged from different viewing angles. While humans can accurately recognize the same objects from different viewing angles, classification methods based on spectral-spatial features for airborne hyperspectral images exhibit significant errors. The existing methods primarily involve incorporating image or feature rotation angles into the network to improve its accuracy in classifying rotated images. However, these methods introduce additional parameters that need to be manually determined, which may not be optimal for all applications. This paper presents a spectral-spatial attention rotation-invariant classification network for the airborne hyperspectral image to address this issue. The proposed method does not require the introduction of additional rotation angle parameters. There are three modules in the proposed framework: the band selection module, the local spatial feature enhancement module, and the lightweight feature enhancement module. The band selection module suppresses redundant spectral channels, while the local spatial feature enhancement module generates a multi-angle parallel feature encoding network to improve the discrimination of the center pixel. The multi-angle parallel feature encoding network also learns the position relationship between each pixel, thus maintaining rotation invariance. The lightweight feature enhancement module is the last layer of the framework, which enhances important features and suppresses insignificance features. At the same time, a dynamically weighted cross-entropy loss is utilized as the loss function. This loss function adjusts the model’s sensitivity for samples with different categories according to the output in the training epoch. The proposed method is evaluated on five airborne hyperspectral image datasets covering urban and agricultural regions. Compared with other state-of-the-art classification algorithms, the method achieves the best classification accuracy and is capable of effectively extracting rotation-invariant features for urban and rural areas.

1. Introduction

With the development of optical imaging technology and unmanned aerial vehicle (UAV) technology, airborne hyperspectral imaging (HSI) has become increasingly abundant. HSI differs from RGB images in that it contains a large amount of spectral and spatial information. The continuous spectral curve of HSI can identify various objects, as different objects have different spectral curves [1]. As a result, the airborne hyperspectral image has been widely used in applications such as urban planning [2], agricultural monitoring [3], disaster detection [4]. Table 1 shows the common airborne hyperspectral image datasets captured by UAVs or aircraft covering urban and agricultural areas.
HSI classification aims to predict the corresponding category for each pixel. Based on how features are extracted, HSI classification methods are roughly divided into traditional and deep learning methods. The traditional methods [7,8] typically extract hyperspectral spatial-spectral features using handcrafted features, followed by a feature-classifying module. This paper [7] proposed using Independent Component Discriminant Analysis (ICDA) for classification. Cao et al. [9] used the three-dimensional discrete wavelet transform (3D-DWT) to extract the spatial-spectral feature for HSI classification. While traditional methods [10,11,12] have achieved good results, the handcrafted features are generally shallow features with limited feature representation capability, making it challenging to achieve satisfactory performance.
In recent years, deep learning methods have been the mainstream approach in HSI classification [13]. Hyperspectral images contain rich spectral information, with each category having unique spectral information [1]. Based on the different information methods, deep learning methods are broadly divided into two categories: spectral feature methods [14,15,16] and spectral-spatial feature methods [17,18,19].
Spectral feature algorithms extract features along the 1D spectral dimension. For instance, Chen et al. [20] first applied deep learning to HSI classification. According to Hu et al.’s research [14], 1D convolution neural networks (CNNs) were employed to classify HSI based on spectral features. A novel recurrent neural network (RNN) module [21] was employed in HSI classification. Wu et al. [15] developed an RNN-based semi-supervised classifier for HSI classification. Hang et al. [22] utilized cascaded RNNs for HSI classification. This work used RNNs to model the sequence and effectively represent the relationship between adjacent spectral bands. However, while the spectral dimension can distinguish different land-cover categories, adjacent pixels in HSI may belong to the same land-cover categories [23,24].
In order to achieve an accurate classification of land-cover classes, it is necessary to consider both spectral and spatial features [25]. The spatial-spectral feature methods [19,26,27] have been proposed to address the issues associated with spectral feature methods. For example, Zhang et al. [4] employed a method of learning contextual interaction features using inputs based on different regions. Song et al. [28] introduced residual learning and fused the output of the hierarchical features for HSI classification. To expedite the forward progression of 2D CNN, Mei et al. [18] proposed a novel step activation quantization method. Since HSIs are 3D cubes, 3D CNN has been employed for HSI classification. Wei et al. [29] utilized the edge-preserving sied window filters as the convolution kernels. He et al. [30] proposed a multiscale 3D CNN for classification. A hybrid network [31] that combines 2D CNN and 3D CNN was presented to issue the classification of HSIs. Mei et al. [17] employed an unsupervised 3D CNN autoencoder for HSI classification. Multiple spectral resolution 3D CNN [32] has also been introduced for classification. In addition, attention modules have been embedded in the network to extract spectral-spatial features in HSIs. Zheng et al. [13] proposed an attention mechanism to suppress redundant bands and improve classification accuracy. A novel spectral-spatial attention network [26] was introduced to capture the correlation of the pixels. In most cases [33,34,35,36], these attention modules are independent. This means these modules are flexibly put into the network.
HSIs contain rich spectral information. Meanwhile, the 2D convolution neural networks have significantly affected computer vision, with applications including biomedical image classification [37], remote sensing image classification [3,38], change detection [39,40], and image deblurring [2]. However, when convolving along the spectral dimension of HSI, hundreds of bands need plenty of parameters. There is no doubt that this dramatically increases computational time and cost. The number of channels is usually reduced before using 2D convolution kernels for feature extraction and classification to solve this problem. The mainstream methods include two main types. One is to reduce the spectral dimension. For instance, [31,41] used PCA [42] and variants of PCA [16] to reduce the number of spectral channels. The authors of [43] utilized the enhancing transformation reduction (ETR) for reducing dimensionality and HSI classification. Another option is to suppress the redundant bands using spectral attention methods [25,26]. The spectral attention methods usually change the weights of each band.
The vision transformer (ViT) [44] has recently performed remarkably on some vision-related tasks. As a result, some studies [45,46,47] have attempted to apply ViT to hyperspectral classification. For instance, a novel local transformer [48] with an integrated spatial partition restore module is proposed for classification. He et al. [49] utilized a spatial-spectral transformer with a dense connection, which was proposed to capture sequential spectra relationships. Extended morphological profiles [50] were employed for HSI classification in a deep global-local transformer network. These transformer methods [50,51,52,53] process the hyperspectral images in a token style. Generally, the 3D HSI is divided into patches and treated as tokens. The transformer network extracts the features and relationships of these tokens for hyperspectral classification.
An airborne hyperspectral imaging system is typically equipped on an aircraft or UAV to capture ground scenes from an overlooking perspective. As a result of the rotation of the UAV, the HSI in the same area has different perspectives [13]. While spatial rotation does not typically cause degradation of classification accuracy for spectral-based methods, spectral feature methods do not perform as well as spectral-spatial feature methods, which are sensitive to spatial rotation, as shown in Figure 1. For convenience, the input is set to have one spectral band. The kernel size is 3 × 3 . The image size is 5 × 5 . The stride and padding of convolution are 1 and 0, respectively. Figure 1 shows that different features are extracted from the same image with different input angles using the same convolutional kernel. transformer-based methods face similar problems. The input image rotation causes a change in the output. Therefore, the spatial-spectral methods perform poorly when the images are rotated. In order to address this issue, some work has made meaningful attempts to explore it. Tao et al. [54] utilized vector sorting to extract rotation-invariant features. Zheng et al. [13] used spectral convolution to extract spatial features to maintain rotation invariance. Chen et al. [55] presented using feature rotation to address the rotation invariance of UAV images. Figure 2 illustrates common methods for addressing rotation invariance. Figure 2A shows that the image-level rotation may lose samples or changes the image size. According to [55], the coordinates of each feature are ( x , y ) . The rotation by θ degrees is expressed as:
x ˜ = x cos θ y sin θ y ˜ = x sin θ + y cos θ
where x ˜ and y ˜ denote the new coordinates of the rotated feature. However, without additional constraint conditions, the feature-level rotation may still lose features, which are shown in Figure 2B. For instance, according to Equation (1), the feature at the coordinate position (−2, −2) will be lost after rotation, while the features at the coordinate positions (−2, 0) and (−1, 0) will have the same new coordinate position (−1, −1) after rotation, resulting in two overlapping features. Figure 2C shows that the proposed feature-level rotation is capable of effectively maintaining all features to address the problem of rotational invariance without the introduction of additional conditions.
A spectral-spatial attention rotation-invariant classification network (SSARIN) is presented based on the above issues. The SSARIN can address the issue of spatial rotation sensitivity in spectral-spatial feature methods of HSI classification. SSARIN is composed of a Band Selection (BS) module, a Local Spatial Feature Enhancement (LSFE) module, and a Lightweight Feature Enhancement (LWFE) module. The BS module is the initial component that reduces redundant spectral channels. The LSFE module generates a multi-angle parallel feature encoding network, which enhances the center pixel’s discrimination ability and learns the positional relationship between each pixel, ensuring rotation invariance. The LWFE module enhances significant features and suppresses insignificant ones as the final layer. At the same time, a dynamically weighted cross-entropy loss function is employed.
This paper has the following contributions.
1.
We present spectral-spatial attention rotation-invariant classification network (SSARIN) that utilizes convolutional neural networks (CNNs) to extract spectral-spatial features. The SSARIN not only achieves good performance in HSI classification, but is also a rotation-invariant network. Additionally, a dynamically weighted cross-entropy loss is introduced that considers the complexities of samples with different categories to improve classification accuracy.
2.
A local spatial feature enhancement module is proposed to address the issue of spatial rotation sensitivity. This module not only captures spatial-spectral features but also learns the position relationship between pixels. Doing so enhances the discriminative power of the center pixel and alleviates the impact of spatial rotation on classification accuracy.
The paper is divided into the following sections. Section 2 shows the proposed method in more detail. Section 3 discusses the experimental results. The discussion is presented in Section 4. Finally, a conclusion is drawn in Section 5.

2. Proposed Method

This section describes the various components of the proposed network in detail. The overview of the algorithm is shown in Section 2.1. Section 2.2 contains the details of the band selection module. The local spatial feature enhancement module is explained in Section 2.3. Section 2.4 introduces the lightweight feature enhancement. Finally, the loss function is reported in Section 2.5.

2.1. Overview

The HSI is a 3D cube [26]. Suppose that X R H × W × B denotes the HSIs, where H × W represents the spatial size of the image. B is the number of channels. The Y = y 1 , y 2 , , y c R 1 × 1 × C represents the land-cover categories. C denotes the number of classes. The Y is a one-hot label vector. Classification aims to make each hyperspectral image pixel have a corresponding category.
Let X i R p × p × b represent the patch, a square area cut from the HSI X p . X p represents the hyperspectral image after principal components analysis (PCA). X i denotes the i-th patch of the hyperspectral image X p . p × p is the spatial size. The pixel x i c represents the center pixel of the patch X i . Each pixel has a corresponding patch in the HSIs. Thus, the proposed SSARIN is utilized to determine the class of the pixel x i based on patch X i .
Figure 3 shows the proposed HSI classification method, which mainly contains a band selection (BS) module, a local spatial feature enhancement (LSFE) module, and a lightweight feature enhancement (LWFE) module. Hundreds of bands need many parameters. Many bands are redundant, so PCA can be used to retain the primary spectral information and reduce the number of bands. PCA reduces the number of channels from B to b. In this paper, b is set to 50. Furthermore, a spectral attention mechanism is employed to recalibrate surplus spectral bands. The spectral attention is also named the band selection (BS) in this paper. The HSI patch X i R p × p × b is fed into the BS module. This module has the effect of suppressing redundant spectral channels. The main benefits are the following. PCA not only reduces the number of channels but also reduces the number of parameters. The main thing is that the HSIs after PCA retains the primary information. Then, a local spatial feature enhancement module is employed to extract the spectral-spatial features. Meanwhile, the output of the LSFE module consists of rotation-invariant features. Finally, a lightweight feature enhancement is leveraged to enhance essential features, suppressing insignificance features and improving classification accuracy. The core component of SSARIN is the LSFE module. Table 2 reports the details of the presented algorithm.

2.2. Band Selection Module

The band selection module contains three layers: one average pooling and two convolution layers. This module emphasizes the useful bands and suppresses the redundant ones by adaptive weights. The BS module recalibrates the spectral bands and adjusts the weight of each band. Figure 4 shows the details of the BS module. Table 3 lists detailed information on the BS module. The formulation of this module is defined as Equation (2):
I P × P S = σ Conv ReLU Conv AP I P × P I P × P
where AP · represents the global average pooling; Conv · denotes the 2D convolutional layer; ReLU · denotes the activate function, which is defined as ReLU x = max 0 , x ; σ · is the SigMoid activate function, which is formulated as σ x = 1/ ( 1 + e x ) ; ⊙ denotes the channel-wise multiplication; I P × P denotes the corresponding p × p image patches cropped from the original hyperspectral image; and I P × P S denotes spectral-spatial feature after the band selection.
The BS module has the following functions. First, the module suppresses redundant bands by recalibrating the band weights. Second, the principal components analysis and 1 × 1 convolution kernel only needed a small number of parameters. Furthermore, the most important thing is that the 1 × 1 convolution has rotation invariance [13].

2.3. Local Spatial Feature Enhancement

In order to obtain the class of the pixel x i , the spectral and spatial features are taken into account [26,56]. The method based on spatial-spectral features is widely used for HSI classification [19,57]. Meanwhile, the adjacent samples may belong to the same class [58]. Thus, the spatially adjacent pixel of the center pixel x i c can be used to help to classify pixel x i [59]. However, the methods based on spatial-spectral features are sensitive to spatial rotation [13]. The existing spatial-spectral feature methods do not sufficiently consider the position relationship between pixels. For the same area, the rotation of the imaging devices causes the collected hyperspectral images to have various viewing angles. The change in spatial location between pixels leads to a decline in classification accuracy.
This paper proposes a simple and effective module named Local Spatial Feature Enhancement (LSFE) to solve the above problem. The LSFE module contains a rotate operator, a feature coding module, and an average pooling layer, as shown in Figure 5.
Specifically, each spatial-spectral feature is divided into eight non-overlapping regions using the center pixel as a reference. It also means that the center angle of each area is 2 π / 8 . Let I P × P S , i ( 2 π ) 8 denote the i-th region, where i = ( 0 , 1 , , 7 . Then, each time, all regions are rotated 2 π / 8 to produce a new spectral-spatial feature. It needs to rotate seven times to produce eight different spectral-spatial features.
Figure 6 shows an example. The blue area has a central pixel angle of 2 π / 8 . The radius of this area is r = P / 2 . P represents the size of the spectral-spatial feature. · stands for rounding up. For instance, the size of the spatial-spectral feature is 5 × 5 . The radius of the rotation area is 3. After determining the region’s size, each rotation of 2 π / 8 produces a new spatial-spectral feature. As shown in the second spatial-spectral feature in Figure 6, the blue region rotates to the corresponding position, and other regions rotate similarly. Therefore, the original spatial-spectral feature generates eight spatial-spectral features. This approach has the following benefits. (1) This approach is intuitive and straightforward to understand. (2) It can extract the position relationship between pixels without changing the shape of spatial-spectral features. (3) New spatial-spectral features can be directly fed into the network to extract features. (4) This module enhances the spatial features of the central pixel and improves the accuracy for the central pixel category.
After rotation, these spectral-spatial features are fed into a weight-shared feature coding module to obtain the corresponding spectral-spatial feature. The feature encoding network mainly includes two spatial attention layers and multi-layer feature extraction layers. Table 4 lists the detailed structures of the feature encoding network.
Each pixel surrounding the center pixel may have different effects on the center pixel. Thus, the weights of different pixels on the center pixel need to be recalibrated through the spatial attention mechanism. The spatial attention mechanism is shown in Figure 7. Different spectral curves represent different land-cover categories. Thus, spectral features can be used to change the pixel weights. The formula is as follows:
F m = m a x ( f ( F ) )
F a = a v e r a g e ( f ( F ) )
where F m and F a represent the features after max pooling and average pooling and f ( F ) is the spectral-spatial feature. Next, concatenate these features along the spectral dimension.
F c = c o n c a t F m , F a
where F c is the feature after concatenation. Then, spatial attention can be calculated as follows:
F SpaA = σ Conv F c ( f F )
where F SpaA represents spectral-spatial features after spatial attention. Then, this feature is fed to multi-layer feature extraction layers. The output of the multi-layer feature extraction layers is F v i :
F v i = FCM I P × P S , i ( 2 π ) 8
where FCM · denotes the weight-shared feature coding module, which includes two spatial attention layers and multi-layer feature extraction layers and F v i represents the output feature of the i-th branch. Finally, F v i is pooled into a feature vector, and the operation is defined with Equation (8):
F LSFE = 1 8 0 = 1 7 F v i
The output features of the LSFE module have the following functions. First, the output features extract the influence of surrounding pixels on the center pixel. Second, these features also contain the position relationship of each pixel and have rotation invariance.

2.4. Lightweight Feature Enhancement

A lightweight feature enhancement module (LWFE) is proposed to enhance the output features of the local spatial feature enhancement module. This module mainly focuses on enhancing important features, suppressing insignificance features, and improving classification accuracy.
The LWFE is shown in Figure 5. The output feature of the LSFE is fed to the LWFE module. Table 5 lists the datails of the LWFE module. Its formulation is defined with Equation (9):
F 1 × 1 L W F E = Ave ReLU Conv ReLU Conv F K × K L S F E
where Ave · denotes the average pooling; Conv · represents the 2D convolutional layer; ReLU · is the activate function, which is defined as ReLU x = max 0 , x ; F K × K L S F E denotes the output feature of the LSFE module, and K × K denotes the size of the feature; and F 1 × 1 L W F E represents the output feature of the LWFE module, which is also an enhanced feature.
The kernel size of all convolutional layers of LWFE is 1 × 1 . The reasons for using a convolution kernel of this size are as follows. (1) This convolution kernel can reduce the network parameters while enhancing the features. (2) It also maintains rotational invariance. Therefore, the whole LWFE module also retains rotation invariance. The output feature of the lightweight feature enhancement module is rotation invariant, while the LWFE module is also rotation invariant, so the whole network is also rotation invariant.
Finally, the enhanced feature is fed into a classifier to complete the classification.
C p = max e F i j 16 e F j
where C p is the predicted category. F i and F j are the i-th and j-th features of the feature vector of the LWFE module, respectively; that is, the category with the highest probability value is the prediction category of the network.

2.5. Loss Function

Due to the imbalance of samples, the feature of small samples may be lost during network training, which leads to the low classification accuracy of small samples. Therefore, this paper presents a dynamically weighted cross entropy loss according to the complexities of samples with different categories to improve the accuracy.
The proposed dynamical weighted cross entropy loss in m-th training epoch is defined as Equation (11):
J m x t , y t θ , ξ m , c = 1 T t = 1 T c = 1 C ξ m , c · I y t = c log exp θ c T x t k = 1 C exp θ k T x t
where J m represents the loss; x ( t ) and y ( t ) denote the t-th patch cube and corresponding category label; θ denotes the parameters of the softmax layer; T denotes the number of samples in a training batch; C denotes the number of categories of the dataset; I · denotes the indicator function, which equals one if the condition is satisfied and zero otherwise; and ξ m , c denotes the weight coefficient of the c-th category in the m-th training epoch, defined by Equation (12):
ξ m , c = k = 1 C A m 1 , k C · A m 1 , c + τ
where A ( m , k ) denotes the validated accuracy of the k-th category in the m-th training epoch and τ is a small constant to avoid dividing by zero, which is set to 10 9 in this work.
The proposed SSARIN consists of the above three modules BS, LSFE, and LWFE. The corresponding loss functions are proposed according to the network characteristics. We introduce the experiments to demonstrate the proposed algorithm in the following.

3. Results

In order to verify the performance of the different methods for HSI classification, extensive experiments have been carried out in this section. Section 3.1 introduces the dataset and evaluation metrics used in the experiment. Section 3.2 describes the compared methods and experiment design. Finally, the experimental results are drawn in Section 3.3.

3.1. Data Description and Evaluation Metrics

There are five public airborne hyperspectral image datasets (Indian Pines, Salinas, Pavia University, Pavia Center, and Houston) that were used to evaluate the methods.

3.1.1. Data Description

  • Indian Pines (IP): As shown in Figure 8a, the IP dataset is composed of 145 × 145 pixels with 200 bands. Figure 8b shows the ground truth map of the IP dataset. This dataset has 16 different land-covers classes in the agriculture areas. It also includes 10,249 samples. The number of training and testing pixels is listed in Table 6.
  • Salinas (SA): Figure 9a shows the pseudo-color image of the SA dataset. It contains 512 × 217 pixels with 204 bands. Similar to the IP dataset, this dataset also has 16 categories and 54,129 samples in the agriculture areas, as shown in Figure 9b. In total, 2% of the pixels are randomly selected as training data. All samples are used as testing data. Table 7 lists the class name and the number of training and testing samples.
  • Pavia University (PU): The PU dataset includes 103 available spectral channels. The height and width of PU are 610 and 340. There are 42,776 samples from nine different land-cover categories in the PU database. Figure 10 shows the pseudo-color image and the ground truth classification map. Table 8 lists the training and testing data of the PU dataset.
  • Pavia Center (PC): The PC database comprises 1096 × 715 pixels with 102 available bands. Same as the PU dataset, it includes nine classes and 148,152 samples in the urban area. The pseudo-color image and the ground truth are shown in Figure 11. We randomly selected 0.5% of the data for each category as the training pixels. The entire dataset is the test set. Table 9 shows the number of testing and training samples.
  • Houston: Houston is widely used as a benchmark database to evaluate the performance of HSI classification. It comprises 349 × 1905 pixels with 144 channels. Table 10 lists the number of training and testing data. There are 15 challenging land-cover classes and 15,029 samples. Similarly, the visualization of the image is given in Figure 12.

3.1.2. Evaluation Metrics

Three metrics were used to measure the performance of all algorithms. Let M R n × n denote the confusion matrix and n represent the number of classes.
  • Average Accuracy (AA) is the mean accuracy:
    A A = i = 1 n M i , i j = 1 n M i , j n
  • Overall Accuracy (OA) denotes the ratio of the number of correct samples to the total samples:
    O A = i = 1 n M i , i i = 1 n j = 1 n M i , j
  • Kappa coefficient ( κ ) is the consistency between forecast results and ground truth:
    κ = OA p e 1 p e
    p e = k = 1 n i = 1 n M i , k · j = 1 n M k , j i = 1 n j = 1 n M i , j 2
    where M i , j represents the i-th row and j-th column of the matrix M ; the value of M i , j denotes the i-th category is classified as the j-th class; and ∑ stands for summation. Larger values represent better results.

3.2. Compared Methods and Experimental Design

This section briefly introduces the details of each compared method, mainly including the details and the experimental design.

3.2.1. Compared Methods

Several representatives and the most widely used deep learning algorithms are employed as compared methods. According to the different networks used, these methods are divided into CNN-based and transformer-based. CNN-based networks include 1D CNN [14], 2D CNN [60], 3D CNN [61], RNN [21], SSRN [27], HybridSN [31], and RIAN [13]. Transformer-based methods contain SF [45], SSFTT [46], and GAHT [47].
These algorithms are described as follows.
  • 1D CNN [14]: This method uses two 1D convolutional layers to extract features.
  • 2D CNN [60]: The spectral-spatial features are stacked to a 2D matrix. The matrix is considered as an image to feed into CNN.
  • 3D CNN [61]: This method utilized the 3D convolutional layers to extract classification features.
  • RNN [21]: The authors use RNN with the new activate function named parametric rectified tanh for HSI classification.
  • SSRN [27]: A spectral-spatial residual network is proposed to classify hyperspectral samples.
  • HybridSN [31]: This algorithm utilizes three layers of 3D CNN to extract spectral-spatial features. The output features are fed into a 2D CNN to classify hyperspectral pixels.
  • RIAN [13]: The center spectral attention module recalibrates the spectral channels of image patches. The rectified spatial attention modules extract spectral-spatial features. A residual network connects these modules.
  • SF [45]: A transformer-based backbone network.
  • SSFTT [46]: A 3D and a 2D convolution layer are employed to extract spectral-spatial features. The output features are fed into a Gaussian-weighted tokenizer for feature transformation. Finally, an encoder module is utilized for feature learning to classify HSI samples.
  • GAHT [47]: This work utilizes the hierarchical transformer network with the grouped pixel embedding module. This module confines the multi-head self-attention for extracting the spatial-spectral feature.

3.2.2. Experiment Design

Let X R H × W × N be the original HSI, where H and W represent the height and width and N denotes the channel number. In the data preprocessing, all HSI datasets are normalized by:
X = X min X max X min X
where min X and max X represent the minimum and maximum value of the original HSI data R .
In order to ensure the best performance of the compared algorithms, we introduce the following experimental designs. All the methods are implemented on Pytorch 1.12.1 with Python 3.9.13. The graphics processing unit (GPU) is an NVIDIA GeForce RTX 3090 with 24 GB memory, which was used to accelerate the experiments. The experiment designs are identical to the original literature and code, including the learning rate scheduler, the optimizer, and the HSI patches. In our methods, the initial learning rate of the learning rate scheduler is 0.001. It multiplies by 0.6 after every 10 epochs. The Adaptive Moment Estimation (Adam) optimizer with the default value is employed. Furthermore, a weight decay of 0.00005 is used to update the training parameters. The training and testing batch size of all methods is 64. The number of training epoch is 200.
Five airborne hyperspectral image datasets (IP, SA, PU, PC, and Houston) covering urban and agricultural regions are used to evaluate the algorithms. Different proportions are employed for each database. For the IP and Houston databases, 10% of samples are randomly selected as training data, and all samples are used for testing. For the SA and PU datasets, the train proportions are 2%. 0.5% of the samples for the PC database are selected for training. Table 6, Table 7, Table 8, Table 9 and Table 10 list the number of training and testing pixels of five datasets.

3.3. Experimental Results

We analyze the experimental results of the methods on the public datasets in detail, mainly including the patch size, the ablation experiment, and the performance of each algorithm in this section.

3.3.1. Size of HSI Patches

The size of the patch decides how much information is used for classification. Therefore, the patch size has a crucial impact on classification accuracy. The effects of different spatial sizes are first explored in this experiment. A series of patch sizes { 7 , 9 , 11 , 13 , 15 } has been considered.
As shown in Figure 13, the accuracy does not always get better when the size increases. For IP and SA datasets, when the patch size is from 9 to 15, the accuracy of the proposed algorithm generally increases. The main reason is that the sample area is regular and dense, so as the local spatial information increases, it can provide more effective classification information. In other words, the IP and SA database has more significant smooth regions [47]. Thus, the patch size of the SA dataset is set to 15 × 15. Moreover, the IP dataset is set to 13 × 13. For PU, PC, and Houston datasets, the image of these databases has small and separate regions of land cover. Thus, the OAs of these datasets drop when the patch size exceeds the upper limit. The PC and Houston datasets’ patch size is 9 × 9. The spatial size of the PU dataset is set to 13 × 13.

3.3.2. Ablation Experiment

The proposed method consists of three parts (BS, LSFE, and LWFE). In order to verify the effect of each part, ablation experiments are conducted. Details of the experiments are as follows:
  • Baseline network: This network only contains seven 2D convolution layers and one fully connected layer.
  • Spectral-Spatial Attention (SSA) network: The spectral and spatial attention modules are added to the baseline network.
  • Lightweight feature enhancement (LWFE) network: Based on the SSA network, the lightweight feature enhancement (LWFE) module is added to the network before the fully connected layer.
  • SSARIN: Based on the LWFE network, a local spatial feature enhancement (LSFE) module is added to the network. This module is the key to ensuring rotation invariance.
Table 11 and Table 12 show the results of the ablation experiment. As shown in Table 11, each module improves the classification OAs on different datasets. Compared to the baseline network, SSA improves the OAs by 0.21%, 0.72%, 1.15%, 0.44%, and 0.09%. It proves that the redundant bands do not provide adequate classification information. Sometimes, it reduces the accuracy of classification. It also illustrates that the weights of each channel are different for the spectral-spatial features. Thus, the BS module suppresses the redundant channels by recalibrating the weights of the different bands.
The OAs of the LWFE network are 0.17%, 0.14%, 0.12%, 0.08%, and 0.22% better than the SSA network. The LWFE module is primarily used to boost the output feature of the LSFE module to improve classification accuracy. The SSARIN network only promotes accuracy by 0.06%, 0.03%, 0.02%, 0.05%, and 0.11% compared to the LWFE network. However, the LSFE module added to the LWFE network can effectively maintain rotation invariance. Table 12 displays the OAs for different rotation degrees. It can be seen that when rotating at different degrees, the OAs of the LWFE network drop on all datasets, while the SSARIN remains stable. The ablation experiment indicates that the LSFE module improves the classification accuracy and retains the rotational invariance of the features. Therefore, the ablation experiment has demonstrated the role of each module and its impact on accuracy.

3.3.3. Performance of the Compared Methods

To evaluate the methods, the IP, SA, PU, PC, and Houston datasets are employed. The compared algorithms include 1D CNN [14], 2D CNN [60], 3D CNN [61], RNN [21], SSRN [27], HybridSN [31], RIAN [13], SF [45], SSFTT [46], and GAHT [47].
  • Indian Pines: OAs of 1D CNN, 2D CNN, 3D CNN, RNN, SSRN, HybirdSN, RIAN, SF, SSFTT, GAHT, and RIRF at different rotation degrees are listed in Table 13. When the rotation degree is 0, the performance of the spectral-spatial algorithms is better than the spectral methods. Meanwhile, the difference in OAs between the CNN-based classification model and the transformer-based model is insignificant. The OAs of 1D CNN, RNN, RIAN, and SSARIN at 0, 90, 180, and 270 degrees are 85.73%, 78.73%, 94.56%, and 98.59%. At the same time, these methods are both rotation invariant. 1D CNN, RNN, and RIAN are rotation invariant because these methods’ convolution kernel sizes are all 1 × 1. The 1 × 1 convolution is rotation invariant [13]. Among them, 1D CNN and RNN are based on spectral features and are not sensitive to spatial rotation. Thus, the above methods attain rotation invariance. In contrast, SSARIN is a method based on spatial-spectral features. SSARIN does not rely on 1 × 1 convolution to achieve rotation invariance. SSARIN obtains the position of the center pixel with the surrounding pixels through the LGFE module.
    The OAs of 2D CNN, 3D CNN, SSRN, HybridSN, SF, SSFTT, and GAHT significantly decrease at 90 degrees, 180 degrees, and 270 degrees. At a 90-degree rotation, the performance of these methods decreased by 5.5%, 16.65%, 1.1%, 7.29%, 15.24%, 11.25%, and 18.03%. The main reason is that these spectral-spatial convolutions ignore the position information between pixels. Therefore, the rotation causes the change of pixel position information, which leads to incorrect classification of the network.
    To further evaluate the compared algorithms, the accuracy in each class is listed in Table 14 and Table 15. At 0 degrees, the proposed SSARIN achieves state-of-the-art compared with other methods. It significantly improves the OAs of the “Corn”, “Hay-Windrowed”, “Oats”, “Soybean-Notill”, and “Buildings-Grass-Trees-Drives”, classes. At 90 degrees, it not only maintains the best OAs in the above categories, but also achieves the best OAs in the “Corn-Notill”, “Corn-Mintill”, “Grass-Pasture”, “Soybean-Mintill”, and “Woods” categories. For instance, the SSRN achieves the best accuracy in seven classes without rotation. It only attains the best performance of the four categories after rotating 90 degrees.
    Figure 14 shows the classification maps on the IP dataset. It intuitively displays the performance of each method. When the image is rotated, the edge information of the HSI patch changes. Therefore, the compared methods that do not sufficiently extract the spatial edge position information have a drop in test accuracy. The sufficient learning of spatial edge position information shows that the proposed SSARIN has superior classification results and is invariant to the rotation.
2.
Salinas: Table 16 reports the performance of these methods at different rotation degrees. Similar to the IP dataset, the OAs of 1D CNN, RNN, RIAN, and SSARIN as different angles are 91.61%, 88.83%, 97.13%, and 99.81%, respectively. The performance of SSARIN is better than other networks in terms of OA. When the angles are 90, 180, and 270, the OAs of 2D CNN, 3D CNN, SSRN, HybridSN, SF, SSFTT, and GAHT significantly decrease. At a rotation degree of 180, the performance of these methods decreased by 2.93%, 11.32%, 0.58%, 1.8%, 5.34%, 6.24%, and 10.78%. It is a smaller drop compared to the IP dataset. The reason is that the sample area of the SA dataset is more regular and smooth than the IP dataset. According to the experiment, the transformer-based method is more sensitive to rotation invariance than the CNN-based method. The reason is that the CNN-based convolutional layer focuses on local information, while the transformer-based approach focuses more on global information. Rotation changes the local information of HSI, resulting in the performance of the transformer-based method being worse than the CNN-based method.
To further evaluate the compared algorithms, the accuracy in each category at 180 degrees is listed in Table 17. At 180 degrees, SSARIN maintains the best OAs in 11 classes. Figure 15 shows the classification maps on the SA dataset. The performance of each method is represented intuitively. The compared methods are weak in the performance of the “Vinyard-Untrained” and “Grapes-Untrained” classes. At the same time, the proposed algorithm has superior performance in the above categories. Moreover, the classification map of the SSARIN is also smooth.
3.
Pavia University: Table 18 lists the OAs of all methods at various rotation angles. 1D CNN, RNN, and SSARIN achieve rotation invariance because these methods’ convolution kernel sizes are all 1 × 1. The OAs of these methods at 0, 90, 180, and 270 degrees are 89.64%, 88.83%, and 98.05%. The performance of the SSARIN is 99.05%. When the rotation degree is 0, the performance of SSARIN is second only to that of SSRN. When the rotation degrees change, the OAs of 2D CNN, 3D CNN, SSRN, HybridSN, SF, SSFTT, and GAHT drop significantly. At a rotation degree of 270, the performance of these methods decreased by 3.33%, 11.19%, 1%, 1.58%, 9.36%, 4.71%, and 5.6%.
To further evaluate the compared algorithms, the quantitative indicators at the rotation of 270 degrees are shown in Table 19. SSARIN maintains the best OAs in four classes. Meanwhile, it has the best performance of the OA, AA, and Kappa. Figure 16 shows the performance of each method intuitively. Furthermore, the classification map of SSARIN is smooth.
4.
Pavia Center: Table 20 lists the OAs of all the algorithms at different rotation degrees. The OAs of 1D CNN, RNN, RIAN, and SSARIN at 0, 90, 180, and 270 degrees are 97.44%, 97.34%, 98.35%, and 98.05%. The above methods are both rotation invariant. When the rotation degree is 0, the performance of SSARIN has the best performance. When the rotation degrees change, the OAs of 2D CNN, 3D CNN, SSRN, HybridSN, SF, SSFTT, and GAHT drop significantly. At a rotation degree of 90, the performance of these methods decreased by 0.88%, 1.79%, 0.13%, 0.05%, 1.08%, 1.18%, and 0.86%. Compared to the IP, SA, and PU datasets, these algorithms do not have much accuracy degradation on the PC dataset. Our analysis is due to the small number of categories in the PC dataset and the concentration of sample areas.
To further evaluate the compared algorithms, the metrics at 90 degrees are shown in Table 21. SSARIN maintains the best performance of the OA, AA, and Kappa. At the same time, the performance of each method is represented intuitively in Figure 17.
5.
Houston: OAs of 1D CNN, 2D CNN, 3D CNN, RNN, SSRN, HybirdSN, RIAN, SF, SSFTT, GAHT, and SSARIN at different rotation degrees are listed in Table 22. The OAs of 1D CNN, RNN, RIAN, and SSARIN are 91.96%, 91.90%, 97.33%, and 99.30%. At a rotation degree of 180, the performance of these methods decreased by 11.48%, 6.87%, 0.76%, 0.64%, 2.78%, 1.48%, and 3.2%. To further evaluate the compared algorithms, the OA, AA, and Kappa at 180 degrees are shown in Table 23. SSARIN maintains the best OAs in 12 classes. Meanwhile, it has the best performance of the OA, AA, and Kappa. Through Figure 18, it is very intuitive to conclude that the classification map of SSARIN is smoother than other compared methods.

4. Discussion

Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18 illustrate the classification maps of different methods on five datasets. Table 13, Table 14, Table 15, Table 16, Table 17, Table 18, Table 19, Table 20, Table 21, Table 22 and Table 23 detail the class accuracy, AA, OA, and kappa coefficient for these algorithms on corresponding datasets. Our algorithm not only delivers superior results, but also maintains consistent overall accuracy at varying rotation angles. Building upon this analysis, we explore the time efficiency of the models.
Table 24 outlines the training and testing time for each algorithm. Notably, 3D CNN consistently exhibits the fastest training times across all datasets. 1D CNN emerges as the model with the fastest testing times for each dataset, indicating that it performs well in terms of time efficiency during the testing phase. In contrast, SSARIN displays the longest training and testing times. There are two main reasons for this. (1) The network contains eight branches, resulting in a more complex structure. (2) When rotating the features, it is necessary to load the features from the GPU to the CPU for rotation and then reload the rotated features from the CPU back to the GPU.
Table 25 enumerates the parameters for each method. SSARIN possesses a relatively high parameter count (41,694,672), making it the second most complex model in this list. The main reason is that the network contains eight branches, and the structure of each branch is the same. Therefore, the network requires a larger number of parameters. This complexity is a factor in its longer training and testing times, as observed in the previous analysis.
1D CNN has the fewest parameters (74,196), indicating that it is the simplest model in terms of architecture. This simplicity contributes to the model’s previously observed fast testing times, as there are fewer parameters to compute during the testing phase. However, the trade-off is a limited capacity to capture complex patterns in the data, which impacts performance in classification. 2D CNN has the highest number of parameters (109,613,786), indicating that this model has the most complex architecture among the models listed. The main reason is that it uses multiple fully connected layers, and the fully connected layers contain many network nodes. This intricacy results in heightened computational demands during training and testing and increased processing times.
The parameter numbers and training time for other algorithms do not differ significantly. This section discusses the classification performance of various algorithms on different datasets. Although the proposed SSARIN requires more parameters and computation time, it is within a reasonable and acceptable range.

5. Conclusions

This paper proposes a spectral-spatial attention rotation-invariant classification network for the airborne hyperspectral image. The SSARIN is specifically designed to explore rotation-invariant features for hyperspectral classification. It mainly contains a band selection module, a local spatial feature enhancement module, and a lightweight feature enhancement module.
In the data pre-processing stage, using PCA to reduce the spectral dimensions can effectively reduce the network parameters and training time. However, PCA is not mandatory. After pre-processing, the HSI patch is fed into the band selection module for feature extraction. The band selection (BS) module achieves redundant band suppression by recalibrating the weights of each band. Furthermore, a local spatial feature enhancement (LSFE) module is introduced to extract spectral-spatial features while maintaining rotational invariance. The LSFE module not only extracts spatial-spectral features but also records position information to maintain rotational invariance, providing a robust solution for hyperspectral classification. The proposed method is capable of extracting rotation-invariant spectral-spatial features without requiring additional parameters or constraints. Finally, a lightweight feature enhancement (LWFE) module enhances significant features and suppresses insignificant ones.
Extensive experiments conducted on five airborne hyperspectral image datasets demonstrate the superior performance of SSARIN compared to other methods, proving its robustness against spatial rotations. Moreover, SSARIN effectively extracts urban and countryside features, showcasing its versatility in various scenarios.
However, it is worth noting that the SSARIN network is more complex than the compared methods, resulting in increased computational time and a larger number of parameters. To address this issue, future research will focus on developing new lightweight rotational invariance features for hyperspectral classification, aiming to strike a balance between performance and computational efficiency.

Author Contributions

Conceptualization, Y.S., B.F. and N.W.; methodology, Y.S. and N.W.; software, B.F. and J.F.; validation, N.W., Y.C. and Y.S.; formal analysis, Y.S., Y.C. and N.W.; investigation, X.L. and G.Z.; resources, G.Z. and X.L.; data curation, G.Z.; writing—original draft preparation, Y.S. and B.F.; writing—review and editing, N.W., J.F. and G.Z.; visualization, N.W., Y.C. and J.F.; supervision, G.Z. and X.L.; project administration, G.Z. and X.L.; funding acquisition, G.Z. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Youth Innovation Promotion Association CAS, the National Natural Science Foundation of China under Grants (grant No. 42176182), and the Foundation of Shaanxi Province (grant No. 2023-YBGY-390).

Data Availability Statement

The code of the paper can be found at https://github.com/NanWangAC/SSARIN. It can be accessed on 23 March 2023. Data are available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. This data can be found here: Indian Pines, Salinas, Pavia University, and Pavia Center: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes (accessed on 1 March 2023). Houston: http://www.grss-ieee.org/community/technical-committees/data-fusion/2013-ieee-grss-data-fusion-contest (accessed on 1 March 2023).

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Zhang, L.; Zhang, L. Artificial intelligence for remote sensing data analysis: A review of challenges and opportunities. IEEE Geosci. Remote Sens. Mag. 2022, 10, 270–294. [Google Scholar] [CrossRef]
  2. Fang, J.; Yuan, Y.; Lu, X.; Feng, Y. Robust space–frequency joint representation for remote sensing image scene classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7492–7502. [Google Scholar] [CrossRef]
  3. Fang, J.; Cao, X. Multidimensional relation learning for hyperspectral image classification. Neurocomputing 2020, 410, 211–219. [Google Scholar] [CrossRef]
  4. Zhang, M.; Li, W.; Du, Q. Diverse region-based CNN for hyperspectral image classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef]
  5. Xu, Y.; Gong, J.; Huang, X.; Hu, X.; Li, J.; Li, Q.; Peng, M. Luojia-HSSR: A high spatial-spectral resolution remote sensing dataset for land-cover classification with a new 3D-HRNet. Geo-Spat. Inf. Sci. 2022, 1–13. [Google Scholar] [CrossRef]
  6. Cen, Y.; Zhang, L.; Zhang, X.; Wang, Y.; Qi, W.; Tang, S.; Zhang, P. Aerial hyperspectral remote sensing classification dataset of Xiongan New Area (Matiwan Village). J. Remote Sens. 2020, 24, 1299–1306. [Google Scholar]
  7. Licciardi, G.; Marpu, P.R.; Chanussot, J.; Benediktsson, J.A. Linear versus nonlinear PCA for the classification of hyperspectral data based on the extended morphological profiles. IEEE Geosci. Remote Sens. Lett. 2011, 9, 447–451. [Google Scholar] [CrossRef] [Green Version]
  8. Fang, L.; He, N.; Li, S.; Ghamisi, P.; Benediktsson, J.A. Extinction profiles fusion for hyperspectral images classification. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1803–1815. [Google Scholar] [CrossRef]
  9. Cao, X.; Xu, L.; Meng, D.; Zhao, Q.; Xu, Z. Integration of 3-dimensional discrete wavelet transform and Markov random field for hyperspectral image classification. Neurocomputing 2017, 226, 90–100. [Google Scholar] [CrossRef]
  10. Abdolmaleki, M.; Fathianpour, N.; Tabaei, M. Evaluating the performance of the wavelet transform in extracting spectral alteration features from hyperspectral images. Int. J. Remote Sens. 2018, 39, 6076–6094. [Google Scholar] [CrossRef]
  11. Anand, R.; Veni, S.; Aravinth, J. Robust classification technique for hyperspectral images based on 3D-discrete wavelet transform. Remote Sens. 2021, 13, 1255. [Google Scholar] [CrossRef]
  12. Sun, W.; Yang, G.; Peng, J.; Du, Q. Lateral-slice sparse tensor robust principal component analysis for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 17, 107–111. [Google Scholar] [CrossRef]
  13. Zheng, X.; Sun, H.; Lu, X.; Xie, W. Rotation-invariant attention network for hyperspectral image classification. IEEE Trans. Image Process. 2022, 31, 4251–4265. [Google Scholar] [CrossRef]
  14. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep convolutional neural networks for hyperspectral image classification. J. Sens. 2015, 2015, 1–12. [Google Scholar] [CrossRef] [Green Version]
  15. Wu, H.; Prasad, S. Semi-supervised deep learning using pseudo labels for hyperspectral image classification. IEEE Trans. Image Process. 2017, 27, 1259–1270. [Google Scholar] [CrossRef]
  16. Makantasis, K.; Karantzalos, K.; Doulamis, A.; Doulamis, N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4959–4962. [Google Scholar]
  17. Mei, S.; Ji, J.; Geng, Y.; Zhang, Z.; Li, X.; Du, Q. Unsupervised spatial–spectral feature learning by 3D convolutional autoencoder for hyperspectral classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6808–6820. [Google Scholar] [CrossRef]
  18. Mei, S.; Chen, X.; Zhang, Y.; Li, J.; Plaza, A. Accelerating convolutional neural network-based hyperspectral image classification by step activation quantization. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
  19. Wei, W.; Song, C.; Zhang, L.; Zhang, Y. Lightweighted Hyperspectral Image Classification Network by Progressive Bi-Quantization. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5501914. [Google Scholar] [CrossRef]
  20. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  21. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef] [Green Version]
  22. Hang, R.; Liu, Q.; Hong, D.; Ghamisi, P. Cascaded recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5384–5394. [Google Scholar] [CrossRef] [Green Version]
  23. Luo, F.; Zhang, L.; Zhou, X.; Guo, T.; Cheng, Y.; Yin, T. Sparse-adaptive hypergraph discriminant analysis for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1082–1086. [Google Scholar] [CrossRef]
  24. Jia, S.; Jiang, S.; Lin, Z.; Li, N.; Xu, M.; Yu, S. A survey: Deep learning for hyperspectral image classification with few labeled samples. Neurocomputing 2021, 448, 179–204. [Google Scholar] [CrossRef]
  25. Sun, H.; Zheng, X.; Lu, X. A Supervised Segmentation Network for Hyperspectral Image Classification. IEEE Trans. Image Process. 2021, 30, 2810–2825. [Google Scholar] [CrossRef]
  26. Sun, H.; Zheng, X.; Lu, X.; Wu, S. Spectral–Spatial Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3232–3245. [Google Scholar] [CrossRef]
  27. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
  28. Song, W.; Li, S.; Fang, L.; Lu, T. Hyperspectral image classification with deep feature fusion network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3173–3184. [Google Scholar] [CrossRef]
  29. Wei, Y.; Zhou, Y. Spatial-aware network for hyperspectral image classification. Remote Sens. 2021, 13, 3232. [Google Scholar] [CrossRef]
  30. He, M.; Li, B.; Chen, H. Multi-scale 3D deep convolutional neural network for hyperspectral image classification. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3904–3908. [Google Scholar]
  31. Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 17, 277–281. [Google Scholar] [CrossRef] [Green Version]
  32. Xu, H.; Yao, W.; Cheng, L.; Li, B. Multiple spectral resolution 3D convolutional neural network for hyperspectral image classification. Remote Sens. 2021, 13, 1248. [Google Scholar] [CrossRef]
  33. Lu, Z.; Xu, B.; Sun, L.; Zhan, T.; Tang, S. 3-D channel and spatial attention based multiscale spatial–spectral residual network for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 4311–4324. [Google Scholar] [CrossRef]
  34. Liu, H.; Li, W.; Xia, X.G.; Zhang, M.; Gao, C.Z.; Tao, R. Central attention network for hyperspectral imagery classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–15. [Google Scholar] [CrossRef] [PubMed]
  35. Mei, S.; Li, X.; Liu, X.; Cai, H.; Du, Q. Hyperspectral image classification using attention-based bidirectional long short-term memory network. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
  36. Zhang, X.; Shang, S.; Tang, X.; Feng, J.; Jiao, L. Spectral partitioning residual network with spatial attention mechanism for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
  37. Liu, S.; Wang, Q.; Zhang, G.; Du, J.; Hu, B.; Zhang, Z. Using hyperspectral imaging automatic classification of gastric cancer grading with a shallow residual network. Anal. Methods 2020, 12, 3844–3853. [Google Scholar] [CrossRef]
  38. Sun, H.; Li, S.; Zheng, X.; Lu, X. Remote Sensing Scene Classification by Gated Bidirectional Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 82–96. [Google Scholar] [CrossRef]
  39. Liu, S.; Song, L.; Li, H.; Chen, J.; Zhang, G.; Hu, B.; Wang, S.; Li, S. Spatial weighted kernel spectral angle constraint method for hyperspectral change detection. J. Appl. Remote Sens. 2022, 16, 016503. [Google Scholar] [CrossRef]
  40. Wang, N.; Shi, Y.; Yang, F.; Zhang, G.; Li, S.; Liu, X. Collaborative representation with multipurification processing and local salient weight for hyperspectral anomaly detection. J. Appl. Remote Sens. 2022, 16, 036517. [Google Scholar] [CrossRef]
  41. Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Cai, W.; Yang, N.; Wang, B. Multi-scale Receptive Fields: Graph Attention Neural Network for Hyperspectral Image Classification. Expert Syst. Appl. 2023, 223, 119858. [Google Scholar] [CrossRef]
  42. Yue, J.; Zhao, W.; Mao, S.; Liu, H. Spectral–spatial classification of hyperspectral images using deep convolutional neural networks. Remote Sens. Lett. 2015, 6, 468–477. [Google Scholar] [CrossRef]
  43. Dalal, A.A.; Cai, Z.; Al-qaness, M.A.; Alawamy, E.A.; Alalimi, A. ETR: Enhancing transformation reduction for reducing dimensionality and classification complexity in hyperspectral images. Expert Syst. Appl. 2023, 213, 118971. [Google Scholar]
  44. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  45. Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–15. [Google Scholar] [CrossRef]
  46. Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral–spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  47. Mei, S.; Song, C.; Ma, M.; Xu, F. Hyperspectral image classification using group-aware hierarchical transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  48. Xue, Z.; Xu, Q.; Zhang, M. Local transformer with spatial partition restore for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4307–4325. [Google Scholar] [CrossRef]
  49. He, X.; Chen, Y.; Lin, Z. Spatial-spectral transformer for hyperspectral image classification. Remote Sens. 2021, 13, 498. [Google Scholar] [CrossRef]
  50. Tan, X.; Gao, K.; Liu, B.; Fu, Y.; Kang, L. Deep global-local transformer network combined with extended morphological profiles for hyperspectral image classification. J. Appl. Remote Sens. 2021, 15, 038509. [Google Scholar] [CrossRef]
  51. Hu, X.; Yang, W.; Wen, H.; Liu, Y.; Peng, Y. A lightweight 1-D convolution augmented transformer with metric learning for hyperspectral image classification. Sensors 2021, 21, 1751. [Google Scholar] [CrossRef] [PubMed]
  52. Qing, Y.; Liu, W.; Feng, L.; Gao, W. Improved transformer net for hyperspectral image classification. Remote Sens. 2021, 13, 2216. [Google Scholar] [CrossRef]
  53. He, J.; Zhao, L.; Yang, H.; Zhang, M.; Li, W. HSI-BERT: Hyperspectral image classification using the bidirectional encoder representation from transformers. IEEE Trans. Geosci. Remote Sens. 2019, 58, 165–178. [Google Scholar] [CrossRef]
  54. Tao, C.; Tang, Y.; Fan, C.; Zou, Z. Hyperspectral imagery classification based on rotation-invariant spectral–spatial feature. IEEE Geosci. Remote Sens. Lett. 2013, 11, 980–984. [Google Scholar] [CrossRef]
  55. Chen, S.; Ye, M.; Du, B. Rotation Invariant Transformer for Recognizing Object in UAVs. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 2565–2574. [Google Scholar]
  56. Audebert, N.; Le Saux, B.; Lefèvre, S. Deep learning for classification of hyperspectral data: A comparative review. IEEE Geosci. Remote Sens. Mag. 2019, 7, 159–173. [Google Scholar] [CrossRef] [Green Version]
  57. Hong, D.; He, W.; Yokoya, N.; Yao, J.; Gao, L.; Zhang, L.; Chanussot, J.; Zhu, X. Interpretable hyperspectral artificial intelligence: When nonconvex modeling meets hyperspectral remote sensing. IEEE Geosci. Remote Sens. Mag. 2021, 9, 52–87. [Google Scholar] [CrossRef]
  58. Cao, X.; Zhou, F.; Xu, L.; Meng, D.; Xu, Z.; Paisley, J. Hyperspectral image classification with Markov random fields and a convolutional neural network. IEEE Trans. Image Process. 2018, 27, 2354–2367. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  59. Imani, M.; Ghassemian, H. An overview on spectral and spatial information fusion for hyperspectral image classification: Current trends and challenges. Inform. Fusion 2020, 59, 59–83. [Google Scholar] [CrossRef]
  60. Luo, Y.; Zou, J.; Yao, C.; Zhao, X.; Li, T.; Bai, G. HSI-CNN: A novel convolution neural network for hyperspectral image. In Proceedings of the 2018 International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, China, 16–17 July 2018; pp. 464–469. [Google Scholar]
  61. Li, Y.; Zhang, H.; Shen, Q. Spectral–spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef] [Green Version]
Figure 1. An example of convolution results from different angles.
Figure 1. An example of convolution results from different angles.
Drones 07 00240 g001
Figure 2. (A) Image-level rotation may lose samples or changes the image size. (B) Without additional constraint conditions, the feature-level rotation may still lose features. (C) The proposed feature-level rotation maintains all the features without additional constraint conditions.
Figure 2. (A) Image-level rotation may lose samples or changes the image size. (B) Without additional constraint conditions, the feature-level rotation may still lose features. (C) The proposed feature-level rotation maintains all the features without additional constraint conditions.
Drones 07 00240 g002
Figure 3. The network architecture of the proposed method, named the spectral-spatial attention rotation-invariant classification network (SSARIN), mainly contains a band selection (BS) module, a local spatial feature enhancement (LSFE) module, and a lightweight feature enhancement (LWFE) module.
Figure 3. The network architecture of the proposed method, named the spectral-spatial attention rotation-invariant classification network (SSARIN), mainly contains a band selection (BS) module, a local spatial feature enhancement (LSFE) module, and a lightweight feature enhancement (LWFE) module.
Drones 07 00240 g003
Figure 4. The architecture of the band selection (BS) module.
Figure 4. The architecture of the band selection (BS) module.
Drones 07 00240 g004
Figure 5. The architecture of the proposed local spatial feature enhancement (LSFE) module and the lightweight feature enhancement (LWFE) module.
Figure 5. The architecture of the proposed local spatial feature enhancement (LSFE) module and the lightweight feature enhancement (LWFE) module.
Drones 07 00240 g005
Figure 6. An example of the spatial-spectral feature rotation.
Figure 6. An example of the spatial-spectral feature rotation.
Drones 07 00240 g006
Figure 7. The structures of the spatial attention module.
Figure 7. The structures of the spatial attention module.
Drones 07 00240 g007
Figure 8. Indian Pines dataset. (a) Pseudo-color image. (b) Ground truth.
Figure 8. Indian Pines dataset. (a) Pseudo-color image. (b) Ground truth.
Drones 07 00240 g008
Figure 9. Salinas dataset. (a) Pseudo-color image. (b) Ground truth.
Figure 9. Salinas dataset. (a) Pseudo-color image. (b) Ground truth.
Drones 07 00240 g009
Figure 10. Pavia University dataset. (a) Pseudo-color image. (b) Ground truth.
Figure 10. Pavia University dataset. (a) Pseudo-color image. (b) Ground truth.
Drones 07 00240 g010
Figure 11. Pavia Center dataset. (a) Pseudo-color image. (b) Ground-truth.
Figure 11. Pavia Center dataset. (a) Pseudo-color image. (b) Ground-truth.
Drones 07 00240 g011
Figure 12. Houston dataset. (a) Pseudo-color image. (b) Ground truth.
Figure 12. Houston dataset. (a) Pseudo-color image. (b) Ground truth.
Drones 07 00240 g012
Figure 13. OAs (%) of different spatial sizes on the different databases.
Figure 13. OAs (%) of different spatial sizes on the different databases.
Drones 07 00240 g013
Figure 14. The results of all algorithms in testing samples of the IP dataset.
Figure 14. The results of all algorithms in testing samples of the IP dataset.
Drones 07 00240 g014
Figure 15. The results of all algorithms in testing samples of the SA dataset.
Figure 15. The results of all algorithms in testing samples of the SA dataset.
Drones 07 00240 g015
Figure 16. The results of all algorithms in testing samples of the PU dataset.
Figure 16. The results of all algorithms in testing samples of the PU dataset.
Drones 07 00240 g016
Figure 17. The results of all algorithms in testing samples of the PC dataset.
Figure 17. The results of all algorithms in testing samples of the PC dataset.
Drones 07 00240 g017
Figure 18. The results of all algorithms in testing samples of the Houston dataset.
Figure 18. The results of all algorithms in testing samples of the Houston dataset.
Drones 07 00240 g018
Table 1. The common airborne hyperspectral images datasets.
Table 1. The common airborne hyperspectral images datasets.
DatasetSensorPlatformBandsRange (nm)Land-CoverArea
Indian PinesAVIRISAerial220400–250016Agriculture
Luojia-HSSR [5]AMMHSAerial249390–98023Urban
HoustonITERS CASIAerial144364–104615Urban
Pavia CenterROSISAerial102430–8609Urban
Matiwan [6]VNIRUAV250400–100013Agriculture
WHU-Hi-LongkouHeadwallUAV270400–10009Agriculture
WHU-Hi-HanChuanHeadwallUAV270400–100016Agriculture
WHU-Hi-HongHuHeadwallUAV270400–100022Agriculture
Table 2. The architecture of the proposed SSARIN.
Table 2. The architecture of the proposed SSARIN.
ModuleLayersInput SizeOutput SizeConnected to
InputPatch H P × P × B /SpeA, SpeA-Conv
BSSpeA P × P × B 1 × 1 × B SpeA-Conv
SpeA-Conv P × P × B , 1 × 1 × B P × P × B Rotation
LSFERotation P × P × B P × P × B LSFE-Conv-1
LSFE-Conv-1 P × P × B P × P × 256 SpaA-1, SpaA-Conv-1
SpaA-1 P × P × 256 P × P × 1 SpaA-Conv-1
SpaA-Conv-1 P × P × 256 , P × P × 1 P × P × 256 LSFE-Conv-2
LSFE-Conv-2 P × P × 256 P × P × 128 SpaA-2, SpaA-Conv-2
SpaA-2 P × P × 128 P × P × 1 SpaA-Conv-2
SpaA-Conv-2 P × P × 128 , P × P × 1 P × P × 128 LSFE-Conv-3
LSFE-Conv-3 P × P × 128 K × K × 64 Mean
Mean K × K × 64 K × K × 64 LSFE-Conv
LWFELSFE-Conv K × K × 64 K × K × 64 AP
AP K × K × 64 1 × 1 × 64 FC
ClassifierFC 1 × 1 × 64 16LogSoftmax
LogSoftmax16C/
Table 3. The detailed structures of the band selection (BS) module.
Table 3. The detailed structures of the band selection (BS) module.
LayersInput SizeOutput SizeKernel Size
AP P × P × N 1 × 1 × N /
Conv+ReLU 1 × 1 × N 1 × 1 × N / 4 1 × 1 × N / 4
Conv 1 × 1 × N / 4 1 × 1 × N 1 × 1 × N
σ · 1 × 1 × N 1 × 1 × N /
1 × 1 × N / P × P × N P × P × N /
Table 4. The detailed structures of the local spatial feature enhancement (LSFE) module.
Table 4. The detailed structures of the local spatial feature enhancement (LSFE) module.
LayersInput SizeOutput SizeKernel SizeS/P 1
Conv + ReLU P × P × N P × P × 256 3 × 3 × 256 1/1
SpaA 2 P × P × 256 P × P × 256 //
Conv + ReLU P × P × 256 P × P × 128 3 × 3 × 128 1/1
SpaA P × P × 128 P × P × 128 //
Conv + ReLU P × P × 128 P × P × 256 1 × 1 × 256 /
Conv + ReLU P × P × 256 P × P × 512 3 × 3 × 512 1/1
Conv + ReLU P × P × 512 P × P × 256 5 × 5 × 256 1/1
Conv + ReLU K × K × 256 K × K × 128 3 × 3 × 128 1/1
Conv + ReLU K × K × 128 K × K × 64 1 × 1 × 64 /
1 S/P represents Stride/Padding; 2 SpaA represents Spatial-Attention.
Table 5. The detailed structures of the lightweight feature enhancement (LWFE) module.
Table 5. The detailed structures of the lightweight feature enhancement (LWFE) module.
LayersInput SizeOutput SizeKernel Size
Conv + ReLU K × K × 64 K × K × 256 1 × 1 × 256
Conv + ReLU K × K × 256 K × K × 64 1 × 1 × 64
AP K × K × 64 1 × 1 × 64 /
FC 1 × 1 × 64 1 × 16 /
Table 6. Training/testing samples of the Indian Pines (IP) dataset.
Table 6. Training/testing samples of the Indian Pines (IP) dataset.
Class No.Class NameTrainingTesting
1Alfalfa446
2Corn-Notill1421428
3Corn-Mintill83830
4Corn23237
5Grass-Pasture48483
6Grass-Trees73730
7Grass-Pasture-Mowed228
8Hay-Windrowed47478
9Oats220
10Soybean-Notill97972
11Soybean-Mintill2452455
12Soybean-Clean59593
13Wheat20205
14Woods1261265
15Buildings-Grass-Trees-Drives38386
16Stone-Steel-Towers993
Total-101810,249
Table 7. Training/testing samples of the Salinas (SA) dataset.
Table 7. Training/testing samples of the Salinas (SA) dataset.
Class No.Class NameTrainingTesting
1Brocoli-Green-Weeds-1402009
2Brocoli-Green-Weeds-2743726
3Fallow391976
4Fallow-Rough-Plow271394
5Fallow-Smooth532678
6Stubble793959
7Celery713579
8Grapes-Untrained22511,271
9Soil-Vinyar-Develop1246203
10Corn-Senesced-Green-Weeds653278
11Lettuce-Romaine-4wk211068
12Lettuce-Romaine-5wk381927
13Lettuce-Romaine-6wk18916
14Lettuce-Romaine-7wk211070
15Vinyard-Untrained1457268
16Vinyard-Vertical-Trellis361807
Total-107654,129
Table 8. Training/testing samples of the Pavia University (PU) dataset.
Table 8. Training/testing samples of the Pavia University (PU) dataset.
Class No.Class NameTrainingTesting
1Asphalt1326631
2Meadows37218,649
3Gravel412099
4Trees613064
5Painte-Metal-Sheets261345
6Bar-Soil1005029
7Bitumen261330
8Self-Blocking-Bricks733682
9Shadows18947
Total84942,776
Table 9. Training/testing samples of the Pavia Center (PC) dataset.
Table 9. Training/testing samples of the Pavia Center (PC) dataset.
Class No.Class NameTrainingTesting
1Water13165,971
2Trees157589
3Asphalt63090
4Self-Blocking-Bricks52685
5Bitumen136584
6Tiles189248
7Shadows147287
8Meadows8542,826
9Bare-Soil52863
Total292148,152
Table 10. Training/testing samples of the Houston dataset.
Table 10. Training/testing samples of the Houston dataset.
Class No.Class NameTrainingTesting
1Grass-Healthy1251251
2Grass-Stressed1251254
3Grass-Synergic69697
4Tree1241244
5Soil1241242
6Water32325
7Residential1261268
8Commercial1241244
9Road1251252
10Highway1221227
11Railway1231235
12Parking-Lot-11231233
13Parking-Lot-246469
14Tennis-Court42428
15Running-Track66660
Total149615,029
Table 11. The OAs (%) of the ablation experiment on the public dataset.
Table 11. The OAs (%) of the ablation experiment on the public dataset.
DatasetBaselineSSALWFESSARIN
IP98.1598.3698.5398.59
SA98.9299.6499.7899.81
PU97.7398.9199.0399.05
PC98.5198.9599.0399.08
Houston98.8898.9799.1999.30
Table 12. OA (%) with different rotation angles for the ablation experiment on the public dataset.
Table 12. OA (%) with different rotation angles for the ablation experiment on the public dataset.
RotationNetworkIPSAPUPCHouston
0LWFE98.5399.7899.0399.0399.19
SSARIN98.5999.8199.0599.0899.30
90LWFE95.9899.3798.5198.9898.98
SSARIN98.5999.8199.0599.0899.30
180LWFE92.9598.7797.7898.9298.78
SSARIN98.5999.8199.0599.0899.30
270LWFE96.2299.4698.4098.9799.14
SSARIN98.5999.8199.0599.0899.30
Table 13. OA (%) with different rotation angles for the different methods on the IP dataset.
Table 13. OA (%) with different rotation angles for the different methods on the IP dataset.
Rotation1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
085.7372.2983.1178.7398.1598.4894.5679.5498.1397.5398.59
9085.7366.7966.4678.7397.0591.1994.5664.3086.8879.5098.59
18085.7366.6265.1278.7395.1487.1794.5669.7684.7378.7698.59
27085.7365.8467.0578.7396.9791.3094.5663.9985.8180.0898.59
Table 14. Accuracy in each class, OA (%), AA (%), and κ at 0 degrees on the IP dataset.
Table 14. Accuracy in each class, OA (%), AA (%), and κ at 0 degrees on the IP dataset.
Class No.1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
145.6519.5726.0932.6195.6582.6193.480.0089.1378.2691.30
282.0060.5776.5472.7696.2298.1192.3772.9799.5896.8598.25
373.0154.5870.3661.8199.7698.1995.1868.9298.5599.2899.64
472.1541.3552.3248.1099.1698.7391.9856.5491.5697.8999.58
590.6881.3792.7587.5898.5597.9395.8687.5899.1798.7698.96
695.5897.8194.7597.8195.3499.7399.4587.6798.3697.5397.53
717.867.1428.5721.4389.2957.1464.290.00100.0050.0096.43
898.7494.56100.00100.00100.00100.0099.16100.00100.00100.00100.00
945.0030.0025.0015.00100.00100.0075.000.00100.0045.00100.00
1085.1859.8875.3167.7096.3098.0587.9673.1595.0697.7498.46
1185.3477.8888.9678.8699.5199.8096.0999.6998.9498.1398.66
1288.7043.6867.1278.2594.1098.6586.1754.3091.7498.1594.44
1399.51100.0084.3996.10100.00100.0099.51100.00100.0099.5199.51
1495.1893.7596.6894.94100.0099.3798.7493.9199.6898.5899.84
1565.5455.1878.7685.81100.0089.9093.5296.6499.4893.52100.00
1683.8764.5268.8287.1096.77100.0092.4788.1795.7083.8795.70
AA76.5661.3670.3768.6897.5494.8991.3363.2897.3189.5798.02
OA85.7372.2983.1178.7398.1598.4894.5679.5498.1397.5398.59
Kappa83.7168.1180.6175.6597.8898.2693.7876.4597.8697.1898.39
Table 15. Accuracy in each class, OA (%), AA (%), and κ at 90 degrees on the IP dataset.
Table 15. Accuracy in each class, OA (%), AA (%), and κ at 90 degrees on the IP dataset.
Class No.1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
145.652.170.0032.6180.4378.2693.480.0015.2271.7491.30
282.0055.0557.5672.7695.6692.7992.3749.2391.8164.8598.25
373.0142.4128.4361.8196.8786.9995.1843.4950.2452.6599.64
472.1537.9730.3848.1097.4779.3291.9824.4774.2679.7599.58
590.6870.1966.8787.5892.9687.1695.8682.4088.8277.4398.96
695.5896.8581.2397.8196.3099.7399.4585.6297.8196.8597.53
717.860.000.0021.4342.8614.2964.290.0035.71100.0096.43
898.7493.3199.79100.00100.00100.0099.16100.00100.00100.00100.00
945.000.000.0015.0075.0010.0075.000.0030.000.00100.00
1085.1850.9359.1667.7096.3098.0587.9673.1595.0697.7498.46
1185.3475.5281.3878.8696.1984.3687.9660.7076.5496.7398.66
1288.7026.4840.6478.2596.2983.1486.1728.3375.0455.1494.44
1399.5184.390.0096.10100.00100.0099.510.0099.5162.9399.51
1495.1893.9994.7094.9499.8498.4298.7490.6798.5095.8199.84
1565.5454.6654.1585.8194.5680.5793.529.8496.8966.32100.00
1683.8748.3972.0487.1098.92100.0092.4791.4093.5572.0495.70
AA76.5652.0247.9068.6891.3780.4891.3346.5779.9669.1298.02
OA85.7366.7966.4678.7397.0591.1994.5664.3086.8879.5098.59
Kappa83.7161.7261.2175.6596.6489.9493.7858.4884.9676.4398.39
Table 16. OA (%) with different rotation angles for the different methods on SA dataset.
Table 16. OA (%) with different rotation angles for the different methods on SA dataset.
Rotation1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
091.6189.7288.7388.8399.6799.6597.1396.2799.3698.4699.81
9091.6187.0279.4288.8399.1999.0597.1389.5794.7789.7999.81
18091.6186.7977.4188.8399.0997.8597.1390.9393.1287.6899.81
27091.6186.8878.6888.8399.4599.1197.1387.3793.3288.0699.81
Table 17. Accuracy in each class, OA (%), AA (%), and κ at 180 degrees on the SA dataset.
Table 17. Accuracy in each class, OA (%), AA (%), and κ at 180 degrees on the SA dataset.
Class No.1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
199.5098.3690.8499.30100.00100.0099.7099.90100.0098.71100.00
299.0696.4692.2799.97100.0099.6899.8199.62100.0099.95100.00
399.2495.5595.1498.28100.00100.0099.0499.44100.0099.34100.00
497.7095.5580.4299.35100.0093.5499.2899.7196.7718.4499.86
598.8490.0390.7294.4796.2796.4598.8183.5393.9988.8799.14
699.7597.4781.6699.85100.0099.95100.00100.00100.0099.9099.97
799.5596.5494.8699.61100.0099.80100.0095.6499.8685.00100.00
882.7181.4568.0065.6899.4896.7294.4584.7787.5787.9899.42
999.7999.4597.0299.82100.00100.0099.1199.8599.5099.92100.00
1092.1379.5654.8594.9499.6097.5697.6292.2892.7179.9999.91
1195.5175.0986.2491.29100.0091.4895.6968.2672.6576.5999.81
1299.6999.5886.2099.9599.6499.5899.6498.7598.1851.69100.00
1397.6090.611.7599.5692.6999.7899.0297.2786.3596.9499.24
1492.8088.3262.8095.3399.4494.8697.7699.1698.6991.87100.00
1573.7860.8376.4978.1096.7395.2892.4374.2681.1881.59100.00
1693.8588.0579.2598.0199.8999.6195.3098.1299.6199.94100.00
AA95.0989.5674.9194.5998.9897.7797.9893.1694.1984.7999.83
OA91.6186.7977.4188.8399.0997.8597.1390.9393.1287.6899.81
Kappa90.6685.2774.8087.6198.9997.6196.8089.9092.3486.2899.79
Table 18. OA (%) with different rotation angles for the different methods on the PU dataset.
Table 18. OA (%) with different rotation angles for the different methods on the PU dataset.
Rotation1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
089.6493.0085.2888.8399.3898.9798.0595.3898.8398.0099.05
9089.6488.5374.9388.8398.7397.5198.0585.0495.8390.0699.05
18089.6485.6974.5388.8397.7995.5598.0587.5093.7087.9999.05
27089.6489.6774.0988.8398.3897.3598.0586.0294.1292.4099.05
Table 19. Accuracy in each class, OA (%), AA (%), and κ at 270 degrees on the PU dataset.
Table 19. Accuracy in each class, OA (%), AA (%), and κ at 270 degrees on the PU dataset.
Class No.1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
190.6887.8686.9386.3198.4099.1698.4685.4690.2794.92100.00
295.6697.6190.6795.7999.7499.9299.4094.2199.799.2799.80
364.4674.2725.8273.0399.0982.7188.7672.7592.0964.8997.38
485.7079.1871.4188.9093.0592.8996.8386.5995.0483.0996.67
599.7099.6313.4699.4899.85100.0099.7899.7098.6697.8499.93
676.8584.2165.4880.97100.0099.0398.6794.9799.6897.65100.00
784.8979.1743.5372.18100.0098.9597.8262.7179.6291.8899.62
887.7576.5657.7777.2493.5489.3895.0674.4568.2569.4295.82
999.8994.5111.7299.2697.7197.5799.8972.2395.9985.2294.40
AA87.2985.8951.8785.9197.3995.4997.1479.5691.0497.1398.18
OA89.6489.6774.0988.8398.3897.3598.0586.0294.1292.4099.05
Kappa86.1686.1965.3685.1597.8596.4997.4281.6292.2289.9098.74
Table 20. OA (%) with different rotation angles for the different methods on the PC dataset.
Table 20. OA (%) with different rotation angles for the different methods on the PC dataset.
Rotation1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
097.4497.3497.2297.3498.4398.6398.3598.2598.3598.6199.08
9097.4496.4695.4397.3498.3098.5898.3597.1797.1797.7599.08
18097.4496.9396.0097.3498.3498.3098.3597.3396.9797.6199.08
27097.4496.3996.2997.3498.4098.5198.3597.3397.1597.3499.08
Table 21. Accuracy in each class, OA (%), AA (%), and κ at 90 degrees on the PC dataset.
Table 21. Accuracy in each class, OA (%), AA (%), and κ at 90 degrees on the PC dataset.
Class No.1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
199.9699.7699.9899.9999.9999.9999.9099.8399.9699.98100.00
294.8196.4396.4691.7398.6792.2594.6796.3396.2590.0096.37
381.0369.1982.8890.9783.6990.7490.5577.1269.1975.7691.26
475.6173.7453.0757.2897.5875.9078.4481.8272.0392.7495.38
589.8781.1283.2598.3491.8398.1296.6993.7793.4192.2797.28
693.4692.5292.5498.7199.9199.4299.2892.6394.8799.3799.76
792.1891.0382.8592.1984.6696.1292.1988.7188.4588.3497.86
899.3598.7399.2698.5899.9499.8499.8399.8399.1499.8999.59
999.7295.2897.4299.9399.5194.4893.8986.5797.0793.9994.24
AA91.7888.6487.5291.0095.0994.1093.9490.7490.0492.4896.86
OA97.4496.3996.2997.3498.4098.5198.3597.3397.1597.3499.08
Kappa96.3894.8894.7296.2497.7397.8997.6696.2295.9796.7998.69
Table 22. OA (%) with different rotation angles for the different methods on the Houston dataset.
Table 22. OA (%) with different rotation angles for the different methods on the Houston dataset.
Rotation1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
091.9695.2290.4591.9099.2398.8197.3396.7598.5397.7899.30
9091.9693.2177.1291.9097.9198.3597.3386.9791.9689.1399.30
18091.9683.7483.5891.9098.4798.1797.3393.9796.0594.5899.30
27091.9693.6176.2391.9097.9997.8797.3387.4290.5088.8599.30
Table 23. Accuracy in each class, OA(%), AA(%), and κ at 180 degrees on the Houston dataset.
Table 23. Accuracy in each class, OA(%), AA(%), and κ at 180 degrees on the Houston dataset.
Class No.1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
197.04100.0094.1690.7398.9697.9299.0496.4098.3292.25100.00
299.3698.2590.9997.7799.2899.3698.4197.6998.6498.4898.88
398.8597.8596.56100.00100.0099.86100.0099.86100.0099.86100.00
498.7995.8295.6698.7199.6899.7697.5193.4999.2895.5099.84
598.6399.6897.5899.19100.00100.0099.7699.76100.0099.60100
698.7792.3182.1599.6998.1599.6995.3893.5497.2396.6298.15
786.5187.5474.5387.3899.5396.2997.8790.1487.3896.2998.82
885.2987.1464.7992.3697.1198.3197.5190.3589.9592.52100.00
977.3283.1588.6676.3694.6592.0192.0188.5092.3392.6596.25
1092.9997.6472.7895.2799.3599.3596.0997.2399.5193.56100.00
1188.5893.1271.1785.9197.0997.9896.1992.1596.6888.9199.51
1295.3095.3872.6989.5398.8698.6298.3889.0594.0887.7598.95
1359.9181.4579.7472.9294.0396.5992.5488.0692.5494.8899.58
1498.1396.9681.78100.00100.00100.0097.4398.36100.0099.77100.00
1599.8596.6298.0398.9499.5599.7099.7098.7999.8599.85100.00
AA91.6993.5284.0892.3298.4298.3697.1994.2296.3995.2399.33
OA91.9683.7483.5891.9098.4798.1797.3393.9796.0594.5899.30
Kappa91.3093.2382.2491.2598.3598.0297.1193.4895.7394.1599.24
Table 24. The training and testing time of the different methods.
Table 24. The training and testing time of the different methods.
DatasetTime (s)1D CNN2D CNN3D CNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
IPTraining16.53194.6115.1126.6270.0847.0351.31105.7055.90125.17594.51
Testing0.190.690.220.210.670.290.401.420.410.826.59
SATraining18.01162.3317.2028.6381.9944.5049.99110.1659.50129.16674.84
Testing0.682.091.121.083.521.762.067.961.975.2927.61
PUTraining16.4760.6915.1418.5156.0834.9542.0381.3156.07104.51563.91
Testing0.461.050.620.801.911.411.616.191.584.0824.617
PCTraining14.6229.6311.1315.7645.9333.2335.5485.7541.4492.54510.75
Testing0.793.362.072.725.543.885.535.015.3011.7266.78
HoustonTraining20.3387.8926.7034.6595.6237.6276.89156.6683.28165.53976.77
Testing0.240.630.260.300.750.410.721.940.541.187.88
Table 25. The parameters of the different methods.
Table 25. The parameters of the different methods.
Method1D CNN2D CNN3D CNNNRNNSSRNHybridSNRIANSFSSFTTGAHTSSARIN
Parameters74,196109,613,786115,564235,024129,068108,91289,26099,640950,2801,228,94041,694,672
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, Y.; Fu, B.; Wang, N.; Cheng, Y.; Fang, J.; Liu, X.; Zhang, G. Spectral-Spatial Attention Rotation-Invariant Classification Network for Airborne Hyperspectral Images. Drones 2023, 7, 240. https://doi.org/10.3390/drones7040240

AMA Style

Shi Y, Fu B, Wang N, Cheng Y, Fang J, Liu X, Zhang G. Spectral-Spatial Attention Rotation-Invariant Classification Network for Airborne Hyperspectral Images. Drones. 2023; 7(4):240. https://doi.org/10.3390/drones7040240

Chicago/Turabian Style

Shi, Yuetian, Bin Fu, Nan Wang, Yinzhu Cheng, Jie Fang, Xuebin Liu, and Geng Zhang. 2023. "Spectral-Spatial Attention Rotation-Invariant Classification Network for Airborne Hyperspectral Images" Drones 7, no. 4: 240. https://doi.org/10.3390/drones7040240

APA Style

Shi, Y., Fu, B., Wang, N., Cheng, Y., Fang, J., Liu, X., & Zhang, G. (2023). Spectral-Spatial Attention Rotation-Invariant Classification Network for Airborne Hyperspectral Images. Drones, 7(4), 240. https://doi.org/10.3390/drones7040240

Article Metrics

Back to TopTop