SSANet-BS: Spectral–Spatial Cross-Dimensional Attention Network for Hyperspectral Band Selection

Cui, Chuanyu; Sun, Xudong; Fu, Baijia; Shang, Xiaodi

doi:10.3390/rs16152848

Open AccessArticle

SSANet-BS: Spectral–Spatial Cross-Dimensional Attention Network for Hyperspectral Band Selection

¹

College of Computer Science and Technology, Qingdao University, Qingdao 266071, China

²

School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(15), 2848; https://doi.org/10.3390/rs16152848

Submission received: 27 May 2024 / Revised: 22 July 2024 / Accepted: 26 July 2024 / Published: 3 August 2024

(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

Band selection (BS) aims to reduce redundancy in hyperspectral imagery (HSI). Existing BS approaches typically model HSI only in a single dimension, either spectral or spatial, without exploring the interactions between different dimensions. To this end, we propose an unsupervised BS method based on a spectral–spatial cross-dimensional attention network, named SSANet-BS. This network is comprised of three stages: a band attention module (BAM) that employs an attention mechanism to adaptively identify and select highly significant bands; two parallel spectral–spatial attention modules (SSAMs), which fuse complex spectral–spatial structural information across dimensions in HSI; a multi-scale reconstruction network that learns spectral–spatial nonlinear dependencies in the SSAM-fusion image at various scales and guides the BAM weights to automatically converge to the target bands via backpropagation. The three-stage structure of SSANet-BS enables the BAM weights to fully represent the saliency of the bands, thereby valuable bands are obtained automatically. Experimental results on four real hyperspectral datasets demonstrate the effectiveness of SSANet-BS.

Keywords:

hyperspectral imagery; band selection; spectral–spatial cross-dimensional attention; multi-scale reconstruction network

1. Introduction

Hyperspectral imagery (HSI) records numerous contiguous and narrow spectral bands, and has been extensively utilized in diverse fields such as military and industry [1]. Nonetheless, the high redundancy of bands in HSI poses great challenges in terms of data transmission, storage, and computation, and can also lead to the Hughes phenomenon in classification, thereby reducing classification accuracy [2,3]. Consequently, dimensionality reduction is essential for HSI.

The dimensionality reduction in HSI can be divided into two categories: feature extraction (FE) and band selection (BS) [4]. FE aims to project the original HSI into a lower-dimensional space, which results in the loss of HSI’s physical information due to alterations in the feature space. Conversely, BS focuses on selecting a representative band subset from HSI that preserves the physical significance and has higher interpretability, which is more preferable for practical applications [5,6,7].

According to different task scenarios, BS methods can be primarily categorized into target detection-oriented methods and classification-oriented methods. The former typically considers the spectral differences between targets and backgrounds when selecting a subset of bands [8,9], whereas the latter selects bands that contain a large amount of information and exhibit strong discrimination capability [10,11]. Most of these methods are supervised and depend on prior information, such as ground truth labels, which limits their practical application. In contrast, unsupervised methods do not rely on prior information, but select the representative bands by identifying the intrinsic properties of HSIdata. It is more versatile and can be applied to downstream tasks in all scenarios, including classification [12]. Hence, the focus of this paper is on unsupervised band selection methods which are more versatile for task scenarios.

The initial BS approaches employed the artificially designed band evaluation metrics or heuristic strategies to obtain target bands [1]. However, the manually designed BS process was unable to account for the complex real-world factors in a comprehensive manner, resulting in unsatisfactory performance. The advent of machine learning has offered novel insights into the field of BS. The maximum-variance principal component analysis (MVPCA) [13] and Boltzmann entropy-based band selection (BE) [14] employ specific metrics derived from machine learning models to evaluate the equality of bands. Techniques such as fast density-peak-based clustering (E-FDPC) [15] and graph regularized spatial–spectral subspace clustering (GRSC) [16] utilize clustering to partition bands into multiple clusters, from which the most representative band in each cluster is selected. The sparse representation-based band selection (SpaBS) [17] and spectral–spatial hypergraph-regularized self-representation (HyGSR) [18] operate under the assumption that the original HSI can be represented by a linear combination of a limited number of bands, identifying the optimal band combination through iterative optimization of the sparse representation model. Those above machine learning-based BS methods attained considerable performance and can effectively reduce band redundancy.

However, those BS methods usually rely on strong assumptions to model the internal interactions within HSI [19,20]. In reality, the interaction between bands and pixels is complex [21]. Due to various physical factors, the reflectance of band in a certain pixel is influenced by its surrounding pixels and bands. Thus, the predefined strong assumptions cannot cover all situations, and hence are not the optimal solution [22,23,24,25].

Neural networks possess remarkable fitting capabilities and can reveal the intricate interdependent relationships in HSI [26,27,28]. The attention mechanism is effective in distinguishing the important features [29,30]. Networks with the attention mechanism can automatically learn the potential interrelations and distinguish the most representative bands of HSI [31,32]. Therefore, various attention modules are widely employed in the field of band selection. On this basis, BS-Nets [33] is the first band selection framework that combines band attention mechanism and autoencoder. This model utilizes attention mechanism to search for important bands and applies band-wise attention weighting to the original HSI. By optimizing the model using an autoencoder, significant bands are selected according to the band attention weights.

Subsequently, various models have employed different attention mechanisms to model the spectral or spatial dimensions of HSI, and enhance the model performance. Attention-based autoencoder (AAE) [34] generates the attention mask for each pixel. Then, the band correlations are calculated based on the attention mask and the final band subset is obtained by clustering. Non-local band attention network (NBAN) [20] employs a global-local attention mechanism, which fully considers the nonlinear long-range dependencies of HSI in the band dimension. This approach significantly enhances the effectiveness and robustness of the attention mechanism, facilitating the automatic selection of bands. Dual-attention reconstruction network (DARecNet-BS) [35] incorporates two independent self-attention mechanisms, one in the spectral dimension and the other in the spatial dimension. These mechanisms enhance the results of band selection by exploring the dependencies of HSI in various dimensions.

The aforementioned methods have utilized the nonlinear interaction information within HSI, yielding promising results. However, these methods solely model in the spectral or spatial dimensions independently, overlooking the potential improvement in performance achieved from considering the connections between them. Most of them are two-stage models, i.e., attention and reconstruction network, which brought new challenges. Methods such as DARecNet-BS and the triplet-attention and multi-scale reconstruction network (TAttMSRecNet) [36] introduce spatial attention modules connected in parallel with the band attention module to mine image spatial information, as shown in Figure 1a. The band attention module and the spatial attention modules are integrated to function in unison, thereby rendering the band attention weights unable to represent the saliency of the bands independently. Hence, these methods are incapable of selecting bands based on the converged band attention weights and must rely on calculating the entropy of the reconstructed image for band selection, a process that constrains the potential of the attention mechanism to automatically identify significant bands.

In this end, we propose a deep neural network, SSANet-BS, based on spectral–spatial cross-dimensional attention. SSANet-BS is a three-stage model, as shown in Figure 1b and Figure 2. This network regards BS as a reconstruction task for HSI to achieve unsupervised BS that is applicable to scenarios such as classification. Initially, a band attention module (BAM) is designed to model the spectral dimension of HSI, extracting salient band features, and outputting band attention weights. Subsequently, two spectral–spatial attention modules (SSAMs) are constructed in the band-width (b-w) and band-height (b-h) directions using the BAM-weighted image as input, to explore the complex spectral–spatial interactions within HSI, and generate SSAM weights along with SSAM-fused image. Finally, a multi-scale reconstruction network is used to reconstruct the above fused image. In the optimization process, the band attention weights obtained by BAM are gradually converged to the bands with large information and high saliency. Compared with the above two-stage methods, DARecNet-BS and TAttMSRecNet, SSANet-BS makes SSAMs compatible with BAM by the ingenious design of the three-stage structure, and fully takes advantage of the automatic convergence of the attention mechanism to the important bands during the back propagation process to achieve automatic band selection. The main contributions of this paper are as follows:

This paper proposes a deep neural network based on spectral–spatial cross-dimensional attention for hyperspectral BS, named SSANet-BS. This network employs complementary multi-dimensional attention mechanisms to automatically discover salient bands, and improves the performance of BS by exploring the complex spectral–spatial interactions in HSI.
SSANet-BS, with its three-stage structural design, addresses the issue of existing BS methods that introduce spatial modules, which compromise the independence of the band attention weights. The experimental results demonstrate that SSANet-BS is effective and stable. This offers a novel solution for the field of hyperspectral BS.

2. The Proposed Method

This section introduces the proposed method, SSANet-BS, outlines its design concept and overall structure, and presents the implementation details of every module and step.

2.1. Overview of SSANet-BS

SSANet-BS treats BS as a task of band-weighted reconstruction for HSI. To enhance performance, it fully models the nonlinear interactions between pixels and bands in HSI [37] throughout the reconstruction process. SSANet-BS is comprised of three stages, and the overall structure is shown in Figure 2.

In the first stage, SSANet-BS inputs image patch

X \in ℝ^{M \times N \times L}

from the original HSI multiple times as a single input with width M, height N, and number of bands L. This process ensures that SSANet-BS can read the original HSI thoroughly. Afterwards, X is fed into the band attention module (BAM) to obtain band attention weights, which are then applied proportionally. The output of BAM is a band-attention-weighted image with enhanced salient bands.

The second stage is designed to extract the spectral–spatial information of the HSI. The above BAM-weighted image is input into two spectral–spatial attention modules (SSAMs) to fully explore the complex spectral–spatial cross-dimesional interactions. This leads to the generation of spectral–spatial attention weights, which are later used to construct the SSAM-fusion image.

The third stage is the reconstruction of the attention-weighted HSI for model optimization. A multi-scale reconstruction network based on 3D convolution and transposed convolution is employed to reconstruct the aforementioned SSAM-fusion image. The loss function is defined as the residual between the reconstructed image and the original image, facilitating the optimization of SSANet-BS.

It should be noted that the existing two-stage approach employs a band attention module and other modules in the same stage. Consequently, the band attention weights are unable to represent the salience of bands independently. In contrast, the BAM of SSANet-BS is employed independently in the first stage. Therefore, the weight vector generated by the BAM represent the salience or reconstruction capability of each band in relation to the original HSI. When SSANet-BS reaches convergence, the band attention weights are sorted in descending order. The higher the band ranking, the higher its priority. Specifically, the details of each module in SSANet-BS are illustrated below.

2.2. The Band Attention Module

The BAM takes X as input, and generates a band attention weight vector through neural network

f_{b}

within this module:

w_{b} = f_{b} (X)

(1)

The i-th element

w_{b}^{i} \in [0, 1]

of the vector

w_{b} \in ℝ^{L}

represents the salience of the i-th band

b_{i}

in X. A higher value of

w_{b}^{i}

indicates that

b_{i}

contributes more to the reconstruction of X, making it more salient. The structure of

f_{b}

is detailed in Figure 3 and Table 1. Compared to using a fully connected network to extract band information from a single pixel, employing a convolutional neural network with spatial inductive bias [33] can effectively make use of spatial information and boost modeling capabilities. Consequently, the initial layer of the network uses multiple 2D convolution kernels to extract band information, while the second layer employs max-pooling operations to reduce the feature dimension of the output of the convolutional layer. Finally, after passing through a fully connected network with a sigmoid activation function and batch normalization, the weight vector

w_{b}

can be obtained.

Subsequently, X is weighted band-by-band to generate the output image

X_{b}

of BAM:

X_{b} = f_{linear} (w_{b}) \otimes X

(2)

Here,

X_{b} \in ℝ^{M \times N \times L}

.

\otimes

represents band-wise multiplication, and

f_{linear}

denotes the linear transformation operation. The

L_{1}

regularization is imposed on the loss function of SSANet-BS, which introduces a sparse constraint on

w_{b}

in order to reduce the redundancy of the final band subset. Therefore, some elements in

w_{b}

may be 0 or close to 0. At this point, if

X_{b}

is obtained through

w_{b} \otimes X

, it will inevitably lose some original band information, making it difficult for the subsequent SSAMs to fully model the spectral–spatial cross-dimensional interactions in HSI. Therefore, in this paper, the linear transformation

f_{linear}

is adopted to map each element in

w_{b}

from the range of

[0, 1]

to

[0.5, 1]

without changing relative relationship of band saliency. As the input of the subsequent module,

X_{b}

enhances the features of salient bands in X, improving the rationality of BS.

2.3. The Spectral–Spatial Attention Module

If only a single dimension such as spectral or spatial considered in HSI reconstruction, the interdependent relationship between these dimensions is ignored. In reality, proper modeling of the complex nonlinear interactions in HSI can effectively improve the performance of model [36]. Based on this, SSANet-BS not only uses BAM to learn and model the interactions in the spectral dimension but also further introduces two spectral–spatial attention modules (SSAM) for the band-width (b-w) and band-height (b-h) directions. This approach aims at fusing the spectral–spatial information to deeply explore the complex spectral–spatial cross-dimensional dependencies in HSI.

{SSAM}_{b - w}

and

{SSAM}_{b - h}

are implemented in the same way, except that the directions are different. Taking the

{SSAM}_{b - w}

in b-w direction as an example, the neural network

f_{b - w}

takes

X_{b}

as input, and generates the spectral–spatial attention weight matrix

W_{b - w}

in b-w direction:

W_{b - w} = f_{b - w} (X_{b})

(3)

In Equation (3), the elements in

W_{b - w} \in ℝ^{M \times L}

are non-negative. The detailed structure of

f_{b - w}

is shown in Table 1.

{SSAM}_{b - w}

first performs max-pooling and average-pooling along the height direction of

X_{b}

to reduce its dimensionality, obtaining feature maps of salient and global information in the b-w direction. Further, by stacking the above two feature maps and passing them through a convolutional layer, a batch normalization layer and a ReLU nonlinear activation function,

W_{b - w}

and the SSAM-weighted image of the b-w direction

X_{b - w} \in ℝ^{M \times N \times L}

can be obtained:

X_{b - w} = W_{b - w} ⊙ X_{b}

(4)

Here,

⊙

represents the corresponding position-wise multiplication. Specifically, let

X_{b - w}^{k} \in ℝ^{M \times L}

and

X_{b}^{k} \in ℝ^{M \times L}

be the k-th section or layer of

X_{b - w}

and

X_{b}

in the height direction,

1 \leq k \leq L

, respectively. Then,

X_{b - w}^{k}

is obtained by the element-wise multiplication of

X_{b}^{k}

and

W_{b - w}

. Similarly, the module

{SSAM}_{b - h}

outputs the SSAM-weighted image

X_{b - h}

of the b-h direction. Then,

X_{b - w}

and

X_{b - h}

are fused to generate the SSAM-fusion image

X_{b - w - h}

:

X_{b - w - h} = A v g (X_{b - w}, X_{b - h})

(5)

In this case,

A v g

is the average operation.

X_{b - w - h}

will provide spectral–spatial cross-dimensional interaction information for the adjustment of BAM weight vector

w_{b}

and subsequent image reconstruction process, thus enabling SSANet-BS to fully utilize the spectral–spatial correlation information to select more reasonable bands and achieve performance improvement.

2.4. The Multi-Scale Reconstruction Network

3D convolutional networks can exploit spectral–spatial information and have found extensive usage in reconstructing HSI [36,38]. To develop a network that can model HSI’s interactions on varying scales and enhance reconstruction proficiency, this paper puts forth a multi-scale reconstruction network

f_{rec}^{ms}

inspired by MSRN [38] that incorporates 3D convolutions and transposed convolutions with diverse kernel scales. Then, the above SSAM-fusion image

X_{b - w - h}

can be reconstructed by

f_{rec}^{ms}

:

\hat{X} = f_{rec}^{ms} (X_{b - w - h})

(6)

The detailed implementation of

f_{rec}^{ms}

are displayed in Table 1. The SSAM-fusion image

X_{b - w - h}

will be reconstructed as

\hat{X} \in ℝ^{M \times N \times L}

using

f_{rec}^{ms}

. To ensure that bands with adjacent spectral positions are not assigned approximate attention weights and to reduce the redundancy of the band subset, the loss function of SSANet-BS is designed as:

J (θ) = \frac{1}{P} {‖\hat{X} - X‖}^{2} + λ {‖w_{b}‖}_{1}

(7)

Here,

θ

represents all trainable parameters of SSANet-BS.

P = M \times N

denotes the total number of pixel in X.

{‖\cdot‖}_{1}

represents the

L_{1}

sparse constraint. The coefficient

λ

controls the sparsity degree of

w_{b}

. The three-stage design of SSANet-BS enables the band attention weight

w_{b}

of the BAM to represent the band saliency independently, thus facilitating band selection. Specifically, the average of

w_{b}

corresponding to each X,

{\bar{w}}_{b}

can be treated as the ultimate salience scores of each band once the SSANet-BS has converged. The larger the i-th atom of

{\bar{w}}_{b}

, the more important the i-th band

b_{i}

. Based on this, after sorting the atoms of

{\bar{w}}_{b}

in descending order, the bands linked to the top n values are picked as the ultimate band subset.

3. Experiments

This section presents a comparative analysis of the proposed SSANet-BS model, two state-of-the-art feature extraction methods, and eight state-of-the-art BS methods on four publicly available datasets. The classification results are used to verify the effectiveness of each method. The experimental data, parameter settings, comprehensive analysis and discussions are detailed in the following sections.

3.1. Experimental Setup

The comparison methods include locally linear embedding (LLE) [39], isometric mapping (Isomap) [40], maximum-variance principal component analysis (MVPCA) [13], enhanced fast density-peak clustering (E-FDPC) [15], adaptive subspace partitioning strategy (ASPS) [41], scalable one-pass self-representation learning (SOPSRL) [42], graph regularized spatial–spectral clustering (GRSC) [16], BSNet-Conv [33], DARecNet-BS [35] and spatial and spectral structure preserved self-representation (

S^{4} P

) [43], respectively. It is crucial to emphasize that LLE and Isomap require significant computational resources for processing large-scale HSI. Therefore, sampled versions are chosen to ensure their successful operation on the four datasets. Further, in order to facilitate comparisons between the two feature extraction methods (LLE and Isomap) and the BS methods, the number of dimension after feature extraction is set equal to the number of selected bands. The four hyperspectral datasets are as follows, as shown in Figure 4:

Indian Pines (IP220): IP220 is captured by the AVIRIS sensor in 1992 in an Indian pine forest landscape which located at the northwest of Indiana. It contains 220 bands, with a resolution of 145 × 145 pixels and 16 classes of ground objects labeled.
Washington DC Mall (DC191): It is an airborne HSI acquired by the HYDICE sensor, which contains 191 bands, with a resolution of 280 × 307 and 6 classes.
Pavia University (PU103): PU103 is taken in 2002 by the ROSIS sensor in the campus of Pavia University in Italy. It size is 610 × 340 × 103, and has 9 classes.
QUH-Qingyun (QY176) [44]: The image was captured on 18 May 2021 in Qingdao, China, utilising a Gaiasky mini2-VN imaging spectrometer mounted on a UAV platform. It comprises 176 spectral bands. After cropping, it is 600 × 200 in size and contains 5 classes of ground labels.

The experiment uses Support Vector Machine (SVM) as the classifier, with 10%, 1%, 5%, and 10% samples selected from IP220, DC191, PU103 and QY176 for training. The classification results include producer’s accuracy (PA), average producer’s accuracy (APA), average user’s accuracy (AUA), overall accuracy (OA) and kappa coefficient (kappa) are used to assess the effectiveness of each method. To reduce the uncertainty caused by random sample selection, the OA of each band subset is the average from five independent tests. The experiment divides the HSI into multiple non-overlapping images

X \in ℝ^{7 \times 7 \times L}

as input for SSANet-BS, and takes the SGD as optimizer. SSANet-BS is implemented using the PyTorch framework based on CUDA 10.7. All experiments are run on Intel Xeon E5-2699 v4 CPU and Nvidia Tesla P40 GPU.

3.2. Parameter Setting

The hyperparameter of SSANet-BS,

λ

, is the coefficient to control the regularization. Its range is set to {0.0001, 0.001, 0.01, 0.1}. The optimal

λ

is determined based on the average OA (AOA) under the number of bands

n_{BS}

varies from 5 to 30 with a step of 5. Table 2 shows the AOA values of SSANet-BS under different

λ

. It can be observed that the optimal values on the IP220, DC191, PU103 and QY176 datasets are 0.01, 0.0001, 0.001 and 0.0001, respectively.

3.3. Result Analysis

To validate the effectiveness of the proposed method, Figure 5 shows the OA values of five runs for each BS method at different

n_{BS}

. For the IP220 dataset, SSANet-BS achieves the best results under most bands. It is closely followed by DARecNet-BS, GRSC, and ASPS, with E-FDPC, LLE and Isomap performing poorly. The advantage of SSANet-BS becomes more pronounced when fewer bands are selected. In terms of stability, the OA values of SSANet-BS, GRSC, and DARecNet-BS vary slightly under different

n_{BS}

, while the OA values of ASPS drops when the

n_{BS}

is 20, which is not stable. Meanwhile, Figure 5b reveals that SSANet-BS has a more significant advantage under most

n_{BS}

on the DC191 dataset. As the

n_{BS}

increases to 20, the gap between SSANet-BS and other comparison methods gradually narrows, still leaving SSANet-BS as an outstanding performer. For the PU103 dataset, although it is less effective than

S^{4} P

when the

n_{BS}

under 15, SSANet-BS still performs well. Nevertheless, it outperforms other methods in all other

n_{BS}

. As with other datasets, SSANet-BS demonstrates an advantage over the other methods with fewer bands, such as 5 and 10, in the QY176 dataset. As the number of bands increase, the performance of SSANet-BS gradually approaches that of the other methods, with the exception of DARecNet-BS and MVPCA.

As shown in Figure 5, methods such as SSANet-BS, DARecNet-BS and

S^{4} P

outperforms full bands across the majority of bands on the IP220 dataset. This indicates that those BS methods effectively reduced the data redundancy and further obtain good performance. On the DC191, PU103, and QY176 datasets, full bands surpasses all BS methods. However, as the number of bands increases, this gap gradually narrows. It is important to emphasize that the objective of BS is to improve data transmission and processing speed, conserve computational resources, and enhance model usability while maintaining task accuracy as much as possible. For instance, on the DC191 dataset with 191 bands, when the number of bands is 15, SSANet-BS achieves a reduction of approximately 92% in data volume with an 1.32% loss in OA. Moreover, in this experiment, the running time for SVM with 15 bands and full bands is 0.43s and 2.94s, respectively, which is of considerable importance in practical applications with large-scale datasets. Therefore, the BS methods incur a acceptable loss of accuracy to significantly reduce the data volume of HSI, thereby increasing processing efficiency.

Further, Figure 6, Figure 7, Figure 8 and Figure 9 illustrate the classification maps of each method on four datasets at

n_{BS} = 15

. It can be observed that there are discrepancies between the false color image (a) and the ground truth (b) in Figure 6, Figure 7, Figure 8 and Figure 9. These differences are more pronounced in the areas highlighted by the yellow box in Figure 9. One of the reasons for these discrepancies is the interference from shadows, reflections, and other disturbances. Therefore, these factors are more conducive to validating and distinguishing the effectiveness of different band selection methods. The classification maps demonstrate that the selected bands of SSANet-BS are more closely aligned with the ground truth than those of other methods. The prediction accuracy of SSANet-BS is higher in adjacent regions belonging to the same class. This phenomenon is more pronounced in the yellow box labelled region of Figure 6, Figure 7, Figure 8 and Figure 9. For instance, on the IP220 dataset, the bands selected by SSANet-BS exhibit a lower misclassification rate in the yellow box labelled region, in contrast to MVPCA, E-FDPC and other methods, which exhibit higher rates. Similarly, on the QY176 dataset, SSANet- BS is the most closely aligned with the ground truth in the yellow box, whereas methods such as DARecNet-BS and MVPCA are less effective. This indicates that the joint spectral–spatial information of HSI has been fully utilized.

Furthermore, Table 3, Table 4, Table 5 and Table 6 also present the producer’s accuracy (PA), average producer’s accuracy (APA), average user’s accuracy (AUA), overall accuracy (OA) and kappa coefficient (kappa) for each method at

n_{BS} = 15

on the IP220, DC191, PU103, and QY176 datasets, respectively. For the IP220 dataset, SSANet-BS achieves the optimal APA, AUA, OA and kappa, and PA in 11 classes. In those classes where SSANet-BS did not achieve the optimal outcome, the PA value between the SSANet-BS and the optimal method is less than 3% except class 7 and 16. The performance of SSANet-BS on DC191 and QY176 are comparable to that of IP220. The APA, AUA, OA and kappa of SSANet-BS all represent the optimal values. This indicates that the selected subset of bands for SSANet-BS is of high quality and that the classification performance is stable. When considered collectively, SSANet-BS achieves the optimal values of APA, AUA, OA, and kappa on the remaining datasets, with the exception on PU103, which is outperformed by

S^{4} P

. This indicates that SSANet-BS is a stable method and that the selected bands can effectively represent the original HSI.

To further ensure the stability of SSANet-BS, Figure 10 presents the AOA values of each BS method across six band subset subgroups ranging from 5 to 30 with a step size of 5. The AOA values of the optimal and suboptimal methods are bolded in red and black. Upon examination of Figure 10, it is observed that there exists significant discrepancy in the performance of the various methods on the IP220 and PU103 dataset, whereas a relatively minor difference is noted on the DC191 and QY176 dataset. Moreover, most methods demonstrate superior performance on DC191 and QY176. This phenomenon can be attributed to a variety of factors, including sensor characteristics, the attributes of the ground objects within the scenes, atmospheric conditions, and the impact of lighting, among others. Consequently, the IP220 and PU103 present greater challenges for different BS methods. Figure 10 shows that the AOA values of SSANet-BS exceed those of all other comparison methods on the IP220, DC191 and QY176 datasets, leading the suboptimal methods by 3.08%, 2.05% and 0.44%, respectively. On the PU103 dataset, SSANet-BS is suboptimal, with a difference of only 1.42% from

S^{4} P

but a 2.05% improvement over the third-best method BSNet-Conv. These outcomes indicate that SSANet-BS produces good and stable performance on various datasets by modeling complex spectral–spatial cross-dimensional interactions in the reconstruction process.

4. Discussion

This section discusses the quality of the selected band subset and the runtime of each method, verifies the effectiveness of two SSAM modules through ablation experiments, and concludes with the advantages and limitations of SSANet-BS.

4.1. Band Quanlity

Hyperspectral band selection methods aim to select a subset of bands that are both informative and low-redundancy, while also providing a comprehensive representation of the original HSI. Consequently, the quantity of information and the degree of redundancy are pivotal metrics for evaluating the quality of the band subset selected by the BS method under examination. On the one hand, bands with greater information content exhibit higher Shannon entropy values. On the other hand, the content of adjacent bands in HSI is similar and tends to be redundant [44], which means that the distribution of bands can reflect the redundancy of the band subsets.

It can be observed in Figure 11 that bands with high entropy exhibit greater clarity in the features of ground objects. Conversely, bands with low entropy, such as Figure 11c, are noisy bands, which can have a detrimental impact on subsequent classification tasks. In order to assess the quality of the selected band for each method, Figure 12 further plots the distribution of the selected bands (top for each subplot), and the entropy values for all bands (bottom for each subplot) for the IP220 dataset. All subplots of Figure 12 indicates that the distribution of selected bands for MVPCA is concentrated in comparison to other methods. Although the selected bands of MVPCA are concentrated in the region of higher entropy, the classification performance is unsatisfactory. In contrast, methods such as EFDPC, ASPS, SOPSRL, BSNet-Conv and

S^{4} P

, select bands that exhibit greater dispersion but inevitably fall within the low entropy range. The sparse constraints imposed on SSANet-BS result in a uniform distribution of bands across the four datasets. The selected bands are spaced further apart with lower redundancy and superior quality. This demonstrates the effectiveness of SSANet-BS.

4.2. Computation Time

This section mainly focuses on the computation time of SSANet-BS. Deep learning-based methods can be accelerated by GPU, so SSANet-BS, DARecNet-BS, and BSNet-Conv run on GPU, while the others run on CPU. Table 7 shows the computation time of different methods for selecting 30 bands on the IP220 dataset. Compared to other methods, deep learning-based methods take more time. Among the three deep learning methods, DARecNet-BS requires a significantly longer processing time than BSNet-Conv and SSANet-BS. The reason is that the band attention weights of DARecNet-BS can not represent the band saliency in its entirety. Consequently, DARecNet-BS is only able to select bands by calculating the entropy of the reconstructed image, which introduces additional computational cost. This disadvantage becomes more pronounced as the image size increases. Conversely, the three-stage structure of SSANet-BS enables the selection of bands from the converged band attention weights directly as in BSNet-Conv, thereby reducing the computational costs. This represents a distinct advantage of the three-stage structure of SSANet-BS.

4.3. Ablation Study for SSAMs

In this section, three variants of the SSANet-BS model are constructed to verify the effectiveness of SSAM. This is achieved by removing the

{SSAM}_{b - w}

in the band-width (b-w) direction,

{SSAM}_{b - h}

in band-height (b-h) direction and both of them in the SSANet-BS, respectively. The three aforementioned variants, designated as

{no - SSAM}_{b - w}

,

{no - SSAM}_{b - h}

, and no-SSAM, are subjected to testing on the IP220 dataset. The OA and A OA values of SSANet-BS, along with its three variants under

n_{BS}

ranging from 5 to 30 are recorded, as shown in Figure 13.

As can be seen in Figure 13a, the SSANet-BS exceeds the three variants mentioned above at all

n_{BS}

. Meanwhile, both variants lacking SSAM module in one direction, namely

{no - SSAM}_{b - w}

and

{no - SSAM}_{b - h}

, are superior to the variant without any SSAM module. In addition, the AOA values shown in Figure 13b indicate that in comparison to the complete SSANet-BS, the variants lacking either any modules

{SSAM}_{b - w}

or

{SSAM}_{b - h}

,

{SSAM}_{b - h}

, or all modules exhibited a reduction in AOA values of 2.89%, 4.94% and 14.59%, respectively. Therefore, both

{SSAM}_{b - w}

and

{SSAM}_{b - h}

developed in this paper can effectively improve the model’s performance. The ablation study indicates that SSANet-BS has successfully utilized SSAM to capture the spectral–spatial information of HSI during the band selection process. The three-stage structure of SSANet-BS, comprising BAM, SSAMs and reconstruction network, has been demonstrated to be effective.

4.4. Effectiveness of the Three-Stage Structure

In order to validate the necessity and effectiveness of the three-stage structure, a variant of the SSANet-BS with two-stage has been constructed. This variant is named SSANet-BS-2S. In SSANet-BS-2S, the BAM is situated in the same stage as the two SSAMs, which are in a parallel relationship. This is in contrast to the progressive relationship in the three-stage version of SSANet-BS.

Under the optimal parameters, Figure 14 shows that the performance of SSANet-BS-2S is markedly inferior to that of SSANet-BS, exhibiting an approximate 21% deficit in AOA. The discrepancy can be attributed to the fact that, in the variant SSANet-BS-2S which is a two-stage structure, the BAM and the two SSAMs operate in a cooperative manner, jointly modelling the HSI. Information pertaining to the significance of bands is distributed throughout the hidden features of

w_{b}

,

W_{b - w}

, and

W_{b - h}

. Consequently,

w_{b}

cannot independently and comprehensively represent the significance of bands. In contrast, within the SSANet-BS which is a three-stage structure, the BAM and SSAMs are in a progressive order. The spectral–spatial information within HSI learned by the SSAMs is used to guide the adjustment of w_b in the BAM via backpropagation, enabling

w_{b}

to automatically converge to the bands with high significance during the training process.

To further validate the effectiveness of the three-stage structure of SSANet-BS, we developed a variant based on SSANet-BS-2S, termed SSANet-BS-2SE. In order to address the aforementioned issues that have arisen from the introduction of spatial or spectral–spatial modules, DARecNet-BS selects bands with higher entropy value during the reconstructed process. Variant SSANet-BS-2SE implements band selection in an analogous manner. Figure 14 illustrates that SSANet-BS-2SE demonstrates a notable enhancement in performance relative to SSANet-BS-2S. Nevertheless, it still exhibits a performance deficit when compared to SSANet-BS. This suggests that, in comparison to criteria (entropy) that have been manually designed, the automatic discovery of salient bands using attention mechanisms can effectively enhance model performance. It is evident that the two-stage structure of SSANet-BS-2SE is unable to fully capitalize on the advantages of the attention mechanism.

In conclusion, the three-stage structure of SSANet-BS guarantees that the band attention weights can independently and comprehensively represent the significance of bands. This allows the attention mechanism to automatically evaluate and select salient bands and achieve superior results. Therefore, the three-stage structure is both an effective and necessary.

4.5. Comments on Existing BS Methods and SSANet-BS

The attention mechanism can be used to learn the complex spectral–spatial interactions within HSI and enable the automated identification of significant bands. Current research on deep learning-based BS methods predominantly focuses on how to more effectively utilize attention mechanisms to enhance model performance. BS-Net [33] is the first BS method to automatically select bands using an attention mechanism. Then, NBAN [20] employs a non-local attention mechanism to capture long-range contextual information in the spectral dimension. Next, DARecNet-BS [35] introduces an independent spatial attention module and TAttMSRecNet [36] further exploits spectral–spatial information to improve model performance. By contrast, the proposed method SSANet-BS makes the SSAMs compatible with the BAM through the ingenious design of the three-stage structure, and achieves automatic band selection using the attention mechanism based on the full use of spectral–spatial information. The experimental results demonstrate that SSANet-BS is an effective and stable method.

BSNet-Conv is characterized by a straightforward structure that facilitates expeditious processing in real-world scenarios. Nonetheless, Its performance is generally mediocre. DARecNet-BS incorporates an independent spatial attention module, which offers new insights for HSI BS domain. However, the band selection process of DARecNet-BS relies on entropy, which results in a slower processing speed. SSANet-BS achieves promising results by learning the spectral–spatial information of HSIs. But according to statistics from the PyTorch framework, for the IP220 dataset (comprising 220 bands), the parameter count of SSANet-BS is about 43% higher than that of BSNet-Conv. The augmented number of parameters results in a greater requirement for GPU memory. This is less conducive to computing platforms with lower specifications, which may limit its applicability in certain contexts. However, in the context of today’s highly developed GPU hardware, the parameter volume of SSANet-BS does not present a significant bottleneck in application. With the rapid advancement of computer technology, this disadvantage is becoming mitigated.

Further, deep learning-based methods, including SSANet-BS, have the following potential issues. In contrast to domains such as CV and NLP, where models are employed in the manner of inference [45,46], the band selection process of existing attention-based BS methods is conducted on the training process. The existing BS model can only learn information about the target HSI, which greatly limits the potential capability of the neural network. In addition, due to the training process, the deep learning-based BS method takes tens or even hundreds of times longer than machine learning-based methods. Therefore, it is interesting to see how to make the model train on multiple HSIs, and implement BS on target HSI in the manner of inference. The inference process of neural network is much faster than the training process, and if the training can be done on multiple HSIs, it may be possible to obtain a BS method with higher performance and comparable time to machine learning-based methods. In the future, we will fully study those above issues and improve SSANet-BS.

5. Conclusions

This paper presents SSANet-BS, a network designed for BS. SSANet-BS is a three-stage BS method that solves the problem that existing two-stage BS methods cannot automatically search for salient bands using the attention mechanism while learning spatial information. It considers BS as a weighted reconstruction task of HSI, and leverages BAM and SSAMs to model the complex spectral–spatial cross-dimensional nonlinear interactions in HSI during the reconstruction process. Further, a multi-scale reconstruction network, featuring convolution kernels of various scales, is used to reconstruct HSI to optimize model. Experimental results on four publicly available datasets demonstrate that SSANet-BS outperforms existing BS methods and exhibits satisfactory stability. In the future, SSANet-BS is expected to be deployed and utilized for tasks including HSI classification, segmentation, and target detection, providing strong support for the HSI processing field.

Author Contributions

Conceptualization, X.S. (Xiaodi Shang); methodology, C.C. and X.S. (Xiaodi Shang); software, C.C. and X.S. (Xiaodi Shang); validation, C.C.; formal analysis, C.C.; investigation, C.C.; resources, C.C.; data curation, B.F.; writing—original draft preparation, C.C.; writing—review and editing, X.S. (Xiaodi Shang) and X.S. (Xudong Sun); visualization, C.C. and X.S. (Xudong Sun); supervision, X.S. (Xiaodi Shang) and X.S. (Xudong Sun); project administration X.S. (Xiaodi Shang); funding acquisition, X.S. (Xiaodi Shang). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Qingdao Natural Science Foundation Grant 23-2-1-64-zyyd-jch, in part by China Postdoctoral Science Foundation Grant 2023M731843, in part by the Postdoctoral Applied Research Foundation of Qingdao under Grant QDBSH20230101012, in part by the National Natural Science Foundation of China under Grant 42301380, and in part by the Science and Technology Support Plan for Youth Innovation of Colleges and Universities of Shandong Province of China under Grant 2023KJ232. (Corresponding author: Xiaodi Shang).

Data Availability Statement

The dataset utilized in this text can be accessed at the following links. IP220 (Indian Pines) and PU103 (Pavia University): https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes (accessed on 24 May 2024). DC191 (Washington DC Mall): https://www.researchgate.net/figure/Washington-DC-Mall-HSI-a-Red-band-63-Green-band-50-and-Blue-band-27-sample-image_fig5_342993074 (accessed on 24 May 2024). QY176 (QUH-Qingyun): https://github.com/RsAI-lab/QUH-classification-dataset (accessed on 24 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sun, W.; Du, Q. Hyperspectral Band Selection: A Review. IEEE Geosci. Remote Sens. Mag. 2019, 7, 118–139. [Google Scholar] [CrossRef]
Sun, X.; Lin, P.; Shang, X.; Pang, H.; Fu, X. MOBS-TD: Multiobjective Band Selection with Ideal Solution Optimization Strategy for Hyperspectral Target Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10032–10050. [Google Scholar] [CrossRef]
Li, Q.; Wang, Q.; Li, X. An Efficient Clustering Method for Hyperspectral Optimal Band Selection via Shared Nearest Neighbor. Remote Sens. 2019, 11, 350. [Google Scholar] [CrossRef]
Vaddi, R.; Manoharan, P. CNN Based Hyperspectral Image Classification Using Unsupervised Band Selection and Structure-Preserving Spatial Features. Infrared Phys. Technol. 2020, 110, 103457. [Google Scholar] [CrossRef]
Deep, K.; Thakur, M. Hyperspectral Band Selection Using a Decomposition Based Multiobjective Wrapper Approach. Infrared Phys. Technol. 2024, 136, 105053. [Google Scholar] [CrossRef]
Fu, B.; Sun, X.; Cui, C.; Zhang, J.; Shang, X. Structure-Preserved and Weakly Redundant Band Selection for Hyperspectral Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 1–15, Early access. [Google Scholar] [CrossRef]
Li, S.; Wang, Z.; Fang, L.; Li, Q. An Efficient Subspace Partition Method Using Curve Fitting for Hyperspectral Band Selection. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Gao, H.; Zhang, Y.; Chen, Z.; Xu, S.; Hong, D.; Zhang, B. A Multidepth and Multibranch Network for Hyperspectral Target Detection Based on Band Selection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–18. [Google Scholar] [CrossRef]
Song, M.; Liu, S.; Xu, D.; Yu, H. Multiobjective Optimization-Based Hyperspectral Band Selection for Target Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–22. [Google Scholar] [CrossRef]
Ou, X.; Wu, M.; Tu, B.; Zhang, G.; Li, W. Multi-Objective Unsupervised Band Selection Method for Hyperspectral Images Classification. IEEE Trans. Image Process. 2023, 32, 1952–1965. [Google Scholar] [CrossRef]
Fu, H.; Zhang, A.; Sun, G.; Ren, R.; Jia, X.; Pan, Z.; Ma, H. A Novel Band Selection and Spatial Noise Reduction Method for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
Ji, L.; Zhu, L.; Wang, L.; Xi, Y.; Yu, K.; Geng, X. FastVGBS: A Fast Version of the Volume-Gradient-Based Band Selection Method for Hyperspectral Imagery. IEEE Geosci. Remote Sens. Lett. 2021, 18, 514–517. [Google Scholar] [CrossRef]
Chang, C.; Du, Q.; Sun, T.; Althouse, M. A joint band prioritization and band-decorrelation approach to band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 1999, 37, 2631–2641. [Google Scholar] [CrossRef]
Gao, P.; Wang, J.; Zhang, H.; Li, Z. Boltzmann Entropy-Based Unsupervised Band Selection for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 462–466. [Google Scholar] [CrossRef]
Jia, S.; Tang, G.; Zhu, J.; Li, Q. A Novel Ranking-Based Clustering Approach for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2016, 54, 88–102. [Google Scholar] [CrossRef]
Wang, J.; Tang, C.; Zheng, X.; Liu, X.; Zhang, W.; Zhu, E. Graph Regularized Spatial-Spectral Subspace Clustering for Hyperspectral Band Selection. Neural Netw. 2022, 153, 292–302. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Qi, H. Sparse Representation Based Band Selection for Hyperspectral Images. In Proceedings of the 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 2693–2696. [Google Scholar]
Shang, X.; Cui, C.; Sun, X. Spectral-Spatial Hypergraph-Regularized Self-Representation for Hyperspectral Band Selection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5504405. [Google Scholar] [CrossRef]
Liu, Y.; Li, X.; Xu, Z.; Hua, Z. BSFormer: Transformer-Based Reconstruction Network for Hyperspectral Band Selection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5507305. [Google Scholar] [CrossRef]
Li, T.; Cai, Y.; Cai, Z.; Liu, X.; Hu, Q. Nonlocal Band Attention Network for Hyperspectral Image Band Selection. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 3462–3474. [Google Scholar] [CrossRef]
Wang, M.; Liu, W.; Chen, M.; Huang, X.; Han, W. A Band Selection Approach Based on a Modified Gray Wolf Optimizer and Weight Updating of Bands for Hyperspectral Image. Appl. Soft Comput. 2021, 112, 107805. [Google Scholar] [CrossRef]
Yao, Q.; Zhou, Y.; Tang, C.; Xiang, W.; Zheng, G. End-to-End Hyperspectral Image Change Detection Based on Band Selection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
Feng, J.; Bai, G.; Li, D.; Zhang, X.; Shang, R.; Jiao, L. MR-Selection: A Meta-Reinforcement Learning Approach for Zero-Shot Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–20. [Google Scholar] [CrossRef]
Sun, W.; He, K.; Yang, G.; Peng, J.; Ren, K.; Li, J. A Cross-Scene Self-Representative Network for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5509212. [Google Scholar] [CrossRef]
Amoako, P.Y.O.; Cao, G.; Yang, D.; Amoah, L.; Wang, Y.; Yu, Q. A Metareinforcement-Learning-Based Hyperspectral Image Classification with a Small Sample Set. IEEE J-STARS. 2024, 17, 3091–3107. [Google Scholar] [CrossRef]
Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanusso, J. SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Zhang, H.; Sun, X.; Zhu, Y.; Xu, F.; Fu, X. A Global-Local Spectral Weight Network Based on Attention for Hyperspectral Band Selection. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Yang, J.; Zhou, J.; Wang, J.; Tian, H.; Liew, A. LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5515815. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.Z.; Cao, Y.Z.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 9992–10002. [Google Scholar]
Gao, L.; Chen, L.; Liu, P.; Jiang, Y.; Xie, W.; Li, Y. A Transformer-Based Network for Hyperspectral Object Tracking. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–11. [Google Scholar] [CrossRef]
Zhang, H.; Gao, H.; Sun, H.; Sun, X.; Zhang, B. A Spatial-Spectrum Fully Attention Network for Band Selection of Hyperspectral Images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Li, S.; Wang, M.; Cheng, C.; Gao, X.; Ye, Z.; Liu, W. Spectral-Spatial-Sensorial Attention Network with Controllable Factors for Hyperspectral Image Classification. Remote Sens. 2024, 16, 1253. [Google Scholar] [CrossRef]
Cai, Y.; Liu, X.; Cai, Z. BS-nets: An End-to-End Framework for Band Selection of Hyperspectral Image. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1969–1984. [Google Scholar] [CrossRef]
Dou, Z.; Gao, K.; Zhang, X.; Wang, H.; Han, L. Band Selection of Hyperspectral Images Using Attention-Based Autoencoders. IEEE Geosci. Remote Sens. Lett. 2021, 18, 147–151. [Google Scholar] [CrossRef]
Roy, S.K.; Das, S.; Song, T.; Chanda, B. DARecNet-BS: Unsupervised Dual-Attention Reconstruction Network for Hyperspectral Band Selection. IEEE Geosci. Remote Sens. Lett. 2021, 18, 2152–2156. [Google Scholar] [CrossRef]
Nandi, U.; Roy, S.; Hong, D.; Wu, X.; Chanussot, J. TAttMSRecNet:Triplet-Attention and Multiscale Reconstruction Network for Band Selection in Hyperspectral Images. Expert Syst. Appl. 2023, 212, 118797. [Google Scholar] [CrossRef]
He, K.; Sun, W.; Yang, G.; Meng, X.; Ren, K.; Peng, J.; Du, Q. A Dual Global–Local Attention Network for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
Li, J.; Fang, F.; Mei, K.; Zhang, G. Multi-scale Residual Network for Image Super-Resolution. In Proceedings of the Computer Vision ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 527–542. [Google Scholar]
Roweis, S.; Saul, L. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed]
Tenenbaum, J.B.; Silva, V.D.; Langford, J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef] [PubMed]
Wang, Q.; Li, Q.; Li, X. Hyperspectral Band Selection via Adaptive Subspace Partition Strategy. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 4940–4950. [Google Scholar] [CrossRef]
Wei, X.; Zhu, W.; Liao, B.; Cai, L. Scalable One-Pass Self-Representation Learning for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4360–4374. [Google Scholar] [CrossRef]
Tang, C.; Wang, J.; Zheng, X.; Liu, X.; Xie, W.; Li, X.; Zhu, X. Spatial and Spectral Structure Preserved Self-Representation for Unsupervised Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Fu, H.; Sun, G.; Zhang, L.; Zhang, A.; Ren, J.; Jia, X.; Li, F. Three-Dimensional Singular Spectrum Analysis for Precise Land Cover Classification From UAV-borne Hyperspectral Benchmark Datasets. ISPRS J. Photogramm. Remote Sens. 2023, 203, 115–134. [Google Scholar] [CrossRef]
Wu, Y.; Liu, J.; Gong, M.; Gong, P.; Fan, X.; Qin, A.K.; Miao, Q.; Ma, W. Self-Supervised Intra-Modal and Cross-Modal Contrastive Learning for Point Cloud Understanding. IEEE Trans. Multimed. 2024, 26, 1626–1638. [Google Scholar] [CrossRef]
Zhang, J.; Wang, Q.; Wang, Q.; Zheng, Z. Multimodal Fusion Framework Based on Statistical Attention and Contrastive Attention for Sign Language Recognition. IEEE Trans. Mobile Ccomput. 2024, 23, 1431–1443. [Google Scholar] [CrossRef]

Figure 1. Schematic diagrams of various model structures: (a) two-stage model; (b) three-stage model.

Figure 2. Overall structure of SSANet-BS.

Figure 3. Schematic diagram of the neural network in the BAM.

Figure 4. The dataset used in the experiment. The land cover types and the number of samples for each dataset are indicated, respectively. (a) IP220. (b) DC191. (c) PU103. (d) QY176.

Figure 5. The OA values of SSANet-BS and comparison methods on four HSI datasets. (a) IP220. (b) DC191. (c) PU103. (d) QY176.

Figure 6. Classification maps with 15 bands on the IP220 dataset. (a) False-color image. (b) Ground truth. (c) Full bands. (d) LLE. (e) Isomap. (f) MVPCA. (g) E-FDPC. (h) ASPS. (i) SOPSRL. (j) GRSC. (k) BSNet-Conv. (l) DARecNet-BS. (m)

S^{4} P

. (n) SSANet-BS.

Figure 6. Classification maps with 15 bands on the IP220 dataset. (a) False-color image. (b) Ground truth. (c) Full bands. (d) LLE. (e) Isomap. (f) MVPCA. (g) E-FDPC. (h) ASPS. (i) SOPSRL. (j) GRSC. (k) BSNet-Conv. (l) DARecNet-BS. (m)

S^{4} P

. (n) SSANet-BS.

Figure 7. Classification maps with 15 bands on the DC191 dataset. (a) False-color image. (b) Ground truth. (c) Full Bands. (d) LLE. (e) Isomap. (f) MVPCA. (g) E-FDPC. (h) ASPS. (i) SOPSRL. (j) GRSC. (k) BSNet-Conv. (l) DARecNet-BS. (m)

S^{4} P

. (n) SSANet-BS.

Figure 7. Classification maps with 15 bands on the DC191 dataset. (a) False-color image. (b) Ground truth. (c) Full Bands. (d) LLE. (e) Isomap. (f) MVPCA. (g) E-FDPC. (h) ASPS. (i) SOPSRL. (j) GRSC. (k) BSNet-Conv. (l) DARecNet-BS. (m)

S^{4} P

. (n) SSANet-BS.

Figure 8. Classification maps with 15 bands on the PU103 dataset. (a) False-color image. (b) Ground truth. (c) Full Bands. (d) LLE. (e) Isomap. (f) MVPCA. (g) E-FDPC. (h) ASPS. (i) SOPSRL. (j) GRSC. (k) BSNet-Conv. (l) DARecNet-BS. (m)

S^{4} P

. (n) SSANet-BS.

Figure 8. Classification maps with 15 bands on the PU103 dataset. (a) False-color image. (b) Ground truth. (c) Full Bands. (d) LLE. (e) Isomap. (f) MVPCA. (g) E-FDPC. (h) ASPS. (i) SOPSRL. (j) GRSC. (k) BSNet-Conv. (l) DARecNet-BS. (m)

S^{4} P

. (n) SSANet-BS.

Figure 9. Classification maps with 15 bands on the QY176 dataset. (a) False-color image. (b) Ground truth. (c) Full Bands. (d) LLE. (e) Isomap. (f) MVPCA. (g) E-FDPC. (h) ASPS. (i) SOPSRL. (j) GRSC. (k) BSNet-Conv. (l) DARecNet-BS. (m)

S^{4} P

. (n) SSANet-BS.

Figure 9. Classification maps with 15 bands on the QY176 dataset. (a) False-color image. (b) Ground truth. (c) Full Bands. (d) LLE. (e) Isomap. (f) MVPCA. (g) E-FDPC. (h) ASPS. (i) SOPSRL. (j) GRSC. (k) BSNet-Conv. (l) DARecNet-BS. (m)

S^{4} P

. (n) SSANet-BS.

Figure 10. The AOA values of SSANet-BS and comparison methods on four HSI datasets. (a) IP220. (b) DC191 (c) PU103. (d) QY176. The optimal and suboptimal results are bolded in red and black.

Figure 11. Bands and entropy values of the IP220 dataset. (a) Band20 (7.16). (b) Band30 (7.25). (c) Band152 (4.83). (d) Band210 (6.61).

Figure 12. The distribution of the 20 bands selected by each BS method (top) and the entropy of each band (bottom) for different dataset. (a) IP220. (b) DC191. (c) PU103. (d) QY176.

Figure 13. The results of the ablation study for SSAMs on IP220 dataset. (a) OA values. (b) AOA values. The optimal and suboptimal results are bolded in red and black.

Figure 14. The OA and AOA values of SSANet-BS, SSANet-BS-2S and SSANet-BS-2SE on IP220 dataset. (a) OA values. (b) AOA values. The optimal and suboptimal results are bolded in red and black.

Table 1. Detailed structure of each module in SSANet-BS.

Module	Layer
$f_{b}$	Conv2D kernel(3,3)
	MaxPool2D kernel(4,4)
	FC1 in = L out = 32
	Sigmoid
	FC2 in = 32 out = L
	BatchNorm
	Sigmoid
$f_{b - w}$ $and f_{b - h}$	MaxPool3D kernel(L,3,3)	AvgPool3D kernel(L,3,3)
	Concanate
	Conv2D kernel(3,3)
	BatchNorm2D
	ReLU
$f_{rec}^{ms}$	Conv3D kernel(3,3,3)	Conv3D kernel(5,5,5)
	Concanate
	MaxPool3D kernel(3,3,3)
	Conv3D kernel(3,3,3)
	TranposedConv3D kernel(3,3,3)
	TranposedConv3D kernel(3,3,3)

Table 2. The AOA of SSANet-BS under different

λ

on four datasets. Optimal results are highlighted in bold.

Table 2. The AOA of SSANet-BS under different

λ

on four datasets. Optimal results are highlighted in bold.

$λ$	IP220	DC191	PU103	QY176
0.0001	74.23%	92.63%	76.09%	95.50%
0.001	73.50%	92.28%	76.84%	95.25%
0.01	75.28%	92.29%	75.90%	95.33%
0.1	74.03%	91.70%	76.05%	95.34%

Table 3. Classification results of SSANet-BS and comparative methods with 15 bands on the IP220 dataset. Values in the table are in per cent. Optimal results of BS methods are highlighted in bold.

Label	Full Bands	LLE	Isomap	MVPCA	E-FDPC	ASPS	SOPSRL	GRSC	BSNet-Conv	DarecNet-BS	$S^{4} P$	SSANet-BS
1	46.91	42.95	42.31	34.54	21.88	58.52	47.93	60.93	51.69	74.06	77.73	75.23
2	60.89	39.81	48.80	45.83	56.04	65.82	66.17	66.36	66.00	70.76	72.17	75.54
3	55.5	24.58	46.63	48.34	43.13	47.37	56.59	51.48	55.31	63.19	61.38	64.81
4	30.61	14.22	24.55	31.33	21.48	37.20	36.72	29.25	34.44	36.95	39.34	46.26
5	76.2	54.63	72.57	72.69	50.31	72.18	76.22	83.39	72.09	80.30	83.02	82.94
6	90.61	83.15	84.98	85.74	77.83	90.51	86.66	90.51	86.21	91.17	88.41	92.89
7	55.87	24.18	46.23	30.75	48.15	64.90	64.96	69.46	66.79	75.61	91.21	81.35
8	95.07	89.77	93.95	88.37	87.47	94.82	94.48	97.00	94.21	96.75	94.36	97.61
9	24.61	00.00	21.43	21.94	18.49	43.90	59.66	65.22	51.63	59.74	62.41	69.42
10	60.47	32.42	52.86	52.42	38.58	56.46	58.96	61.00	59.07	65.74	59.61	65.08
11	72.55	52.78	65.72	70.06	53.18	74.97	71.26	72.40	70.83	75.95	75.87	81.39
12	38.08	23.48	34.46	29.37	40.51	44.62	51.96	59.88	50.29	60.34	53.88	61.53
13	83.63	63.56	73.66	52.67	85.01	79.46	82.81	83.46	81.89	86.68	85.36	88.05
14	93.3	92.94	90.53	93.94	84.04	91.57	92.49	94.06	90.86	93.38	93.82	94.27
15	47.07	36.97	34.53	45.71	33.53	51.91	49.64	57.21	47.25	59.93	58.02	60.66
16	79.10	49.33	48.99	95.77	47.96	61.11	71.77	94.46	71.55	81.85	73.45	85.71
APA	63.15	45.29	55.13	56.21	50.47	64.70	66.76	71.00	65.63	73.27	73.12	76.42
AUA	74.03	49.00	62.11	61.06	52.48	71.29	71.56	76.73	70.60	78.94	78.15	81.11
OA	66.97	51.55	61.07	62.06	55.96	68.58	69.36	70.68	68.41	74.36	73.24	77.20
kappa	62.46	44.32	55.55	56.81	49.54	64.18	64.99	66.59	63.93	70.70	69.44	73.96

Table 4. Classification results of SSANet-BS and comparative methods with 15 bands on the DC191 dataset. Values in the table are in per cent. Optimal results of BS methods are highlighted in bold.

Label	Full Bands	LLE	Isomap	MVPCA	E-FDPC	ASPS	SOPSRL	GRSC	BSNet-Conv	DarecNet-BS	$S^{4} P$	SSANet-BS
1	92.93	91.19	88.94	88.03	90.48	94.02	93.57	91.21	93.63	91.77	93.29	94.26
2	87.3	73.24	75.84	76.93	75.36	77.22	79.33	79.86	79.89	79.10	77.88	80.06
3	97.1	91.25	87.12	90.05	79.70	95.55	95.56	94.64	95.40	89.39	93.48	96.59
4	97.63	96.23	96.69	96.91	95.72	97.52	97.58	97.53	97.53	97.47	97.55	97.56
5	98.41	99.41	98.07	98.36	96.35	98.30	98.29	98.36	98.30	98.19	98.52	98.24
6	97.36	97.38	95.80	98.20	95.23	98.44	98.42	98.00	98.07	97.98	98.41	97.83
APA	95.12	91.45	90.41	91.41	88.80	93.50	93.79	93.26	93.80	92.31	93.18	94.09
AUA	95.35	91.97	91.47	91.98	90.62	94.13	94.38	93.72	94.42	93.37	93.91	94.67
OA	94.33	89.66	89.12	89.96	87.76	92.15	92.64	92.04	92.71	91.28	91.96	93.02
kappa	93.01	87.31	86.64	87.64	85.02	90.36	90.94	90.20	91.04	89.29	90.12	91.42

Table 5. Classification results of SSANet-BS and comparative methods with 15 bands on the PU103 dataset. Values in the table are in per cent. Optimal results of BS methods are highlighted in bold.

Label	Full Bands	LLE	Isomap	MVPCA	E-FDPC	ASPS	SOPSRL	GRSC	BSNet-Conv	DarecNet-BS	$S^{4} P$	SSANet-BS
1	96.73	96.60	94.44	97.29	96.17	96.41	96.06	96.34	96.21	95.94	96.38	96.61
2	94.69	84.61	86.47	91.31	90.08	89.50	91.54	91.34	91.85	90.19	93.63	93.83
3	70.19	34.03	38.85	44.06	54.74	56.42	52.65	55.86	58.10	52.20	63.43	61.36
4	75.29	48.69	60.16	47.86	62.99	67.64	64.61	63.14	64.35	63.67	72.86	63.99
5	94.01	98.81	98.54	94.87	97.26	96.52	97.20	98.40	97.33	97.29	97.40	97.53
6	64.92	38.68	36.10	56.13	40.14	42.41	49.14	43.96	49.45	41.23	56.18	56.01
7	51.29	35.37	43.50	35.72	45.95	44.81	44.41	43.65	44.79	44.48	44.67	44.72
8	79.12	69.45	71.72	65.24	79.57	78.51	77.55	78.55	77.30	77.12	75.29	77.06
9	99.98	99.47	99.96	99.89	99.80	99.82	99.80	100.00	99.86	99.70	99.84	99.90
APA	80.69	67.30	69.97	70.26	74.07	74.67	74.77	74.58	75.47	73.53	77.74	76.77
AUA	88.09	74.65	77.06	79.15	82.61	82.64	83.01	82.99	83.82	81.86	85.42	85.23
OA	83.33	65.38	66.27	70.98	70.78	72.84	74.29	72.56	74.77	70.86	79.20	77.67
kappa	78.51	56.90	58.20	63.75	63.75	65.88	67.70	65.79	68.37	63.80	73.47	71.77

Table 6. Classification results of SSANet-BS and comparative methods with 15 bands on the QY176 dataset. Values in the table are in per cent. Optimal results of BS methods are highlighted in bold.

Label	Full Bands	LLE	Isomap	MVPCA	E-FDPC	ASPS	SOPSRL	GRSC	BSNet-Conv	DarecNet-BS	$S^{4} P$	SSANet-BS
1	95.24	81.37	86.17	63.22	92.70	93.12	92.21	92.42	92.43	89.61	93.13	92.91
2	97.66	88.96	92.10	60.92	96.03	95.45	95.13	95.52	95.27	92.20	95.34	95.87
3	99.4	99.11	98.66	98.76	99.57	99.60	99.56	99.44	99.57	98.20	99.70	99.47
4	98.74	96.41	97.15	93.61	97.33	96.89	97.61	97.69	97.79	96.60	96.85	97.55
5	96.44	83.17	82.80	68.12	93.76	93.53	93.10	93.78	93.38	87.73	93.14	94.23
APA	97.49	89.80	91.37	76.92	95.87	95.71	95.522	95.77	95.68	92.86	95.63	96.00
AUA	81.23	75.33	76.15	63.71	79.97	79.73	79.63	79.87	79.81	77.63	79.67	80.00
OA	97.44	89.06	90.52	75.31	95.60	95.36	95.22	95.53	95.41	92.33	95.23	95.77
kappa	96.69	85.89	87.75	68.06	94.31	94.00	93.82	94.22	94.07	90.09	93.83	94.53

Table 7. The runtime (s) of selecting 30 bands by different BS methods on the IP220 dataset.

LLE	Isomap	MVPCA	E-FDPC
7.53	13.76	2.29	0.14
ASPS	SOPSRL	GRSC	BSNet-Conv
0.51	0.37	1.64	116.55
DARecNet-BS	$S^{4} P$	SSANet-BS
1092.36	0.411	295.46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, C.; Sun, X.; Fu, B.; Shang, X. SSANet-BS: Spectral–Spatial Cross-Dimensional Attention Network for Hyperspectral Band Selection. Remote Sens. 2024, 16, 2848. https://doi.org/10.3390/rs16152848

AMA Style

Cui C, Sun X, Fu B, Shang X. SSANet-BS: Spectral–Spatial Cross-Dimensional Attention Network for Hyperspectral Band Selection. Remote Sensing. 2024; 16(15):2848. https://doi.org/10.3390/rs16152848

Chicago/Turabian Style

Cui, Chuanyu, Xudong Sun, Baijia Fu, and Xiaodi Shang. 2024. "SSANet-BS: Spectral–Spatial Cross-Dimensional Attention Network for Hyperspectral Band Selection" Remote Sensing 16, no. 15: 2848. https://doi.org/10.3390/rs16152848

APA Style

Cui, C., Sun, X., Fu, B., & Shang, X. (2024). SSANet-BS: Spectral–Spatial Cross-Dimensional Attention Network for Hyperspectral Band Selection. Remote Sensing, 16(15), 2848. https://doi.org/10.3390/rs16152848

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SSANet-BS: Spectral–Spatial Cross-Dimensional Attention Network for Hyperspectral Band Selection

Abstract

1. Introduction

2. The Proposed Method

2.1. Overview of SSANet-BS

2.2. The Band Attention Module

2.3. The Spectral–Spatial Attention Module

2.4. The Multi-Scale Reconstruction Network

3. Experiments

3.1. Experimental Setup

3.2. Parameter Setting

3.3. Result Analysis

4. Discussion

4.1. Band Quanlity

4.2. Computation Time

4.3. Ablation Study for SSAMs

4.4. Effectiveness of the Three-Stage Structure

4.5. Comments on Existing BS Methods and SSANet-BS

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI