An Unsupervised Band Selection Method via Contrastive Learning for Hyperspectral Images

: Band selection (BS) is an efﬁcacious approach to reduce hyperspectral information redundancy while preserving the physical meaning of hyperspectral images (HSIs). Recently, deep learning-based BS methods have received widespread interest due to their ability to model the nonlinear relationship between bands, with existing methods typically relying on generative algorithms. However, the process of generating images with pixel-level detail required by generative algorithm-based BS methods is computationally expensive. To alleviate this issue, we propose a contrastive learning-based unsupervised BS architecture, termed ContrastBS, in this article. With the help of contrastive learning, the proposed architecture avoids the costly generation step in pixel space by learning to distinguish data at the abstract semantic level of the feature space. Speciﬁcally, ContrastBS combines an attention mechanism with contrastive learning to extract the importance of each band. Furthermore, we design a novel loss function, which is able to constrain the symmetric loss while ensuring attention to the most valuable bands, for the contrastive learning-based BS network. Experimental results indicate that ContrastBS has excellent classiﬁcation performance and competitive time cost compared to the comparison methods.


Introduction
Hyperspectral images (HSIs) provide hundreds of contiguous bands at high resolution, which can provide a wealth of information on regions of interest.However, a large number of bands contained in an HSI also causes the Hughes phenomenon and information redundancy.Thus, dimensionality reduction emerges as a critical step in hyperspectral data processing.Band selection (BS) [1][2][3][4][5] is an efficacious technique for diminishing the dimensionality of HSIs.This kind of method achieves dimensionality reduction by selecting the most valuable subset of bands from an HSI [6][7][8][9].
Existing BS methods are generally classified into supervised and unsupervised methods.Compared with supervised methods that require prior knowledge, unsupervised methods that do not require prior knowledge are more popular in the field of hyperspectral processing, where labeled samples are difficult to obtain.
Unsupervised BS can be further classified into four classes: group-wise approaches, point-wise approaches, ranking-based approaches, and machine learning-based approaches.Group-wise approaches are generally based on evolutionary algorithms.Typical approaches are immune clone selection-based BS [10,11] and genetic algorithm-based BS [12].In point-wise approaches, the selected subset of bands is obtained by appending or eliminating bands one at a time.Examples of point-wise approaches are representativeness and redundancy-based BS (RRBS) [13] and orthogonal-projection-based BS (OPBS) [14].Ranking-based approaches, such as similarity-based ranking structural similarity (SR-SSIM) [8], linearly constraint minimum variance (LCMV) [15], exemplar component analysis (ECA) [16], and maximum-variance principal component analysis (MVPCA) [17] employ certain criteria to rank all bands and subsequently select the top-ranked bands with the desired number.Machine learning-based approaches achieve BS with the help of machine learning strategies, such as manifold learning [18] and clustering [19].
Recently, with the rapid development of unsupervised deep representation learning, the autoencoder-based BS methods [20][21][22] have received more and more attention in the hyperspectral field due to their capacity to model the nonlinear relationship between bands.This kind of method learns the salience of each band in an HSI by optimizing the loss at the pixel level in the reconstruction network.However, generating images with a high level of detail required by this kind of method is computationally expensive and may not be efficient for representation learning [23,24].
Generally, unsupervised representation learning algorithms in the computer vision field can be classified into two kinds: generative and discriminative [23,24].Generative algorithms are typified by autoencoders.Among discriminative algorithms, contrastive learning yields state-of-the-art performance [24,25].Compared with generative algorithms, contrastive learning algorithms avoid the costly process of reconstructing input samples in pixel space by learning to distinguish data at the abstract semantic level of the feature space.In addition, since contrastive learning avoids the need to attend to the tedious details of the instances, it is able to mine more general patterns of data distribution and has a stronger generalization ability than the generative algorithm.Hence, it is natural to think of taking advantage of contrastive learning to alleviate the shortcomings of existing representation learning-based BS methods.However, as far as we know, there has been no report on the application of contrastive learning to the hyperspectral BS field until now.
To address the problems of existing unsupervised representation learning-based BS methods by leveraging contrastive learning, we propose a contrastive learning-based BS architecture termed ContrastBS in this article.Specifically, to extract the importance of each band, we design a band attention-based encoder for the contrastive learning framework.By introducing the attention mechanism into contrastive learning, ContrastBS can achieve unsupervised BS for HSIs.On this basis, we design a loss function, which is able to constrain the symmetric loss while ensuring attention to the most valuable bands, for the contrastive learning-based BS network.The significant contributions of this article are highlighted as follows: (1) We introduce contrastive learning into the hyperspectral BS field to overcome the limitations of existing unsupervised representation learning-based BS methods.Compared with these existing methods, the proposed architecture, ContrastBS, eliminates the need for the computationally expensive pixel-level reconstruction step by taking advantage of contrastive learning, resulting in a more efficient BS process.
(2) We propose a band attention-based encoder for the contrastive learning framework to extract the importance of each band and thus design a novel band importance metric capable of considering the abstract semantic information of HSIs.
(3) We propose a loss function for the contrastive learning-based BS network.The proposed loss function is able to constrain the symmetric loss while ensuring attention to the most valuable bands, which is explored for the first time in the BS field.
The subsequent sections of this article are organized as follows.Section 2 introduces the background knowledge of the algorithms involved.Section 3 details the ContrastBS architecture.Section 4 provides the experimental results on three hyperspectral datasets.Then, Section 5 discusses these results.Section 6 concludes this article.

Contrastive Learning
Contrastive learning enables models to learn meaningful representations by emphasizing the differences between instances.The central idea of this algorithm involves the attraction of positive sample pairs and the exclusion of negative sample pairs [26].In recent years, this unsupervised representation learning algorithm, i.e., contrastive learning, has received increasing attention [25].However, as far as we know, there has been no docu-mented account of the utilization of contrastive learning within the field of hyperspectral BS until now.
In practice, contrastive learning algorithms often take advantage of extensive negative samples [23,27,28].Wu et al. [27] introduced a memory bank to preserve these samples.However, such a negative sample storage approach relies on large amounts of memory and computational resources.Chen et al. [23] directly utilized the negative samples coexisting in the current batch.Nevertheless, such an approach needs a large batch size to perform well.
In this context, He et al. [28] proposed to maintain a negative sample queue in a Siamese network.Furthermore, to improve the consistency of the queue, this method turns a branch of the Siamese network into a momentum encoder.Inspired by the momentum contrast (MoCo) algorithm [28] developed by He et al., Grill et al. [24] also adopted a momentum encoder in a branch of the Siamese network in the bootstrap your own latent (BYOL) method.However, unlike MoCo, BYOL utilizes one view to predict the output of another view, making the network independent of negative samples.Subsequently, Chen et al. [25] proposed a simple Siamese (SimSiam) network for contrastive learning.Unlike the previous contrastive learning algorithms, SimSiam can acquire meaningful representations even without utilizing large-batch training, negative samples, and momentum encoders [25].Therefore, we choose SimSiam as the contrastive learning infrastructure of our method.

Attention Mechanism
The attention mechanism, originally conceived for machine translation [29], has led to significant advances in areas such as natural language processing [30], speech [31], and computer vision [32][33][34][35].The rapid development of the attention mechanism stems from its capacity to enhance the interpretation of neural networks.
Mathematically, an attention module with x as input can be represented as follows: where a stands for the attention map, g(•) signifies the attention module, and Θ represents the learnable parameters.Generally, attention mechanisms can be categorized into spatial, channel, and joint attention mechanisms.Spatial attention directs the focus of the model to spatial regions of interest by learning the weights of different spatial locations.Channel attention focuses more on noteworthy channels by assigning different weights to channels.Joint attention synergistically combines spatial and channel attention mechanisms.
The fundamental concept underlying unsupervised BS is the identification of the most valuable spectral bands, a notion that can be distilled as discerning the most notable channels.Dou et al. [34] utilized an attention module to produce an attention mask and subsequently reconstructed the original HSI via a fully connected autoencoder.Cai et al. [20] combined a band attention module with a convolutional autoencoder for implementing an end-to-end unsupervised BS method.Inspired by these, we extract the importance of each band with the help of channel attention in the proposed contrastive learning-based BS architecture.

Proposed Method
In this section, we detail the proposed ContrastBS architecture.ContrastBS introduces a band attention-based encoder into the contrastive learning framework to achieve unsupervised BS for HSIs.Furthermore, to make the contrastive learning algorithm serve the HSI BS task more effectively, we improve the data augmentation strategy and the loss function of the original contrastive learning framework.

Band Attention-Based Contrastive Learning Network
We let X ∈ R W×H×B denote an HSI, where W and H represent the width and height of the HSI, respectively, and B represents the number of total bands.The original hyperspectral data X are divided into patches X p ∈ R a×a×B by a window of size a × a with step size s.
The overview of the constructed ContrastBS network is depicted in Figure 1.As illustrated in Figure 1, first, the proposed ContrastBS performs two random augmentation operations on an HSI patch X p separately to obtain two randomly augmented views, denoted as X p1 and X p2 .These two randomly augmented views from the same HSI patch are regarded as positive sample pairs.Notably, instead of directly adopting the typical data augmentation strategy commonly used in the computer vision field, we make reasonable modifications to the augmentation strategy based on the characteristics of HSIs.Specifically, the color distortion included in the typical augmentation strategy is acceptable for normal images while corrupting the spectral information when applied to HSIs.Since the spectral information is valuable for hyperspectral BS, the augmentation strategy of the proposed ContrastBS discards the color distortion.Consequently, the augmentation strategy of ContrastBS includes random cropping followed by resizing back to the original size, random Gaussian blur, and random horizontal flipping.Subsequently, to extract the importance of each band, we construct an attention encoder denoted as f .In the constructed attention encoder f , we use a band attention module to focus more on the valuable bands.Specifically, as displayed in Figure 1, the band attention module comprises a global max pooling layer, a one-dimensional (1D) convolutional layer, and a Sigmoid layer.The band attention module takes the augmented views as input to obtain the band weight vectors, which can be denoted as where F(•) signifies the band attention module.
In the next step, each augmented view is band-wise multiplied with the corresponding band weight vector, i.e., where ⊗ stands for band-wise multiplication.Furthermore, in attention encoder f , we utilize a two-dimensional (2D) convolutional module to extract spatial information of the hyperspectral patches.As depicted in Figure 1, the 2D convolutional module is composed of 2D convolutional, batch normalization (BN), and exponential linear unit (ELU) layers.Additionally, we use a global average pooling and a multi-layer perceptron (MLP) to build a projector.To be more specific, the MLP is composed of fully connected (FC), BN, and rectified linear unit (ReLU) layers.The attention encoder shares parameters between the two views.
In the next step, as drawn in Figure 1, we use a predictor h constructed by an MLP to transform the output of one view and match it to the other.

Loss Function of a Contrastive Learning-Based BS Network
Defining the output vectors of the two asymmetric pipelines as P 1 h( f (X p1 )) and Z 2 f (X p2 ), respectively, ContrastBS minimizes the negative cosine similarity of these two vectors as follows: where • 2 represents the 2 norm.Subsequently, we symmetrize the loss of Equation ( 4) by feeding X p1 into the bottom pipeline and X p2 into the upper pipeline.The bottom pipeline takes X p1 as input and outputs Z 1 f (X p1 ), and the upper pipeline takes X p2 as input and outputs P 2 h( f (X p2 )).On this basis, the symmetric loss of each patch in the ContrastBS network is defined as The minimum possible value of L sy is −1.
Furthermore, as shown in Figure 1, an essential element to make ContrastBS work is the stop-gradient operation, which can be achieved by modifying Equation (4) as follows: where stopgrad(Z 2 ) indicates that Z 2 is regarded as a constant.In a similar manner, the form of Equation ( 5) is modified to That is, the attention encoder on X p2 does not obtain the gradients from Z 2 in the first term, while it does in the second term from P 2 (and the same is true for X p1 ).It is worth noting that the total symmetric loss should be the average of the symmetric losses of all image patches.Meanwhile, to focus on the most valuable bands, ContrastBS also imposes a sparsity constraint on the band weight vectors as follows: where n is the number of samples, and • 1 represents the 1 norm.By combining Equations ( 7) and ( 8), we propose a loss function for the contrastive learning-based BS task, formulated as where η is the penalty parameter.Stochastic gradient descent (SGD) is utilized to optimize Equation (9).

Band Selection Based on Contrastive Learning
After training, ContrastBS determines the most valuable bands according to the average of the learned sparse band weight vectors in the two pipelines for all samples.For the tth band, the average weight can be calculated as A larger average weight of a band indicates that this band contributes more to the contrastive learning network to identify positive sample pairs and contains more valuable information.Therefore, ContrastBS selects the k bands with the largest average weights to form the most valuable band subset.Algorithm 1 offers the detailed procedures of Con-trastBS.

Algorithm 1 ContrastBS Algorithm
Input: Raw HSI X ∈ R W×H×B , ContrastBS hyper-parameters, and the number of selected bands k.
Step 1: Preprocess HSI and produce training samples X p .
Step 2: Train the contrastive learning network.while Model is convergent or maximum iteration is met do 1: Sample a batch of X p .2: Random data augmentation: X p1 , X p2 = aug(X p ), aug(X p ).
3: Process two augmented views with the attention encoder: Transform the output of one view with the predictor and match it to the other: P 1 , P 2 = h(Z 1 ), h(Z 2 ).5: Optimize Equation (9) using SGD.

end while
Step 3: Compute average band weights based on Equation (10).
Step 4: Select k bands with the largest weights.Output: k selected bands.
(1) BS-Net-Conv [20] is a generative algorithm-based method.This method leverages the autoencoder to mine band representativeness and selects the desired number of bands with high representation.Since the BS-Net-Conv method can take advantage of deep learning to model the nonlinear interdependences between bands, this method can achieve good classification performance.
(2) DARecNet-BS [22] is a deep learning-based BS method implemented with the help of the generative algorithm.DARecNet-BS uses a dual attention mechanism to recalibrate the feature maps and then uses the reconstruction network to restore the original HSIs, followed by selecting the bands with the highest entropy in the reconstructed output.
(3) MR [18] relies on advanced machine learning techniques, encompassing clustering, manifold learning, and clone selection, to perform its BS tasks.
(4) OPBS [14] selects, one by one, the bands that maximize the volume of the hypersphere formed by selected bands until the size of the selected band subset reaches the desired size.
(5) MVPCA [17] is frequently employed as a benchmark for the evaluation of BS methodologies due to its effectiveness and simplicity.MVPCA selects the bands with high discrimination ability, and the discrimination power of a band is gauged by the ratio of the variance of this band to the sum of variances across all bands.
(6) ECA [16] ranks the band priorities by assuming that exemplars are far from highdensity points and have the largest local density.
(7) LCMVBCM [15] and LCMVBCC [15] select the desired bands by ranking the representative ability of each band to the entire image cube.BCC and BCM are used as specific evaluation criteria for LCMV, and the corresponding BS techniques are termed LCMVBCC and LCMVBCM, respectively.
(8) SR-SSIM [8] is a state-of-the-art similarity ranking-based BS method that uses the structural similarity metric to gauge the similarity between bands.

Datasets
The three HSIs used in the experiments are Indian Pines (IP), Salinas (SA), and Pavia University (PU).
(1) The IP dataset comprises 16 distinct land cover categories and 220 bands, each containing 145 × 145 pixels.Our experiments remove the water vapor absorption and low signal-to-noise ratio bands, including 1-3, 103-112, 148-165, and 217-220.The remaining 185 bands are used.The image in grayscale of band 170 and ground truth on the IP dataset are given in Figure 2.
(2) The SA dataset consists of 224 bands, 16 different kinds of land cover, and 512 × 217 pixels.The image in grayscale of band 100 and ground truth on the SA dataset are given in Figure 3.
(3) The PU dataset includes nine classes of land cover and 610 × 340 pixels.Moreover, 103 bands are contained in the PU dataset.Figure 4 presents the image in grayscale of band 50 and ground truth on the PU dataset.Table 1 presents the details of three hyperspectral datasets.The support vector machine (SVM) classifier is utilized to evaluate the classification effectiveness of band subsets selected by different BS approaches.This classifier employs the Gaussian radial basis function [36] as its kernel function.The parameters of SVM are determined through cross-validation and grid search.Additionally, the one-against-all scheme is utilized for multi-class classification.
Furthermore, we make use of three evaluation metrics, i.e., overall accuracy (OA), average accuracy (AA), and kappa coefficient (Kappa), to quantitatively evaluate classification performance [20].The error matrix of the classification results is denoted by K ∈ R m×m , where the value of position (i, j) in K denotes the count of ith class samples classified as the jth class, and m represents the number of classes of land cover.Mathematically, AA, OA, and Kappa can be expressed as AA = mean(diag(K)./sum(K,2)), (11) OA = sum(diag(K))/sum(K), ( 12) where mean(•) stands for computing the mean over all samples, diag(K) represents the vector of diagonal elements of K, ./denotes the elementwise division operation, sum(•, 2) represents summing over the elements of each row, sum(•) represents computing the sum of all elements, and sum(•, 1) denotes summing over the elements of each column.The higher values of AA, OA, and Kappa indicate better performance in classification.
We randomly choose 10% of the samples from each category as the training set in each independent experiment, while the remaining samples are utilized for testing [37,38].Tables 2-4 list the counts of training samples and testing samples for each category on three datasets.We use the average of accuracies obtained from five independent classification experiments as the final result.The proposed ContrastBS uses the following hyper-parameter settings in this article.The momentum and weight decay of the SGD solver are 9 × 10 −1 and 1 × 10 −4 , respectively.The size of a in the HSI patch X p ∈ R a×a×B is 10.The step size s of the window movement is set to one.The initial learning rate is equal to 6.25 × 10 −3 , and a cosine decay schedule is employed to regulate the learning rate.The random Gaussian blur in the data augmentation strategy is configured to be applied with a probability of 20%, using a randomly selected 3 × 3 Gaussian kernel with a standard deviation ranging from 1 to 2. The kernel size of the 1D convolutional layer in the band attention module is three.The batch size is 32.The penalty parameter η is set to 1 × 10 −2 .

Classification Performance Comparison with Other BS Methods
To validate the usefulness of ContrastBS, we conduct a comparative assessment of its classification performance against nine existing BS methods across three hyperspectral datasets.Figures 5-7 depict the OA curves of different BS methods employing the SVM classifier on three datasets.Tables 5-7 summarize the OAs, AAs, and Kappa values obtained by ContrastBS and the compared BS methods based on the SVM classifier on three datasets.In Tables 5-7, the best results are denoted by bold formatting.

IP Dataset
Figure 5 provides the OA curves of different BS techniques on the IP dataset when 10 to 50 bands are selected.As illustrated in Figure 5, the classification accuracy of ContrastBS significantly exceeds that of the competitors when more than twelve bands are used on the IP dataset.For example, ContrastBS can achieve relatively high classification accuracy when selecting twelve bands, while the other BS methods cannot obtain this accuracy even when the count of selected bands is expanded to fifty.
Table 5 summarizes the OAs, AAs, and Kappa values obtained by ContrastBS and other BS techniques when the count of selected bands is fifteen on the IP dataset.As presented in Table 5, ContrastBS yields the highest classification accuracy on the IP dataset.Furthermore, compared to the most competitive comparison technique (i.e., BS-Net-Conv), ContrastBS achieves a 2.03% and 1.74% increase in OA and AA, respectively.This experimental result demonstrates that the contrastive learning-based ContrastBS method, which can better utilize the abstract semantic information of the HSI, is able to select a more valuable subset of bands compared to the generative-based BS-Net-Conv method.6 presents the OA curves of the BS methods used for comparison and the proposed ContrastBS on PU.As given in Figure 6, our ContrastBS always surpasses the nine comparison methods in terms of classification performance when selecting different numbers of bands.Notably, ContrastBS requires only ten bands to achieve very high accuracy, while the advanced SR-SSIM and OPBS need more than thirty bands to obtain such accuracy.This experimental result demonstrates that ContrastBS can provide excellent classification accuracy even when the size of the selected band subset is limited.
Table 6 summarizes the OAs, AAs, and Kappa values obtained by ContrastBS and nine comparison methods when the count of selected bands is ten on PU.As summarized in Table 6, ContrastBS demonstrates a notable enhancement in terms of Kappa when compared to competitors.Additionally, ContrastBS is able to improve the OA by at least 3.09% and the AA by at least 2.98% compared to the comparison methods on the PU dataset, indicating that the proposed BS method can focus on the most valuable band subset with the help of the contrastive learning and the attention mechanism.7, our ContrastBS consistently achieves the highest OA under different sizes of band subsets on the SA dataset.It is worth noting that ContrastBS outperforms the generative algorithm-based BS methods (i.e., BS-Net-Conv and DARecNet-BS), which demonstrates the importance of considering the abstract semantic information of HSIs in the BS task.Furthermore, ContrastBS has obvious superiority over the comparison methods when using twenty bands.In addition, as shown in Figure 7, MR shows a decrease in classification accuracy with the increase in the number of selected bands when selecting more than twelve bands on the SA dataset, and this phenomenon can be interpreted in terms of the Hughes phenomenon.
Table 7 presents the OAs, AAs, and Kappa values obtained by ContrastBS and nine comparison methods when the count of selected bands is fifteen on SA.As presented in Table 7, for the SA dataset, although the best competitor (i.e., SR-SSIM) can obtain relatively high OA, AA, and Kappa, ContrastBS still attains the best classification performance due to its ability to well consider the abstract semantic information of the HSI and the nonlinear relationship between bands.

Analysis of Computational Time
Table 8 summarizes the computational times required for ContrastBS and nine comparison methods when selecting fifteen bands on the IP dataset.As summarized in Table 8, the deep learning-based BS methods (i.e., BS-Net-Conv, DARecNet-BS, and ContrastBS) require some time for network training.However, once the network is trained, BS-Net-Conv or ContrastBS exhibits a remarkably efficient inference time of just 0.0004 s, which is much less than the running times required for the BS methods without deep learning (i.e., MVPCA, LCMVBCC, LCMVBCM, ECA, OPBS, MR, and SR-SSIM).Since DARecNet-BS computes entropy for each band during inference, it requires a longer inference time than that required by BS-Net-Conv and ContrastBS.Furthermore, as listed in Table 8, when comparing the three deep learning-based BS methods, our ContrastBS costs much less time on network training than BS-Net-Conv and tation strategy.The experimental results verify that the improved augmentation strategy can help the contrastive learning algorithm better serve the HSI BS task.

Ablation Study of the Loss Function
To verify the effectiveness of the designed loss function, we perform ablation studies on the symmetric loss constraint (i.e., Equation ( 7)) and sparsity constraint (i.e., Equation ( 8)) of the loss function, respectively.Table 9 lists the classification results achieved by the contrastive learning-based BS framework for selecting fifteen bands on the IP dataset when either a single constraint or two constraints (i.e., ContrastBS) are used for the loss function.
As shown in Table 9, the classification performance achieved by the band subset selected when the loss function uses both symmetric loss constraint and sparsity constraint is better than the classification performance achieved by the band subset selected when the loss function only uses a single constraint.Experimental results verify that the designed loss function containing symmetric loss constraint and sparsity constraint can help the contrastive learning-based BS framework select the most valuable band subset.

Discussion
According to the experiments in Section 4, we summarize some important results and discuss some interesting phenomena.
(1) Superiority and robustness of ContrastBS.From the experiments in Section 4.2, it is observed that our ContrastBS offers better overall classification performance on the three datasets compared to the competitors, demonstrating that the BS method implemented by means of the contrastive learning and attention mechanism is capable of selecting the most valuable band subset.It is worth noting that BS-Net-Conv, which is based on the generative algorithm, is able to achieve better classification performance on the IP dataset compared to the other comparison BS methods, while it performs worse than SR-SSIM and OPBS on the PU and SA datasets, as presented in Figures 5-7.By contrast, the proposed ContrastBS attains better performance than the comparison methods on all three datasets, indicating that ContrastBS is robust to the datasets.The experimental results verify that the contrastive learning-based BS method has a stronger generalization ability compared to the generative-based BS method.
(2) Computational efficiency of ContrastBS.In terms of computational efficiency, the experiments in Sections 4.2 and 4.3 demonstrate that ContrastBS is able to select the required band subset within a reasonable time.Furthermore, ContrastBS implemented with the help of contrastive learning can avoid the computationally costly generation step in generative-based BS methods, resulting in a more efficient unsupervised representation learning-based BS process.The results of comparing the training time of ContrastBS with that of BS-Net-Conv and that of DARecNet-BS in Table 8 confirm the above statement.

Conclusions
In this paper, we propose a contrastive learning-based unsupervised BS architecture, termed ContrastBS, which can mine the abstract semantic information of HSIs.In Con-trastBS, we introduce the attention mechanism into the contrastive learning framework to extract the importance of each band.Moreover, we improve the traditional data augmentation strategy originally designed for normal images in SimSiam to make contrastive learning better serve HSIs.In addition, we design a loss function, which can constrain the symmetric loss while ensuring attention to the most valuable bands, specifically for the contrastive learning-based BS network.Experimental results indicate that the implemented ContrastBS has excellent performance compared to the comparison BS methods.In the future, we will explore other effective unsupervised representation learning techniques for the HSI BS task, aiming to enhance efficiency and effectiveness.

Figure 1 .
Figure 1.Overview of the proposed ContrastBS network.Two augmented views of one HSI patch are handled by the same attention encoder, which comprises the band attention module, the convolutional module, and the projector.Subsequently, one side uses a predictor, and the other side uses a stopgradient operation.The network minimizes the similarity between the two sides.

Figure 5 .
Figure 5. Overall accuracy curves of different BS techniques on the IP dataset.

Figure 6 .
Figure 6.Overall accuracy curves of different BS techniques on the PU dataset.

Figure 7 .
Figure 7. Overall accuracy curves of different BS techniques on the SA dataset.

Table 1 .
Information on three hyperspectral datasets.

Table 2 .
Number of training samples (n train ) and number of testing samples (n test ) for each category within IP.

Table 3 .
Number of training samples (n train ) and number of testing samples (n test ) for each land cover type within PU.

Table 4 .
Number of training samples (n train ) and number of testing samples (n test ) for each category within SA.

Table 5 .
Classification performance of BS techniques on the IP dataset.

Table 6 .
Classification performance of BS techniques on the PU dataset.

Table 7 .
Classification performance of BS techniques on the SA dataset.

Table 8 .
Comparison of computational times of BS techniques.

Table 9 .
Ablation studies on the symmetric loss constraint and sparsity constraint of the loss function.