Next Article in Journal
A Microwave–Optical Multi-Stage Synergistic Daily 30 m Soil Moisture Downscaling Framework
Previous Article in Journal
CLIP-Driven with Dynamic Feature Selection and Alignment Network for Referring Remote Sensing Image Segmentation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

BiMambaHSI: Bidirectional Spectral–Spatial State Space Model for Hyperspectral Image Classification

1
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
2
College of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(22), 3676; https://doi.org/10.3390/rs17223676
Submission received: 6 October 2025 / Revised: 2 November 2025 / Accepted: 5 November 2025 / Published: 8 November 2025

Highlights

What are the main findings?
  • The proposed BiMambaHSI introduces a bidirectional spectral–spatial state space framework integrating a joint spectral–spatial gated mamba (JGM) encoder and a spatial–spectral mamba block (SSMB), explicitly capturing long-range dependencies while maintaining the linear complexity.
  • Extensive experiments across multiple public benchmarks show that BiMambaHSI consistently achieves the state-of-the-art classification accuracy and robustness, surpassing the representative CNN- and Transformer-based baselines.
What is the implication of the main finding?
  • A bidirectional state–space Mamba block was designed specifically for hyperspectral image classification, providing a more efficient alternative to conventional Mamba and Transformer structures by effectively modeling long-range dependencies with high accuracy.
  • BiMambaHSI simultaneously models spectral continuity and spatial structure, exhibiting stronger discriminative ability for land-cover classes that are spectrally similar and spatially adjacent.

Abstract

Hyperspectral image (HSI) classification requires models that can simultaneously capture spatial structures and spectral continuity. Although state space models (SSMs), particularly Mamba, have shown strong capability in long-sequence modeling, their application to HSI remains limited due to insufficient spectral relation modeling and the constraints of unidirectional processing. To address these challenges, we propose BiMambaHSI, a novel bidirectional spectral—spatial framework. First, we proposed a joint spectral—spatial gated mamba (JGM) encoder that applies forward–backward state modeling with input-dependent gating, explicitly capturing bidirectional spectral—spatial dependencies. This bidirectional mechanism explicitly captures long-range spectral—spatial dependencies, overcoming the limitations of conventional unidirectional Mamba. Second, we introduced the spatial—spectral mamba block (SSMB), which employs parallel bidirectional branches to extract spatial and spectral features separately and integrates them through a lightweight adaptive fusion mechanism. This design enhanced spectral continuity, spatial discrimination, and cross-dimensional interactions while preserving the linear complexity of pure SSMs. Extensive experiments on five public benchmark datasets (Pavia University, Houston, Indian Pines, WHU-Hi-HanChuan, and WHU-Hi-LongKou) demonstrate that BiMambaHSI consistently achieves state-of-the-art performance, improving classification accuracy and robustness compared with existing CNN- and Transformer-based methods.

1. Introduction

With the continuous advancement of Earth observation technologies, the capabilities of remote sensing imagery acquisition have been steadily improving in terms of spatial resolution, spectral resolution, and observation coverage [1,2]. Traditional optical remote sensing typically relies solely on three broad RGB bands to capture information, reflecting limited spectral differences and struggling to support precise identification and differentiation of complex land features [3]. In contrast, hyperspectral imaging (HSI) simultaneously captures data across dozens to hundreds of contiguous narrow bands, offering broader spectral coverage and enabling comprehensive recording of land features’ reflectance characteristics across different wavelengths [4,5]. Against this backdrop, hyperspectral image classification (HSI classification) has emerged as a core research direction in the field. Its objective is to predict the category label assigned to each pixel in an image, enabling precise differentiation of land features. This technique demonstrates significant application value in resource surveys, ecological and environmental monitoring, urban mapping, and other domains [6].
Driven by deep learning, the HSI classification has made remarkable progress [7,8]. Early research predominantly relied on CNN-based models, which extract discriminative spatial textures and local spectral patterns [9,10,11]. Despite strong performance in small-sample scenarios, CNNs are fundamentally constrained by the limited receptive field of convolution, making it difficult to model long-range spectral dependency—a critical property for hyperspectral data.
To address this limitation, Transformer-based approaches introduced global self-attention to jointly explore spectral and spatial relationships [12,13,14]. These models improve classification in complex scenes. However, the quadratic growth of attention computation with sequence length leads to heavy memory usage and training cost, especially when operating on pixel-level hyperspectral sequences. Hence, while Transformers provide superior modeling capability, their computational overhead becomes a practical bottleneck.
Recently, State Space Models (SSMs) have emerged as a promising alternative that balances modeling power and efficiency [15,16]. Mamba [17] further enhances SSMs by introducing a selective mechanism, achieving long-range dependency modeling with linear complexity and hardware-efficient implementation [18,19]. Motivated by these advantages, several works explored Mamba for hyperspectral interpretation. For example, MambaHSI [20] introduced spatial and spectral branches with adaptive fusion, while HG-Mamba [21] and IGroupSS-Mamba [22] used grouped state updates to enhance spectral representation. These studies demonstrate that SSMs are competitive alternatives to Transformers for HSI.
However, when applied to hyperspectral image classification tasks [23], Mamba primarily encountered two issues: (1) they generally adopt unidirectional state propagation, making it difficult to simultaneously capture forward and backward spectral—spatial relationships, and (2) fusion between spectral and spatial cues is usually implicit or non-adaptive, which may lead to incomplete feature interaction. As a result, spectral continuity and spatial structure were not fully exploited. Regarding the first issue, Li et al. [20] proposed MambaHSI, the first Mamba-based model for hyperspectral image classification that introduces spatial and spectral Mamba modules with an adaptive fusion mechanism to jointly capture long-range dependencies and spectral—spatial representations. Zhang et al. [24] proposed WSC-Net, a dual-branch hyperspectral image classification network that integrates a Swin Transformer with a wavelet transform module and introduces a cross-domain attention fusion mechanism to jointly capture global contextual information and fine-grained spectral details. Nevertheless, existing Mamba-based hyperspectral classification methods (such as MambaHSI) still employ unidirectional sequence modeling, lacking the ability to explicitly model the sequential relationships between spectral and spatial tokens.
To overcome these limitations while preserving the efficiency advantage of SSMs, we proposed BiMambaHSI, a bidirectional spectral—spatial state–space framework for hyperspectral image classification. Unlike existing Mamba-based HSI models that rely on unidirectional propagation or implicit fusion, BiMambaHSI explicitly models spectral—spatial interactions in both forward and backward directions under a purely linear SSM design. Specifically, we introduced a joint spectral—spatial bidirectional gated Mamba (JGM) encoder, which serializes hyperspectral cubes into a unified sequence and performs global bidirectional state updates. The two directional representations are integrated through a selective gating mechanism that adaptively emphasizes informative spectral—spatial dependencies. Furthermore, we designed a spatial—spectral mamba block (SSMB) that decomposes modeling into spatial and spectral paths and fuses them with learnable weighting, enabling stronger representation of spectral continuity and spatial structure.
In summary, the main contributions of this paper are as follows:
  • We proposed a bidirectional state–space framework for hyperspectral image classification (BiMambaHSI), which extends standard Mamba-based HSI architectures with explicit bidirectional spectral—spatial state modeling while maintaining pure SSM linear complexity. By incorporating forward–backward propagation and adaptive fusion, the framework enhanced spectral—spatial feature representation without increasing computational cost.
  • To overcome the limitation of Mamba’s unidirectional processing in modeling global spectral—spatial relationships, we developed a joint spectral—spatial bidirectional gated Mamba (JGM) encoder. Through forward and backward state updates coupled with a selective gating mechanism, it effectively captures the spectral-spatial coupling relationships across the entire HSI image.
  • To address the insufficient characterization of spectral continuity and spatial structure in Mamba, we designed a bidirectional spatial—spectral Mamba block (SSMB) for both spectral and spatial dimensions. By implementing bidirectional state updates and adaptive fusion across two pathways, this module enhances the depiction of spectral continuity and spatial structure, improving classification discriminative power.
The remainder of this paper is organized as follows: Section 2 introduces the framework of BiMambaHSI. Section 3 comprehensively describes the experimental process, including dataset selection, implementation details, comparative experiments, parameter analysis, and ablation studies. Finally, Section 4 concludes this work and provides suggestions for future research directions.

2. Materials and Methods

2.1. Experimental Datasets

To comprehensively evaluate the performance of the proposed BiMambaHSI model, we conducted extensive experiments using five publicly available datasets that cover diverse characteristics: Pavia University [25], Houston [26], Indian Pines [27], WHU-Hi-HanChuan [28,29], and WHU-Hi-LongKou [28,29]. These datasets exhibit significant differences in spatial resolution, spectral characteristics, scene complexity, and category distribution. We provided an overview of their ground-truth annotations. The original hyperspectral image, the corresponding land-cover categories, and their spatial distributions are illustrated in Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5. Furthermore, the number of samples for each class is summarized in Table 1, offering a clear view of the data composition across different categories.

2.1.1. Houston Dataset

The Houston dataset was collected by the AVIRIS sensor over the University of Houston campus, USA. It consists of 349 × 1905 pixels with 144 spectral bands covering 380–2500 nm. Fifteen classes are annotated, including buildings, roads, trees, and grass. This dataset is more challenging due to its large area, high intra-class variability, and shadow effects.

2.1.2. Indian Pines Dataset

The Indian Pines dataset, captured by AVIRIS over northwestern Indiana, contains 145 × 145 pixels and 200 spectral bands (after water-band removal). It covers agricultural fields with 16 land-cover classes, mostly crops, soil, and forest. Due to high spectral similarity among vegetation categories, it is often used to test spectral discrimination ability.

2.1.3. WHU-Hi-HanChuan Dataset

The WHU-Hi-HanChuan dataset, also captured by ZY-1, represents a suburban area in Hanchuan, China. It consists of 480 × 300 pixels with 270 spectral bands after preprocessing. Nine categories are defined, primarily vegetation, bare soil, and residential regions. This dataset features homogeneous agricultural scenes and smooth spatial transitions, suitable for testing spatial consistency.

2.1.4. WHU-Hi-LongKou Dataset

The WHU-Hi-Longkou dataset was acquired by the ZY-1 hyperspectral sensor over Longkou, China. It has 550 × 400 pixels and 270 spectral bands, covering urban and coastal areas. Ten land-cover types are annotated, including buildings, roads, vegetation, and water. It serves as a high-resolution benchmark with complex spatial textures and spectral diversity.

2.2. Preliminaries

In recent years, Structured State Space Models (SSMs) have attracted widespread attention in deep learning. Models of this type, such as S4 and Mamba, inherit the principles of classical state—space systems and have demonstrated great potential in sequence modeling. The core idea is to use hidden state variables h ( t ) R N to model a one-dimensional input sequence x ( t ) R , mapping it to an output y ( t ) R , thereby establishing the dynamic relationship between input and output. This process can be described by the following linear ordinary differential equation (ODE)
h ( t ) = A h ( t ) + B x ( t ) , y ( t ) = C h ( t ) ,
where A R N × N is the state matrix, B R N × 1 , and C R 1 × N are the projection parameters. To handle discrete data such as images and natural language, the continuous form of state space systems needs to be discretized. To this end, S4 and Mamba introduce a time-scale parameter Δ and apply a fixed discretization rule to transform the continuous parameters A and B into their discrete counterparts A ¯ and o v e r l i n e B . The most commonly used method is the Zero-Order Hold (ZOH), defined as the following:
A ¯ = exp ( Δ A ) , B ¯ = ( Δ A ) 1 exp ( Δ A ) I · Δ B .
After discretization, SSM-based models can be computed in the recurrence form, which updates the hidden state step by step as follows:
h t = A ¯ h t 1 + B ¯ x t , y t = C h t .
Alternatively, it can be expressed in the convolutional form, representing the entire sequence as a convolution with the system kernel as the following:
K = C B ¯ , C A ¯ B ¯ , , C A ¯ L 1 B ¯ , y = x · K ,
where L denotes the sequence length, which determines the expansion scale of the convolution kernel K . This transition from continuous to discrete representation enables SSMs to retain theoretical modeling capacity while achieving linear-time complexity for long sequence modeling, laying the foundation for the wide adoption of S4 and Mamba in deep learning.

2.3. BiMambaHSI Overview

Our proposed framework, termed BiMambaHSI, is illustrated in Figure 6. The entire architecture is composed of three main components: (i) an Embedding Layer that produces multi-scale spectral—spatial embeddings from hyperspectral patches, (ii) a set of joint spectralspatial bidirectional gated Mamba Encoders,which form the core of our framework and incorporate the newly designed spatialspectral Mamba block (SSMB) to explicitly enhance bidirectional spectral—spatial modeling, and (iii) a Multi-scale Classification Head that integrates the encoder multi-scale outputs for final prediction. By introducing the JGM together with SSMB, BiMambaHSI not only captures long-range dependencies with linear complexity but also alleviates the limitations of insufficient spectral relation modeling and unidirectional sequence processing.
(1) Embedding Layer: Given an input hyperspectral image I R H × W × C , we first apply principal component analysis (PCA) to reduce spectral redundancy. Pixel-centered patches of size N × N × C are then extracted and passed through multi-scale 3D convolutional branches. Each branch corresponds to one scale s and consists of a convolution layer, batch normalization, and SiLU activation. The embedding feature on scale s is formulated as the following:
E s = SiLU BN 3 D ( Conv 3 D s ( I p a t c h ) ) , s = 1 , 2 , , S ,
where S denotes the number of scales. The outputs { E 1 , E 2 , , E S } are then fed into the gated BiMamba encoders. In our BiMambaHSI framework, we specifically adopt S = 2 scales, corresponding to convolution kernels of size 1 × 1 and 3 × 3 , in order to capture both fine-grained spectral cues and broader spatial—spectral dependencies.
(2) Joint spectral—spatial Bidirectional Gated Mamba encoder: Each embedding E s is processed by a gated BiMamba encoder, which constitutes the core innovation of our framework. Additionally, the encoder employs the proposed SSMB as its basic unit as follows:
H s = Encoder ( E s ) .
where H s denotes the bidirectionally enhanced representation for scale s. This design allows the encoder to overcome the limitations of traditional Mamba-based models by explicitly exploiting both spectral relationships and spatial—spectral dependencies in a bidirectional manner.
(3) Multi-scale Classification Head: The outputs from all encoders are aggregated for classification. Specifically, the features { H 1 , H 2 , , H S } are summed, followed by average pooling and reshaping to form the final representation as the following:
Z = Reshape AvgPooling ( s = 1 S H s ) .
Finally, a linear classifier maps Z to the predicted class labels as follows:
y ^ = Classifier ( Z ) .
This multiscale fusion design allows BiMambaHSI to integrate complementary spectral and spatial cues across different receptive fields, thereby improving classification accuracy on complex hyperspectral scenes.

2.4. Joint Spectral—Spatial Bidirectional Gated Mamba (JGM) Encoder

The JGM encoder extends the unidirectional sequence modeling framework of Mamba by introducing forward and backward state propagation, enabling more explicit modeling of spectral—spatial interactions. Unlike existing approaches (e.g., Hypersigma [12]), our encoder does not separate spectral and spatial features at the very beginning. Instead, it unifies hyperspectral patches into a sequence of tokens, ensuring that the correlations across both spectral and spatial dimensions can be jointly captured from the start. The underlying module employed in this encoder is the spatial—spectral Mamba block (SSMB), which extends Mamba for hyperspectral data; its detailed design will be presented in Section 2.4. This strategy allows the encoder to explicitly model cross-dimensional relationships rather than treating spectra and space in isolation. Furthermore, to overcome the limitation of unidirectional processing, we introduce a bidirectional mechanism, where each sequence is modeled in both forward and backward directions, Given the embedding feature E s R L × L × D , we first apply normalization and projection to obtain a latent representation Z s as follows:
Z s = Proj Norm ( E s ) .
In parallel, we do not create a separate gating branch. Instead, the projected tensor Z s R L × L × 2 D is split along the channel dimension into a content part and a gating part as follows:
Z s = Z s ( c ) , Z s ( g ) , Z s ( c ) , Z s ( g ) R L × L × D
where the input-dependent gating weight is computed from the gating part via a SiLU nonlinearity followed by a sigmoid:
G s = σ SiLU ( Z s ( g ) ) .
The content part Z s ( c ) is fed into the SSMB, while G s is later used to adaptively fuse the forward and backward representations. With the split representation, the content features Z s ( c ) are processed by the SSMB in both forward and backward directions, yielding two complementary representations H s f and H s b . To alleviate the limitation of unidirectional modeling, the final output of the encoder is obtained by adaptively fusing these two representations under the control of the gating weight G s as follows:
H s = G s H s f + ( 1 G s ) H s b ,
where ⊙ denotes element-wise multiplication. This formulation allows the encoder to explicitly integrate spectral–spatial dependencies while simultaneously mitigating the shortcomings of single-directional processing. As a result, the proposed JGM encoder is able to capture richer discriminative representations tailored for hyperspectral image classification.

2.5. Spatial–Spectral Mamba Block (SSMB)

Building on the joint spectral—spatial tokenization achieved by the JGM encoder, the spatial—spectral Mamba block (SSMB) is designed to separately mine spatial and spectral cues while retaining bidirectional long-range modeling. As shown in Figure 7, SSMB splits the processing into two complementary branches: a spatial branch that specializes in spatial structures across patches and a spectral branch that focuses on spectral continuity across channels; both branches employ bidirectional Mamba to capture dependencies in two directions. Let U = Z s ( c ) R L × L × D denote the input to SSMB. For the spatial branch, the input U is first flattened into a sequence as follows:
H F spa = Flatten ( U ) .
where H F spa R L 2 × D . To capture bidirectional dependencies, we apply Mamba along both forward and reversed sequences, followed by group normalization and SiLU activation,
R spa = SiLU GN ( Mamba ( H F spa ) ) , R spa = Reverse SiLU GN ( Mamba ( Reverse ( H F spa ) ) ) .
One operates in the forward order, while the other processes the reversed sequence to capture backward dependencies. Their outputs are fused by a learnable gating weight g spa , yielding
R spa = g spa · R spa + ( 1 g spa ) · R spa ,
where g spa is a trainable parameter that adaptively balances the contributions of forward and backward spatial dependencies. The forward propagation captures spatial structural, while the backward propagation compensates the missing dependencies from the opposite direction. Compared to fixed summation, the gated design enables the model to adaptively adjust the weights of forward/backward information flows, thereby achieving greater robustness across datasets. Finally, the aggregated sequence is reshaped back into a spatial layout and added to the original input through a residual connection as follows:
O spa = Reshape ( R spa ) + U .
To enhance spectral information modeling, the spectral channels are first divided into G groups (each of size M = D / G ). Let U R L × L × D . We split channels and keep the spatial layout,
H G spe = SplitSpectralGroup ( U ) R L × L × G × M .
The spatial tokens are then flattened, producing a sequence per group along the intra-group spectral index,
H F spe = Flatten ( H G spe ) R L 2 × G × M .
Bidirectional Mamba is applied along the intra-group spectral dimension, followed by GN and SiLU; the backward pass reverses the G-axis as the following:
R spe = SiLU GN ( Mamba ( H F spe ) ) , R spe = Reverse SiLU GN ( Mamba ( Reverse ( H F spe ) ) ) .
The two directions are aggregated by summation. Unlike spatial structures, spectral bands are sampled in a strictly ordered continuous wavelength domain, and forward–backward propagation represents two complementary scanning directions along the same band axis. Since spectral continuity is globally smooth and less variable than spatial textures, equal-weight aggregation provides a stable and parameter-free fusion strategy. This allows the model to integrate long-range spectral correlations without introducing extra learnable coefficients, preserving pure SSM linear complexity as the following:
R spe = R spe + R spe .
Finally, the sequence is reshaped back, spectral groups are merged, and a residual connection is added,
O spe = Reshape ( R spe ) + U .
This branch explicitly captures bidirectional dependencies among principal component channels within each spectral group and complements the spatial branch. Finally, the outputs from the spatial and spectral branches are integrated by a lightweight fusion module. Given the input embedding U , two fusion weights are first computed through a shared projection followed by a SiLU activation as the following:
[ W spa , W spe ] = SiLU Proj ( U ) .
The spatial and spectral outputs W spa , W spe R L × L × D are then combined in an adaptive manner as follows:
H fus = W spa · O spa + W spe · O spe .
A normalization layer is subsequently applied to stabilize training as the following:
O = LayerNorm ( H fus ) .
Through this design, the model learns to emphasize spatial or spectral representations depending on the input, enabling adaptive feature integration.

3. Results

3.1. Evaluation Indicators

We evaluated the proposed method using three widely adopted metrics in hyperspectral image classification: overall accuracy (OA), average accuracy (AA), and the Kappa coefficient ( κ ). These metrics complement each other by characterizing classification performance from different perspectives. Let the total number of test samples be N, the number of land-cover classes be C, and the confusion matrix be denoted as M R C × C , where each entry M i j represents the number of samples belonging to class i that are misclassified as class j.
  • Overall Accuracy (OA): This metric measures the ratio of correctly classified pixels to the total number of test pixels, and it is referred to as the overall accuracy (OA) and defined as the following:
    OA = i = 1 C M i i N .
  • Average Accuracy (AA): AA computes the average classification accuracy across all classes, assigning equal importance to each class regardless of its sample size. This ensures a more balanced evaluation of classification performance over all categories. The definition of AA is given as the following:
    AA = 1 C i = 1 C M i i j = 1 C M i j .
  • Kappa Coefficient ( κ ): The Kappa coefficient quantifies the level of agreement between the predicted classification results and the ground truth, while accounting for the agreement that may occur by chance. It is formulated as follows:
    κ = N i = 1 C M i i i = 1 C j = 1 C M i j · j = 1 C M j i N 2 i = 1 C j = 1 C M i j · j = 1 C M j i .

3.2. Experimental Setup

To ensure fair comparisons and reproducibility, all experiments were implemented within a unified framework using a single NVIDIA RTX 4090 GPU and a 14th Gen Intel Core CPU under PyTorch 2.1. Following the common practice in previous studies [9,11], we applied PCA to each hyperspectral cube for spectral dimensionality reduction. Specifically, the number of retained components was set to 15 for Pavia University, 17 for Houston, 19 for Indian Pines, and 14 for both WHU-Hi-Hanchuan and WHU-Hi-LongKou. For a fair comparison with Hypersigma, which requires pretrained weights, we followed its setting and uniformly fixed the number of PCA components to 33.
During training, all models were optimized using the AdamW optimizer for 200 epochs, with an initial learning rate of 1 × 10 3 and weight decay set to 0.005 . The learning rate adjustment employed a cosine-annealing scheduler, with a batch size of 128. These training hyperparameters were determined through preliminary experiments to ensure stable and efficient convergence across all datasets.
The training sample proportions for each dataset are shown in Table 1. The Pavia University dataset training set contains 1% of samples from each category. The training sets for the Houston and WHU-Hi-HanChuan datasets each contain 5% of samples per category. The Indian Pines dataset training set contains 10% of samples per category; The WHU-Hi-LongKou dataset contains 0.1% of samples per category. These challenging small-sample-size settings were designed to rigorously test the models’ generalization capabilities. The remaining samples in each dataset were used for testing.

3.3. Ablation Analysis

To analyze the contributions of individual modules, as shown in Table 2, we conducted ablations focusing on the gated fusion in JGM and the dual-branch structure in SSMB.
  • JGM fusion mechanism. Replacing gated fusion with simple summation yielded only slight improvement over the base model. With learnable gating, the accuracy increased more noticeably on all datasets (e.g., from 95.36% to 95.68% on PaviaU and from 96.97% to 97.08% on Houston), indicating that adaptive weighting provides more effective spectral—spatial integration.
  • SSMB dual-branch structure. Spatial-only and spectral-only variants resulted in reduced accuracy, showing that both modalities contribute complementary information. For example, on Houston, spatial-only and spectral-only variants achieved 97.13% and 97.30%, while the dual-branch version reached 97.47%. Similar patterns appeared on other datasets.
With both components enabled, BiMambaHSI obtained the highest accuracy across datasets. These results suggested that performance gains came from explicit bidirectional spectral—spatial modeling with adaptive fusion, rather than increased model size.

3.4. Classification Performance

To comprehensively evaluate the effectiveness of the proposed method, we conducted experiments on five widely used hyperspectral datasets. We selected several representative hyperspectral classification models as comparison methods, including GSCNet [14], HyperSIGMA [12], Lite-HCNNet [11], MSDAN [9], SimpoolFormer [30], SpectralFormer [13], SSFTT [31], MambaHSI [20], and HGMamba [21], which can demonstrate performance differences from various perspectives. Detailed quantitative results are summarized in Table 3, Table 4, Table 5, Table 6 and Table 7, while qualitative visualizations of classification maps with distinct locations clearly marked by white boxes are provided in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12. These results collectively validate the robustness and advantages of our approach across different spatial resolutions, spectral characteristics, and scene complexities.

3.4.1. Results on Pavia University Dataset

The Pavia University dataset contains a variety of urban surfaces with high intra-class spectral similarity. Our BiMambaHSI achieves the highest OA of 97.90%, outperforming both CNN- and Transformer-based baselines. The improvement is especially significant for categories such as asphalt, self-blocking bricks, and shadows, where subtle material differences are easily confused by other models. The bidirectional encoding and adaptive fusion mechanism allow BiMambaHSI to capture fine-grained local cues while maintaining spectral continuity, effectively reducing mis-classification among urban structures. In addition to the quantitative advantages, the classification maps on Pavia University show that BiMambaHSI produces smoother and more coherent urban surfaces. On the Pavia University dataset, the classification maps of BiMambaHSI appear overall more coherent, with relatively clearer boundaries between similar categories such as asphalt and bitumen. Compared with some baselines, misclassifications in shadow regions and narrow man-made structures are somewhat alleviated, and noise points tend to be reduced.

3.4.2. Results on Houston Dataset

The Houston dataset represents a highly heterogeneous large-scale urban area, which poses challenges due to complex man-made structures and spectrally confusing classes (e.g., roads vs. rooftops). BiMambaHSI achieves an OA of 97.92%, showing strong resilience to such confusions. Notably, our method maintains balanced performance across vegetation, water, and urban materials, while several baselines exhibit bias towards dominant land-cover classes. The robustness of BiMambaHSI in this scenario highlights the advantage of bidirectional spectral—spatial modeling when dealing with fine-grained heterogeneity and inter-class similarity.scenes. For the Houston dataset classification maps, BiMambaHSI produces relatively more uniform classification results in roads, rooftops, and residential areas, with fewer fragmented predictions. In complex urban blocks, the model tends to maintain structural consistency, while vegetation and water regions are also classified with relatively stable accuracy.

3.4.3. Results on Indian Pines Dataset

Indian Pines is characterized by low spatial resolution, severe class imbalance, and a large proportion of spectrally similar vegetation types. On this challenging benchmark, BiMambaHSI achieves an OA of 99.02% and κ of 98.88, with notable improvements in minority classes such as oats, alfalfa, and grass-pasture-mowed. Unlike previous models that struggle with small-sample categories, our bidirectional framework effectively reinforces spectral continuity and spatial consistency, allowing reliable recognition of rare crops. This demonstrates BiMambaHSI’s potential in imbalanced agricultural monitoring scenarios, where both major and minor classes are critical. In the Indian Pines dataset, the visualization results show that BiMambaHSI has a better tendency to identify certain minority classes, such as oats and alfalfa, with fewer omissions compared to other methods. Field boundaries also appear relatively clearer, suggesting that bidirectional modeling helps distinguish between adjacent crop categories to some extent.

3.4.4. Results on WHU-Hi-HanChuan Dataset

The WHU-Hi-HanChuan dataset has high spatial resolution and covers diverse landscapes, from cropland to urban surfaces. BiMambaHSI achieves an OA of 97.92%, showing clear advantages in categories with fine spatial granularity, such as roads, gardens, and small built-up structures. The results suggest that the spatial—spectral Mamba block is particularly beneficial in preserving fine structural detail without sacrificing spectral discrimination. In contrast, the baseline models tend to either over-smooth small objects or misinterpret them as larger homogeneous regions. This underlines the effectiveness of our adaptive fusion strategy in balancing local detail and global representation. On the WHU-Hi-HanChuan dataset, the classification maps indicate that BiMambaHSI preserves fine-grained spatial structures, such as roads and small buildings, relatively better. Compared with several baselines, vegetation and farmland regions show more stable predictions, with reduced noise in some local areas.

3.4.5. Results on WHU-Hi-LongKou Dataset

In contrast to HanChuan, the WHU-Hi-LongKou dataset is dominated by large homogeneous regions such as farmland and water. BiMambaHSI achieves an OA of 99.54%, consistently outperforming other methods across both major and minor land-cover categories. The model shows particular strength in discriminating between different crop types (e.g., narrow-leaf vs. broad-leaf soybean), which exhibit highly similar spectra. These results indicate that the bidirectional mechanism not only excels in complex urban settings but also fully exploits subtle spectral variations in uniform agricultural landscapes, making BiMambaHSI versatile across scene complexities. For the WHU-Hi-LongKou dataset, BiMambaHSI demonstrates higher consistency in large homogeneous regions such as farmland and water bodies. The distinction between different crop types, such as narrow-leaf and broad-leaf soybeans, also appears relatively clearer, and the classification maps contain fewer block-like artifacts.
In addition to CNN- and Transformer-based baselines, we also compared BiMambaHSI with two representative state–space hyperspectral classifiers: MambaHSI and HG-Mamba. These baselines directly validate the effectiveness of bidirectional state modeling. As shown in Table 3, Table 4, Table 5, Table 6 and Table 7 and Figure 9, Figure 10, Figure 11 and Figure 12, although both models leverage SSM-based long-range sequence modeling, their unidirectional formulation makes it difficult to capture forward–backward spectral—spatial relationships. For example, on the Pavia University dataset, MambaHSI and HG-Mamba achieve OA values of 95.80% and 95.03%, while BiMambaHSI reaches 97.90% under the same settings.
On more challenging datasets such as Indian Pines, BiMambaHSI further improved classification of minority classes and spectrally similar materials, achieving 99.02% OA and 98.88% Kappa. The visualization results showed that predictions from MambaHSI and HG-Mamba may contain locally inconsistent patches, whereas BiMambaHSI produces more continuous and coherent spatial structures.
These observations suggested that performance gains came not from simply adopting SSMs, but from explicitly introducing bidirectional state modeling and adaptive fusion. Therefore, the bidirectional design plays a key role in enhancing spectral—spatial representation under a pure SSM framework.

3.5. Model Complexity and Efficiency Analysis

To validate the efficiency of the proposed bidirectional SSM architecture, we compared BiMambaHSI with representative CNN-based, Transformer-based, and Mamba-based baselines. Table 8 reports parameter count, FPS, and per-epoch training time on five datasets. BiMambaHSI requires 0.42–0.43 M parameters, comparable to lightweight baselines such as SSFTT and SpectralFormer, indicating that performance improvements were not obtained at the cost of larger model capacity. In terms of computation, BiMambaHSI consistently achieved higher FPS and shorter training time than Transformer-based models. Since Transformers relied on self-attention with O ( N 2 ) complexity, while SSM propagation retained O ( N ) complexity, the observed speed differences aligned with the expected theoretical behavior. These results demonstrated that the proposed bidirectional formulation maintained the computational advantages of pure SSMs while improving classification performance across datasets.

4. Discussion

4.1. Mechanism Analysis and Comparison with Existing Methods

The effectiveness of BiMambaHSI primarily comes from two structural designs: the Joint Spectral—Spatial Bidirectional Gated Mamba (JGM) and the Spatial—Spectral Mamba Block (SSMB). JGM introduces forward–backward state propagation with a learnable fusion scheme, allowing the model to aggregate contextual cues from both directions rather than relying on a single unidirectional sequence. SSMB further refines representations through parallel spectral and spatial branches, enabling spectral continuity and spatial coherence to be modeled in a complementary manner. Together, these mechanisms enhance spectral—spatial discriminability while preserving the linear complexity of state–space modeling.
Compared with traditional hyperspectral classification models, BiMambaHSI avoids the limitations of CNN-based architectures, which capture local spatial features effectively but struggle with long-range spectral dependency. Unlike Transformer-based methods, which model global relationships but incur quadratic attention cost, BiMambaHSI maintains linear complexity and lightweight computation. Relative to existing Mamba-based methods such as HG-Mamba and MambaHSI, which remain unidirectional or emphasize only a single pathway, our design introduces bidirectional propagation and adaptive fusion without increasing parameter scale. This provides a favorable trade-off between accuracy and efficiency. On average, BiMambaHSI achieves higher OA with fewer parameters and FLOPs, indicating that the improvement arises from architectural design rather than model size.
Experimental results across five benchmark datasets show consistent performance in both heterogeneous urban scenes and homogeneous agricultural areas. In complex settings such as Pavia University and Houston, the model reduces confusion among spectrally similar categories, while in agricultural datasets (Indian Pines, WHU-Hi-Hanchuan, Longkou) it preserves fine-grained discrimination across subtle spectral variations. The minor accuracy differences mainly reflect scene complexity instead of model instability, suggesting that the bidirectional formulation provides stable generalization under varied spectral resolution and limited supervision.

4.2. Parameter Sensitivity Analysis

To assess the robustness and adaptability of BiMambaHSI, we further investigated the influence of three key factors: patch size, spectral group number (G) in the SSMB module, and training ratio. The experimental results are summarized in Figure 13. Patch size determines the extent of local spatial context. As shown in Figure 13a, accuracy increases with patch size until performance saturates or slightly declines after reaching a moderate scale. Small patches lack contextual semantics, while overly large patches introduce background noise and boundary smoothing. Most datasets achieved optimal performance with 9 × 9 or 11 × 11 patches, reflecting a balance between fine detail and context. G controls spectral partition granularity. Figure 13b shows that G = 16 consistently yields the highest accuracy across all datasets. Small G values merge a wide spectral range into one group, weakening local wavelength discrimination, while very large G values fragment continuous spectra, reducing global coherence. The stability of G = 16 across datasets supports the effectiveness of the proposed spectral grouping strategy. We further examined performance at 20%, 40%, 60%, 80%, and 100% labeled samples, as shown in Figure 13c. BiMambaHSI maintained an upward trend as supervision increases and achieves over 96% OA on PaviaU and Houston even with only 20% training samples, demonstrating good data efficiency. The small difference between 80% and 100% indicates early saturation of representation capacity.

4.3. Model Interpretation and Mechanism Analysis

To provide deeper insight into the design rationale and internal mechanisms of the proposed BiMambaHSI, we presented a concise interpretative analysis of its core modules—JGM and SSMB—and their collaborative effects.
(1) Forward–Backward Dependency Modeling. The JGM encoder was designed to enhance global spectral—spatial representation by incorporating both forward and backward state propagation. The forward propagation sequentially modeled dependencies along the scanning direction, while the backward propagation captured complementary contextual relations from the opposite direction. Their fusion enabled the model to perceive each pixel’s surrounding context in both preceding and succeeding orders, effectively mitigating direction-dependent bias in feature representation. Since both paths shared parameters within the Mamba formulation, this bidirectional extension introduced no additional theoretical complexity beyond the linear SSM framework.
(2) Spectral—Spatial Complementarity. While JGM emphasizes global spectral—spatial dependency modeling, the SSMB focused on local structural refinement. The spectral branch captured wavelength continuity and material-specific variations, whereas the spatial branch strengthened neighborhood coherence and boundary integrity. By combining both through adaptive fusion, the model achieved balanced representation learning—enhancing local discrimination without sacrificing global consistency. This design reflected the physical characteristics of hyperspectral imagery, where spatial adjacency and spectral smoothness jointly determined land-cover semantics.

4.4. Limitations and Future Work

BiMambaHSI still has certain limitations. First, although bidirectional modeling alleviates the shortcomings of unidirectionality, the fixed spectral grouping strategy may constrain its adaptability under varying spectral resolutions or sensor conditions. Second, this study relies on PCA for dimensionality reduction, but the optimal balance between information preservation and computational feasibility has not yet been explored further.
Future research could investigate adaptive spectral grouping strategies and multi-level bidirectional modeling structures to further enhance the model’s generalization ability. Beyond architectural refinements, it would also be valuable to examine alternative dimensionality reduction techniques or learned embeddings that strike a better balance between efficiency and information retention. In addition, extending BiMambaHSI to more challenging scenarios such as semi-supervised learning, cross-domain adaptation, and real-time applications may further improve its robustness, scalability, and practical utility in real-world hyperspectral tasks.

5. Conclusions

In this paper, we presented BiMambaHSI, a bidirectional state–space framework for hyperspectral image classification. BiMambaHSI introduces explicit bidirectional spectral—spatial state modeling and adaptive feature fusion while retaining pure SSM linear complexity. The framework integrates two complementary components: a joint spectral—spatial bidirectional gated Mamba (JGM) encoder that serializes hyperspectral cubes and models forward–backward dependencies, and a spatial—spectral mamba block (SSMB) that refines local spatial structures and spectral continuity through dual paths with learnable fusion. Experiments on five datasets demonstrate that BiMambaHSI achieves competitive performance across diverse spectral—spatial conditions. The results indicate that explicit bidirectional propagation and adaptive dual-path fusion provide a complementary modeling capability to existing methods, while preserving lightweight complexity characteristic of state–space models. These findings suggest that bidirectional spectral—spatial modeling is a promising direction for efficient and discriminative hyperspectral image classification.

Author Contributions

Conceptualization, J.M.; Methodology, J.M.; Investigation, H.M.; Writing—original draft, J.M.; Writing—review & editing, J.M.; Supervision, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Development Fund of Macau Project, grant number: 0096/2023/RIA2.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhao, B.; Zhang, M.; Li, W.; Song, X.; Gao, Y.; Zhang, Y.; Wang, J. Intermediate domain prototype contrastive adaptation for spartina alterniflora segmentation using multitemporal remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5401314. [Google Scholar] [CrossRef]
  2. Zhao, B.; Zhang, M.; Wang, J.; Song, X.; Gui, Y.; Zhang, Y.; Li, W. Multiple attention network for spartina alterniflora segmentation using multitemporal remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5402915. [Google Scholar] [CrossRef]
  3. Sun, Z.; Liu, H.; Chen, N.; Yang, H.; Li, J.; Liu, C.; Pei, X. Spectral Channel Mixing Transformer with Spectral-Center Attention for Hyperspectral Image Classification. Remote Sens. 2025, 17, 3100. [Google Scholar] [CrossRef]
  4. Yang, Z.; Li, H.; Wei, F.; Ma, J.; Zhang, T. WSC-Net: A Wavelet-Enhanced Swin Transformer with Cross-Domain Attention for Hyperspectral Image Classification. Remote Sens. 2025, 17, 3216. [Google Scholar] [CrossRef]
  5. Mei, Y.; Fan, J.; Fan, X.; Li, Q. CSTC: Visual Transformer Network with Multimodal Dual Fusion for Hyperspectral and LiDAR Image Classification. Remote Sens. 2025, 17, 3158. [Google Scholar] [CrossRef]
  6. Fu, D.; Zeng, Y.; Zhao, J. DFAST: A Differential-Frequency Attention-Based Band Selection Transformer for Hyperspectral Image Classification. Remote Sens. 2025, 17, 2488. [Google Scholar] [CrossRef]
  7. Han, R.; Cheng, S.; Li, S.; Liu, T. Prompt-Gated Transformer with Spatial–Spectral Enhancement for Hyperspectral Image Classification. Remote Sens. 2025, 17, 2705. [Google Scholar] [CrossRef]
  8. Zhao, B.; Li, Z.; Jiang, X.; Zhang, M.; Li, W.; Zhang, Y.; Song, X. Contrastive Adaptive Segmentation Method for Spartina Alterniflora Based on Intermediate Domain Prototypes. In Proceedings of the 2024 7th International Conference on Image and Graphics Processing, Beijing, China, 19–21 January 2024; pp. 85–91. [Google Scholar]
  9. Wang, X.; Fan, Y. Multiscale densely connected attention network for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1617–1628. [Google Scholar] [CrossRef]
  10. Cui, Y.; Xia, J.; Wang, Z.; Gao, S.; Wang, L. Lightweight spectral–spatial attention network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5510114. [Google Scholar] [CrossRef]
  11. Ma, X.; Kang, X.; Qin, H.; Wang, W.; Ren, G.; Wang, J.; Liu, B. A Lightweight Hybrid Convolutional Neural Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5513714. [Google Scholar] [CrossRef]
  12. Wang, D.; Hu, M.; Jin, Y.; Miao, Y.; Yang, J.; Xu, Y.; Qin, X.; Ma, J.; Sun, L.; Li, C.; et al. Hypersigma: Hyperspectral intelligence comprehension foundation model. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 6427–6444. [Google Scholar] [CrossRef]
  13. Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5518615. [Google Scholar] [CrossRef]
  14. Zhao, Z.; Xu, X.; Li, S.; Plaza, A. Hyperspectral image classification using groupwise separable convolutional vision transformer network. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5511817. [Google Scholar] [CrossRef]
  15. Gu, A. Modeling Sequences with Structured State Spaces; Stanford University: Stanford, CA, USA, 2023. [Google Scholar]
  16. Gu, A.; Goel, K.; Ré, C. Efficiently modeling long sequences with structured state spaces. arXiv 2021, arXiv:2111.00396. [Google Scholar]
  17. Gu, A.; Johnson, I.; Goel, K.; Saab, K.; Dao, T.; Rudra, A.; Ré, C. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Adv. Neural Inf. Process. Syst. 2021, 34, 572–585. [Google Scholar]
  18. Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. Vmamba: Visual state space model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
  19. Dao, T.; Gu, A. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. arXiv 2024, arXiv:2405.21060. [Google Scholar] [CrossRef]
  20. Li, Y.; Luo, Y.; Zhang, L.; Wang, Z.; Du, B. MambaHSI: Spatial–Spectral Mamba for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5524216. [Google Scholar] [CrossRef]
  21. Cui, H.; Hayama, T. Hgmamba: Enhancing 3d human pose estimation with a hypergcn-mamba network. arXiv 2025, arXiv:2504.06638. [Google Scholar]
  22. He, Y.; Tu, B.; Jiang, P.; Liu, B.; Li, J.; Plaza, A. IGroupSS-Mamba: Interval group spatial-spectral mamba for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5538817. [Google Scholar] [CrossRef]
  23. Zhao, B.; Zhang, M.; Li, W.; Gao, Y.; Wang, J. Domain Information Mining and State-Guided Adaptation Network for Multispectral Image Segmentation. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 19849–19863. [Google Scholar] [CrossRef]
  24. Zhang, J.; Sun, M.; Chang, S. Spatial and Spectral Structure-Aware Mamba Network for Hyperspectral Image Classification. Remote Sens. 2025, 17, 2489. [Google Scholar] [CrossRef]
  25. Derosa, G.; Sahebkar, A.; Maffioli, P. The role of various peroxisome proliferator-activated receptors and their ligands in clinical practice. J. Cell. Physiol. 2018, 233, 153–161. [Google Scholar] [CrossRef]
  26. Debes, C.; Merentitis, A.; Heremans, R.; Hahn, J.; Frangiadakis, N.; Van Kasteren, T.; Liao, W.; Bellens, R.; Pižurica, A.; Gautama, S.; et al. Hyperspectral and LiDAR data fusion: Outcome of the 2013 GRSS data fusion contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2405–2418. [Google Scholar] [CrossRef]
  27. Baumgardner, M.F.; Biehl, L.L.; Landgrebe, D.A. 220 band aviris hyperspectral image data set: June 12, 1992 indian pine test site 3. Purdue Univ. Res. Repos. 2015, 10, R7RX991C. [Google Scholar]
  28. Zhong, Y.; Hu, X.; Luo, C.; Wang, X.; Zhao, J.; Zhang, L. WHU-Hi: UAV-borne hyperspectral with high spatial resolution (H2) benchmark datasets and classifier for precise crop identification based on deep convolutional neural network with CRF. Remote Sens. Environ. 2020, 250, 112012. [Google Scholar] [CrossRef]
  29. Zhong, Y.; Wang, X.; Xu, Y.; Wang, S.; Jia, T.; Hu, X.; Zhao, J.; Wei, L.; Zhang, L. Mini-UAV-borne hyperspectral remote sensing: From observation and processing to applications. IEEE Geosci. Remote Sens. Mag. 2018, 6, 46–62. [Google Scholar] [CrossRef]
  30. Roy, S.K.; Jamali, A.; Chanussot, J.; Ghamisi, P.; Ghaderpour, E.; Shahabi, H. SimPoolFormer: A two-stream vision transformer for hyperspectral image classification. Remote Sens. Appl. Soc. Environ. 2025, 37, 101478. [Google Scholar] [CrossRef]
  31. Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral–Spatial Feature Tokenization Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5522214. [Google Scholar] [CrossRef]
Figure 1. The original image and labeled ground truth of Pavia University dataset.
Figure 1. The original image and labeled ground truth of Pavia University dataset.
Remotesensing 17 03676 g001
Figure 2. The original image and labeled ground truth of Houston dataset.
Figure 2. The original image and labeled ground truth of Houston dataset.
Remotesensing 17 03676 g002
Figure 3. The original image and labeled ground truth of Indian Pines dataset.
Figure 3. The original image and labeled ground truth of Indian Pines dataset.
Remotesensing 17 03676 g003
Figure 4. The original image and labeled ground truth of WHU-Hi-HanChuan dataset.
Figure 4. The original image and labeled ground truth of WHU-Hi-HanChuan dataset.
Remotesensing 17 03676 g004
Figure 5. The original image and labeled ground truth of WHU-Hi-LongKou dataset.
Figure 5. The original image and labeled ground truth of WHU-Hi-LongKou dataset.
Remotesensing 17 03676 g005
Figure 6. Overview of the BiMambaHSI framework.
Figure 6. Overview of the BiMambaHSI framework.
Remotesensing 17 03676 g006
Figure 7. Overview of the spatial—spectral Mamba block (SSMB).
Figure 7. Overview of the spatial—spectral Mamba block (SSMB).
Remotesensing 17 03676 g007
Figure 8. The classification maps of all methods in the Pavia University dataset: (a) image, (b) ground Truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Figure 8. The classification maps of all methods in the Pavia University dataset: (a) image, (b) ground Truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Remotesensing 17 03676 g008
Figure 9. The classification maps of all methods in the Houston dataset: (a) image, (b) ground truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Figure 9. The classification maps of all methods in the Houston dataset: (a) image, (b) ground truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Remotesensing 17 03676 g009
Figure 10. The classification maps of all methods in the Indian Pines dataset: (a) image, (b) ground truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Figure 10. The classification maps of all methods in the Indian Pines dataset: (a) image, (b) ground truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Remotesensing 17 03676 g010
Figure 11. The classification maps of all methods in the WHU-Hi-HanChuan dataset: (a) image, (b) ground truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Figure 11. The classification maps of all methods in the WHU-Hi-HanChuan dataset: (a) image, (b) ground truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Remotesensing 17 03676 g011
Figure 12. The classification maps of all methods in the WHU-Hi-LongKou dataset: (a) image, (b) ground truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Figure 12. The classification maps of all methods in the WHU-Hi-LongKou dataset: (a) image, (b) ground truth, (c) GSCNet, (d) HyperSIGMA, (e) Lite-HCNNet, (f) MSDAN, (g) SimpoolFormer, (h) SpectralFormer, (i) SSFTT, (j) MambaHSI, (k) HGMamba, and (l) BiMambaHSI.
Remotesensing 17 03676 g012
Figure 13. Parameter sensitivity analysis of BiMambaHSI across five datasets: (a) patch size, (b) spectral group number (G), and (c) training sample ratio.
Figure 13. Parameter sensitivity analysis of BiMambaHSI across five datasets: (a) patch size, (b) spectral group number (G), and (c) training sample ratio.
Remotesensing 17 03676 g013
Table 1. Number of training and testing samples for five datasets.
Table 1. Number of training and testing samples for five datasets.
Pavia UniversityHoustonWHU-Hi-HanChuan
ClassNumber of SampleClassNumber of SampleClassNumber of Sample
No.NameTrainTestNo.NameTrainTestNo.NameTrainTest
1Asphalt6665651Grass healthy6311881Paddy field223742,498
2Meadows18618,4632Grass stressed6311912Irrigated field113821,615
3Gravel2120783Grass synthetic356623Dry cropland5149773
4Trees3130334Tree6211824Garden plot2685085
5Painted metal sheets1313325Soil6211805Arbor forest601140
6Bare Soil5049796Water163096Shrub land2274306
7Bitumen1313177Residential6312057Natural meadow2955608
8Self-Blocking Bricks3736458Commercial6211828Artificial meadow89917,079
9Shadows99389Road6311899Industrial land4738996
Indian Pines10Highway61116610Urban residential5269990
ClassNumber of Sample11Railway62117311Rural residential84616,065
No.NameTrainTest12Parking Lot 162117112Traffic land1843495
1Alfalfa54113Parking Lot 22344613River4568660
2Corn-notill143128514Tennis Court2140714Lake92817,632
3Corn-mintill8374715Running Track3362715Pond571079
4Corn24213WHU-Hi-LongKou16Bare land377071,631
5Grass-pasture48435ClassNumber of Sample
6Grass–trees73657No.NameTrainTest
7Grass–pasture–mowed3251Corn3434,477
8Hay-windrowed484302Cotton88366
9Oats2183Sesame33028
10Soybean-notill978754Broad-leaf soybean6363,149
11Soybean-mintill24622095Narrow-leaf soybean44147
12Soybean-clean595346Rice1111,843
13Wheat211847Water6766,989
14Woods12711388Roads and houses77117
15Buildings–Grass–Trees–Drives393479Mixed weed55224
16Stone–Steel–Towers1083
Table 2. The ablation performance of different datasets.
Table 2. The ablation performance of different datasets.
DatasetsBaseJGMSSMBBiMambaHSI
SumGateSpatialSpectralSpatial + Spectral
Pavia University95.29 ± 0.7195.36 ± 0.2395.68 ± 0.6895.87 ± 0.3695.76 ± 0.2496.14 ± 0.3297.90 ± 0.11
Houston96.75 ± 0.9696.97 ± 0.5797.08 ± 0.9397.13 ± 0.2697.30 ± 0.4197.47 ± 1.0497.92 ± 0.17
Indian Pines97.83 ± 0.2197.89 ± 0.6298.01 ± 1.3798.19 ± 0.8698.17 ± 0.5698.56 ± 1.1099.02 ± 0.11
WHU-Hi-HanChuan95.01 ± 0.3295.37 ± 0.5495.99 ± 0.2196.02 ± 0.4795.99 ± 0.7096.52 ± 0.6397.92 ± 0.27
WHU-Hi-LongKou96.86 ± 0.4596.96 ± 0.8397.24 ± 1.0897.76 ± 1.0697.94 ± 0.7998.27 ± 1.6199.54 ± 0.03
Table 3. The classification performance of different methods for the Pavia University dataset.
Table 3. The classification performance of different methods for the Pavia University dataset.
No.GSCNet [14]Hyper-Sigma [12]Lite-HCNnet [11]MSDAN [9]Simpool-Former [30]Spetral-Former [13]SSFTT [31]Mamba-HSI [20]HG-Mamba [21]Ours
197.65 ± 0.6597.08 ± 1.0898.26 ± 0.7795.01 ± 1.5691.41 ± 2.5394.25 ± 1.1796.00 ± 1.2596.37 ± 1.0894.71 ± 0.8199.07 ± 0.44
299.73 ± 0.2499.39 ± 0.3299.79 ± 0.1399.38 ± 0.3198.66 ± 0.4699.27 ± 0.2598.88 ± 0.6899.43 ± 0.7399.79 ± 0.5499.85 ± 0.09
390.11 ± 9.8075.44 ± 5.6388.79 ± 1.9987.07 ± 2.2577.26 ± 5.6179.00 ± 3.3681.75 ± 3.5685.23 ± 2.0976.80 ± 2.0383.92 ± 1.73
495.26 ± 1.6489.49 ± 2.1993.12 ± 1.6490.85 ± 2.0382.02 ± 3.6287.02 ± 2.9093.12 ± 2.2591.30 ± 1.6786.88 ± 1.9993.72 ± 0.96
597.78 ± 2.1699.70 ± 0.2399.35 ± 0.2798.32 ± 1.2999.25 ± 1.0299.43 ± 0.7299.98 ± 0.0497.90 ± 0.3399.17 ± 0.1799.85 ± 0.26
691.70 ± 6.5293.08 ± 4.7898.58 ± 0.8198.15 ± 1.7587.99 ± 1.1997.25 ± 0.5192.90 ± 2.8394.98 ± 1.5494.32 ± 1.0699.76 ± 0.17
784.59 ± 3.1975.28 ± 13.3897.71 ± 2.0098.40 ± 0.8288.75 ± 3.6883.75 ± 4.6689.84 ± 4.2396.13 ± 2.6988.08 ± 3.1895.78 ± 1.50
895.24 ± 3.7577.88 ± 6.1791.88 ± 1.5689.75 ± 4.6178.17 ± 6.8886.61 ± 4.1788.55 ± 3.8486.36 ± 2.5690.04 ± 1.8496.32 ± 1.61
997.78 ± 1.2093.32 ± 4.1599.89 ± 0.1189.01 ± 5.2289.37 ± 7.8697.29 ± 1.6499.34 ± 0.8395.84 ± 3.9697.23 ± 4.1192.27 ± 4.23
OA(%)96.7 ± 0.7993.66 ± 0.7897.64 ± 0.2996.22 ± 0.6191.78 ± 0.4794.77 ± 0.3095.35 ± 0.5595.8 ± 0.2095.03 ± 0.1797.90 ± 0.11
AA(%)94.42 ± 1.3088.85 ± 1.5196.38 ± 0.4593.9 ± 1.2988.1 ± 0.5791.54 ± 0.6093.37 ± 0.5293.73 ± 0.2291.89 ± 0.2095.62 ± 0.41
Kappa95.61 ± 1.0791.54 ± 1.0496.86 ± 0.3894.99 ± 0.8289.01 ± 0.6293.05 ± 0.4093.82 ± 0.7394.41 ± 0.4693.36 ± 0.6297.22 ± 0.15
Table 4. The classification performance of different methods for the Houston dataset.
Table 4. The classification performance of different methods for the Houston dataset.
No.GSCNet [14]Hyper-Sigma [12]Lite-HCNnet [11]MSDAN [9]Simpool-Former [30]Spetral-Former [13]SSFTT [31]Mamba-HSI [20]HG-Mamba [21]Ours
189.56 ± 6.5095.32 ± 1.9191.65 ± 2.3396.94 ± 0.7595.77 ± 1.4997.84 ± 0.3798.10 ± 1.2887.54 ± 1.8495.29 ± 1.7998.62 ± 1.01
294.98 ± 4.7396.47 ± 1.3098.55 ± 0.3498.66 ± 0.8098.54 ± 1.1898.20 ± 0.3399.13 ± 0.1599.92 ± 0.3199.24 ± 0.2999.06 ± 0.20
399.76 ± 0.4699.79 ± 0.1399.86 ± 0.3299.94 ± 0.08100.0 ± 0.00100.0 ± 0.0099.94 ± 0.1399.70 ± 0.6199.70 ± 0.37100.00 ± 0.00
490.59 ± 7.3797.59 ± 1.3698.83 ± 1.1098.65 ± 0.6298.21 ± 1.8699.85 ± 0.2099.81 ± 0.1799.32 ± 1.0397.12 ± 0.7698.97 ± 1.36
599.36 ± 0.799.87 ± 0.2198.65 ± 0.9499.97 ± 0.0899.85 ± 0.1199.98 ± 0.0499.83 ± 0.22100.00 ± 0.00100.00 ± 0.00100.0 ± 0.00
688.48 ± 9.4389.53 ± 5.0484.66 ± 3.7099.35 ± 0.9194.04 ± 5.1290.74 ± 0.8896.31 ± 3.0998.06 ± 2.19100.00 ± 0.0097.80 ± 2.72
791.55 ± 1.7492.42 ± 1.8091.75 ± 1.6095.42 ± 1.6191.60 ± 3.7392.51 ± 3.1794.84 ± 1.9493.03 ± 1.0993.28 ± 1.4193.14 ± 0.71
884.35 ± 2.7886.5 ± 1.7673.69 ± 5.5096.19 ± 0.9892.78 ± 2.1794.35 ± 1.2193.00 ± 1.1997.97 ± 1.2295.94 ± 1.0595.84 ± 0.81
987.67 ± 4.2979.74 ± 1.7581.07 ± 5.6694.84 ± 1.7290.60 ± 4.0096.81 ± 1.8895.11 ± 0.6996.38 ± 2.4396.97 ± 2.9097.34 ± 0.50
1092.13 ± 4.7991.1 ± 1.3894.01 ± 1.5497.22 ± 1.0498.08 ± 0.9499.33 ± 0.5295.75 ± 1.0999.40 ± 1.1795.54 ± 1.7599.21 ± 0.76
1188.53 ± 2.5397.67 ± 1.2289.30 ± 4.2598.88 ± 0.9294.10 ± 6.3898.13 ± 0.8797.87 ± 1.1791.30 ± 3.0288.32 ± 2.5498.71 ± 0.70
1289.5 ± 4.0794.44 ± 3.3483.75 ± 5.7196.09 ± 0.8295.30 ± 2.9498.23 ± 0.7298.36 ± 0.9097.10 ± 0.0997.27 ± 0.1998.91 ± 0.48
1381.03 ± 10.1187.00 ± 4.7386.29 ± 5.8794.8 ± 4.3152.42 ± 13.6473.32 ± 4.3593.41 ± 2.9295.74 ± 2.7791.26 ± 3.0489.28 ± 2.32
1497.79 ± 2.78100.0 ± 0.0098.54 ± 1.0799.75 ± 0.1799.80 ± 0.4499.51 ± 0.4699.31 ± 0.27100.00 ± 0.00100.00 ± 0.0099.70 ± 0.20
1598.95 ± 1.1599.97 ± 0.0799.54 ± 0.47100.0 ± 0.0099.23 ± 0.6199.84 ± 0.2899.42 ± 0.63100.00 ± 0.00100.00 ± 0.0099.81 ± 0.26
OA(%)91.44 ± 0.2493.64 ± 0.2290.99 ± 0.6797.56 ± 0.2294.59 ± 0.5196.88 ± 0.4997.33 ± 0.3996.65 ± 0.2796.32 ± 0.5397.92 ± 0.17
AA(%)91.62 ± 0.7293.83 ± 0.2491.34 ± 0.7497.78 ± 0.3193.35 ± 0.7995.91 ± 0.4997.35 ± 0.5697.03 ± 0.2196.66 ± 0.4997.76 ± 0.30
Kappa90.74 ± 0.2693.13 ± 0.2490.26 ± 0.7297.36 ± 0.2494.15 ± 0.5596.63 ± 0.5397.11 ± 0.4296.38 ± 0.5096.02 ± 0.3797.75 ± 0.19
Table 5. The classification performance of different methods for the Indian Pines dataset.
Table 5. The classification performance of different methods for the Indian Pines dataset.
No.GSCNet [14]Hyper-Sigma [12]Lite-HCNnet [11]MSDAN [9]Simpool-Former [30]Spetral-Former [13]SSFTT [31]Mamba-HSI [20]HG-Mamba [21]Ours
199.46 ± 1.0195.61 ± 4.3699.46 ± 1.2198.38 ± 2.4290.81 ± 10.2298.38 ± 1.4889.52 ± 7.6490.91 ± 2.3797.73 ± 2.4098.92 ± 1.48
294.98 ± 0.7890.46 ± 1.5896.49 ± 1.2797.52 ± 0.7692.81 ± 0.6993.21 ± 1.1997.48 ± 0.3895.65 ± 0.9195.50 ± 0.7997.88 ± 0.87
397.59 ± 2.0895.82 ± 1.8699.49 ± 0.3898.55 ± 0.5795.24 ± 1.2597.74 ± 0.7497.09 ± 0.9198.99 ± 0.2697.97 ± 0.7398.28 ± 0.46
498.32 ± 1.2693.71 ± 1.8798.42 ± 3.5399.68 ± 0.4786.84 ± 3.6391.89 ± 1.6097.16 ± 2.0493.78 ± 2.1595.56 ± 2.09100.00 ± 0.00
596.79 ± 1.8493.93 ± 1.1898.91 ± 0.6497.82 ± 1.0996.79 ± 1.0394.71 ± 2.2095.23 ± 0.74100.00 ± 0.00100.00 ± 0.0099.58 ± 0.51
698.77 ± 0.9999.03 ± 0.3299.35 ± 0.5998.94 ± 0.5299.21 ± 0.5899.83 ± 0.1797.05 ± 0.7898.70 ± 0.3697.26 ± 0.5299.32 ± 0.40
7100.00 ± 0.0080.0 ± 18.5594.54 ± 3.8099.93 ± 0.1581.82 ± 14.3798.18 ± 2.4988.46 ± 6.0892.59 ± 3.1070.37 ± 3.47100.00 ± 0.00
899.27 ± 1.36100.00 ± 0.0099.69 ± 0.4799.69 ± 0.799.63 ± 0.54100.00 ± 0.0099.91 ± 0.20100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00
978.75 ± 19.0645.56 ± 9.1387.5 ± 17.6887.50 ± 4.4286.25 ± 6.8597.5 ± 3.4290.00 ± 7.2473.68 ± 4.9236.84 ± 8.78100.00 ± 0.00
1096.38 ± 1.3491.98 ± 2.4298.15 ± 1.9198.79 ± 1.293.22 ± 1.1195.55 ± 1.1693.89 ± 1.4397.62 ± 0.5898.70 ± 0.3198.79 ± 0.68
1198.61 ± 0.3897.31 ± 0.9299.09 ± 0.2798.83 ± 0.6698.56 ± 0.4898.68 ± 0.1797.77 ± 0.4698.28 ± 0.4097.51 ± 0.2399.85 ± 0.12
1297.31 ± 1.5489.17 ± 1.9497.26 ± 1.3195.92 ± 1.4392.55 ± 3.1295.11 ± 1.4496.01 ± 2.2592.90 ± 2.7990.59 ± 2.4795.96 ± 1.68
1398.78 ± 2.7399.14 ± 0.399.51 ± 0.67100.00 ± 0.099.63 ± 0.33100.0 ± 0.0098.09 ± 1.5797.44 ± 1.0996.92 ± 1.38100.00 ± 0.00
1499.76 ± 0.2099.86 ± 0.1299.92 ± 0.1399.84 ± 0.1199.23 ± 0.3699.21 ± 0.3499.26 ± 0.2799.00 ± 0.0398.50 ± 0.0799.98 ± 0.04
1597.86 ± 0.9697.46 ± 1.2599.16 ± 0.8599.48 ± 0.5996.83 ± 1.9297.35 ± 1.6990.48 ± 2.0793.19 ± 2.8494.01 ± 2.5799.68 ± 0.23
1698.92 ± 1.1379.29 ± 10.4798.11 ± 1.5496.49 ± 3.3997.30 ± 1.9194.86 ± 4.2180.23 ± 2.1064.77 ± 2.9085.23 ± 4.0391.62 ± 3.50
OA(%)97.77 ± 0.1995.3 ± 0.7998.66 ± 0.4498.62 ± 0.2496.30 ± 0.4197.14 ± 0.2796.80 ± 0.2197.17 ± 0.2796.84 ± 0.4499.02 ± 0.11
AA(%)96.97 ± 1.3690.52 ± 1.9897.82 ± 0.9497.92 ± 0.4594.17 ± 1.5997.01 ± 0.3294.23 ± 1.2892.97 ± 0.7490.79 ± 0.9898.74 ± 0.32
Kappa97.46 ± 0.2294.64 ± 0.998.47 ± 0.5198.42 ± 0.2895.78 ± 0.4696.74 ± 0.3196.35 ± 0.2396.76 ± 0.4296.38 ± 0.3798.88 ± 0.13
Table 6. The classification performance of different methods for the WHU-Hi-HanChuan dataset.
Table 6. The classification performance of different methods for the WHU-Hi-HanChuan dataset.
No.GSCNet [14]Hyper-Sigma [12]Lite-HCNnet [11]MSDAN [9]Simpool-Former [30]Spetral-Former [13]SSFTT [31]Mamba-HSI [20]HG-Mamba [21]Ours
197.13 ± 1.5397.62 ± 0.4997.62 ± 0.4897.93 ± 0.2398.85 ± 1.2398.32 ± 0.5297.21 ± 1.0597.64 ± 0.9497.74 ± 1.0399.22 ± 0.26
292.02 ± 2.9195.27 ± 4.4493.56 ± 1.1694.96 ± 1.3195.47 ± 3.9594.95 ± 0.9593.21 ± 1.5590.90 ± 2.8792.14 ± 1.3396.97 ± 0.80
393.91 ± 4.7995.58 ± 2.0294.30 ± 3.3695.78 ± 1.3896.01 ± 3.3097.00 ± 1.0991.96 ± 2.3095.83 ± 1.8996.07 ± 1.0297.91 ± 0.68
497.24 ± 1.3899.33 ± 0.3697.81 ± 0.4398.79 ± 0.6299.48 ± 1.4598.31 ± 0.4998.12 ± 0.7498.84 ± 0.9598.00 ± 1.2299.24 ± 0.63
567.58 ± 14.6666.75 ± 4.7685.74 ± 5.6776.0 ± 14.9385.75 ± 9.5890.28 ± 3.5375.19 ± 8.2191.14 ± 2.3768.21 ± 4.9891.74 ± 3.45
665.97 ± 5.3067.91 ± 2.4963.76 ± 7.0273.56 ± 4.1178.36 ± 10.4368.04 ± 3.9468.24 ± 6.4972.16 ± 2.0170.96 ± 2.0988.33 ± 1.34
781.67 ± 12.0689.16 ± 0.7386.91 ± 7.3687.75 ± 7.6492.48 ± 3.6092.66 ± 3.3588.51 ± 1.3391.69 ± 1.4292.49 ± 1.0896.56 ± 1.11
882.09 ± 4.2590.43 ± 1.5788.67 ± 3.0289.93 ± 1.0291.93 ± 4.5089.74 ± 1.6990.38 ± 2.8089.73 ± 3.9786.00 ± 2.0196.16 ± 1.05
987.47 ± 2.9689.12 ± 1.0988.63 ± 3.5888.86 ± 3.4692.76 ± 3.2690.71 ± 1.6087.00 ± 2.0489.34 ± 3.4690.65 ± 2.7096.51 ± 1.16
1096.74 ± 2.6887.64 ± 2.5497.01 ± 1.7798.12 ± 1.0599.25 ± 1.4298.64 ± 0.5797.7 ± 0.5598.49 ± 0.8198.29 ± 0.4999.15 ± 0.55
1194.36 ± 2.9997.09 ± 0.5596.03 ± 1.5096.17 ± 1.9897.73 ± 1.9095.35 ± 1.5295.79 ± 2.5698.07 ± 0.8895.36 ± 3.0798.92 ± 0.26
1271.34 ± 19.1294.31 ± 2.5476.79 ± 8.4182.59 ± 8.9086.42 ± 6.4489.98 ± 3.7480.29 ± 5.7788.96 ± 3.0280.02 ± 4.9193.19 ± 2.46
1379.78 ± 10.5388.41 ± 2.3075.75 ± 6.6577.44 ± 5.7680.86 ± 6.7678.47 ± 3.5977.84 ± 0.7983.72 ± 2.1079.95 ± 2.3889.74 ± 2.40
1489.42 ± 4.5397.59 ± 0.3090.96 ± 1.5090.80 ± 3.0196.40 ± 2.9894.28 ± 0.8491.39 ± 0.7696.74 ± 0.9893.06 ± 1.2396.65 ± 0.77
1581.91 ± 7.2996.07 ± 1.4072.60 ± 10.8184.06 ± 9.0071.66 ± 22.8468.12 ± 5.9482.48 ± 4.2184.06 ± 3.0280.94 ± 3.1095.54 ± 1.01
1699.49 ± 0.5399.59 ± 0.1299.49 ± 0.1599.53 ± 0.1099.63 ± 0.0899.72 ± 0.1399.37 ± 0.2299.48 ± 0.0399.52 ± 0.0499.88 ± 0.06
OA(%)93.00 ± 0.6497.78 ± 0.1094.04 ± 0.2494.78 ± 0.1596.53 ± 0.6495.37 ± 0.1294.15 ± 0.1795.39 ± 0.1694.47 ± 0.1997.92 ± 0.27
AA(%)86.14 ± 2.2095.66 ± 0.4787.85 ± 0.8389.52 ± 0.7591.50 ± 0.7190.29 ± 0.5588.42 ± 0.8691.67 ± 0.6688.71 ± 0.6595.98 ± 0.49
Kappa91.80 ± 0.7597.40 ± 0.1293.02 ± 0.2893.89 ± 0.1895.76 ± 0.7594.58 ± 0.1493.15 ± 0.2194.60 ± 0.4193.52 ± 0.9497.56 ± 0.32
Table 7. The classification performance of different methods for the WHU-Hi-LongKou dataset.
Table 7. The classification performance of different methods for the WHU-Hi-LongKou dataset.
No.GSCNet [14]Hyper-Sigma [12]Lite-HCNnet [11]MSDAN [9]Simpool-Former [30]Spetral-Former [13]SSFTT [31]Mamba-HSI [20]HG-Mamba [21]Ours
199.86 ± 0.0899.84 ± 0.0499.97 ± 0.0298.64 ± 0.5799.66 ± 0.2699.67 ± 0.1399.76 ± 0.0999.74 ± 0.1099.42 ± 0.0899.90 ± 0.11
299.08 ± 0.8393.83 ± 4.4496.68 ± 0.7795.46 ± 1.8783.04 ± 5.5291.31 ± 4.9697.90 ± 1.1193.29 ± 1.4391.67 ± 2.3799.63 ± 0.12
395.86 ± 2.3584.55 ± 7.0095.66 ± 1.5293.33 ± 5.0691.51 ± 4.9792.58 ± 1.8794.38 ± 2.8791.34 ± 2.0987.27 ± 2.4199.19 ± 0.68
499.39 ± 0.2199.27 ± 0.3699.40 ± 0.1899.23 ± 0.2497.84 ± 0.6498.68 ± 0.3299.20 ± 0.1898.26 ± 0.2198.67 ± 0.1499.63 ± 0.12
588.90 ± 5.2866.47 ± 4.7681.28 ± 7.9663.55 ± 8.0872.61 ± 9.6662.06 ± 8.8170.08 ± 4.8668.96 ± 4.0769.37 ± 4.8996.85 ± 0.85
699.57 ± 0.3999.50 ± 0.3899.33 ± 0.5197.06 ± 2.2898.77 ± 0.8698.03 ± 1.1099.71 ± 0.1999.40 ± 0.0499.06 ± 0.1899.86 ± 0.11
799.75 ± 0.1599.92 ± 0.0599.74 ± 0.2199.49 ± 0.4299.74 ± 0.1899.92 ± 0.0399.60 ± 0.2199.87 ± 0.0699.67 ± 0.1099.92 ± 0.03
887.19 ± 1.9384.98 ± 7.1891.06 ± 2.4683.55 ± 7.0178.64 ± 3.0588.49 ± 2.6891.85 ± 2.3178.17 ± 1.3680.06 ± 2.1597.08 ± 0.70
990.34 ± 2.7587.33 ± 1.6186.60 ± 5.9785.8 ± 3.7077.68 ± 9.0174.46 ± 1.3190.14 ± 1.6385.09 ± 2.3883.17 ± 1.7796.14 ± 0.79
OA(%)98.66 ± 0.0797.69 ± 0.3098.45 ± 0.1897.24 ± 0.2296.43 ± 0.3797.02 ± 0.2498.25 ± 0.1297.17 ± 0.0997.05 ± 0.1799.54 ± 0.03
AA(%)95.55 ± 0.7490.63 ± 1.6794.41 ± 0.9390.68 ± 0.4588.83 ± 1.0989.22 ± 0.7693.62 ± 0.5590.46 ± 1.0189.82 ± 0.8798.69 ± 0.23
Kappa98.24 ± 0.0996.95 ± 0.4097.96 ± 0.2396.36 ± 0.2995.30 ± 0.4896.07 ± 0.3297.70 ± 0.1596.27 ± 0.2296.11 ± 0.2899.40 ± 0.04
Table 8. Complexity and efficiency comparison of different methods for five datasets.
Table 8. Complexity and efficiency comparison of different methods for five datasets.
MethodsGSCNet [14]Hyper-SIGMA [12]MSDAN [9]Spectral-Former [13]SSFTT [31]MambaHSI [20]HGMamba [21]Ours
PaviaUParams (M)0.63182.763.260.050.070.120.110.42
FPS (sample/s)5647.49240.46517.491332.14692.36775.89569.491125.27
Training time (s)0.121.780.830.320.620.550.750.38
HoustonParams (M)1.16182.813.340.050.090.120.120.42
FPS (sample/s)1307.44256.32530.211811.491308.431196.3896.071645.43
Indian PinesTraining time (s)0.792.931.420.410.570.630.840.46
Params (M)2.33182.893.340.060.090.120.120.42
FPS (sample/s)957.87226.41139.381947.281596.331080.45820.631306.04
Training time (s)1.044.527.350.530.640.951.250.78
WHU-Hi-HanChuanParams (M)0.13182.773.340.050.090.120.120.43
FPS (sample/s)207.99176.1287.581093.11619.49560.53429.071163.31
Training time (s)1.241.160.890.240.410.460.60.21
WHU-Hi-LongKouParams (M)0.13182.83.340.050.090.120.120.43
FPS (sample/s)374.33190.7369.66767.57527.05479.59368.13862.37
Training time (s)0.541.350.550.270.490.430.550.37
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mao, J.; Ma, H.; Liang, Y. BiMambaHSI: Bidirectional Spectral–Spatial State Space Model for Hyperspectral Image Classification. Remote Sens. 2025, 17, 3676. https://doi.org/10.3390/rs17223676

AMA Style

Mao J, Ma H, Liang Y. BiMambaHSI: Bidirectional Spectral–Spatial State Space Model for Hyperspectral Image Classification. Remote Sensing. 2025; 17(22):3676. https://doi.org/10.3390/rs17223676

Chicago/Turabian Style

Mao, Jingquan, Hui Ma, and Yanyan Liang. 2025. "BiMambaHSI: Bidirectional Spectral–Spatial State Space Model for Hyperspectral Image Classification" Remote Sensing 17, no. 22: 3676. https://doi.org/10.3390/rs17223676

APA Style

Mao, J., Ma, H., & Liang, Y. (2025). BiMambaHSI: Bidirectional Spectral–Spatial State Space Model for Hyperspectral Image Classification. Remote Sensing, 17(22), 3676. https://doi.org/10.3390/rs17223676

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop