CF-Mamba: A Dual-Path Collaborative Method for Hyperspectral Image Classification

Wang, Yapeng; Cao, Guo; Shi, Boshan; Zhang, Youqiang

doi:10.3390/rs18071063

Open AccessArticle

CF-Mamba: A Dual-Path Collaborative Method for Hyperspectral Image Classification

¹

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

²

Jiangsu Key Laboratory of Broadband Wireless Communication and Internet of Things, School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(7), 1063; https://doi.org/10.3390/rs18071063

Submission received: 1 March 2026 / Revised: 28 March 2026 / Accepted: 31 March 2026 / Published: 2 April 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A continuous–discrete collaborative framework, CF-Mamba, is proposed to handle hyperspectral image classification tasks.
Experiments on four benchmark datasets show that the proposed method achieves competitive accuracy while maintaining a lower computational burden than typical Transformer-based models.

What are the implications of the main findings?

The combination of multi-view adaptive routing and interval sampling strategies helps balance spatial–spectral feature extraction with redundancy reduction.
The designed Confluence Gating Unit (CGU) demonstrates that using continuous context and discrete details to cross-modulate each other can effectively alleviate representation discrepancies during fusion.

Abstract

Hyperspectral image (HSI) classification is a core task in remote sensing data interpretation. Although recently introduced state space models (SSMs), such as Mamba, have demonstrated promising performance in hyperspectral analysis due to their linear computational complexity and strong long-sequence modeling capability, existing single-stream scanning mechanisms struggle to effectively balance the intrinsic spectral continuity dependency and the high-dimensional redundancy inherent in HSI data. Moreover, they often suffer from representation discrepancies when fusing features from heterogeneous representation spaces. To address these challenges, we propose a continuous–discrete collaborative framework, termed Confluence Mamba (CF-Mamba). Specifically, the continuous modeling path (AHSE) introduces a multi-view adaptive routing mechanism to accurately capture anisotropic spectral–spatial continuous evolution patterns. Simultaneously, the discrete interaction path (IISE) employs interval sampling and channel shuffling strategies to efficiently decouple high-dimensional redundancy while maintaining fine-grained feature interactions. Furthermore, the confluence gating unit (CGU) leverages a bidirectional cross-modulation mechanism to constrain discrete feature distributions using continuous contextual information, effectively alleviating representation discrepancies during multi-scale feature fusion. Extensive experiments conducted on four benchmark datasets, namely, Indian Pines, Pavia University, Houston, and WHU-Hi-Longkou, demonstrate that CF-Mamba achieves overall accuracies of 97.77%, 99.68%, 99.06%, and 99.59%, respectively. The proposed method consistently outperforms existing CNN-, Transformer-, and Mamba-based approaches in terms of both classification performance and computational efficiency.

Keywords:

hyperspectral image classification; state space models; Mamba; dual-path architecture; feature fusion

1. Introduction

Through the acquisition of ground object images using tens to hundreds of contiguous narrow spectral bands, hyperspectral remote sensing technology provides nearly continuous spectral information for each pixel. This characteristic endows hyperspectral imagery with significant potential for fine-grained land-cover classification [1]. Early studies primarily focused on developing efficient classification algorithms, ranging from classical machine learning methods such as support vector machines (SVM) [2] to a variety of advanced statistical learning models [3], which collectively promoted the continuous advancement of hyperspectral classification techniques. In recent years, with the rapid development of deep learning, researchers have begun to construct more powerful network architectures, such as deep perception networks capable of jointly capturing spatial context and spectral detail features, in order to learn more discriminative feature representations [4]. More recent research efforts have shifted toward state space models (SSMs) and their variants, notably the Mamba architecture [5]. Owing to their linear computational complexity and strong sequence modeling capability, these models offer a promising new paradigm for handling high-dimensional and highly redundant hyperspectral data. Consequently, the field of hyperspectral image classification has evolved from the development of efficient classifiers to the design of advanced deep neural networks. More recently, it has further moved toward the exploration of novel and efficient sequence modeling architectures, playing a vital role in applications such as agriculture, environmental monitoring, and geological exploration.

In the early stages of deep learning-based hyperspectral image classification, Convolutional Neural Networks (CNNs) emerged as the mainstream paradigm owing to their formidable local feature extraction capabilities [6,7,8,9,10,11]. To address high-dimensional redundancy and spatial–spectral fusion, various CNN-based architectures have been developed [12,13,14,15]. These methods typically integrate dimensionality reduction techniques (e.g., Gabor filters or PCA) with 2D/3D convolutional kernels to capture joint features. However, they often suffer from high computational complexity, sensitivity to preprocessing parameters, and limited generalization due to their reliance on manual designs and local receptive fields. To further capture complex spatial–spectral dependencies, researchers have progressively integrated multiscale feature representations and hybrid attention mechanisms into deep learning frameworks, significantly improving the discriminative power of the models [16,17]. Collectively, while various CNN architectures have demonstrated superior spatial–spectral feature extraction capabilities, they generally suffer from limitations such as reliance on local receptive fields, propensity for overfitting on limited training samples, and difficulty in modeling long-range dependencies. Consequently, to transcend the intrinsic limitations of convolution operators and capture profound long-range contextual dependencies, research priorities have progressively shifted toward the Transformer architecture, renowned for its exceptional sequence modeling capabilities.

Following the successful application of Transformers in natural language processing and computer vision, models based on self-attention mechanisms have been increasingly integrated into hyperspectral image classification task [18,19,20,21,22,23,24,25]. Unlike CNNs that rely on local convolutional kernels, Transformers leverage self-attention mechanisms to effectively model long-range dependencies between pixels and spectral bands across input sequences, thereby overcoming the inherent limitations of convolutional architectures in capturing long-range contextual information and cross-band correlations. Specifically, Peng et al. [26] utilized cross-attention for spatial–spectral fusion, while Zhao et al. [27] designed a lightweight ViT using group-separable convolutions to reduce overhead. To handle limited labeled samples, Jia et al. [28] introduced a local–global fusion framework with center-mask pre-training. Although these Transformer-based methods exhibit notable advantages in long-range dependency modeling and spatial–spectral interaction, the quadratic computational complexity O(N²) of self-attention and the lack of local inductive biases make it challenging to efficiently capture spatial–spectral details under limited sample conditions.

To bridge these gaps, the Mamba architecture based on the State Space Model (SSM) has emerged as a promising solution. By virtue of its selective scan mechanism, Mamba achieves linear computational complexity O(N) while maintaining a long-range receptive field, thereby providing a new paradigm for the efficient modeling of hyperspectral images [29,30,31,32,33,34,35,36]. However, directly applying vanilla Mamba to hyperspectral 3D data still encounters significant challenges: existing 3D scanning strategies often employ predetermined and rigid paths (e.g., simple row-by-row or band-by-band scanning), which overlook the spatial–spectral anisotropy of land cover distribution and fail to adaptively adjust contextual routing based on texture or spectral saliency. Furthermore, the redundancy inherent in high-dimensional spectral data remains insufficiently decoupled, leading to excessive computational loads or the creation of “information islands” when capturing fine-grained features. Additionally, continuous long-range features and discrete texture features exist in different representation forms, where simple linear superposition frequently induces representation discrepancies.

Previous studies have demonstrated that double-branch architectures (e.g., typical spatial–spectral dual streams) are highly effective in decoupling feature representations [37]. However, conventional spatial–spectral paradigms often treat hyperspectral data merely as two independent computer vision features to be mechanically concatenated, fundamentally neglecting the intrinsic physical duality of Earth observation data. In physical remote sensing, ground materials concurrently exhibit continuous physical evolution (e.g., smooth spectral absorption trajectories and continuous geometric textures) and high-frequency discrete mutations (e.g., sharp material boundaries and narrow-band distinctiveness). To genuinely reflect this physical reality and overcome the limitations of single-stream SSMs, we propose a paradigm shift: a continuous–discrete collaborative framework based on the state space model, termed Confluence Mamba (CF-Mamba). This framework innovatively incorporates an Adaptive Holographic Spectral Encoder (AHSE) to dynamically route continuous sequence contexts, and an Interactive Interval Spectral Encoder (IISE) to reduce spectral redundancy while maintaining information flow via channel shuffling. Finally, a Confluence Gating Unit (CGU) is employed to implement consistency constraints and detail enhancement for cross-representation features. Consequently, the proposed method significantly enhances hyperspectral image classification performance while ensuring computational efficiency.

The primary contributions of this paper are summarized in the following three aspects:

Proposing a continuous–discrete collaborative CF-Mamba architecture with customized embedding strategies. To tackle the challenges of high-dimensional redundancy and representation discrepancy in HSIs, a parallel CF-Mamba framework is constructed. We specifically design Spectral–Spatial Convolutional Representation (SSCR) and Depthwise Separable Embedding (DWE) for the continuous modeling path and discrete interaction path, respectively. This design achieves effective feature decoupling and redundancy compression at the input stage, laying a solid foundation for subsequent efficient modeling.
Designing AHSE and IISE modules for continuous evolution and discrete decoupling modeling. To overcome the limitations of traditional scanning mechanisms, an Adaptive Holographic Spectral Encoder (AHSE) is proposed, which introduces a content-aware adaptive routing mechanism to dynamically weight multi-view scanning features and preserve the continuous evolution of spectral–spatial information. Simultaneously, an Interactive Interval Spectral Encoder (IISE) is developed, employing discretized interval sampling and channel shuffling strategies to break “information islands” in discrete feature extraction while maintaining linear computational complexity.
Introducing a Confluence Gating Unit (CGU) to resolve cross-representation discrepancies. To address the representational differences between the dual-path features, the CGU is designed. Leveraging a bi-directional cross-modulation strategy, this module utilizes continuous contextual information to constrain the distribution consistency of discrete features, while employing discrete details to sharpen the continuous features, achieving deep alignment and complementary enhancement of cross-scale features.

2. Preliminary

This section briefly reviews the mathematical foundations of State Space Models (SSMs) and the enhanced Mamba architecture, establishing a theoretical basis for sequence modeling and feature evolution within the proposed CF-Mamba framework.

2.1. State Space Model (SSM)

Rooted in modern control theory, State Space Models (SSMs) are designed to map a one-dimensional (1D) input sequence

x (t) \in R

to an output

y (t) \in R

through a latent state variable

h (t) \in R^{N}

. The continuous-time system dynamics are governed by the following linear ordinary differential equations (ODEs) [38,39]:

h^{'} (t) = A h (t) + B x (t)

(1)

y (t) = C h (t)

(2)

where

A \in R^{N \times N}

, denotes the state transition matrix, while

B \in R^{N \times 1}

and

C \in R^{1 \times N}

represent the projection matrices, respectively.

To process discrete sampled data in deep learning, the Zero-Order Hold (ZOH) technique is typically employed, where a timescale parameter

∆

is introduced to discretize the aforementioned continuous-time parameters as follows:

\bar{A} = \exp (∆ A)

(3)

\bar{B} = (∆ A)^{- 1} (\exp (∆ A) - I) \cdot ∆ B

(4)

Consequently, the discretized system can be expressed in a linear recurrence form:

h_{t} = \bar{A} h_{t - 1} + \bar{B} x_{t}

(5)

y_{t} = C h_{t}

(6)

This recurrent form endows the model with the memory capability to capture long-range temporal dependencies. Moreover, since the parameters

\bar{A}, \bar{B}

are time-invariant in traditional SSMs, the aforementioned process can be equivalently transformed into a global convolution operation

y = x * \bar{K}

, facilitating efficient parallel training. This “convolution during training, recurrence during inference” characteristic enables SSMs to combine the parallel efficiency of Transformers with the linear complexity of RNNs.

2.2. Selective State Space Model (S6) and Mamba

Although traditional SSMs are computationally efficient, their time-invariant parameters restrict the model’s capability for adaptive, context-aware perception. To address this, the Mamba architecture introduces a Selective Scanning Mechanism (Selective Scan, S6) [40]. Unlike traditional SSMs, S6 defines key parameters

(B, C, ∆)

as functions of the input rather than fixed constants [40]:

B_{t} = Linear (x_{t})

(7)

C_{t} = Linear (x_{t})

(8)

∆_{t} = Softplus (Parameter + Linear (x_{t}))

(9)

This input-dependent nature enables the model to dynamically adjust the transmission and forgetting of information flow based on the current input, thereby effectively solving complex sequence tasks such as “selective copying” and “induction heads.” Coupled with a hardware-aware parallel scan algorithm, Mamba achieves modeling performance comparable to Transformers while maintaining

O (L)

linear complexity. In this paper, we leverage this property to address the challenge of efficient modeling of 3D spatial–spectral dependencies in hyperspectral images through customized serialization strategies.

3. Proposed Method

3.1. Overall Architecture

Addressing the inherent physical duality of hyperspectral images (HSIs) is a critical challenge in remote sensing. Conventional dual-branch networks typically treat spatial and spectral dimensions as independent modalities for mechanical fusion. In contrast, we propose a continuous–discrete collaborative framework based on the state space model (SSM), termed Confluence Mamba (CF-Mamba), which explicitly deconstructs the physical attributes of HSIs.

Rather than a simple spatial–spectral split, our architecture is driven by the physical reality of ground objects. The specific selection of AHSE, IISE, and CGU is strictly driven by the necessity to deconstruct the intrinsic physical duality of hyperspectral data. The AHSE is designed to track continuous physical evolution (e.g., smooth spectral absorption gradients), whereas the IISE isolates high-frequency discrete mutations to prevent over-smoothing. Because these continuous and discrete representations are physically orthogonal, conventional fusion paradigms (e.g., simple element-wise addition) fail to align their heterogeneous semantic spaces. Therefore, the CGU is uniquely introduced to perform bidirectional cross-modulation, dynamically gating the confluence of continuous envelopes and discrete details to explicitly resolve their cross-representation discrepancies. As illustrated in Figure 1, CF-Mamba primarily consists of three core stages: shallow feature embedding, dual-path deep feature extraction, and feature confluence and classification.

First, to mitigate the “curse of dimensionality” and reduce computational complexity, the raw hyperspectral data

X \in R^{H \times W \times B}

first undergoes Principal Component Analysis (PCA) for dimensionality reduction, followed by partitioning into overlapping 3D patches. These patches are then fed into two parallel branches for complementary processing:

Continuous Modeling Path: This path aims to establish long-range dependencies at the sequence level while preserving the integrity of spectral–spatial information. The input features first pass through the Spectral–Spatial Convolutional Representation (SSCR) module to retain 3D structural information. Subsequently, the features enter the core Adaptive HoloSpectral Encoder (AHSE). Diverging from traditional methods that mechanically sum fixed scanning paths, AHSE introduces an adaptive routing mechanism that dynamically weights features from different 3D scanning directions based on the texture complexity of the input content, thereby reinforcing continuous context perception while suppressing noise.

Discrete Interaction Path: This path focuses on decoupling high-dimensional redundancy and extracting fine-grained discriminative features. After being processed by the Depth-wise separable spectral–spatial Embedding (DWE), the input features enter the Interactive Interval Spectral Encoder (IISE). IISE employs a discretized interval sampling strategy to partition continuous spectral features into non-overlapping subgroups and performs selective scanning (S6) within each group. To break the “information silos” caused by discrete grouping, IISE innovatively introduces a channel shuffle mechanism to facilitate information flow across subgroups while maintaining linear computational complexity.

In the feature fusion stage, addressing the differences in representation forms and spatial response distributions between the dual-path features, we propose the Confluence Gating Unit (CGU). This module aims to achieve cross-representation feature alignment and complementary enhancement through a learnable gating mechanism. CGU utilizes a bi-directional cross-modulation strategy: it first maps the features of one path into gating coefficients via a nonlinear activation function, which are then used for channel-wise weighted filtering of the features from the other path.

This framework mechanism achieves a dual purpose: first, it utilizes the continuous contextual information from AHSE to provide distributional constraints on the discrete features of IISE, filtering out fragmented noise that lacks contextual support; second, it leverages the discrete detailed information from IISE to supplement the continuous features of AHSE, compensating for potential boundary blurring caused by long-range smooth modeling. Through this deep bi-directional interaction, the model generates “Confluence” features that possess both sequence consistency and detail sharpness. Finally, these features are processed by Global Average Pooling (GAP) before being fed into the classifier to output pixel-level land-cover classification results.

3.2. Adaptive HoloSpectral Encoder (AHSE)

The selection and design of the Adaptive HoloSpectral Encoder (AHSE) are strictly driven by the severe physical limitations of applying standard sequence models to 3D hyperspectral data. Hyperspectral images are fundamentally anisotropic 3D physical entities. While the native Mamba architecture excels at 1D sequences, applying it directly to HSIs via arbitrary flattening inherently destroys either spatial or spectral physical continuity. Furthermore, existing multi-path extensions (e.g., 3DSS-Mamba [41]) merely perform mechanical feature summation. This approach falsely assumes isotropic physical continuity, contradicting the physical reality that different land covers exhibit highly anisotropic continuous evolutions—for instance, vegetation relies heavily on spectral waveform continuity, whereas built structures depend primarily on spatial geometric continuity.

To overcome these physical limitations, the AHSE module is introduced as the core of the continuous modeling path. Analogous to the principle of optical holography, which reconstructs 3D information by recording light waves from different angles, AHSE comprehensively captures anisotropic structural dependencies through multi-view orthogonal trajectories without blind spots. As illustrated in Figure 2, it transforms non-causal 3D images into causal continuous sequence streams through three cascaded stages: Multi-View Serialization, Parallel State Space Evolution, and Content-Aware Adaptive Routing. Crucially, rather than mechanically summing features, the final adaptive routing mechanism dynamically gates and weighs these perspectives. It adaptively amplifies the most physically relevant continuous perspective based on the intrinsic complexity of the specific ground object, thereby completely preventing the introduction of structural noise from irrelevant scanning dimensions.

3.2.1. Multi-View Spectral–Spatial Serialization

Prior to multi-view serialization, the Spectral–Spatial Convolutional Representation (SSCR) module is designed to satisfy the requirements of the continuous modeling path (AHSE) for holistic structural features and spectral continuity. This module leverages a 3D convolutional layer (3D Conv) to directly encode the input hyperspectral image patches

P (X)

as follows:

E_{S S C R} = Conv 3 D (P (X); Θ_{s s c r})

(10)

SSCR enables the direct extraction of spatial structures between pixels and continuous correlations between bands, transforming raw data into a highly compressed representation

E_{S S C R}

that preserves complete spectral evolution information. This provides a robust structural input for the subsequent AHSE to capture long-range sequential dependencies.

To model with SSMs without compromising the integrity of the 3D structure, a continuity-preserving sequence mapping must first be established. Given an input feature tensor

X_{i n} \in R^{H \times W \times C}

, we define

K

different scanning trajectories

T_{k} (\cdot)

to flatten the 3D cube into 1D sequences, capturing anisotropic long-range contextual dependencies. In this study, we construct

K = 3

complementary paths:

Spatial-Priority Path (

T_{s p a}

): This path prioritizes traversing spatial pixels (h, w) before switching spectral bands. It is designed to capture the continuity of spatial textures and the geometric structures of object edges.

Spectral-Priority Path (

T_{s p e}

): This path prioritizes traversing the spectral dimension

c

before switching spatial positions. It focuses on extracting the evolution of spectral fingerprints and long-range sequential dependencies across bands.

Joint-Cross Path (

T_{j o i n t}

): Utilizing a 3D spiral traversal strategy, this path aims to establish a deep coupling relationship between the spatial and spectral dimensions.

For the

k t h

path, the generated sequence

S^{(k)}

can be expressed as:

S^{(k)} = Flatten (X_{i n}; T_{k}) \in R^{L \times C}, L = H \times W

(11)

where

S^{(k)}

contains the holistic sequential context from a specific perspective.

3.2.2. Parallel Selective State Space Evolution

Upon obtaining the flattened sequences, AHSE utilizes parallel selective state space models (S6) to model the independent sequence evolution for each path. For the sequence input

x_{t}^{(k)} \in S^{(k)}

of the

k t h

path, discretized state equations are employed to update the latent state

h_{t}^{(k)}

. The reliance on the Selective State Space Model (S6) for continuous modeling, rather than conventional CNNs, RNNs, or Transformers, is strictly dictated by the physical prerequisites of hyperspectral continuous evolution. Traditional CNNs, constrained by local receptive fields, inherently fail to capture holistic long-range spectral trajectories. While Transformers excel at global dependencies, their quadratic complexity

O (L^{2})

typically forces aggressive sequence truncation or patch downsampling, which physically severs the continuity of the HSI data cube. Moreover, traditional RNNs suffer from memory decay over massive sequences and lack parallel efficiency. In contrast, Mamba uniquely satisfies both physical and computational demands: its linear complexity

O (L)

permits the full ingestion of untruncated 3D continuous sequences, preserving the absolute integrity of physical evolution. Furthermore, unlike the static parameters in RNNs, Mamba’s input-dependent selective mechanism dynamically retains critical continuous physical states (e.g., macro-spectral envelopes) while filtering out irrelevant noise, making it mathematically and structurally optimal for our continuous modeling path.

First, discretization parameters are dynamically generated based on the input

x_{t}^{(k)}

:

B_{t}^{(k)} = {L i n e a r}_{B} (x_{t}^{(k)}), C_{t}^{(k)} = {L i n e a r}_{C} (x_{t}^{(k)}), ∆_{t}^{(k)} = Softplus ({Linear}_{∆} (x_{t}^{(k)}))

(12)

{\bar{A}}^{(k)} = \exp (∆_{t}^{(k)} A), {\bar{B}}_{t}^{(k)} = (∆_{t}^{(k)} A)^{- 1} ({\bar{A}}^{(k)} - I) \cdot ∆_{t}^{(k)} B_{t}^{(k)}

(13)

Subsequently, state recurrence and output computation are executed:

h_{t}^{(k)} = {\bar{A}}^{(k)} h_{t - 1}^{(k)} + {\bar{B}}_{t}^{(k)} x_{t}^{(k)}

(14)

y_{t}^{(k)} = C_{t}^{(k)} h_{t}^{(k)} + D x_{t}^{(k)}

(15)

where

D

denotes the skip connection parameter, typically maintained as a static value to facilitate direct gradient propagation. After S6 processing, the output sequences from each path are reshaped back to their original 3D dimensions via an inverse transformation, denoted as

F^{(k)} \in R^{H \times W \times C}

. This process ensures that each branch independently captures continuous long-range dependency features under its specific perspective.

3.2.3. Content-Aware Adaptive Fusion

Conventional feature fusion often relies on simple element-wise summation. This practice assumes that features from all perspectives hold equal importance for the final classification, thereby ignoring the discrepancies in characterizing data continuity across different scanning topologies (e.g., in spectrally smooth regions, forced spatial scanning may introduce redundant high-frequency structural noise). To address this, AHSE introduces a lightweight adaptive fusion module to achieve dynamic weighted aggregation of multi-path features by learning the sequential feature saliency for each perspective. First, the input features

X_{i n}

are compressed into holistic statistical descriptors

z \in R^{C}

using Global Average Pooling (GAP):

z = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{i n} [i, j, :]

(16)

The GAP here is intended to aggregate the holistic distribution information within the current patch, establishing a macro-statistical basis for subsequent routing decisions. Subsequently, an excitation network—comprising two fully connected (FC) layers and a ReLU activation function—is employed to generate a normalized weight vector

α \in R^{K}

for the

K

paths:

α = [α_{1}, \dots, α_{K}] = Softmax (W_{2} \cdot δ (W_{1} \cdot z))

(17)

where

α_{k}

denotes the representational contribution of the

k t h

scanning path to the current sample, and

δ

refers to the ReLU activation function.

Ultimately, the output of AHSE, denoted as

Y_{o u t},

is formed by aggregating the weighted multi-view features and is fused with the input through a residual connection to ensure effective gradient propagation:

Y_{o u t} = {Linear}_{p r o j} (\sum_{k = 1}^{K} α_{k} \cdot F^{(k)}) + X_{i n}

(18)

Through this mechanism, AHSE adaptively reinforces the perspectives that best align with the continuous structures of ground objects based on the inherent distribution characteristics of the input data. Simultaneously, it suppresses perspectives that introduce redundant interference, thereby achieving precise refinement and sequence optimization of high-dimensional spectral–spatial features.

3.3. Interactive Interval Spectral Encoder (IISE)

Although AHSE can effectively capture continuous long-range dependencies at the sequence level, capturing fine-grained discrete spectral features typically entails prohibitive computational costs (e.g., dense 3D convolutions). Direct full-band scanning in high-dimensional spectral space not only imposes a heavy computational burden but also leads to significant spectral redundancy due to the high correlation between adjacent bands. Therefore, how to efficiently decouple spectral redundancy and extract discriminative features without substantially increasing the model complexity remains a critical challenge.

To address this, we propose the Interactive Interval Spectral Encoder (IISE). Instead of naive continuous scanning, IISE adopts an “Isolation-Interaction” paradigm. Specifically, IISE leverages an interval feature decoupling strategy to significantly compress sequence length and eliminate spectral redundancy, while employing a parameter-free shuffle interaction mechanism to break the barriers between discrete groups. This design enables the model to achieve fine-grained modeling of discrete spectral–spatial details at a lightweight computational cost, effectively circumventing the “high precision, high complexity” dilemma inherent in conventional methods.

3.3.1. Interval Feature Decoupling

To address the requirements of the discrete interaction path (IISE) for spectral redundancy decoupling and fine-grained feature extraction, we propose the DWE module. This module adopts a strategy that combines depth-wise separable convolution (DWConv) with linear mapping (Linear), which substantially reduces the parameter count while enhancing the response to discriminative features:

X_{d i s c} = DWConv (X) + Linear (X)

(19)

The generated features

E_{D W E}

are subsequently partitioned into

G

non-overlapping interval groups

G_{i}

. This discretization-based preprocessing effectively circumvents redundant computations in high-dimensional spectral space and seamlessly aligns with the subsequent interval-based discrete group interaction mechanism of IISE.

To break the high redundancy inherent in adjacent bands and alleviate the computational burden of sequence modeling, IISE first introduces a discrete interval grouping strategy as illustrated in Figure 3. Unlike traditional contiguous band clustering, which essentially acts as local smoothing and risks obliterating critical high-frequency physical mutations (e.g., narrow diagnostic absorption valleys of specific minerals), our strategy is physically grounded in sparse discrete sampling. By extracting bands at fixed intervals, it functions as a spectral comb, explicitly breaking the strong physical collinearity among adjacent bands to reduce redundancy, while strictly retaining the representative macro-spectral profile of ground objects.

Given the decoupled feature tensor

X_{d i s c} \in R^{H \times W \times C}

generated by the depth-wise separable embedding, we partition it into

G

non-overlapping low-dimensional feature subspaces. The feature set

X^{(g)}

of the

g t h

subspace (group) consists of channels with indices

{g, g + G, g + 2 G, \dots}

:

X^{(g)} = Sample (X_{d i s c}, g) \in R^{H \times W \times \frac{C}{G}}, g \in \{1, \dots, G\}

(20)

This interval sampling ensures that although each subgroup’s dimension is reduced to

1 / G

of the original, the bands it contains are uniformly distributed across the spectral domain. This enables each discrete subgroup to preserve the complete skeleton of the ground object’s spectral curve (i.e., a Representative Spectral Profile), rather than being restricted to discrete features within a narrow spectral range. Subsequently, each subgroup is independently fed into a unidirectional selective state space model (S6) to capture the intra-group spatial–spectral dependencies within each discrete subspace in parallel. Furthermore, from the perspective of multi-view analysis, each subgroup generated by the interval sampling fundamentally acts as an independent spectral ‘view’ of the ground object. By decoupling the continuous spectrum into multiple interleaved sub-spaces and analyzing the hyperspectral data from these diverse discrete angles, the IISE module effectively captures complementary high-frequency variations, thereby significantly enhancing the overall feature representation capability of the model.

3.3.2. Cross-Group Shuffle Interaction

Feature extraction based on discrete grouping significantly reduces the sequence length and, combined with the linear complexity of S6, substantially diminishes computational overhead. However, merely decoupling the features is insufficient for true discrete modeling. Standard State Space Models (SSMs) are inherently biased toward sequential continuity. If the decoupled subgroups are processed strictly in their spectral order, the SSM will still compulsively attempt to fit a continuous evolutionary sequence. Therefore, IISE innovatively introduces a channel shuffle mechanism. This operation is not merely an “interaction bridge,” but a mathematical prerequisite: it intentionally destroys the physical wavelength sequence constraint. This forces the SSM to abandon continuous modeling and strictly focus on capturing the global non-sequential interactions among independent, discrete high-frequency mutations without introducing additional parameters.

Assuming that the discrete group features processed by parallel S6 are stacked into a tensor

Y_{s t a c k} \in R^{G \times \frac{C}{G} \times L}

(where

L = H W

denotes the number of spatial pixels), the shuffle interaction process is implemented through tensor dimension rearrangement to enable cross-group information flow:

Reshape & Transpose: The channel dimensions are first reorganized by transposing the “group” dimension with the “intra-group channel” dimension. This logically disrupts the original discrete boundaries, causing features from different spectral intervals to be arranged alternately in memory:

Y_{i n t e r} = Transpose (Y_{s t a c k}, \dim_{0}, \dim_{1}) \in R^{\frac{C}{G} \times G \times L}

(21)

2.: Flatten & Fusion: The transposed tensor is flattened back to the original channel dimension $C$ :

Y_{s h u f f l e d} = Flatten (Y_{i n t e r}) \in R^{C \times L}

(22)

This operation essentially performs a uniform “re-weaving” of the decoupled spectral features, ensuring that the subsequent

1 \times 1

linear projection layer can simultaneously receive and integrate feature information from all discrete intervals. Finally, the output of IISE,

X_{o u t}

, is obtained through linear mapping of the interacted features combined with a residual connection:

X_{o u t} = {Linear}_{p r o j} (Y_{s h u f f l e d}) + X_{d i s c}

(23)

Through this efficient workflow of “Decoupling—Independent Modeling—Interaction Re-weaving,” IISE successfully resolves the conflict between redundancy and correlation in high-dimensional spectral data processing. It maximizes the integrity of fine-grained discrete details and cross-interval spectral dependencies while maintaining linear computational complexity.

3.4. Confluence Gating Unit (CGU)

In the CF-Mamba framework, the AHSE path captures continuous sequence-level context, while the IISE path extracts discrete decoupled fine-grained features. Despite their informational complementarity, the distinct feature generation mechanisms—continuous evolution versus discrete sampling—often lead to representation discrepancy in the feature space when direct linear superposition (e.g., element-wise addition) is applied. For instance, the continuous modeling path may blur the high-frequency boundaries of ground objects due to sequence smoothing effects, whereas the discrete interaction path might introduce fragmented noise lacking contextual constraints as a result of severed long-range dependencies.

To bridge this representation gap and achieve a deep organic fusion of dual-path features, we designed the Confluence Gating Unit (CGU). Departing from traditional passive aggregation methods, the CGU adopts a “Bi-directional Cross-Rectification” strategy, which leverages the distribution characteristics of one path as a prior to dynamically calibrate the response of the other path.

3.4.1. Cross-Scale Alignment and Gate Generation

Let

X_{c o n t} \in R^{H \times W \times C}

denote the continuous context features output by AHSE, and

X_{d i s c} \in R^{H \times W \times C}

denote the discrete fine-grained features output by IISE. The core objective of the CGU is to construct two parallel gating branches designed to generate a “consistency mask” and a “detail enhancement mask,” respectively.

Initially, lightweight feature transformation functions, denoted as

F_{c} (\cdot)

for the continuous context and

F_{d} (\cdot)

for the discrete features, are introduced. These functions, typically composed of a

1 \times 1

convolutional layer, batch normalization (BN), and an activation function, endow the gating coefficients with non-linear discriminative power. Subsequently, a Sigmoid function

σ (\cdot)

is employed to map the features into the

(0, 1)

interval, thereby generating the gating maps:

G_{c o n t} = σ (F_{c} (X_{c o n t})), G_{d i s c} = σ (F_{d} (X_{d i s c}))

(24)

Here,

G_{c o n t}

represents the attention map generated from the continuous context, indicating regions with high sequence-level consistency confidence. Conversely,

G_{d i s c}

represents the attention map derived from discrete details, identifying regions that contain significant discriminative texture information.

3.4.2. Bi-Directional Cross-Modulation

Upon obtaining the gating coefficients, the CGU executes a bi-directional cross-modulation operation. Rather than a simple feature mixture, this process utilizes the Hadamard product (

⊙

) to achieve mutual filtering and enhancement between the dual-path features:

continuous-to-discrete Guidance (Contextual Regularization): The Discrete features $X_{d i s c}$ are weighted using the Continuous gate $G_{c o n t}$ :

${\tilde{X}}_{d i s c} = X_{d i s c} ⊙ G_{c o n t}$

(25)

The primary role of this step is to utilize continuous semantics as a “regularizer” to suppress discrete noise points within the discrete features that are inconsistent with the surrounding sequence evolution trends (e.g., filtering out isolated feature fragments resulting from group decoupling).
discrete-to-continuous Feedback (Detail Refinement): The continuous features $X_{c o n t}$ are weighted using the discrete gate $G_{d i s c}$ :

${\tilde{X}}_{c o n t} = X_{c o n t} ⊙ G_{d i s c}$

(26)

This step serves to utilize discrete high-frequency details as an “enhancer” to strengthen the response of continuous features at object boundaries, thereby compensating for the potential smoothing of local responses caused by long-range sequence modeling.

3.4.3. Confluence Output and Classification

Features after bi-directional calibration possess both the distribution consistency of continuous context and the fine-grained distinctiveness of discrete features. The final confluence feature

X_{o u t}

is obtained by fusing the complementarily modulated features from both paths, while maintaining the flow of original information through residual connections, as illustrated in the mechanism of Figure 4:

X_{o u t} = {Linear}_{p r o j} (Concat (X_{c o n t}, X_{d i s c})) + X_{c o n t} + X_{d i s c}

(27)

Finally,

X_{o u t}

is compressed into a feature vector via a global average pooling (GAP) layer and then fed into a fully connected layer to calculate the final probability distribution of land cover categories:

y = Softmax (W_{c l s} \cdot GAP (X_{o u t}) + b_{c l s})

(28)

Through the CGU module, CF-Mamba successfully achieves a transition from mechanical “multi-source feature stacking” to organic “continuous–discrete Representation Synergy”. This significantly enhances the model’s classification robustness when dealing with spectral confusion (which requires continuity constraints) and subtle texture differences (which require discrete feature differentiation).

Figure 4. Bi-directional Cross-Modulation Mechanism of the Confluence Gating Unit (CGU).

4. Experiment

4.1. Datasets

To evaluate the effectiveness of the proposed method, extensive experimental comparisons were conducted on four public hyperspectral image (HSI) databases: Indian Pines, Pavia University, Houston 2013, and WHU-Hi-Longkou. Table 1 provides the detailed partitioning of the training and testing sets.

Indian Pines Dataset: This dataset was acquired in 1992 by the Airborne/Visible Infrared Imaging Spectrometer (AVIRIS) over the Indian Pines test site in northwestern Indiana. The spectrometer covers a wavelength range from 0.4 to 2.5 µm. After removing water absorption channels, the dataset contains 200 spectral bands with a spatial resolution of 20 m per pixel, and the image size is 145 × 145 pixels. The dataset comprises a total of 10,249 ground truth samples across 16 different categories.

Pavia University Dataset: This dataset was collected by the Reflective Optics System Imaging Spectrometer (ROSIS) over the city of Pavia in northern Italy. The imaging wavelength range of the spectrometer is 0.43 to 0.86 µm. After removing 12 noisy bands, the dataset includes 103 spectral bands with a spatial resolution of 1.3 m per pixel and an image size of 610 × 340 pixels. There are 42,776 labeled ground truth pixels, corresponding to 9 different categories.

Houston 2013 Dataset: This dataset was captured by the ITRES CASI-1500 sensor (ITRES Research Limited, Calgary, AB, Canada) over the University of Houston campus and its surrounding areas, provided by the 2013 IEEE Geoscience and Remote Sensing Society (GRSS) Data Fusion Contest. The spectrometer’s imaging wavelength range is 0.38 to 1.05 µm. It contains 144 spectral bands with an image size of 340 × 1905 pixels and a spatial resolution of 2.5 m per pixel. The dataset includes a total of 15,029 sample pixels categorized into 15 challenging classes.

WHU-Hi-Longkou Dataset: This dataset was acquired by the Headwall Nano-Hyperspec sensor (Headwall Photonics, Bolton, MA, USA) in Longkou Town, Hubei Province, China. The image size is 550 × 400 pixels, containing 270 spectral bands with a wavelength range of 0.4 to 1 µm and a spatial resolution of approximately 0.463 m. The area contains 9 types of land cover: Corn, Cotton, Sesame, Broad-leaf soybean, Narrow-leaf soybean, Rice, Water, Roads and houses, and Mixed weed, primarily used for precision agricultural classification research.

4.2. Experimental Setup

(1): To quantitatively evaluate the classification performance, three standard metrics are employed: Overall Accuracy (OA), Average Accuracy (AA), and the Kappa coefficient Kappa. Specifically, OA is computed as the ratio of correctly classified pixels to the total number of test pixels. AA represents the mean of the classification accuracies across all individual classes. AA is explicitly included to provide a fairer evaluation of the model’s performance on minority classes, effectively addressing the severe class imbalance problem typically inherent in hyperspectral data. Finally, the Kappa coefficient is utilized to measure the consistency and agreement between the classification results and the ground truth, effectively penalizing correct predictions occurring by random chance. To ensure fairness and stability in the comparison between different models, each model is trained five times independently with random initializations. Mean value and standard deviation of these five experiments is reported as the final evaluation result.
(2): The proposed CF-Mamba and all comparative models are implemented based on the PyTorch 2.6.0 deep learning framework, with hardware acceleration provided by a single NVIDIA GeForce RTX 4090D GPU. In the data preprocessing stage, Principal Component Analysis (PCA) is first utilized to compress the spectral dimension of the raw HSI data to 35 dimensions. The feature embedding dimension C is set to 64, and the group number G in the IISE module is set to 4. Subsequently, the data is segmented into overlapping patches with a spatial size of 15 × 15 as network inputs. For the training strategy, the Adam optimizer is selected for parameter updates, with an initial learning rate of 0.001 and a batch size of 64. The entire training cycle lasts for 150 epochs, and the model parameters at the end of the 150th iteration are directly used for final testing.
(3): To evaluate the performance of the proposed CF-Mamba, four categories of mainstream algorithms in the current field of hyperspectral classification are selected as comparative baselines. These include SVM [2] as a representative of traditional shallow learning methods; convolutional neural networks based on deep feature extraction, such as 2D-CNN and 3D-CNN [42]; Transformer-based methods utilizing self-attention mechanisms, including HSI-BERT [43], SF [44], CASST [26], and DCTN [45]; CenterMamba [34], S2Mamba [32] and 3DSS-Mamba [41] model, which incorporate some Mamba-based architectures and selective scanning mechanisms.

4.3. Experimental Results

Indian Pines Dataset: As shown in Table 2 and Figure 5, CF-Mamba achieves the best performance with an OA of 97.77%, AA of 96.00%, and Kappa coefficient of 97.46%. Compared to advanced state-space-based methods 3DSS-Mamba, S2-Mamba, and CenterMamba, our method improves OA by 1.82%, 0.51%, and 1.07%, respectively, and improves AA by 4.92%, 1.72%, and 1.27%, respectively. These improvements are particularly significant given the challenges of low spatial resolution and severe mixed pixels in the Indian Pines dataset.
Notably, CF-Mamba demonstrates exceptional discriminative power across multiple agricultural categories with similar spectral features. Specifically, it achieves perfect classification accuracy of 100.00% on Class 4 (Corn), Class 7 (Grass-pasture-mowed), Class 8 (Hay-windrowed), Class 13 (Wheat), and Class 14 (Woods). These results fully validate the effectiveness of capturing continuous spectral evolution trends through the AHSE module, combined with extracting discrete fine-grained features via the IISE module. Although other SSM-based methods also achieve 100% accuracy on certain classes, CF-Mamba attains perfect classification on more classes while maintaining higher overall classification stability and average accuracy, further demonstrating the significant advantages of the continuous–discrete collaborative framework in addressing the challenges of “same object, different spectrum” and “different object, same spectrum.”
Houston 2013 Dataset: The classification results for the Houston 2013 dataset are summarized in Table 3. Despite the “ceiling effect” where most deep learning models exceed 95% accuracy, CF-Mamba further pushes the limit, increasing the OA to 99.06%. Compared to advanced state-space-based methods 3DSS-Mamba, S2-Mamba, and CenterMamba, our method achieves improvements of 2.99%, 0.96%, and 1.47% in OA, respectively. Detailed comparisons reveal that CF-Mamba significantly reduces confusion between different urban land cover categories. For example, in Class 12 (Parking Lot1) and Class 13 (Parking Lot2), which share similar geometric structures, our model achieves near-perfect accuracies of 99.97% and 99.32%, respectively, substantially outperforming the baseline 3DSS-Mamba (98.06% and 96.15%) as well as other SSM-based methods. This robustness in complex urban scenes validates the effectiveness of the Confluence Gating Unit (CGU). By utilizing continuous contextual information to provide consistency regularization for discrete features, the CGU effectively filters out shadow effects and texture fragmentation noise common in high-resolution urban imagery, ensuring that visually similar objects are accurately distinguished based on their intrinsic spatial–spectral consistency.
Pavia University Dataset: As shown in Figure 6 and Table 4, CF-Mamba demonstrated a significant advantage on the ROSIS dataset, achieving an OA of 99.68%, nearly reaching the saturation limit for this benchmark. Compared to advanced state-space-based methods 3DSS-Mamba, S2-Mamba, and CenterMamba, our method achieves improvements of 0.79%, 0.83%, and 2.38% in OA, respectively. The advantages were most pronounced in categories with narrow linear structures and fine-grained textures. For Class 2 (Meadows) and Class 8 (Self-blocking bricks), CF-Mamba achieved perfect or near-perfect accuracies of 100.00% and 99.85%, respectively, substantially outperforming Transformer-based models (e.g., SF, DCTN), which often suffer from boundary blurring due to resolution loss during patch embedding. Furthermore, our model also achieved 100.00% accuracy on Class 6 (Bare soil) and Class 7 (Bitumen), further demonstrating its robustness in handling diverse material types. In contrast, CF-Mamba preserved fine-grained discrete features through the IISE interaction mechanism and enhanced the sharpness of continuous features at object edges via the CGU “detail feedback” strategy, enabling pixel-level precision for small targets and linear objects.
WHU-Hi-Longkou Dataset: The results for the Longkou dataset are presented in Figure 7 and Table 5. Due to its extremely high spatial resolution (0.463 m) and rich texture features, most deep learning models exceed 98% OA on this dataset. CF-Mamba achieved the highest OA of 99.59%, though the performance gain was more gradual. Compared to advanced state-space-based methods 3DSS-Mamba, S2-Mamba, and CenterMamba, our method achieves improvements of 0.04%, 0.20%, and 0.86% in OA, respectively, while its AA (98.54%) is slightly lower than 3DSS-Mamba’s 98.90% but higher than S2-Mamba (97.69%) and CenterMamba (95.29%). This can be attributed to the highly concentrated distribution and strong anisotropic strip features of certain crops (e.g., Class 3 Sesame); while the AHSE module captured these structures, the complexity introduced by the dual-path mechanism might have led to slight overfitting on some small-sample fragmented objects. Nevertheless, CF-Mamba maintained near-perfect recognition rates for major classes like Class 1 (Corn) and Class 4 (Broad-leaf soybean) with accuracies of 99.97% and 99.81%, respectively, proving its basic robustness in high-resolution agricultural remote sensing scenarios.

4.4. Feature Visualization

To further qualitatively evaluate the feature representation capability of the proposed CF-Mamba, we utilize the t-distributed stochastic neighbor embedding (t-SNE) algorithm to visualize the high-dimensional features extracted by the network. Specifically, the output features from the final global average pooling layer are projected into a two-dimensional space. The visualization results from the IP dataset are presented in Figure 8.

As illustrated in Figure 8, the feature embeddings generated by CF-Mamba exhibit excellent discriminative properties. Data points belonging to the same land-cover categories are tightly clustered together, demonstrating high intra-class compactness. Meanwhile, different categories are well separated with distinct boundaries, indicating strong inter-class separability. This visualization intuitively verifies that the proposed dual-path collaborative framework—incorporating the continuous modeling path (AHSE) and the discrete interaction path (IISE)—can effectively decouple spectral–spatial redundancy and extract highly discriminative representations, thereby facilitating accurate hyperspectral image classification.

4.5. Parameter Analysis

To evaluate the robustness of the proposed CF-Mamba framework and validate the empirical settings used in our experiments, we conduct a detailed parameter sensitivity analysis. We specifically investigate the impact of two critical hyperparameters: the spatial patch size and the number of principal component analysis (PCA) dimensions. The experiments are conducted across all four datasets, and the results are illustrated in Figure 9.

Effect of Spatial Patch Size: The spatial patch size determines the receptive field for capturing local spatial context. As shown in Figure 9a, we vary the patch size from 9 to 21. For all four datasets, the Overall Accuracy (OA) initially increases as the patch size grows, peaking at a size of 15 (e.g., reaching 97.77% on Indian Pines and 99.68% on Pavia University). This upward trend indicates that a larger spatial neighborhood provides richer structural and contextual information, which is beneficial for the continuous modeling path (AHSE). However, when the patch size exceeds 15, the performance begins to degrade slightly. This drop is attributed to the inclusion of heterogeneous pixels from different classes (i.e., the smoothing effect) and the introduction of redundant spatial noise, which interferes with the classification of the central pixel. Therefore, a patch size of 13 to 15 provides the optimal balance.
Effect of PCA Dimensions: Hyperspectral images possess high spectral dimensionality with significant band correlation. The PCA dimensions dictate the amount of retained spectral information fed into the network. Figure 9b illustrates the model’s performance when the retained PCA dimensions range from 20 to 50. The OA curves demonstrate an inverted U-shape, achieving optimal performance at 35 dimensions across all benchmark datasets (e.g., 99.06% on Houston 2013 and 99.23% on WHU-Hi-Longkou). When the dimension is set too low (e.g., 20 or 25), critical discriminative spectral signatures are lost, leading to sub-optimal accuracy. Conversely, retaining too many dimensions (e.g., 45 or 50) not only increases the computational burden but also preserves redundant spectral noise, which hinders the interval decoupling process in the IISE module. Hence, setting the PCA dimension to 35 ensures sufficient information retention while effectively mitigating the curse of dimensionality.

Figure 9. Effect of key hyperparameters (spatial patch size and PCA dimensions) on the classification performance of CF-Mamba.

4.6. Ablation Studies

To comprehensively validate the contributions of the proposed components and address the physical rationality behind our architectural design, we conducted extensive ablation studies on the Indian Pines dataset. The variants and their corresponding performances are detailed in Table 6.

4.6.1. Internal Mechanisms of the Single-Path Encoders (AHSE & IISE)

We first investigated the internal sub-components of the continuous modeling path (AHSE) and the discrete interaction path (IISE) to verify their structural and physical necessity.

Effectiveness of Adaptive Routing in AHSE (ID 1 & ID 2): Replacing the content-aware adaptive routing with fixed average weights (ID 2) resulted in a 3.44% drop in OA. From a physical perspective, different ground objects exhibit varying spatial–spectral structural dependencies (e.g., roads possess strong spatial directionality, while vegetation heavily relies on continuous spectral waveforms). The adaptive routing ensures the model dynamically assigns higher weights to the most relevant scanning perspective, rather than mechanically averaging them, thereby better capturing the anisotropic continuous evolution.
Interval Sampling & Contiguous Clustering in IISE (ID 3 & ID 5): To validate the physical basis of our interval sampling strategy, we replaced it with traditional contiguous band clustering (ID 5). This change caused a drastic performance degradation, with OA plummeting by 3.33%. In hyperspectral physics, adjacent spectral bands are highly correlated and redundant. Contiguous clustering easily traps the model in localized, homogeneous information silos, blurring subtle spectral differences. Conversely, our sparse interval sampling strategy successfully preserves the holistic structural skeleton (representative spectral profile) of the ground objects while stripping away adjacent redundancy.
Necessity of Channel Shuffle (ID 3 & ID 4): Removing the cross-group shuffle mechanism led to a decrease in accuracy. This confirms that the parameter-free shuffle operation successfully acts as a bridge, breaking the “information silos” caused by hard discrete decoupling and ensuring global spectral interaction.

4.6.2. Synergy of Dual-Path and Superiority of CGU Fusion

We further evaluated the necessity of the dual-path architecture and compared our Confluence Gating Unit (CGU) against mainstream feature fusion paradigms.

Dual-Path & Single-Path: Compared to the single-stream models (ID 1 and ID 3), all dual-path variants (IDs 6–10) achieved significant performance gains. This proves that continuous sequential context and discrete fine-grained details are highly complementary in characterizing complex hyperspectral scenes.
Superiority of CGU over Alternative Fusions (IDs 6–10): We extensively compared CGU with Element-wise Sum, Concatenation, Dual-Attention (cross-attention mechanism), and Dual-GLU (Gated Linear Unit). While advanced mechanisms like GLU and Attention outperformed naive addition, the proposed CGU achieved the highest performance (OA 97.77%). The inherent structural superiority of CGU lies in its “Bi-directional Cross-Modulation” design. Unlike conventional attention or GLU that simply re-weighs aggregated features, CGU explicitly utilizes continuous context as a physical regularizer to denoise discrete fragmented features, while utilizing discrete textures to sharpen continuous boundaries. This bidirectional constraint fundamentally resolves the representation discrepancy between heterogeneous feature spaces.

4.7. Comprehensive Analysis of Computational Efficiency

To verify the practical feasibility of CF-Mamba, we conducted a comprehensive efficiency evaluation on the Indian Pines, Pavia University, Houston 2013, and Wuhan Longkou datasets. Table 7 details the inference time, Floating Point Operations (FLOPs), and parameter counts (Params) of eight representative methods on the full test sets.

As shown in Table 7, Transformer-based architectures (such as HSI-BERT and SF) typically incur high computational costs due to the quadratic computational complexity of the global self-attention mechanism. For instance, on the Indian Pines dataset, HSI-BERT requires 11.32 s for inference and consumes 7.112 G FLOPs. In sharp contrast, benefiting from the linear complexity of the state space architecture, our CF-Mamba completes inference in just 1.68 s with only 0.115 G FLOPs. Furthermore, compared to other SSM-based methods (such as CenterMamba and S2-Mamba), CF-Mamba maintains a reasonable balance in inference time and computational cost while demonstrating highly competitive parameter efficiency. This significant reduction in computational overhead confirms the efficiency advantages of our framework when processing long spectral sequences.

Compared to the lightest SSM variants such as 3DSS-Mamba, our proposed CF-Mamba does exhibit an increase in parameters and FLOPs (e.g., 0.106 M parameters on the Houston 2013 dataset versus 0.011 M for 3DSS-Mamba). We acknowledge that CF-Mamba is not strictly the most ‘lightweight’ model among SSM-based variants. However, this is an intended and acceptable trade-off. The additional overhead introduced by the dual-path architecture is necessary to capture richer continuous–discrete complementary features, which significantly enhances the model’s robustness and discriminative power in complex scenarios.

5. Conclusions and Future Developments

To address the challenges in hyperspectral image classification—specifically the difficulty in modeling spectral continuity, inadequate decoupling of high-dimensional redundancy, and conflicts in cross-representation feature fusion—this paper innovatively proposes a continuous–discrete collaborative framework based on the Mamba architecture, named CF-Mamba. Breaking the single-view limitations of traditional sequence modeling, this method achieves a breakthrough in classification performance through three key contributions:

By constructing the Adaptive Holographic Spectral Encoder (AHSE), the model introduces a multi-view dynamic routing mechanism. This successfully resolves the spatial–spectral distribution anisotropy of HSI data, achieving adaptive focusing on key discriminative continuous evolutionary features while maintaining Sequence-level Long-range Dependency.

The Interactive Interval Spectral Encoder (IISE) achieves the extraction of discrete fine-grained features under extremely low computational load through Interval Feature Decoupling and cross-group channel shuffle, effectively overcoming the contradiction between spectral redundancy and feature fragmentation.

The proposed Confluence Gating Unit (CGU) utilizes a bidirectional cross-modulation strategy to achieve deep alignment and complementary enhancement of continuous context and discrete features, mitigating the Representation Discrepancy phenomenon during multi-source feature fusion.

Extensive experimental results demonstrate that CF-Mamba achieves state-of-the-art classification accuracy on the Indian Pines, Houston 2013, Pavia University, and WHU-Hi-Longkou datasets (with OAs of 97.77%, 99.06%, 99.68%, and 99.59%, respectively) and possesses significant computational efficiency advantages compared to Transformer-based methods (such as HSI-BERT).

Despite the superior performance demonstrated by CF-Mamba, future research could further explore the following directions:

Multimodal Extension: Extending the dual-path architecture to multi-source (e.g., LiDAR, SAR) or multi-temporal remote sensing data fusion to explore complementary mechanisms among different data modalities within a continuous evolutionary space.
Lightweight Deployment: Targeting edge computing scenarios such as satellite-borne or UAV platforms, future work could combine model quantization and pruning techniques to further exploit the compression potential of the discrete decoupling path, thereby optimizing the memory footprint and inference speed of the Mamba architecture.
Physical Interpretability Enhancement: Combining frequency domain analysis or attention visualization techniques to deeply investigate the mapping relationship between the evolution of Mamba’s internal hidden states and the physical properties of ground objects (such as spectral absorption peaks), thereby enhancing the trustworthiness of the model.

Author Contributions

Conceptualization, Y.W. and G.C.; methodology, formal analysis, investigation, data curation, and writing—original draft preparation, Y.W.; validation, Y.W., B.S. and Y.Z.; resources, G.C. and Y.Z.; writing—review and editing, G.C., B.S. and Y.Z.; project administration, funding acquisition, G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of Jiangsu Province under Grant BK20231456, and in part by the National Natural Science Foundation of China under Grant 62201282.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Peyghambari, S.; Zhang, Y. Hyperspectral remote sensing in lithological mapping, mineral exploration, and environmental geology: An updated review. J. Appl. Remote Sens. 2021, 15, 031501. [Google Scholar] [CrossRef]
Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef]
Camps-Valls, G.; Tuia, D.; Bruzzone, L.; Benediktsson, J.A. Advances in hyperspectral image classification: Earth monitoring with statistical learning methods. IEEE Signal Process. Mag. 2014, 31, 45–54. [Google Scholar] [CrossRef]
Gao, Y.; Li, W.; Wang, J.; Zhang, M.; Tao, R. Relationship learning from multisource images via spatial-spectral perception network. IEEE Trans. Image Process. 2024, 33, 3271–3284. [Google Scholar] [CrossRef] [PubMed]
Ni, L.; Xu, H.; Zhou, X. Mineral identification and mapping by synthesis of hyperspectral VNIR/SWIR and multispectral TIR remotely sensed data with different classifiers. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3155–3163. [Google Scholar] [CrossRef]
Lin, L.; Chen, C.; Xu, T. Spatial-spectral hyperspectral image classification based on information measurement and CNN. EURASIP J. Wirel. Commun. Netw. 2020, 2020, 59. [Google Scholar] [CrossRef]
Lee, H.; Kwon, H. Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans. Image Process. 2017, 26, 4843–4855. [Google Scholar] [CrossRef]
Zhang, M.; Li, W.; Du, Q. Diverse region-based CNN for hyperspectral image classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef]
Yang, X.; Ye, Y.; Li, X.; Lau, R.Y.; Zhang, X.; Huang, X. Hyperspectral image classification with deep learning models. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5408–5423. [Google Scholar] [CrossRef]
Vaddi, R.; Manoharan, P. Hyperspectral image classification using CNN with spectral and spatial features integration. Infrared Phys. Technol. 2020, 107, 103296. [Google Scholar] [CrossRef]
Medus, L.D.; Saban, M.; Francés-Víllora, J.V.; Bataller-Mompeán, M.; Rosado-Muñoz, A. Hyperspectral image classification using CNN: Application to industrial food packaging. Food Control 2021, 125, 107962. [Google Scholar] [CrossRef]
Bhatti, U.A.; Yu, Z.; Chanussot, J.; Zeeshan, Z.; Yuan, L.; Luo, W.; Nawaz, S.A.; Bhatti, M.A.; Ain, Q.U.; Mehmood, A. Local similarity-based spatial–spectral fusion hyperspectral image classification with deep CNN and Gabor filtering. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5514215. [Google Scholar] [CrossRef]
Fırat, H.; Asker, M.E.; Bayindir, M.İ.; Hanbay, D. Spatial-spectral classification of hyperspectral remote sensing images using 3D CNN based LeNet-5 architecture. Infrared Phys. Technol. 2022, 127, 104470. [Google Scholar] [CrossRef]
Yu, C.; Han, R.; Song, M.; Liu, C.; Chang, C.-I. A simplified 2D-3D CNN architecture for hyperspectral image classification based on spatial–spectral fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2485–2501. [Google Scholar] [CrossRef]
Ma, X.; Wang, W.; Li, W.; Wang, J.; Ren, G.; Ren, P.; Liu, B. An ultralightweight hybrid CNN based on redundancy removal for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5506212. [Google Scholar] [CrossRef]
Shi, H.; Cao, G.; Zhang, Y.; Ge, Z.; Liu, Y.; Fu, P. H2A2Net: A hybrid convolution and hybrid resolution network with double attention for hyperspectral image classification. Remote Sens. 2022, 14, 4235. [Google Scholar] [CrossRef]
Ge, Z.; Cao, G.; Shi, H.; Zhang, Y.; Li, X.; Fu, P. Compound multiscale weak dense network with hybrid attention for hyperspectral image classification. Remote Sens. 2021, 13, 3305. [Google Scholar] [CrossRef]
He, X.; Chen, Y.; Lin, Z. Spatial-spectral transformer for hyperspectral image classification. Remote Sens. 2021, 13, 498. [Google Scholar] [CrossRef]
Qing, Y.; Liu, W.; Feng, L.; Gao, W. Improved transformer net for hyperspectral image classification. Remote Sens. 2021, 13, 2216. [Google Scholar] [CrossRef]
Mei, S.; Song, C.; Ma, M.; Xu, F. Hyperspectral image classification using group-aware hierarchical transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5539014. [Google Scholar] [CrossRef]
Zhang, X.; Su, Y.; Gao, L.; Bruzzone, L.; Gu, X.; Tian, Q. A lightweight transformer network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5517617. [Google Scholar] [CrossRef]
Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral–spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5522214. [Google Scholar] [CrossRef]
Zhang, B.; Chen, Y.; Rong, Y.; Xiong, S.; Lu, X. MATNet: A combining multi-attention and transformer network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5506015. [Google Scholar] [CrossRef]
Zhang, J.; Meng, Z.; Zhao, F.; Liu, H.; Chang, Z. Convolution transformer mixer for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6014205. [Google Scholar] [CrossRef]
Arshad, T.; Zhang, J.; Ullah, I. A hybrid convolution transformer for hyperspectral image classification. Eur. J. Remote Sens. 2024, 57, 2330979. [Google Scholar] [CrossRef]
Peng, Y.; Zhang, Y.; Tu, B.; Li, Q.; Li, W. Spatial–spectral transformer with cross-attention for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5537415. [Google Scholar] [CrossRef]
Zhao, Z.; Xu, X.; Li, S.; Plaza, A. Hyperspectral image classification using groupwise separable convolutional vision transformer network. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5511817. [Google Scholar] [CrossRef]
Jia, S.; Wang, Y.; Jiang, S.; He, R. A center-masked transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5510416. [Google Scholar] [CrossRef]
Wang, C.; Huang, J.; Lv, M.; Du, H.; Wu, Y.; Qin, R. A local enhanced mamba network for hyperspectral image classification. Int. J. Appl. Earth Obs. Geoinf. 2024, 133, 104092. [Google Scholar] [CrossRef]
Ahmad, M.; Butt, M.H.F.; Khan, A.M.; Mazzara, M.; Distefano, S.; Usama, M.; Roy, S.K.; Chanussot, J.; Hong, D. Spatial–spectral morphological mamba for hyperspectral image classification. Neurocomputing 2025, 636, 129995. [Google Scholar] [CrossRef]
Sheng, J.; Zhou, J.; Wang, J.; Ye, P.; Fan, J. DualMamba: A lightweight spectral–spatial mamba-convolution network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5501415. [Google Scholar] [CrossRef]
Wang, G.; Zhang, X.; Peng, Z.; Zhang, T.; Jiao, L. S2Mamba: A spatial–spectral state space model for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5511413. [Google Scholar] [CrossRef]
Ahmad, M.; Usama, M.; Mazzara, M.; Distefano, S. WaveMamba: Spatial-spectral wavelet mamba for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2025, 22, 5500505. [Google Scholar] [CrossRef]
Zhang, T.; Xuan, C.; Cheng, F.; Tang, Z.; Gao, X.; Song, Y. CenterMamba: Enhancing semantic representation with center-scan Mamba network for hyperspectral image classification. Expert Syst. Appl. 2025, 287, 127985. [Google Scholar] [CrossRef]
Huang, L.; Chen, Y.; He, X. Spectral-spatial mamba for hyperspectral image classification. Remote Sens. 2024, 16, 2449. [Google Scholar] [CrossRef]
Shi, H.; Cao, G.; Ge, Z.; Zhang, Y.; Fu, P. Double-branch network with pyramidal convolution and iterative attention for hyperspectral image classification. Remote Sens. 2021, 13, 1403. [Google Scholar] [CrossRef]
Zhuang, P.; Zhang, X.; Wang, H.; Zhang, T.; Liu, L.; Li, J. FAHM: Frequency-aware hierarchical mamba for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 6299–6313. [Google Scholar] [CrossRef]
Gu, A.; Dao, T.; Ermon, S.; Rudra, A.; Ré, C. Hippo: Recurrent memory with optimal polynomial projections. Adv. Neural Inf. Process. Syst. 2020, 33, 1474–1487. [Google Scholar]
Gu, A.; Goel, K.; Ré, C. Efficiently modeling long sequences with structured state spaces. In Proceedings of the 10th International Conference on Learning Representations (ICLR), Online, 25–29 April 2022; Available online: https://openreview.net/forum?id=uYLFoz1vlAC (accessed on 15 March 2026).
Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2024, arXiv:2312.00752. [Google Scholar]
He, Y.; Tu, B.; Liu, B.; Li, J.; Plaza, A. 3DSS-mamba: 3D-spectral–spatial mamba for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5534216. [Google Scholar] [CrossRef]
Feng, F.; Wang, S.; Wang, C.; Zhang, J. Learning deep hierarchical spatial–spectral features for hyperspectral image classification based on residual 3D-2D CNN. Sensors 2019, 19, 5276. [Google Scholar] [CrossRef] [PubMed]
He, J.; Zhao, L.; Yang, H.; Zhang, M.; Li, W. HSI-BERT: Hyperspectral image classification using the bidirectional encoder representation from transformers. IEEE Trans. Geosci. Remote Sens. 2019, 58, 165–178. [Google Scholar] [CrossRef]
Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5518615. [Google Scholar] [CrossRef]
Zhou, Y.; Huang, X.; Yang, X.; Peng, J.; Ban, Y. DCTN: Dualbranch convolutional transformer network with efficient interactive self-attention for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5508616. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of the proposed CF-Mamba, a dual-path collaborative framework for HSI classification.

Figure 2. Detailed Structure and Multi-view Scanning Mechanism of the Adaptive HoloSpectral Encoder (AHSE).

Figure 3. Architecture of the Interactive Interval Spectral Encoder (IISE).

Figure 5. Classification maps of different methods on the Indian Pines dataset. (a) Ground Truth; (b) SVM; (c) 2DCNN; (d) 3DCNN; (e) SF; (f) DCTN; (g) CASST; (h) HSI-BERT; (i) CenterMamba; (j) S2Mamba; (k) 3DSS-Mamba; (l) Ours.

Figure 6. Classification maps of different methods on the Pavia University dataset. (a) Ground; Truth; (b) SVM; (c) 2DCNN; (d) 3DCNN; (e) SF; (f) DCTN; (g) CASST; (h) HSI-BERT; (i) CenterMamba; (j) S2Mamba; (k) 3DSS-Mamba; (l) Ours.

Figure 7. Classification maps of different methods on the WHU-HI-LONGKOU dataset. (a) Ground Truth; (b) SVM; (c) 2DCNN; (d) 3DCNN; (e) SF; (f) DCTN; (g) CASST; (h) HSI-BERT; (i) CenterMamba; (j) S2Mamba; (k) 3DSS-Mamba; (l) Ours.

Figure 8. t-SNE visualization of the features extracted by the proposed CF-Mamba on the IP dataset.

Table 1. Category Name, Sample Numbers of Train Set and Test Set of Each Class on Houston 2013, the Indian Pines, Pavia University and WHU-HI-LONGKOU Datasets.

Houston 2013				Indian Pines			Pavia University			WHU-Hi-LongKou
No.	Category	Train	Test	Category	Train	Test	Category	Train	Test	Category	Train	Test
1	Healthy grass	125	1126	Alfalfa	5	41	Asphalt	332	6299	Corn	345	34,166
2	Stressed grass	125	1129	Corn-notill	143	1285	Meadows	932	17,717	Cotton	84	8290
3	Synthetic grass	70	627	Corn-mintill	83	747	Gravel	105	1994	Sesame	30	3001
4	Trees	124	1120	Corn	24	213	Trees	153	2911	Broad-leaf soybean	632	62,580
5	Soil	124	1118	Grass-pasture	48	435	Painted metal sheets	67	1278	Narrow-leaf soybean	42	4109
6	Water	33	292	Grass-trees	73	657	Bare soil	251	4778	Rice	119	11,735
7	Residential	127	1141	Grass-pasture-mowed	3	25	Bitumen	67	1263	Water	671	66,385
8	Commercial	124	1120	Hay-windrowed	48	430	Self-blocking bricks	184	3498	Roads and houses	71	7053
9	Road	125	1127	Oats	2	18	Shadows	47	900	Mixed weed	52	5177
10	Highway	123	1104	Soybean-notill	97	875
11	Railway	123	1112	Soybean-mintill	245	2210
12	Parking Lot1	123	1110	Soybean-clean	59	534
13	Parking Lot2	47	422	Wheat	20	185
14	Tennis Court	43	385	Woods	126	1139
15	Running Track	66	594	Buildings	39	347
16				Stone	9	84
	Total	1502	13,527	Total	1024	9225	Total	2138	40,638	Total	2046	20,249

Table 2. Comparative Classification Performance on the Indian Pines Dataset, Including Overall Accuracy (OA), Average Accuracy (AA), Kappa Coefficient (κ), and Per-Class Accuracies. The Highest Values Are Highlighted in Bold.

Class	SVM	2DCNN	3DCNN	SF	DCTN	CASST	HSI-BERT	CenterM	S2Mamba	3DSS-Mamba	Ours
1	39.02 ± 10.54	89.47 ± 1.42	100.00 ± 0.00	100.00 ± 0.00	92.86 ± 1.39	100.00 ± 0.00	100.00 ± 0.00	97.61 ± 1.91	64.54 ± 7.59	100.00 ± 0.00	97.56 ± 1.41
2	74.01 ± 11.68	91.59 ± 1.38	79.37 ± 4.73	73.24 ± 5.99	84.58 ± 5.39	100.00 ± 0.00	68.53 ± 4.39	95.35 ± 2.28	94.62 ± 1.98	92.66 ± 1.98	93.61 ± 1.34
3	70.15 ± 2.43	85.84 ± 4.75	96.99 ± 1.67	95.24 ± 1.77	91.16 ± 2.28	92.86 ± 2.54	65.06 ± 7.09	99.47 ± 0.11	98.73 ± 0.90	93.98 ± 2.39	96.92 ± 0.76
4	51.17 ± 17.30	96.84 ± 1.43	82.98 ± 2.43	91.67 ± 3.97	97.18 ± 0.41	91.67 ± 1.34	100.00 ± 0.00	94.03 ± 3.27	93.77 ± 4.02	95.78 ± 1.73	100.0 ± 0.00
5	94.02 ± 2.72	94.82 ± 2.68	97.94 ± 0.29	100.00 ± 0.00	95.17 ± 1.38	95.83 ± 0.85	91.67 ± 1.07	98.42 ± 0.54	99.34 ± 0.06	96.91 ± 0.90	98.62 ± 0.86
6	96.96 ± 1.90	98.63 ± 0.46	99.32 ± 0.22	100.00 ± 0.00	94.98 ± 1.39	100.00 ± 0.00	100.00 ± 0.00	98.80 ± 0.20	98.41 ± 0.26	97.95 ± 1.08	98.93 ± 0.14
7	72.00 ± 6.79	100.00 ± 0.00	80.00 ± 4.77	100.00 ± 0.00	87.50 ± 3.28	100.00 ± 0.00	100.00 ± 0.00	80.76 ± 3.12	100.0 ± 0.00	80.00 ± 5.40	100.0 ± 0.00
8	95.18 ± 1.63	99.48 ± 0.12	98.96 ± 0.33	100.00 ± 0.00	98.60 ± 1.31	100.00 ± 0.00	100.00 ± 0.00	100.0 ± 0.00	100.0 ± 0.00	97.92 ± 1.06	100.0 ± 0.00
9	50.00 ± 7.63	87.50 ± 3.39	75.00 ± 2.23	100.00 ± 0.00	100.00 ± 0.00	100.00 ± 0.00	50.00 ± 9.63	86.22 ± 6.91	100.0 ± 0.00	25.00 ± 11.69	71.11 ± 4.63
10	73.71 ± 8.79	91.00 ± 3.75	90.72 ± 1.54	71.43 ± 5.75	90.07 ± 1.82	95.92 ± 0.49	92.78 ± 1.37	97.20 ± 1.37	96.96 ± 0.78	93.81 ± 1.49	96.57 ± 1.36
11	81.18 ± 1.54	95.01 ± 1.49	87.37 ± 4.48	96.75 ± 1.47	97.69 ± 2.05	98.37 ± 0.18	80.08 ± 2.19	94.64 ± 2.46	99.01 ± 0.43	98.17 ± 0.06	99.04 ± 0.26
12	61.42 ± 6.75	85.65 ± 2.08	91.60 ± 2.47	76.67 ± 5.43	89.89 ± 3.01	80.00 ± 1.40	83.05 ± 2.99	90.10 ± 3.42	92.36 ± 2.29	89.92 ± 4.62	95.13 ± 1.17
13	99.46 ± 0.28	95.12 ± 1.83	100.00 ± 0.00	100.00 ± 0.00	98.36 ± 0.02	100.0 ± 0.00	100.00 ± 0.00	100.0 ± 0.00	98.97 ± 1.11	100.00 ± 0.00	100.0 ± 0.00
14	93.24 ± 1.53	96.05 ± 0.49	98.81 ± 0.02	98.41 ± 0.17	97.37 ± 1.25	98.41 ± 0.41	98.41 ± 0.88	100.0 ± 0.00	99.50 ± 0.21	99.21 ± 0.14	100.0 ± 0.00
15	56.77 ± 12.66	100.00 ± 0.00	98.70 ± 0.57	100.00 ± 0.00	98.28 ± 0.41	100.00 ± 0.00	94.87 ± 1.73	98.30 ± 0.38	93.73 ± 3.28	96.10 ± 1.99	99.71 ± 0.05
16	89.29 ± 4.56	94.59 ± 1.63	94.74 ± 2.09	100.00 ± 0.00	96.43 ± 1.36	100.00 ± 0.00	100.00 ± 0.00	98.82 ± 0.61	88.63 ± 3.65	100.00 ± 0.00	98.80 ± 0.37
OA (%)	79.89 ± 5.52	93.66 ± 1.54	91.56 ± 2.50	90.64 ± 2.59	93.82 ± 2.58	96.88 ± 0.47	85.46 ± 4.49	96.70 ± 0.87	97.26 ± 0.79	95.95 ± 0.77	97.77 ± 0.82
AA (%)	74.89 ± 8.68	93.85 ± 2.96	92.03 ± 3.36	93.96 ± 1.99	94.38 ± 2.00	97.07 ± 0.17	89.03 ± 5.98	94.73 ± 3.58	94.28 ± 3.51	91.08 ± 4.76	96.00 ± 1.57
Kappa	77.03 ± 3.43	92.77 ± 1.61	90.42 ± 1.55	89.28 ± 4.66	92.93 ± 2.09	96.44 ± 1.28	83.59 ± 4.30	96.24 ± 1.27	96.88 ± 1.48	95.37 ± 1.29	97.46 ± 1.19

Table 3. Comparative Classification Performance on the Houston 2013 Dataset, Including Overall Accuracy (OA), Average Accuracy (AA), Kappa Coefficient (κ), and Per-Class Accuracies. The Highest Values Are Highlighted in Bold.

Class	SVM	2DCNN	3DCNN	SF	DCTN	CASST	HSI-BERT	CenterMamba	S2Mamba	3DSS-Mamba	Ours
1	87.08 ± 5.97	99.75 ± 0.85	99.33 ± 0.86	99.54 ± 0.01	93.87 ± 2.48	98.94 ± 1.04	99.20 ± 0.37	98.82 ± 0.18	98.40 ± 0.07	99.52 ± 0.20	99.13 ± 0.09
2	95.36 ± 1.47	82.45 ± 4.97	100.00 ± 0.00	90.89 ± 2.64	98.67 ± 0.04	100.00 ± 0.00	88.83 ± 2.86	98.82 ± 0.41	98.57 ± 0.76	90.91 ± 2.73	100.0 ± 0.00
3	97.16 ± 1.78	98.01 ± 0.71	100.00 ± 0.00	96.31 ± 0.36	99.04 ± 0.67	86.67 ± 5.70	95.22 ± 1.89	99.84 ± 0.11	100.00 ± 0.00	99.43 ± 0.38	99.73 ± 0.04
4	91.12 ± 2.65	92.21 ± 2.28	93.33 ± 2.25	91.95 ± 2.78	93.03 ± 2.89	98.40 ± 0.79	92.76 ± 2.94	96.78 ± 0.47	97.46 ± 1.81	93.73 ± 2.03	98.38 ± 0.91
5	90.73 ± 2.93	98.02 ± 0.74	95.30 ± 1.29	99.77 ± 0.04	99.46 ± 0.08	99.46 ± 0.37	99.20 ± 0.37	100.0 ± 0.00	100.00 ± 0.00	98.71 ± 0.01	100.0 ± 0.00
6	78.21 ± 6.49	90.05 ± 3.52	100.00 ± 0.00	91.23 ± 0.63	97.96 ± 1.37	97.96 ± 0.49	92.86 ± 2.38	98.05 ± 0.02	98.06 ± 0.96	95.06 ± 2.90	100.0 ± 0.00
7	82.49 ± 6.31	91.87 ± 2.76	88.16 ± 4.53	97.52 ± 0.26	96.05 ± 0.39	99.47 ± 0.38	90.00 ± 2.97	93.02 ± 2.90	96.60 ± 0.75	90.22 ± 2.38	98.12 ± 0.52
8	74.39 ± 5.47	88.88 ± 4.69	98.66 ± 0.27	94.71 ± 0.19	98.12 ± 0.17	92.51 ± 3.81	91.69 ± 2.93	94.92 ± 2.29	97.29 ± 0.48	96.78 ± 1.71	99.10 ± 0.19
9	87.00 ± 5.51	92.51 ± 1.51	95.33 ± 1.81	97.26 ± 0.97	96.54 ± 1.59	97.87 ± 0.58	98.14 ± 0.81	98.73 ± 0.37	96.46 ± 0.69	96.17 ± 1.86	99.08 ± 0.18
10	83.22 ± 3.98	92.73 ± 1.87	96.60 ± 1.89	94.42 ± 0.75	99.18 ± 0.09	94.02 ± 2.37	86.68 ± 4.58	98.97 ± 0.19	99.65 ± 0.14	95.11 ± 1.28	100.0 ± 0.00
11	80.04 ± 5.84	91.16 ± 2.61	99.32 ± 0.77	99.07 ± 0.78	97.04 ± 0.37	92.43 ± 2.58	100.00 ± 0.00	100.0 ± 0.00	99.91 ± 0.06	97.41 ± 1.39	100.0 ± 0.00
12	62.92 ± 10.72	93.76 ± 1.59	96.62 ± 0.66	86.57 ± 4.82	98.11 ± 0.27	98.38 ± 0.95	90.27 ± 2.48	96.41 ± 1.85	96.84 ± 0.73	98.06 ± 0.40	99.97 ± 0.02
13	88.34 ± 1.91	91.80 ± 2.51	92.86 ± 2.51	95.12 ± 0.84	95.04 ± 2.52	94.29 ± 1.93	95.74 ± 2.72	86.99 ± 6.49	91.03 ± 3.28	96.15 ± 1.59	99.32 ± 0.31
14	83.70 ± 4.79	97.48 ± 0.31	100.00 ± 0.00	100.00 ± 0.00	100.00 ± 0.00	100.00 ± 0.00	99.22 ± 0.01	100.0 ± 0.00	99.50 ± 0.11	99.53 ± 0.60	100.0 ± 0.00
15	94.89 ± 1.86	97.90 ± 1.56	98.73 ± 0.87	98.27 ± 0.91	99.49 ± 0.28	100.00 ± 0.00	100.00 ± 0.00	100.0 ± 0.00	100.0 ± 0.00	98.79 ± 0.08	100.0 ± 0.00
OA (%)	84.65 ± 4.24	92.92 ± 2.13	96.62 ± 1.35	95.42 ± 1.15	97.25 ± 0.79	96.81 ± 1.57	94.23 ± 1.93	97.59 ± 1.07	98.10 ± 0.61	96.07 ± 1.19	99.06 ± 0.09
AA (%)	85.11 ± 5.98	93.24 ± 3.65	96.95 ± 2.28	95.51 ± 2.59	97.44 ± 1.08	96.69 ± 1.65	94.65 ± 2.37	97.42 ± 1.68	97.98 ± 1.18	96.37 ± 1.70	98.65 ± 0.11
Kappa	83.41 ± 3.61	92.34 ± 2.99	96.34 ± 1.83	95.05 ± 1.71	97.03 ± 1.34	96.55 ± 1.26	93.77 ± 1.19	97.39 ± 1.18	97.94 ± 0.86	95.76 ± 1.51	98.60 ± 0.23

Table 4. Comparative Classification Performance on the Pavia University Dataset, Including Overall Accuracy (OA), Average Accuracy (AA), Kappa Coefficient (κ), and Per-Class Accuracies. The Highest Values Are Highlighted in Bold.

Class	SVM	2DCNN	3DCNN	SF	DCTN	CASST	HSI-BERT	CenterMamba	S2Mamba	3DSS-Mamba	Ours
1	93.11 ± 2.47	95.64 ± 1.77	99.49 ± 0.13	96.97 ± 1.20	96.37 ± 1.74	98.37 ± 0.16	97.71 ± 1.54	98.74 ± 0.56	98.69 ± 0.83	98.42 ± 0.67	99.12 ± 0.21
2	97.73 ± 0.64	98.93 ± 0.91	99.82 ± 0.31	97.06 ± 1.10	99.05 ± 0.17	98.98 ± 0.01	98.60 ± 0.90	99.95 ± 0.17	99.90 ± 0.02	99.42 ± 0.21	100.00 ± 0.00
3	66.65 ± 10.77	86.71 ± 3.76	96.86 ± 1.23	85.65 ± 4.71	92.63 ± 2.67	94.54 ± 1.21	90.53 ± 1.78	97.84 ± 0.61	97.64 ± 0.72	97.50 ± 1.67	98.09 ± 0.98
4	94.06 ± 1.55	97.32 ± 2.26	96.80 ± 1.74	97.11 ± 1.01	98.48 ± 0.71	99.17 ± 0.05	98.16 ± 1.11	91.27 ± 3.17	94.81 ± 0.31	99.67 ± 0.04	97.90 ± 0.21
5	97.73 ± 0.34	98.43 ± 0.03	99.85 ± 0.03	97.89 ± 1.07	97.93 ± 0.52	99.50 ± 0.07	99.65 ± 0.02	95.46 ± 2.19	97.57 ± 0.23	100.00 ± 0.00	99.84 ± 0.12
6	76.85 ± 14.23	85.13 ± 4.19	83.37 ± 4.98	97.09 ± 1.67	94.27 ± 2.11	98.89 ± 0.16	98.08 ± 0.14	89.36 ± 3.34	100.00 ± 0.00	99.60 ± 0.21	100.00 ± 0.00
7	74.82 ± 13.95	88.47 ± 1.53	91.43 ± 3.17	80.87 ± 4.41	88.55 ± 4.15	93.38 ± 2.32	85.85 ± 3.12	100.00 ± 0.00	100.00 ± 0.00	96.62 ± 1.31	100.00 ± 0.00
8	88.36 ± 6.96	92.91 ± 4.00	98.97 ± 0.41	94.23 ± 1.98	95.42 ± 2.11	97.39 ± 1.00	96.33 ± 1.98	97.51 ± 0.26	99.19 ± 0.01	96.33 ± 2.91	99.85 ± 0.02
9	99.33 ± 0.11	100.0 ± 0.00	100.0 ± 0.00	100.0 ± 0.00	99.88 ± 0.01	100.0 ± 0.00	99.88 ± 0.03	93.66 ± 3.11	88.00 ± 4.21	100.0 ± 0.00	96.00 ± 2.12
OA (%)	91.29 ± 1.52	95.25 ± 4.93	97.14 ± 0.40	95.84 ± 1.87	97.06 ± 1.21	98.40 ± 0.15	97.44 ± 0.59	97.30 ± 1.70	98.85 ± 0.71	98.89 ± 0.41	99.68 ± 0.21
AA (%)	87.63 ± 5.09	93.73 ± 1.33	96.29 ± 0.69	94.10 ± 2.14	95.84 ± 2.74	97.80 ± 0.64	96.09 ± 2.17	95.98 ± 2.28	97.31 ± 1.21	98.62 ± 0.93	98.98 ± 1.39
Kappa	88.36 ± 1.35	93.68 ± 2.10	96.18 ± 0.41	94.50 ± 2.51	96.10 ± 1.71	97.88 ± 0.31	96.61 ± 1.41	96.40 ± 2.43	98.48 ± 0.32	98.53 ± 0.91	99.35 ± 0.76

Table 5. Comparative Classification Performance on the WHU-HI-LONGKOU Dataset, Including Overall Accuracy (OA), Average Accuracy (AA), Kappa Coefficient (κ), and Per-Class Accuracies. The Highest Values Are Highlighted in Bold.

Class	SVM	2DCNN	3DCNN	SF	DCTN	CASST	HSI-BERT	CenterMamba	S2Mamba	3DSS-Mamba	Ours
1	97.01 ± 3.61	99.81 ± 0.12	97.45 ± 1.31	99.28 ± 1.87	99.06 ± 0.81	99.59 ± 0.13	98.73 ± 0.08	99.40 ± 0.21	99.97 ± 0.02	99.60 ± 0.21	99.97 ± 0.01
2	79.64 ± 10.30	99.83 ± 0.76	97.64 ± 2.36	98.68 ± 1.21	99.10 ± 0.17	99.82 ± 0.04	97.15 ± 1.21	98.89 ± 0.37	99.37 ± 0.31	99.70 ± 0.26	99.77 ± 0.18
3	69.73 ± 11.71	91.69 ± 1.61	75.34 ± 10.58	59.26 ± 9.31	83.13 ± 5.35	94.72 ± 1.78	71.95 ± 4.87	93.60 ± 1.65	95.93 ± 1.51	97.19 ± 1.56	96.53 ± 1.09
4	97.41 ± 1.31	93.32 ± 4.39	98.41 ± 2.21	98.79 ± 1.40	98.17 ± 1.10	99.30 ± 0.44	99.22 ± 0.21	99.86 ± 0.07	99.93 ± 0.03	99.42 ± 0.44	99.81 ± 0.01
5	63.92 ± 18.60	51.62 ± 10.30	92.16 ± 5.72	91.68 ± 3.44	99.61 ± 0.13	99.28 ± 0.31	96.58 ± 1.98	93.41 ± 1.87	97.34 ± 2.33	96.93 ± 1.76	99.05 ± 0.41
6	96.30 ± 0.98	98.94 ± 0.10	91.89 ± 6.98	97.34 ± 2.61	99.80 ± 0.04	99.16 ± 0.12	99.04 ± 0.21	97.66 ± 0.91	99.03 ± 0.31	99.96 ± 0.02	99.82 ± 0.06
7	99.93 ± 0.01	99.95 ± 0.01	99.98 ± 0.01	99.97 ± 0.02	99.96 ± 0.01	99.99 ± 0.01	99.96 ± 0.02	99.97 ± 0.02	99.94 ± 0.01	99.99 ± 0.00	99.94 ± 0.01
8	86.60 ± 4.30	82.22 ± 9.51	83.98 ± 5.62	98.32 ± 1.01	97.79 ± 1.41	94.53 ± 1.21	97.88 ± 1.56	92.48 ± 3.21	94.82 ± 2.21	98.88 ± 1.09	96.73 ± 1.41
9	73.61 ± 11.09	96.58 ± 0.89	90.48 ± 2.39	96.01 ± 2.07	98.09 ± 0.91	97.23 ± 0.98	97.50 ± 0.42	82.37 ± 5.61	92.91 ± 3.87	98.42 ± 1.02	95.24 ± 0.61
OA (%)	95.30 ± 1.30	96.00 ± 0.99	96.07 ± 1.91	98.35 ± 0.43	98.83 ± 0.41	99.30 ± 0.14	98.74 ± 0.87	98.73 ± 0.41	99.39 ± 0.21	99.55 ± 0.21	99.59 ± 0.05
AA (%)	84.90 ± 5.29	90.44 ± 4.12	83.55 ± 5.31	93.26 ± 5.31	97.19 ± 1.09	98.18 ± 0.51	95.33 ± 1.90	95.29 ± 2.76	97.69 ± 1.35	98.90 ± 0.51	98.54 ± 0.61
Kappa	93.79 ± 1.42	94.78 ± 1.37	94.80 ± 2.31	97.83 ± 0.00	98.47 ± 0.49	99.08 ± 0.19	98.34 ± 0.20	98.32 ± 0.51	99.20 ± 0.41	99.41 ± 0.23	99.46 ± 0.12

Table 6. Ablation Study of Key Components on the Indian Pines Dataset.

ID	Model Variant	AHSE	IISE	Fusion Strategy	OA (%)	AA (%)	Kappa (%)
1	AHSE-Only	✓	✗	-	93.85	89.12	92.95
2	AHSE (Fixed Weights)	✓	✗	-	90.41	87.74	89.68
3	IISE-Only	✗	✓	-	94.62	90.55	93.81
4	IISE (no Shuffle)	✗	✓	-	94.15	91.32	93.20
5	IISE (Contiguous)	✗	✓	-	91.29	88.50	90.15
6	Dual-Sum	✓	✓	Element-wise Sum	96.15	92.40	95.58
7	Dual-Concat	✓	✓	Concatenation	96.48	92.95	95.92
8	Dual-GLU	✓	✓	Dual-GLU	97.13	93.85	96.62
9	Dual-Attention	✓	✓	Dual-Attention	96.65	93.10	96.15
10	CF-Mamba	✓	✓	CGU	97.77	96.00	97.46

Table 7. Computational Efficiency Analysis of Different Methods Across Four Datasets.

Dataset	Metrics	Method
Dataset	Metrics	3D-CNN	HSI-BERT	SF	DCTN	CenterMamba	S2-Mamba	3DSS-Mamba	CF-Mamba
Indian Pines	Testing time (s)	0.21	11.32	4.51	2.15	1.45	1.95	1.25	1.68
	FLOPs (G)	0.002	7.112	0.224	0.207	0.072	0.142	0.023	0.115
	Params (M)	0.016	0.413	0.196	11.215	0.195	0.158	0.011	0.124
Pavia Univ	Testing time (s)	0.55	30.85	8.14	10.94	5.15	7.65	4.10	6.82
	FLOPs (G)	0.001	5.448	0.212	0.273	0.049	0.185	0.014	0.131
	Params (M)	0.007	0.309	0.226	11.208	0.163	0.235	0.010	0.192
Houston 2013	Testing time (s)	0.32	8.963	4.41	5.16	3.11	4.15	2.83	3.35
	FLOPs (G)	0.002	6.235	0.171	0.235	0.069	0.182	0.023	0.145
	Params (M)	0.012	0.393	0.213	11.212	0.153	0.145	0.011	0.106
WHU-Hi-LongKou	Testing time (s)	1.97	110.44	29.14	39.17	19.76	28.30	14.68	24.43
	FLOPs (G)	0.001	7.954	0.309	0.399	0.065	0.235	0.020	0.191
	Params (M)	0.005	0.213	0.156	7.734	0.163	0.178	0.007	0.132

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Cao, G.; Shi, B.; Zhang, Y. CF-Mamba: A Dual-Path Collaborative Method for Hyperspectral Image Classification. Remote Sens. 2026, 18, 1063. https://doi.org/10.3390/rs18071063

AMA Style

Wang Y, Cao G, Shi B, Zhang Y. CF-Mamba: A Dual-Path Collaborative Method for Hyperspectral Image Classification. Remote Sensing. 2026; 18(7):1063. https://doi.org/10.3390/rs18071063

Chicago/Turabian Style

Wang, Yapeng, Guo Cao, Boshan Shi, and Youqiang Zhang. 2026. "CF-Mamba: A Dual-Path Collaborative Method for Hyperspectral Image Classification" Remote Sensing 18, no. 7: 1063. https://doi.org/10.3390/rs18071063

APA Style

Wang, Y., Cao, G., Shi, B., & Zhang, Y. (2026). CF-Mamba: A Dual-Path Collaborative Method for Hyperspectral Image Classification. Remote Sensing, 18(7), 1063. https://doi.org/10.3390/rs18071063

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CF-Mamba: A Dual-Path Collaborative Method for Hyperspectral Image Classification

Highlights

Abstract

1. Introduction

2. Preliminary

2.1. State Space Model (SSM)

2.2. Selective State Space Model (S6) and Mamba

3. Proposed Method

3.1. Overall Architecture

3.2. Adaptive HoloSpectral Encoder (AHSE)

3.2.1. Multi-View Spectral–Spatial Serialization

3.2.2. Parallel Selective State Space Evolution

3.2.3. Content-Aware Adaptive Fusion

3.3. Interactive Interval Spectral Encoder (IISE)

3.3.1. Interval Feature Decoupling

3.3.2. Cross-Group Shuffle Interaction

3.4. Confluence Gating Unit (CGU)

3.4.1. Cross-Scale Alignment and Gate Generation

3.4.2. Bi-Directional Cross-Modulation

3.4.3. Confluence Output and Classification

4. Experiment

4.1. Datasets

4.2. Experimental Setup

4.3. Experimental Results

4.4. Feature Visualization

4.5. Parameter Analysis

4.6. Ablation Studies

4.6.1. Internal Mechanisms of the Single-Path Encoders (AHSE & IISE)

4.6.2. Synergy of Dual-Path and Superiority of CGU Fusion

4.7. Comprehensive Analysis of Computational Efficiency

5. Conclusions and Future Developments

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI