Next Article in Journal
Traffic Sign Detection and Quality Assessment Using YOLOv8 in Daytime and Nighttime Conditions
Previous Article in Journal
Implementation of Visual Odometry on Jetson Nano
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SSM-Net: Enhancing Compressed Sensing Image Reconstruction with Mamba Architecture and Fast Iterative Shrinking Threshold Algorithm Optimization

Beijing Electronic Science and Technology Institute, Beijing 100070, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(4), 1026; https://doi.org/10.3390/s25041026
Submission received: 5 January 2025 / Revised: 5 February 2025 / Accepted: 7 February 2025 / Published: 9 February 2025
(This article belongs to the Section Sensing and Imaging)

Abstract

:
Compressed sensing (CS) is a powerful technique that can reduce data size while maintaining high reconstruction quality, which makes it particularly valuable in high-dimensional image applications. However, many existing methods have difficulty balancing reconstruction accuracy, computational efficiency, and fast convergence. To address these challenges, this paper proposes SSM-Net, a novel framework that combines the state-space modeling (SSM) of the Mamba architecture with the fast iterative shrinking threshold algorithm (FISTA). The Mamba-based SSM module can effectively capture local and global dependencies with linear computational complexity and significantly reduces the computation time compared to Transformer-based methods. In addition, the momentum update inspired by FISTA improves the convergence speed during deep iterative reconstruction. SSM-Net features a lightweight sampling module for efficient data compression, an initial reconstruction module for fast approximation, and a deep reconstruction module for iterative refinement. Extensive experiments on various benchmark datasets show that SSM-Net achieves state-of-the-art reconstruction performance while reducing both training and inference reconstruction time, making SSM-Net a scalable and practical solution for real-time applications of compressed sensing.

1. Introduction

Compressed sensing (CS) has gained prominence in signal processing due to its ability to reconstruct high-dimensional signals from significantly fewer measurements [1]. This technique proves especially effective in scenarios where resources for data acquisition, such as bandwidth and storage, are constrained [2,3,4,5,6,7]. By exploiting the inherent sparsity or compressibility of signals, CS offers a structured approach to signal recovery applicable across various fields, including medical imaging, telecommunications, and industrial automation.
CS works in two main steps. First, during the sampling phase, a high-dimensional signal x R n is projected into a lower-dimensional measurement y R m (where m n ) using a sensing matrix Φ :
y = Φ x + η ,
where η represents noise. In the second step, the goal is to recover x by solving an optimization problem that balances the accuracy of the data and the sparsity of the signal:
min x 1 2 y Φ x 2 2 + λ Ψ x 1 .
Here, Ψ represents the transform basis, and λ controls the trade-off between the reconstruction accuracy and sparsity.
Researchers have developed numerous algorithms to solve this problem. Methods like Orthogonal Matching Pursuit (OMP) [8] and the Iterative Shrinkage-Thresholding Algorithm (ISTA) [9] address the optimization efficiently in many cases. However, computational limitations and reduced accuracy at low sampling rates remain persistent challenges. Algorithms such as the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [10] incorporate momentum-based updates to improve convergence speed, significantly enhancing scalability in large-scale settings.
The integration of deep learning with CS has introduced new possibilities. Hybrid frameworks combine traditional optimization methods with neural network-based models to achieve better performance. ISTA-Net [11], for example, adapts ISTA into a trainable neural network, while ADMM-CSNet [12] integrates alternating direction method of multipliers (ADMM) with deep learning for faster and more accurate results. Transformer-based approaches like TransCS [13] utilize attention mechanisms to capture global dependencies effectively. Additionally, architectures based on generative adversarial networks (GANs) [14,15] have been employed to enhance image resolution and recover fine details.
Despite these advancements, computational complexity and trade-offs between speed and accuracy present ongoing difficulties. Transformer-based models, though capable of modeling global features, exhibit quadratic complexity in self-attention mechanisms, limiting their use in real-time applications. Lightweight designs, such as CSNet [16], improve processing speed but often compromise reconstruction quality.
Alternative approaches have emerged to address these shortcomings. The Mamba architecture introduces a selective state-space model that captures local and global dependencies efficiently. Unlike Transformers, which rely on self-attention mechanisms with quadratic complexity O ( n 2 ) , Mamba achieves linear complexity O ( n ) through a more computationally efficient design. This approach reduces processing overhead and enhances scalability, making it well suited for high-dimensional image data.
Inspired by Mamba’s efficient dependency modeling, this work combines its state-space modeling with the momentum-based optimization of FISTA. The resulting framework achieves faster convergence, improved accuracy, and enhanced adaptability across various reconstruction scenarios.
This study makes the following contributions:
  • A novel compressive sensing framework is proposed, integrating Mamba’s state-space modeling and FISTA’s optimization. This design removes the dependence on manually defined hyperparameters such as sensing matrices.
  • Improvements in computational efficiency and reconstruction accuracy are demonstrated, with reduced training and inference times.
  • Extensive evaluations across multiple datasets highlight the framework’s adaptability and robustness under varying sampling rates and noise levels.
The paper proceeds as follows: Section 2 examines related works in CS and hybrid frameworks. Section 3 describes the proposed framework in detail. Section 4 presents experimental results and comparisons. Section 5 concludes with an outlook on future research.

2. Related Work

2.1. FISTA

FISTA is an advanced algorithm for solving large-scale optimization problems, particularly in compressive sensing applications. It builds on the traditional Iterative Shrinkage-Thresholding Algorithm (ISTA) by adding momentum-based acceleration, while keeping the simplicity of first-order methods.
The algorithm solves optimization problems of the following form:
min x F ( x ) = f ( x ) + g ( x )
where f ( x ) is a smooth convex function with a continuous gradient, and g ( x ) is a convex but possibly non-smooth regularization term. In compressive sensing, this becomes
min x 1 2 Φ x y 2 2 + λ x 1
where Φ is the measurement matrix, y represents the compressed measurements, and λ controls how sparse the solution is.
The key innovation of FISTA is its momentum-based acceleration. In each iteration, the algorithm keeps track of an auxiliary sequence that adds momentum from previous iterations, leading to the following update rules:
t k + 1 = 1 + 1 + 4 t k 2 2
z k = x k + t k 1 t k + 1 ( x k x k 1 )
Next, the algorithm performs a step called a “proximal gradient step” to adjust the solution:
x k + 1 = arg min x λ g ( x ) + L 2 x ( z k 1 L f ( z k ) ) 2 2
where L is the Lipschitz constant of f . For the common 1 regularization used in compressive sensing, this step simplifies to a soft-thresholding operation:
[ soft λ / L ( v ) ] i = sign ( v i ) max { | v i | λ / L , 0 }
This combination of momentum acceleration and proximal steps allows FISTA to converge faster, achieving a rate of O ( 1 / k 2 ) , which is much better than ISTA’s O ( 1 / k ) rate. The faster convergence and the algorithm’s efficiency make FISTA especially useful for large-scale compressive sensing tasks, where quick reconstruction is needed.

2.2. DL-Based CS

Deep learning has brought substantial advancements to CS by enabling the modeling of complex relationships between measurements and original signals. Methods leveraging deep learning for CS can generally be divided into two categories. The first category combines traditional optimization techniques with neural networks, creating hybrid models that balance interpretability and flexibility. The second category focuses on fully end-to-end architectures, where neural networks directly learn to reconstruct images without relying on prior optimization frameworks.
In the first category, hybrid methods combine the stability of traditional CS algorithms with the efficiency of deep learning. ISTA-Net [11] unfolds the ISTA into a neural network, replacing handcrafted sparsity constraints with learned nonlinear transforms. ADMM-CSNet [12] extends the alternating direction method of multipliers (ADMM) by introducing learnable parameters to accelerate convergence and enhance reconstruction accuracy. AMP-Net unfolds the AMP algorithm into a network framework, addressing noise and reducing blocking artifacts. NeumNet [17] uses the Neumann series to solve inverse problems efficiently but remains susceptible to block effects. Recent approaches, such as TransCS, incorporate Transformer-based architectures to capture global dependencies across image sub-blocks, while DRCAMP-Net [18] combines AMP with residual convolutional layers to expand the receptive field and improve reconstruction performance.
The second category focuses on purely deep learning-based methods, which leverage convolutional neural networks (CNNs) for end-to-end image reconstruction. ReconNet [19] pioneered the use of CNNs for compressive sensing recovery, demonstrating the feasibility of deep learning in this domain. DR2-Net [20] improves reconstruction with residual learning blocks, while DPA-Net [21] employs a dual-path attention mechanism to separately capture structural and texture details. However, the limited receptive field of CNNs restricts their ability to model global dependencies. To address this, methods like MSCRLNet extend the receptive field using multi-scale and residual learning strategies, effectively combining local feature extraction with global modeling.
In addition to these methods, hybrid deep unfolding networks, such as DPC-DUN [22] and LTwIST [23], combine the interpretability of traditional algorithms with the flexibility of deep learning, dynamically optimizing reconstruction paths and eliminating the need for manual parameter tuning.
While these advancements have significantly enhanced image reconstruction, challenges remain in balancing computational efficiency, reconstruction quality, and the ability to capture both local and global features. Recent methods integrating Transformers with CNNs offer promising solutions, paving the way for frameworks like our proposed SSM-Net (the code can be found in Supplementary Material), which further improves the efficiency and accuracy of compressive sensing reconstruction.

2.3. Mamba

Mamba is a cutting-edge state-space model (SSM) [24] that has demonstrated remarkable efficiency in modeling long-range dependencies in sequential data. Its key innovation lies in replacing the self-attention mechanism, commonly used in Transformers, with a structured state-space representation. This approach reduces the computational complexity of sequence modeling from quadratic to linear, making Mamba particularly well suited for tasks involving large-scale data. The state-space model underlying Mamba can be expressed as
h ( t ) = A h ( t ) + B u ( t ) , y ( t ) = C h ( t ) + D u ( t ) ,
where h ( t ) R N is the hidden state, u ( t ) R N is the input, and y ( t ) R N is the output. The matrices A , B , C , and D are learnable parameters. To adapt this continuous formulation for deep learning models, it is discretized using zero-order hold (ZOH) as follows:
h t + 1 = e A Δ h t + t t + Δ e A ( t + Δ τ ) B u ( τ ) d τ ,
where Δ is the time step.
On this basis, Mamba-2 [25] introduces several architectural improvements that improve both efficiency and performance. Key innovations include a simplified design with parallel parameter generation, an improved state-space dual (SSD) framework for connecting SSM and attention mechanisms, and improved hardware utilization through efficient matrix multiplication operations. The architecture supports larger state sizes (up to 8 times larger than the original Mamba) while maintaining 2–8 times faster computation speed. Mamba-2 also includes an additional normalization layer for improved stability and supports tensor parallelism for better scaling.
The selective information preservation mechanism present in both versions dynamically considers sequence length and input batch size in state computation. Mamba’s scanning method is a key step in converting 2D visual data into 1D sequences. Different scanning methods have different advantages in capturing spatial relationships and contextual information. Global scanning processes the entire image at once and can capture global patterns but may ignore details. Multi-directional selective scanning scans the image from multiple directions and can fully capture spatial information but has higher computational complexity. Spiral scanning expands from the center of the image outward and is suitable for tasks that require full coverage of the image, such as medical imaging and remote sensing. Radial scanning expands from the edge of the image to the center and is suitable for tasks that need to capture details in the central area. The scanning operation facilitates efficient computation of state selection, enabling Mamba to compress historical information into a compact and efficient state representation. This architectural efficiency enables Mamba to be successfully adapted to various computer vision tasks, such as the efficient visual backbone of Vision Mamba [26] and VMamba; the computationally efficient graph processing of Graph/-Mamba [27]; LocalMamba [28], which focuses on the extraction of local features, processing local areas of the image through window selective scanning techniques; and EfficientVMamba [29], which is a lightweight Mamba architecture that reduces computational costs by introducing Atrous selective scanning techniques. Mamba-ND [30] is a multi-dimensional Mamba architecture capable of processing data of arbitrary dimensions. It maintains the linear complexity of SSM by alternating the sequence order, which is suitable for processing high-dimensional data. MambaMixer [31] is a hybrid architecture that combines the advantages of Mamba and Transformer. It enhances the efficiency of sequence modeling through dual token and channel selection mechanisms. Mamba-R [32] introduces register tokens to enhance the focusing ability of feature maps and reduce artifacts in feature maps. PlainMamba [33] is a non-hierarchical Mamba architecture that maintains spatial continuity through continuous 2D scanning. This approach performs well in scenarios that require continuous data processing. Additionally, there are various Mamba-based image segmentation architectures (U-Mamba [34], Swin-UMamba [35], and Mamba-UNet [36]). As an emerging deep learning architecture, Mamba has shown great potential in the field of computer vision through selective state-space models and efficient scanning mechanisms. Its flexible architectural design enables it to adapt to different task requirements, from image classification and video processing to remote sensing and medical image analysis.

3. Methods

In this section, we present the proposed framework for efficient image reconstruction. The overall system pipeline is illustrated in Figure 1. The framework consists of three primary components: the sampling module, the reconstruction module, and the loss function design. The sampling module compresses high-dimensional image data into compact measurements, which are then used by the reconstruction module to recover the original image. The reconstruction module includes both initial reconstruction and deep reconstruction stages, iteratively refining the output to achieve high-fidelity results. Finally, the loss function ensures the reconstruction accuracy by balancing fidelity, perceptual similarity, and structural consistency.

3.1. Sampling Module

The sampling module is designed to efficiently transform high-dimensional image data into compressed measurements while preserving essential structural information. Given an input image X R B × 1 × H × W , where B represents the batch size and H, W denote the height and width, respectively, we propose a structured block-wise sampling approach that leverages learnable measurement matrices.
The foundation of our sampling mechanism is a learnable measurement matrix Φ R m × n , where m = ρ n with ρ being the target compression ratio and n = ϕ size 2 denoting the dimensionality of each block. The complete sampling process is detailed in Algorithm 1.
Algorithm 1 Block-wise Adaptive Sampling Process
Require: 
Input image X R B × 1 × H × W
Ensure: 
Compressed measurements y
  1:
{Initialize measurement matrix}
  2:
n ϕ size 2
  3:
m ρ n
  4:
Φ init N ( 0 , ( 1 / n ) 0.5 )
  5:
Q init Φ init T {Initialize reconstruction matrix}
  6:
{Block partitioning}
  7:
if  H mod ϕ size 0 or W mod ϕ size 0  then
  8:
   Apply padding operator B
  9:
end if
10:
X blocks P ( X )
11:
{Vectorization}
12:
x V ( X blocks )
13:
{Measurement computation}
14:
y Φ x T
15:
return  y
The sampling process consists of three key transformations. Initially, we define a block partitioning operator P : R B × 1 × H × W R N × ϕ size × ϕ size , where N represents the total number of blocks. This operator decomposes the input image into non-overlapping blocks:
X blocks = P ( X )
Subsequently, we employ a vectorization operator V : R N × ϕ size × ϕ size R N × ϕ size 2 that transforms each block into a vector representation:
x = V ( X blocks )
The final transformation computes the compressed measurements through linear projection:
y = Φ x T
A key innovation in our approach is the simultaneous learning of a reconstruction initialization matrix Q R n × m , initialized as Q init = Φ init T . This dual learning strategy enables joint optimization of the sampling and initial reconstruction processes, leading to the following objective:
min Φ , Q X f recon ( Q Φ x ) 2 2
where f recon ( · ) represents our reconstruction network.
To handle inputs of any size, we use a padding operator B to adjust the input to the nearest multiple of ϕ size . After the experiments, we found that ϕ size = 32 provides the best balance between speed and reconstruction quality. The adaptive sampling matrices and block-based processing help the framework capture important image features, while keeping the computation simple. This sampling module provides a strong foundation for reducing dimensionality efficiently while preserving critical image structures. It prepares the data for the reconstruction process, which is explained in the next section.

3.2. Reconstruction Module

3.2.1. Initial Reconstruction

The image signal is compressed into y through the sampling module, and this y is used as the input to the initial reconstruction, providing the preliminary estimate x init . This estimate forms the basis for subsequent refinement and optimization.
The process begins with a linear operation:
x init = Q y ,
where Q R n × m is initialized as the transpose of the sampling matrix Φ . Specifically, Q init = Φ T , ensuring that the initial estimate aligns naturally with the compressed data.
To maintain stability and support effective training, the entries of Q are drawn from a normal distribution:
Q init N ( 0 , 1 m ) ,
where m denotes the dimensionality of the compressed measurements. This initialization strategy promotes stable gradient flow during backpropagation and enables the model to learn meaningful reconstruction patterns.
When the dimensions of the input image do not align with the block size used in the sampling module, padding is applied to adjust the height (H) and width (W). The required padding values are determined as follows:
h pad = 0 if H mod ϕ size = 0 , ϕ size H mod ϕ size otherwise ,
w pad = 0 if W mod ϕ size = 0 , ϕ size W mod ϕ size otherwise .
This adjustment ensures consistent processing of all image blocks and retains the spatial relationships present in the original image.

3.2.2. Deep Reconstruction

The initial reconstruction result x init is used as input to the deep reconstruction module, where it is further improved through iterative optimization and advanced feature modeling. This stage combines momentum-based updates inspired by FISTA with multi-directional selective state-space modeling (SSM). Together, these techniques enable efficient and accurate recovery of high-dimensional image data from compressed measurements. The full process is described in Algorithm 2.
Algorithm 2 Deep Reconstruction Process
Require: 
Initial reconstruction x init , compressed measurements y , sampling matrix Φ
Ensure: 
Final reconstructed image X recon
  1:
Initialize x ( 0 ) = x init , t 1 = 1
  2:
for  l = 1 to L do
  3:
   Compute momentum t l + 1 1 + 1 + 4 t l 2 2
  4:
   Update intermediate state z ( l ) x ( l ) + t l 1 t l + 1 ( x ( l ) x ( l 1 ) )
  5:
   Apply gradient correction z ( l ) z ( l ) η Φ ( Φ z ( l ) y )
  6:
   Pre-process features z ( l ) z ( l ) f pre ( z ( l ) )
  7:
   Apply selective state-space modeling:
  8:
   for each direction r { 1 , 2 , 3 , 4 }  do
  9:
     Compute state evolution h r ( l ) exp ( Λ Δ r ) h r ( l 1 ) + B r z ( l ) Δ r
10:
   end for
11:
   Aggregate results z ( l ) r = 1 4 h r ( l )
12:
   Post-process features x ( l + 1 ) z ( l ) f post ( z ( l ) )
13:
end for
14:
Reshape final result X recon R ( x ( L ) )
15:
return  X recon
In each iteration l, the reconstruction follows these steps:
First, momentum-based acceleration is used to speed up convergence. The intermediate state z ( l ) is updated as follows:
t l + 1 = 1 + 1 + 4 t l 2 2 ,
z ( l ) = x ( l ) + t l 1 t l + 1 x ( l ) x ( l 1 ) ,
where t l is a momentum factor initialized as t 1 = 1 . This step incorporates information from previous iterations to ensure faster convergence and improved reconstruction quality.
Next, to maintain consistency with the compressed measurements y , a gradient descent step is applied to z ( l ) :
z ( l ) = z ( l ) η Φ Φ z ( l ) y ,
where η > 0 is a learnable step size parameter. This correction minimizes the data fidelity term, aligning the reconstruction with the measurement constraints.
Before applying state-space modeling, a pre-processing network f pre ( · ) is employed to enhance local image features:
z ( l ) = z ( l ) f pre ( z ( l ) ) ,
where f pre ( · ) is implemented as a series of convolutional layers designed to extract fine-grained image details.
The refined intermediate signal z ( l ) is then processed using a multi-directional selective state-space model. This step captures both local and global dependencies by scanning the input in four directions: horizontal, vertical, main diagonal, and secondary diagonal. For each direction r, the modeling is defined as
h r ( l ) = exp Λ Δ r h r ( l 1 ) + B r z ( l ) Δ r ,
Δ r = ϕ W r τ r + b r ,
Here, Λ represents the state evolution matrix, B r is the input projection matrix, ϕ ( · ) is a nonlinear activation function that ensures stability, and Δ r is the adaptive time step for the r-th direction. The outputs from all directions are combined as
z ( l ) = r = 1 4 h r ( l ) .
After state-space modeling, a post-processing network f post ( · ) refines the reconstructed signal. This refinement is performed using
x ( l + 1 ) = z ( l ) f post ( z ( l ) ) ,
where f post ( · ) is a set of learnable convolutional layers designed to remove residual errors and improve the global structure of the signal.
Once L iterations are complete, the final reconstructed signal x ( L ) is reshaped into its original spatial dimensions:
X recon = R ( x ( L ) ) ,
where R ( · ) is the reshaping operator.
This iterative process combines momentum-based optimization, feature extraction, and selective state-space modeling to achieve accurate and high-quality reconstruction. The integration of learnable parameters and adaptive strategies allows the model to handle diverse image structures and varying compression conditions effectively, ensuring both efficiency and accuracy.

3.3. Loss Function

To ensure accurate and high-quality reconstruction, we design a loss function that balances pixel-wise fidelity, perceptual quality, and structural consistency.
The MSE loss ensures pixel-wise accuracy by minimizing the L 2 norm between the reconstructed image X recon and the ground truth X :
L MSE = 1 N X recon X 2 2 ,
where N is the total number of pixels.
To improve perceptual quality, the SSIM loss measures structural similarity between X recon and X :
L SSIM = 1 SSIM ( X recon , X ) ,
where SSIM ( · , · ) captures luminance, contrast, and structure.
The edge gradient loss enhances edge preservation by penalizing differences in image gradients:
L edge = 1 N h X recon h X 1 + v X recon v X 1 ,
where h and v represent horizontal and vertical gradient operators, respectively.
Finally, the total loss integrates these components, balancing their contributions with weighting factors λ SSIM and λ edge :
L total = L MSE + λ SSIM L SSIM + λ edge L edge .
This comprehensive loss function ensures faithful reconstruction while preserving structural details and perceptual quality, enabling the model to achieve robust and high-quality results.

4. Experimental Results

4.1. Experimental Settings

The training and evaluation of SSM-Net use a variety of datasets, including satellite imagery, urban environments, natural scenes, and high-resolution images. The primary training process relies on the BSD500 dataset [37], which consists of 200 training images, 100 validation images, and 200 testing images. From each training image, 200 patches of size 96 × 96 pixels are extracted, resulting in a total of 100,000 training samples. Data augmentation methods, such as bidirectional flips, rotations, and scaling, are applied to increase image diversity.
To evaluate performance, several benchmark datasets are selected, each addressing specific challenges in image reconstruction:
(1) UCMerced Land Use Dataset: This dataset includes 21 land-use classes, each containing 100 images at a spatial resolution of 0.3 m. The dataset evaluates how well the model processes and distinguishes diverse land cover patterns.
(2) Urban100 [38]: This high-resolution dataset focuses on urban architecture and building facades. It tests the model’s ability to capture detailed structural features in dense urban environments.
(3) BSD100 [39]: A dataset of natural scene images, which features a range of terrain types and vegetation patterns. It provides insight into the model’s generalization capabilities across varied natural landscapes.
(4) Set5 [40]: This benchmark dataset contains high-resolution images designed to evaluate the reconstruction of fine-scale features. It is particularly useful for scenarios where precise detail recovery is critical.
Each dataset presents unique challenges, offering insights into the model’s behavior under different conditions. These datasets also serve as standardized benchmarks, enabling direct comparisons with existing methods such as ISTA-Net+, Csformer [41], AMP-Net, CPP-Net [42], and TransCS. By incorporating diverse structural patterns, textures, and details, the evaluation framework ensures a rigorous assessment of reconstruction performance. The observed results reflect the model’s ability to adapt to various image characteristics, confirming its effectiveness in reconstructing high-quality images under challenging conditions.
All experiments were conducted using PyTorch 1.9.0 on a machine equipped with an Intel® Xeon® 8336 CPU and a GeForce RTX 4090 GPU. To ensure fair comparison, all competing models were trained using the same BSD500 dataset and evaluated under identical conditions across all test datasets.

4.2. Comparisons with State-of-the-Art Methods

Table 1 presents comprehensive quantitative comparisons between SSM-Net and current state-of-the-art methods across four benchmark datasets (UCMerced, Set5, Urban100, and BSD100) at various sampling rates ( τ ). The sampling rate τ is defined as the ratio between the number of measurements M and the total number of pixels in the image N. Specifically, we define the sampling rate as
τ = M N
This sampling rate controls how much of the image is used during the reconstruction process, with lower values of τ corresponding to higher levels of compression. The experimental results demonstrate that SSM-Net achieves competitive or superior performance compared to existing approaches.
On the UCMerced dataset, which consists of satellite remote sensing imagery, SSM-Net demonstrates competitive performance particularly at lower sampling rates. At τ = 0.04 , our method achieves a PSNR of 25.28 dB and an SSIM of 0.6959, outperforming both CSformer (25.21 dB/0.6957) and TransCS (25.18 dB/0.6950). The advantage is more pronounced at τ = 0.1 , where SSM-Net achieves 29.53 dB PSNR and 0.8449 SSIM, surpassing TransCS (29.41 dB/0.8412) and CSformer (29.47 dB/0.8437). At τ = 0.25 , our method achieves the highest PSNR of 34.71 dB among all compared methods, demonstrating its particular effectiveness in medium-rate compression scenarios for remote sensing applications. While the performance shows some limitations at very high sampling rates ( τ = 0.5 ), the strong results in the critical low-to-medium sampling rate range highlight SSM-Net’s practical value for bandwidth-constrained remote sensing scenarios where efficient compression is most needed.
For the Set5 dataset, SSM-Net demonstrates superior performance across the full range of sampling rates. At lower rates ( τ = 0.01 and 0.04 ), our method achieves 23.37 dB and 29.32 dB PSNR, respectively, outperforming TransCS (22.98 dB and 29.02 dB). This advantage is maintained through medium rates ( τ = 0.25 and 0.3 ) with PSNRs of 37.61 dB and 38.74 dB, and extends to higher rates ( τ = 0.4 and 0.5 ) reaching 41.81 dB and 42.72 dB, consistently surpassing both TransCS and CSformer across all compression levels.
In the Urban100 dataset, which contains complex urban structures, SSM-Net shows balanced performance across different sampling rates. While slightly lower than TransCS at τ = 0.01 (19.18 dB vs. 19.53 dB), our method demonstrates competitive results at medium rates ( τ = 0.25 and 0.3 ) with PSNRs of 30.78 dB and 31.21 dB, and achieves superior reconstruction at higher rates ( τ = 0.4 and 0.5 ) with values of 33.34 dB and 34.93 dB, respectively, demonstrating particular effectiveness in preserving architectural details.
On the BSD100 dataset, featuring diverse natural landscapes, SSM-Net exhibits strong performance across sampling rates, with notable advantages at medium to high rates. Starting from τ = 0.1 with 27.60 dB PSNR, our method shows progressive improvement through τ = 0.25 and 0.3 (31.45 dB and 32.79 dB), culminating in strong high-rate performance at τ = 0.4 and 0.5 (34.96 dB and 36.89 dB). This consistent scaling demonstrates our method’s effectiveness in handling varied natural textures across different compression levels.
Table 2 further shows the WS-PSNR and MSSIM comparisons of different image reconstruction methods on the UCMerced, Set5, Urban100, and BSD100 datasets at various sampling rates ( τ { 0.01 , 0.04 , 0.1 , 0.25 , 0.5 } ). The results show that SSM-Net can achieve the current state-of-the-art performance on various datasets and sampling rates.
Figure 2 demonstrates the reconstruction results for satellite sensing images. As shown in the detailed regions (highlighted by red boxes), SSM-Net achieves superior reconstruction quality with PSNR/SSIM values of 33.41/0.8443 and 33.32/0.9192 for different sampling rates, effectively preserving both global structure and fine details in remote sensing imagery.
Figure 3 provides additional visual comparisons for high-resolution images. The results show that SSM-Net better preserves fine textures and sharp edges, achieving PSNR/SSIM values of 31.55/0.8896 and 35.42/0.9753 for different test cases. This visual quality improvement aligns with the quantitative metrics, confirming our method’s effectiveness in maintaining both structural integrity and local details.
These comprehensive results demonstrate that SSM-Net has achieved state-of-the-art performance, consistently matching or exceeding existing methods across different datasets and sampling rates. The balanced performance across various scenarios, particularly in remote sensing applications, validates our approach of combining momentum-based optimization with efficient feature modeling through the Mamba architecture.

4.3. Noise Robustness Analysis

We evaluate our method’s robustness against measurement noise using the BSD100 dataset. Our experiments introduce Gaussian noise with different variances at various sampling rates ( τ { 0.04 , 0.10 , 0.25 } ). Table 3 presents the quantitative comparisons between SSM-Net, AMP-Net, and Csformer under these conditions.
SSM-Net consistently outperforms the competing methods across all noise levels and sampling rates. At low noise ( σ = 0.001 ), our method achieves higher PSNR values than both AMP-Net and Csformer, with improvements of up to 0.95 dB at τ = 0.04 . This advantage remains evident at higher noise levels ( σ = 0.004 ), where SSM-Net maintains better reconstruction quality with PSNR values of 22.74 dB, 24.31 dB, and 26.06 dB at sampling rates of 0.04, 0.10, and 0.25, respectively.
The visual comparisons in Figure 4 further demonstrate our method’s superior noise handling capabilities. SSM-Net preserves more image details and produces cleaner reconstructions compared to AMP-Net and Csformer, particularly in challenging cases with both high noise levels and low sampling rates. These results confirm that our approach offers enhanced stability and reconstruction accuracy in noisy conditions.

4.4. Training Convergence Rate Analysis

We analyze the training convergence speed by comparing the PSNR and SSIM curves of different methods during the training process, as shown in Figure 5. The curves are obtained by evaluating performance on the validation set during training iterations.
Our SSM-Net achieves faster convergence compared to TransCS, reaching stable PSNR and SSIM values within approximately 3000 s. In contrast, TransCS requires nearly 5000 iterations to achieve comparable performance levels. While AMP-Net converges slightly faster, reaching stability at around 2000 s, its final reconstruction quality is notably lower than that of SSM-Net. For example, at sampling rate τ = 0.1 , SSM-Net achieves a 1.23 dB PSNR improvement over AMP-Net after convergence.
The rapid convergence of SSM-Net can be attributed to two factors: the momentum-based optimization strategy from FISTA and the efficient feature modeling of the Mamba architecture. Specifically, FISTA optimizes the training process by incorporating momentum, which accelerates convergence by utilizing information from previous gradients. This allows the model to make larger, more accurate updates during training, speeding up the process and improving performance.
The Mamba architecture, in contrast to the Transformer-based models, improves convergence by modeling both local and global feature dependencies efficiently. Transformers, while highly effective at capturing global relationships, require extensive computational resources due to their self-attention mechanism, which scales quadratically with the image size. Mamba’s state-space modeling (SSM) addresses this challenge by representing dependencies in a more compact and efficient manner, reducing computational complexity. By utilizing SSM, Mamba effectively captures long-range dependencies without the heavy computational burden typical of Transformers, leading to faster convergence and better performance in terms of both quality and efficiency.
The sudden drop observed in the blue curve (representing SSM-Net) around 2000 s is a notable point. This drop can be explained by the dynamic adjustments the model makes during training. It likely occurs due to the optimization algorithm adapting to the more challenging aspects of the data as the training progresses, where certain features or structures in the image require fine-tuning. This brief decrease is followed by rapid recovery, indicating that the model’s optimization strategy is robust enough to handle such challenges, and it stabilizes quickly. This behavior highlights the balance between exploration and fine-tuning during training, which is an essential aspect of the learning process.
Thus, the combination of FISTA’s momentum-based optimization and Mamba’s state-space modeling contributes significantly to the fast convergence of SSM-Net. The architecture’s efficiency in modeling dependencies and its ability to quickly adapt to complex image features ensure that the network converges faster while maintaining high reconstruction quality.

4.5. Complexity Analysis

We analyze the computational complexity of SSM-Net compared to existing methods. All experiments use a standard input size of 256 × 256 pixels and sampling rate τ = 0.1 .
As shown in Figure 6, SSM-Net requires 26.6626 GFLOPs (billion floating-point operations), which falls between lightweight models like CSNet (11.49 GFLOPs) and heavier models like ISTA-Net+ (30.931 GFLOPs). The parameter count of SSM-Net is 1.654 M, similar to that of TransCS (1.489 M). This moderate computational cost enables SSM-Net to achieve superior reconstruction quality while maintaining reasonable efficiency. In terms of energy consumption, SSM-Net requires 68.11 W, which is higher than lightweight models such as CSNet (29.35 W) and CSformer (29.49 W), but still more efficient than models like ISTA-Net+ (79.01 W). Similarly, SSM-Net’s memory usage stands at 892.38 MB, which is significantly higher than CSNet’s 173.78 MB and CSformer’s 347.56 MB, but lower than TransCS (803.60 MB) and ISTA-Net + (220.73 MB). These values highlight the trade-off between higher model performance and the increased computational cost in terms of energy consumption and memory usage.
Table 4 shows inference time comparisons on an RTX 4090 GPU. SSM-Net consistently processes images in approximately 0.0202 s across different sampling rates. This stable processing time demonstrates better scalability compared to methods like CSformer, which shows increasing inference times from 0.0469 to 0.0486 s at higher sampling rates. While CSNet achieves faster inference (0.0078–0.0099 s), it produces lower-quality reconstructions as shown by the PSNR results.
The efficiency of SSM-Net comes from the linear complexity of Mamba’s state-space model and FISTA’s momentum-based optimization. These components work together to provide fast convergence without excessive computational demands. The results show that SSM-Net balances computational cost and reconstruction quality effectively, offering strong performance while maintaining practical processing speeds.

4.6. Ablation Studies

Figure 7 and Table 5 show the results of the ablation study, which evaluates the impact of removing the FISTA algorithm and the Mamba module on image reconstruction quality and computational efficiency. We compare three configurations: SSM-Net (full model), without FISTA, and without Mamba, as well as Mamba replaced with Transformer.
Removing the FISTA algorithm results in a significant reduction in reconstruction quality. At τ = 0.25 , the PSNR drops from 30.78 dB (SSM-Net) to 27.85 dB (without FISTA), while the runtime improves slightly to 0.0180 s per image. This demonstrates that FISTA is crucial for enhancing both the quality and convergence of the reconstruction.
Similarly, removing the Mamba module also significantly degrades the performance. At τ = 0.25 , the PSNR decreases from 30.78 dB (SSM-Net) to 28.47 dB (without Mamba). The model also suffers from a loss of important global and local feature modeling, which is essential for high-quality reconstruction. The runtime remains competitive at 0.0184 s per image, but this speed comes at the expense of a noticeable loss in PSNR.
In contrast, the complete SSM-Net model achieves the best balance between reconstruction quality and computational efficiency. At τ = 0.25 , SSM-Net delivers a PSNR of 30.78 dB, while the runtime is 0.0204 s per image. This shows that the Mamba module plays a key role in improving the reconstruction quality without introducing significant computational overhead.
Finally, replacing Mamba with Transformer brings the PSNR closer to the performance of SSM-Net, reaching 30.12 dB at τ = 0.25 , compared to 28.47 dB without Mamba. However, the computational time increases significantly to 0.024 s per image. This indicates that although the Transformer configuration achieves PSNR values near SSM-Net, it incurs a substantial increase in runtime due to the quadratic complexity of attention mechanisms, making it less efficient than SSM-Net.
The ablation study demonstrates the critical role of both the Mamba module and FISTA in improving the reconstruction quality. Removing the Mamba module reduces the model’s ability to capture global and local dependencies, as evidenced by the significant drop in PSNR. This shows that Mamba’s capability to model these dependencies is essential for preserving fine image details.
In contrast, FISTA optimization enhances the reconstruction process by accelerating convergence and refining the image quality. When FISTA is removed, the network’s reconstruction is slower, and the model struggles to reach the same level of accuracy. This is reflected in the lower PSNR of “without FISTA” compared to SSM-Net, particularly at higher compression rates, where FISTA’s optimization is most beneficial.
These comparisons clearly demonstrate the benefits of the full SSM-Net model. Removing FISTA or Mamba degrades both the quality and efficiency of the model, while the Transformer-based alternative, although offering PSNR values close to those of SSM-Net, suffers from a substantial increase in runtime. This underscores the importance of Mamba in maintaining high-quality reconstruction with minimal computational cost.

5. Conclusions

In this paper, we propose SSM-Net, a new framework for efficient remote sensing image reconstruction based on SSM and deep unfolding techniques. The framework integrates a sampling module for efficient data compression, an initial reconstruction module for fast signal estimation, and a deep reconstruction module that iteratively refines the results. By leveraging FISTA-inspired momentum updates and selective state-space modeling, SSM-Net achieves a balance between reconstruction accuracy, computational efficiency, and fast training convergence.
Comprehensive experiments on standard benchmark datasets demonstrate that SSM-Net offers competitive performance in terms of PSNR and SSIM while maintaining a lightweight architecture and computational efficiency. Although the training process exploits natural image datasets, the framework exhibits strong generalization potential for remote sensing scenarios. Its modular design ensures adaptability and scalability, making it a practical solution for real-world remote sensing applications where storage and transmission constraints are critical.
Looking ahead, we aim to further optimize the proposed framework by incorporating ideas from the Mamba-2 model, particularly the selective state-space decoding SSD methodology. The SSD concept introduces more effective ways of modeling spatial dependencies, which can potentially enhance reconstruction accuracy and robustness. Additionally, we plan to explore domain-specific adaptations of SSM-Net by fine-tuning the framework on large-scale remote sensing datasets, including hyperspectral and SAR images. These advancements will further extend the applicability of SSM-Net, paving the way for its deployment in practical remote sensing systems.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/s25041026/s1.

Author Contributions

Conceptualization, X.G. and Y.Y.; methodology, X.G. and B.C.; software, X.G. and X.Y.; validation, X.G., B.C. and X.Y.; formal analysis, X.G. and X.Y.; data curation, X.G.; writing—original draft preparation, X.G. and B.C.; writing—review and editing, B.C. and Y.Y.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by “the Fundamental Research Funds for the Central Universities” (Grant Number: 3282023009).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

References

  1. Donoho, D. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
  2. Hariri, A.; Babaie-Zadeh, M. Compressive detection of sparse signals in additive white Gaussian noise without signal reconstruction. Signal Process. 2017, 131, 376–385. [Google Scholar] [CrossRef]
  3. Usala, J.D.; Maag, A.; Nelis, T.; Gamez, G. Compressed sensing spectral imaging for plasma optical emission spectroscopy. J. Anal. At. Spectrom. 2016, 31, 2198–2206. [Google Scholar] [CrossRef]
  4. Bu, H.; Tao, R.; Bai, X.; Zhao, J. A novel SAR imaging algorithm based on compressed sensing. IEEE Geosci. Remote. Sens. Lett. 2014, 12, 1003–1007. [Google Scholar] [CrossRef]
  5. Goodsell, R.M.; Coutts, S.; Oxford, W.; Hicks, H.; Comont, D.; Freckleton, R.P.; Childs, D.Z. Black-Grass Monitoring Using Hyperspectral Image Data Is Limited by Between-Site Variability. Remote Sens. 2024, 16, 4749. [Google Scholar] [CrossRef]
  6. Chanda, M.; Hossain, A.K.M.A. Application of PlanetScope Imagery for Flood Mapping: A Case Study in South Chickamauga Creek, Chattanooga, Tennessee. Remote Sens. 2024, 16, 4437. [Google Scholar] [CrossRef]
  7. Ren, D.; Qiu, X.; An, Z. A Multi-Source Data-Driven Analysis of Building Functional Classification and Its Relationship with Population Distribution. Remote Sens. 2024, 16, 4492. [Google Scholar] [CrossRef]
  8. Tropp, J.A.; Gilbert, A.C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 2007, 53, 4655–4666. [Google Scholar] [CrossRef]
  9. Daubechies, I.; Defrise, M.; De Mol, C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. J. Issued Courant Inst. Math. Sci. 2004, 57, 1413–1457. [Google Scholar] [CrossRef]
  10. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
  11. Zhang, J.; Ghanem, B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1828–1837. [Google Scholar]
  12. Yang, Y.; Sun, J.; Li, H.; Xu, Z. ADMM-CSNet: A deep learning approach for image compressive sensing. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 521–538. [Google Scholar] [CrossRef] [PubMed]
  13. Shen, M.; Gan, H.; Ning, C.; Hua, Y.; Zhang, T. TransCS: A transformer-based hybrid architecture for image compressed sensing. IEEE Trans. Image Process. 2022, 31, 6991–7005. [Google Scholar] [CrossRef] [PubMed]
  14. Kabkab, M.; Samangouei, P.; Chellappa, R. Task-aware compressed sensing with generative adversarial networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
  15. Sun, Y.; Chen, J.; Liu, Q.; Liu, G. Learning image compressed sensing with sub-pixel convolutional generative adversarial network. Pattern Recognit. 2020, 98, 107051. [Google Scholar] [CrossRef]
  16. Kumar, A.; Upadhyay, N.; Ghosal, P.; Chowdhury, T.; Das, D.; Mukherjee, A.; Nandi, D. CSNet: A new DeepNet framework for ischemic stroke lesion segmentation. Comput. Methods Programs Biomed. 2020, 193, 105524. [Google Scholar] [CrossRef]
  17. Gilton, D.; Ongie, G.; Willett, R. Neumann networks for linear inverse problems in imaging. IEEE Trans. Comput. Imaging 2019, 6, 328–343. [Google Scholar] [CrossRef]
  18. Guo, Z.; Zhang, J. Lightweight Dilated Residual Convolution AMP Network for Image Compressed Sensing. In Proceedings of the 2023 4th International Conference on Computer Engineering and Application (ICCEA), Hangzhou, China, 7–9 April 2023; pp. 747–752. [Google Scholar]
  19. Kulkarni, K.; Lohit, S.; Turaga, P.; Kerviche, R.; Ashok, A. Reconnet: Non-iterative reconstruction of images from compressively sensed measurements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 449–458. [Google Scholar]
  20. Yao, H.; Dai, F.; Zhang, S.; Zhang, Y.; Tian, Q.; Xu, C. Dr2-net: Deep residual reconstruction network for image compressive sensing. Neurocomputing 2019, 359, 483–493. [Google Scholar] [CrossRef]
  21. Yu, F.; Qian, Y.; Zhang, X.; Gil-Ureta, F.; Jackson, B.; Bennett, E.; Zhang, H. Dpa-net: Structured 3d abstraction from sparse views via differentiable primitive assembly. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2025; pp. 454–471. [Google Scholar]
  22. Song, J.; Chen, B.; Zhang, J. Dynamic path-controllable deep unfolding network for compressive sensing. IEEE Trans. Image Process. 2023, 32, 2202–2214. [Google Scholar] [CrossRef]
  23. Gan, H.; Wang, X.; He, L.; Liu, J. Learned two-step iterative shrinkage thresholding algorithm for deep compressive sensing. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 3943–3956. [Google Scholar] [CrossRef]
  24. Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
  25. Dao, T.; Gu, A. Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. arXiv 2024, arXiv:2405.21060. [Google Scholar]
  26. Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv 2024, arXiv:2401.09417. [Google Scholar]
  27. Wang, C.; Tsepa, O.; Ma, J.; Wang, B. Graph-mamba: Towards long-range graph sequence modeling with selective state spaces. arXiv 2024, arXiv:2402.00789. [Google Scholar]
  28. Huang, T.; Pei, X.; You, S.; Wang, F.; Qian, C.; Xu, C. Localmamba: Visual state space model with windowed selective scan. arXiv 2024, arXiv:2403.09338. [Google Scholar]
  29. Pei, X.; Huang, T.; Xu, C. Efficientvmamba: Atrous selective scan for light weight visual mamba. arXiv 2024, arXiv:2403.09977. [Google Scholar]
  30. Li, S.; Singh, H.; Grover, A. Mamba-nd: Selective state space modeling for multi-dimensional data. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2025; pp. 75–92. [Google Scholar]
  31. Behrouz, A.; Santacatterina, M.; Zabih, R. Mambamixer: Efficient selective state space models with dual token and channel selection. arXiv 2024, arXiv:2403.19888. [Google Scholar]
  32. Wang, F.; Wang, J.; Ren, S.; Wei, G.; Mei, J.; Shao, W.; Zhou, Y.; Yuille, A.; Xie, C. Mamba-r: Vision mamba also needs registers. arXiv 2024, arXiv:2405.14858. [Google Scholar]
  33. Yang, C.; Chen, Z.; Espinosa, M.; Ericsson, L.; Wang, Z.; Liu, J.; Crowley, E.J. Plainmamba: Improving non-hierarchical mamba in visual recognition. arXiv 2024, arXiv:2403.17695. [Google Scholar]
  34. Ma, J.; Li, F.; Wang, B. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv 2024, arXiv:2401.04722. [Google Scholar]
  35. Liu, J.; Yang, H.; Zhou, H.Y.; Xi, Y.; Yu, L.; Li, C.; Liang, Y.; Shi, G.; Yu, Y.; Zhang, S.; et al. Swin-umamba: Mamba-based unet with imagenet-based pretraining. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2024; pp. 615–625. [Google Scholar]
  36. Wang, Z.; Zheng, J.Q.; Zhang, Y.; Cui, G.; Li, L. Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arXiv 2024, arXiv:2402.05079. [Google Scholar]
  37. Arbelaez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 898–916. [Google Scholar] [CrossRef]
  38. Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
  39. Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; IEEE: Piscataway, NJ, USA, 2001; Volume 2, pp. 416–423. [Google Scholar]
  40. Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the 23rd British Machine Vision Conference (BMVC), Surrey, UK, 3–7 September 2012. [Google Scholar]
  41. Ye, D.; Ni, Z.; Wang, H.; Zhang, J.; Wang, S.; Kwong, S. CSformer: Bridging convolution and transformer for compressive sensing. IEEE Trans. Image Process. 2023, 32, 2827–2842. [Google Scholar] [CrossRef]
  42. Guo, Z.; Gan, H. CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 25086–25095. [Google Scholar]
Figure 1. System model.
Figure 1. System model.
Sensors 25 01026 g001
Figure 2. Reconstruction results for satellite sensing images using SSM-Net and other methods. Sampling rates τ are 0.04 for the first row and 0.1 for the second row. Please zoom in for better comparison.
Figure 2. Reconstruction results for satellite sensing images using SSM-Net and other methods. Sampling rates τ are 0.04 for the first row and 0.1 for the second row. Please zoom in for better comparison.
Sensors 25 01026 g002
Figure 3. Reconstruction results for high-resolution images using SSM-Net and other methods. Sampling rates τ are 0.04 for the first row and 0.25 for the second row. Please zoom in for better comparison.
Figure 3. Reconstruction results for high-resolution images using SSM-Net and other methods. Sampling rates τ are 0.04 for the first row and 0.25 for the second row. Please zoom in for better comparison.
Sensors 25 01026 g003
Figure 4. Noise robustness comparison. Visual analysis of different CS methods on images from the BSD100 dataset at sampling rates τ 0.04 , 0.10 , 0.25 . Gaussian noise with variances σ 0.001 , 0.002 , 0.004 was introduced. Note the effectiveness in recovering the images.
Figure 4. Noise robustness comparison. Visual analysis of different CS methods on images from the BSD100 dataset at sampling rates τ 0.04 , 0.10 , 0.25 . Gaussian noise with variances σ 0.001 , 0.002 , 0.004 was introduced. Note the effectiveness in recovering the images.
Sensors 25 01026 g004
Figure 5. PSNR and SSIM changes in models trained in different ways as training time increases.
Figure 5. PSNR and SSIM changes in models trained in different ways as training time increases.
Sensors 25 01026 g005
Figure 6. Comparison of GFLOPs, parameters, memory consumption, and energy consumption for a 256 × 256 pixel image with τ = 0.1 .
Figure 6. Comparison of GFLOPs, parameters, memory consumption, and energy consumption for a 256 × 256 pixel image with τ = 0.1 .
Sensors 25 01026 g006
Figure 7. Effectiveness of Mamba and FISTA in SSM-Net reconstruction.
Figure 7. Effectiveness of Mamba and FISTA in SSM-Net reconstruction.
Sensors 25 01026 g007
Table 1. PSNR (dB) and SSIM comparisons of different methods on datasets Urban100, BSD100, and Set5 at multiple sampling rates ( τ { 0.01 , 0.04 , 0.1 , 0.25 , 0.3 , 0.4 , 0.5 } ). The highest value is marked in red, and the second highest value is marked in blue.
Table 1. PSNR (dB) and SSIM comparisons of different methods on datasets Urban100, BSD100, and Set5 at multiple sampling rates ( τ { 0.01 , 0.04 , 0.1 , 0.25 , 0.3 , 0.4 , 0.5 } ). The highest value is marked in red, and the second highest value is marked in blue.
DatasetsMethods0.010.040.100.250.300.400.50
PSNR SSIM PSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
UCMercedISTA-Net+ (CVPR2018)17.820.412721.650.589225.440.716529.780.882531.040.906133.320.937335.360.9571
CSNet (TIP2019)18.890.443822.690.617326.170.778930.760.903332.520.927834.830.946236.470.9627
AMP-Net (TIP2021)19.120.456723.740.651327.900.795932.550.917433.440.931735.260.952137.650.9702
TransCS (TIP2022)21.760.483625.180.695029.410.841234.030.930336.120.952738.450.969240.560.9785
CSformer (TIP2023)21.520.479325.210.695729.470.843734.570.938735.970.951338.210.967540.230.9771
CPP-Net (CVPR2024)21.400.478325.140.690829.460.842734.230.932835.550.949837.110.962939.310.9758
SSM-Net (Ours)21.380.470525.280.695929.530.844934.710.939835.890.951136.120.956837.330.9613
Set5ISTA-Net+ (CVPR2018)20.250.560823.420.628728.470.830934.020.918835.380.939737.440.957339.250.9689
CSNet (TIP2019)20.150.544727.120.798831.070.892535.890.947337.250.947338.910.961140.740.9691
AMP-Net (TIP2021)20.450.556327.250.806531.430.897736.250.951437.820.958339.550.969441.480.9756
TransCS (TIP2022)22.980.628729.020.831732.740.923537.260.962538.530.969341.400.977342.420.9846
CSformer (TIP2023)21.840.589229.270.823933.040.924337.040.958338.440.961440.620.972342.370.9793
CPP-Net (CVPR2024)22.630.621429.190.825932.640.924037.120.959238.240.959840.900.974642.250.9769
SSM-Net (Ours)23.370.631129.320.837733.170.925037.610.964738.740.971241.810.979642.720.9849
Urban100ISTA-Net+ (CVPR2018)15.230.412719.650.535123.440.716528.780.882530.040.906132.320.937334.360.9571
CSNet (TIP2019)15.890.443819.690.597323.170.778928.760.903329.520.927832.830.946233.470.9627
AMP-Net (TIP2021)16.120.456720.740.601323.900.785929.550.917430.440.931733.260.952134.650.9702
TransCS (TIP2022)19.530.510422.300.693825.870.833430.460.921531.470.944633.490.962134.580.9711
CSformer (TIP2023)18.920.489322.570.678125.270.821330.570.928731.070.931333.210.957534.230.9721
CPP-Net (CVPR2024)18.830.485622.190.675925.320.825530.220.926931.170.935833.160.954934.600.9699
SSM-Net (Ours)19.180.499221.790.685725.430.829330.780.939831.210.937233.340.958334.930.9725
BSD100ISTA-Net+ (CVPR2018)17.450.423422.210.539724.890.683728.830.837929.920.867331.770.906333.520.9357
CSNet (TIP2019)18.230.456723.770.649726.310.771430.040.899730.690.913532.940.929934.960.9478
AMP-Net (TIP2021)18.670.462324.040.653726.160.768830.130.900230.880.914233.240.937935.430.9517
TransCS (TIP2022)22.260.484824.680.663327.760.794531.540.903132.270.921534.520.949936.520.9671
CSformer (TIP2023)22.420.489224.960.670926.540.774930.750.902231.540.918934.210.944335.910.9589
CPP-Net (CVPR2024)22.150.483624.780.668427.380.789530.950.901231.970.920734.350.947836.240.9584
SSM-Net (Ours)21.870.479725.120.669027.600.790331.450.891032.790.923734.960.952236.890.9577
Table 2. WS-PSNR (dB) and MSSIM comparisons of different methods on datasets Urban100, BSD100, and Set5 at multiple sampling rates ( τ { 0.01 , 0.04 , 0.1 , 0.25 , 0.5 } ). The highest value is marked in red, and the second highest value is marked in blue.
Table 2. WS-PSNR (dB) and MSSIM comparisons of different methods on datasets Urban100, BSD100, and Set5 at multiple sampling rates ( τ { 0.01 , 0.04 , 0.1 , 0.25 , 0.5 } ). The highest value is marked in red, and the second highest value is marked in blue.
DatasetsMethods0.010.040.100.250.50
WS-PSNRMSSIMWS-PSNRMSSIMWS-PSNRMSSIMWS-PSNRMSSIMWS-PSNRMSSIM
UCMercedISTA-Net+ (CVPR2018)16.030.445718.180.607921.880.735127.690.891333.280.9565
CSNet (TIP2019)17.110.499820.120.635423.980.797228.630.912134.390.9611
AMP-Net (TIP2021)17.370.513920.190.669824.320.814230.480.925935.570.9694
TransCS (TIP2022)19.870.532122.210.762126.920.892231.970.948738.410.9881
CSformer (TIP2023)19.620.530822.540.768227.180.895732.430.957238.370.9868
CPP-Net (CVPR2024)18.480.527122.130.760727.010.893732.050.950238.180.9849
SSM-Net (Ours)18.870.527422.670.769027.260.896532.490.959036.310.9778
Set5ISTA-Net+ (CVPR2018)16.680.579118.850.646926.890.848731.950.927135.180.9579
CSNet (TIP2019)17.590.562919.920.816129.490.910833.810.955636.240.9662
AMP-Net (TIP2021)17.880.574623.680.823929.860.915934.180.959736.410.9688
TransCS (TIP2022)19.320.655224.970.878430.180.947935.190.970837.540.9745
CSformer (TIP2023)19.120.654225.080.877630.470.951534.970.966637.910.9793
CPP-Net (CVPR2024)19.270.652124.650.874930.070.942734.950.967537.420.9762
SSM-Net (Ours)19.960.658925.170.879830.640.952234.830.972837.700.9788
Urban100ISTA-Net+ (CVPR2018)13.650.430917.080.553420.870.734826.710.890932.290.9504
CSNet (TIP2019)14.330.461117.120.615621.590.797226.680.911731.400.9520
AMP-Net (TIP2021)14.570.474218.170.619622.330.842927.480.925731.580.9595
TransCS (TIP2022)16.850.527919.980.761824.290.899128.390.929832.160.9704
CSformer (TIP2023)15.430.519719.720.755422.980.882128.490.937032.510.9714
CPP-Net (CVPR2024)15.240.518219.010.753223.020.883228.050.935231.840.9689
SSM-Net (Ours)15.890.520219.150.754423.110.883927.290.937731.780.9629
BSD100ISTA-Net+ (CVPR2018)15.880.441717.640.558120.320.701924.780.846230.450.9449
CSNet (TIP2019)16.660.475120.200.668022.740.789726.810.908031.890.9571
AMP-Net (TIP2021)17.100.480620.420.672022.590.787127.060.908531.360.9510
TransCS (TIP2022)18.690.564221.110.741625.190.857228.230.921434.450.9699
CSformer (TIP2023)18.850.566621.390.749224.340.854127.890.919933.840.9682
CPP-Net (CVPR2024)18.580.560721.210.742324.180.847827.840.918533.170.9677
SSM-Net (Ours)18.490.562121.470.746224.560.855227.920.918232.530.9583
Table 3. PSNR (dB) and SSIM comparisons on BSD100 with different noise levels σ and various sampling rates τ .
Table 3. PSNR (dB) and SSIM comparisons on BSD100 with different noise levels σ and various sampling rates τ .
σ τ AMP-NetCsformerSSM-Net
PSNRSSIMPSNRSSIMPSNRSSIM
0.0010.0423.280.547524.120.548324.230.5509
0.1025.320.655826.210.672226.380.6760
0.2527.730.793028.540.793228.750.7946
0.0020.0422.590.487923.410.492323.540.4957
0.1024.170.605225.310.619825.440.6241
0.2527.230.758327.330.751227.460.7537
0.0040.0420.560.431722.560.424622.740.4262
0.1022.530.528924.180.531224.310.5331
0.2525.410.682525.890.694326.060.6972
Table 4. Time consumption (in seconds) of different methods under various compression rates τ on GPU: RTX 4090.
Table 4. Time consumption (in seconds) of different methods under various compression rates τ on GPU: RTX 4090.
MethodsGPU Time Consumption (s)Platform
τ = 0.10 τ = 0.25 τ = 0.30 τ = 0.40 τ = 0.50
ISTA-Net+0.02270.02320.02380.02410.0247RTX 4090
CSNet0.00780.00840.00890.00950.0099
CSformer0.04690.04710.04760.04800.0486
AMP-Net0.01650.01770.01810.01890.0194
TransCS0.02410.02450.02470.02510.0257
SSM-Net0.02010.02030.02020.02030.0203
Table 5. Comparisons of PSNR results (dB) and GPU runtime on 4090 for different methods with an input image of pixel size 256 × 256.
Table 5. Comparisons of PSNR results (dB) and GPU runtime on 4090 for different methods with an input image of pixel size 256 × 256.
Methods τ = 0.10 τ = 0.25 τ = 0.30
PSNR (dB)GPU Time (s)PSNR (dB)GPU Time (s)PSNR (dBGPU Time (s)
Without Mamba26.580.018228.470.018429.840.0185
Without FISTA25.400.017527.850.018029.100.0182
Mamba replaced with Transformer28.740.023430.120.02431.890.025
SSM-Net29.320.020130.780.020432.450.0207
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, X.; Chen, B.; Yao, X.; Yuan, Y. SSM-Net: Enhancing Compressed Sensing Image Reconstruction with Mamba Architecture and Fast Iterative Shrinking Threshold Algorithm Optimization. Sensors 2025, 25, 1026. https://doi.org/10.3390/s25041026

AMA Style

Gao X, Chen B, Yao X, Yuan Y. SSM-Net: Enhancing Compressed Sensing Image Reconstruction with Mamba Architecture and Fast Iterative Shrinking Threshold Algorithm Optimization. Sensors. 2025; 25(4):1026. https://doi.org/10.3390/s25041026

Chicago/Turabian Style

Gao, Xianwei, Bi Chen, Xiang Yao, and Ye Yuan. 2025. "SSM-Net: Enhancing Compressed Sensing Image Reconstruction with Mamba Architecture and Fast Iterative Shrinking Threshold Algorithm Optimization" Sensors 25, no. 4: 1026. https://doi.org/10.3390/s25041026

APA Style

Gao, X., Chen, B., Yao, X., & Yuan, Y. (2025). SSM-Net: Enhancing Compressed Sensing Image Reconstruction with Mamba Architecture and Fast Iterative Shrinking Threshold Algorithm Optimization. Sensors, 25(4), 1026. https://doi.org/10.3390/s25041026

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop