Next Article in Journal
2D Mutation-Based Elitist Genetic Algorithm for Optimal Design of Transmissive Linear-to-Circular Polarization Conversion Metasurfaces
Previous Article in Journal
Development and Pilot-Scale Testing of Vibro-Briquetting Technology for Fine Chromite Raw Materials
Previous Article in Special Issue
Edge-Enhanced CrackNet for Underwater Crack Detection in Concrete Dams
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MambaUSR: Mamba and Frequency Interaction Network for Underwater Image Super-Resolution

1
Nanjing Hydraulic Research Institute, Nanjing 210029, China
2
College of Information Science and Engineering, Hohai University, Changzhou 213200, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2025, 15(20), 11263; https://doi.org/10.3390/app152011263
Submission received: 24 September 2025 / Revised: 12 October 2025 / Accepted: 13 October 2025 / Published: 21 October 2025

Abstract

In recent years, underwater image super-resolution (SR) reconstruction has increasingly become a core focus of underwater machine vision. Light scattering and refraction in underwater environments result in images with blurred details, low contrast, color distortions, and multiple visual artifacts. Despite the promising results achieved by deep learning in underwater SR tasks, global and frequency-domain information remain poorly addressed. In this study, we introduce a novel underwater SR method based on the Vision State-Space Model, dubbed MambaUSR. At its core, we design the Frequency State-Space Module (FSSM), which integrates two complementary components: the Visual State-Space Module (VSSM) and the Frequency-Assisted Enhancement Module (FAEM). The VSSM models long-range dependencies to enhance global structural consistency and contrast, while the FAEM employs Fast Fourier Transform combined with channel attention to extract high-frequency details, thereby improving the fidelity and naturalness of reconstructed images. Comprehensive evaluations on benchmark datasets confirm that MambaUSR delivers superior performance in underwater image reconstruction.

1. Introduction

Underwater images are frequently degraded by color distortion, low contrast, and blurred details, primarily due to the complex optical phenomena of light refraction, absorption, and scattering in aquatic environments. These degradations lead to poor visual quality, which can significantly compromise the performance of underwater robots in tasks such as object recognition, route navigation, and environmental observation. To address this issue, underwater image super-resolution (SR) reconstruction aims to generate an underwater high-resolution (HR) image from its degraded underwater low-resolution (LR) version. SR technology achieves better visual results by effectively recovering fine-grained details and high-frequency components.
Recent advancements in deep learning have fostered a wide spectrum of SR methodologies, encompassing convolutional neural networks (CNNs) [1,2], Transformers [3,4], State-Space Model (SSMs) [5,6], and Fast Fourier Transform (FFT) [7,8], all aimed at improving the performance of underwater image SR. For instance, Islam et al. [9] introduced a convolutional neural network named SRDRM tailored for underwater SR and further extended it into SRDRM-GAN by incorporating a Markovian PatchGAN as the discriminator. To facilitate more effective training of these models, they also released a large-scale dataset, called USR-248. Subsequently, Islam et al. [10] presented a deep simultaneous enhancement and super-resolution (Deep SESR) framework, an end-to-end framework that simultaneously conducts enhancement and SR, evaluated on the UFO-120 dataset. AMPCNet [11] established interconnections between residual and dilated modules, enabling effective merging of heterogeneous feature information. In a similar vein, MSIDN [12] was proposed to balance underwater SR accuracy with deployment efficiency in real-world applications. Yang et al. [13] developed a lightweight encoder-decoder architecture, termed LAFFNet, which integrates multiple adaptive feature fusion modules to enhance underwater image quality. Sharma et al. [14] presented Deep WaveNet, a deep CNN designed with a wavelength-guided multi-contextual architecture to address SR tasks effectively. Although CNN-based approaches have achieved notable success in underwater image restoration, their dependence on local receptive fields constrains the modeling of global context, making them less effective in addressing complex illumination variations. In contrast, recent studies have explored Transformer-based architectures, which exhibit strong potential in underwater applications by capturing long-range dependencies through multi-head self-attention. For example, Peng et al. [15] proposed a Transformer framework with a U-shaped design, aimed at strengthening the model’s capacity to selectively emphasize severely attenuated color channels and spatially degraded regions. UIR-Net [16] was a concise yet efficient approach tailored to restore clear underwater images from inputs affected by substantial particulate matter and marine light interference. URSCT [17] was developed to perform joint enhancement and SR, demonstrating the superior capacity to model global context in nonhomogeneous underwater media. Despite their strengths, Transformer-based methods are challenged by the quadratic cost of self-attention as the number of patches increases, significantly limiting their scalability for underwater tasks. Figure 1 presents a local attribution map (LAM) [18] for different methods. The red regions in the CNN-based SRDRM [9] exhibit limitations, while the Mamba-based approach achieves higher DI values. This phenomenon is crucial for underwater SR tasks, as it requires both focus on critical regions and broader global modeling capabilities.
SSM [19] has recently gained attention as a compelling alternative for capturing long-range dependencies, offering both theoretical rigor and computational. Gu and Dao [20] introduced Mamba, a foundational architecture that incorporates a selection mechanism into SSM, enabling adaptive information retention or suppression based on token-wise context along the sequence or scan path. Guan et al. [21] introduced WaterMamba, a novel method built upon SSM with linear computational complexity. This approach effectively captures pixel-wise information propagation across four spatial directions and channel-wise interactions, thereby overcoming limitations associated with pixel and channel dependency modeling. Liu et al. [22] introduced an SSM-based vision backbone network called VMamba that converts image patches into sequential representations along horizontal and vertical axes, enabling bidirectional scanning to capture structured spatial dependencies in both directions. An innovative O-shaped dual-branch network, termed O-Mamba [23], was proposed to strengthen spatial—channel interaction and effectively exploit multi-scale features for improved representation learning. Beyond spatial modeling, frequency-domain information has recently gained attention for its ability to complement spatial features and capture global structures more effectively. Yao et al. [24] shifted their focus to the Fourier domain by integrating frequency-domain information with spatial features for the enhancement of remote sensing low-light images. Wang et al. [25] developed a spatial—frequency mutual network for face SR, leveraging information from both domains to improve reconstruction fidelity. A novel SR approach named multi-scale FFT-based attention network (MSFFTAN) [26], introduced an FFT-based residual block that integrates both image-domain and Fourier-domain branches, allowing concurrent extraction of fine-grained details and global structural information.
The above analysis suggests that existing methods have achieved notable progress in addressing SR problems. However, several key challenges remain to be explored. On the one hand, with comparable parameter counts and computational complexity, Mamba-based approaches generally achieve superior performance over the more sophisticated CNN, Transformer, and hybrid models. On the other hand, the effective integration of frequency-domain information with spatial features for underwater image SR remains an open problem. Developing efficient models that can fully leverage this complementary information across domains is, therefore, a critical direction for future research.
To alleviate the above disadvantages, we first try to apply Mamba to underwater SR and propose a novel method, dubbed MambaUSR. Specifically, we present a Frequency State-space Module (FSSM) that is composed mainly of a VSSM and a frequency-assisted enhancement module (FAEM). VSSM can effectively enhance long-range dependency modeling to enhance global contrast in underwater images. In FAEM, the FFT decomposes the features into phase and amplitude components, while channel attention highlights important information, thereby enhancing the high-frequency details of underwater images. Experimental results reveal that our MambaUSR attains comparable performance to prevailing underwater SR methods with fewer computational resources.
In short, our main contribution is reflected in three aspects:
  • We present the first application of SSM to underwater SR, showcasing the potential of Mamba for efficient and effective global modeling in underwater image processing.
  • We propose a frequency state-space module (FSSM) that explores long-term dependencies to resolve detail blurring and low contrast in underwater images.
  • We devise a frequency-assisted enhancement module (FAEM) that integrates FFT with CA to efficiently extract high-frequency features, resulting in a more natural reconstructed image.
The remainder of this paper is organized as follows. Section 2 reviews the related work on underwater image SR and frequency-domain modeling. Section 3 details the proposed MambaUSR framework and its core components. Section 4 presents the experimental setup, ablation analyses, and quantitative as well as qualitative results. Section 5 concludes the paper.

2. Related Work

2.1. Underwater Image SR

Deep learning has substantially improved underwater image SR by enabling the extraction of rich and complex features from large-scale datasets. SRDRM [9] was a generative model based on deep residual networks for underwater SR, capable of effectively recovering the global contrast and texture of images. Building upon this, a residual-in-residual generative architecture was introduced in Deep SESR [10] to boost perceptual quality, focusing on the preservation of fine structural details and the enhancement of overall visual realism. In PAL [27], the authors incorporated a CNN with channel-wise attention that guided the network to focus on the most salient areas. To address image degradation in nonhomogeneous media, Wang et al. [28] presented a DIMN incorporating spatial-aware attention blocks and multi-scale Transformer blocks to direct inductive bias, thereby better adapting to spatially varying degradation patterns. Pramanick et al. [29] proposed Lit-Net, a lightweight multi-stage network for multi-resolution image analysis. It employs a multi-resolution attention network and a multi-scale attention network to capture varying receptive fields and obtain rich feature representations. In addition to CNN-based models, Transformer architectures have also been introduced to address underwater restoration challenges. Dharejo et al. [30] employed the Swin Transformer, incorporating wavelet blocks to mitigate information loss caused by irreversible downsampling. Ren et al. [17] integrated Swin Transformer with U-Net architecture to enhance global dependency capture capabilities. More recently, Mamba-based models have demonstrated the ability to achieve efficient global modeling with reduced computational complexity. In the realm of underwater image enhancement, Guan et al. [21] put forward the WaterMamba algorithm, which effectively models and leverages multi-scale image features through spatial—channel omnidirectional selective scanning blocks. Chang et al. [31] presented a mamba-enhanced spectral-attentive wavelet network, which achieves complementary learning between the spatial and frequency domains, thereby yielding superior image restoration results. The potential of Mamba in underwater SR tasks has not yet been fully explored, to our knowledge. Unlike WaterMamba, which focuses on underwater image enhancement without explicit high-frequency reconstruction, and spectral-attentive wavelet networks, which rely on Transformer-based spectral attention for spatial—frequency coupling, the proposed MambaUSR integrates vision state-space modeling and frequency-assisted enhancement in a unified, linear-complexity framework specifically tailored for underwater SR. Compared to natural images, underwater images suffer from significant detail loss, reduced contrast, and color distortion, posing greater challenges for deep learning models in achieving high-quality reconstruction.

2.2. Fourier Transform

FFT serves as a global analysis technique that effectively captures long-term dependencies by representing signals in the frequency domain [32]. With this in mind, numerous computer vision tasks leverage FFT for frequency-domain modeling. Shao et al. [33] replaced traditional self-attention in Vision Transformers with FFT-based operations, effectively capturing both high- and low-frequency components and achieving linear computational complexity. Wang et al. [25] used FFT to implement an image-size receptive field, thus better capturing global face structure. Huang et al. [34] put forth a deep exposure correction model operating in the Fourier domain, where amplitude and phase are independently learned via dedicated branches to recover brightness and structure in low-quality images. Zhang et al. [35] utilized FFT to obtain frequency-domain information and collaborated with spatial domain information, preserving underwater image details and reducing noise. Inspired by these studies, we combine the global context capture capability of the Mamba framework with the frequency-domain representation of FFT, enabling complementary spatial—frequency information and improving the quality of underwater image reconstruction.

3. Proposed Method

3.1. Overall Network Architecture

Our proposed MambaUSR involves three stages, as shown in Figure 2. Stage 1 performs a 3 × 3 to abstract initial features. Stage 2 is the core part of our MambaUSR, constituted by T FrequencyState-Space Groups (FSSG) for capturing high-level information features. Each FSSG is composed of G Frequency State-Space Modules (FSSMs) followed by a 3 × 3 convolution. Stage 3 assembles shallow and deep features to yield high-quality SR outputs. Given a degraded input image I L R R H × W × 3 , we can obtain an HR image I S R R r H × r W × 3 after MambaUSR processing, where r denotes the scale factor.
F 0 = Conv 3 × 3 ( I LR )
F t = H MIEM t ( F t 1 ) = H MIEM t ( H MIEM t 1 ( H MIEM 1 ( F 0 ) ) )
I SR = H HU ( F T ) + H LU ( I LR )
H H U ( · ) and H L U ( · ) correspond to the upsampling operations of the deep features and the input LR image, respectively. Specifically, H H U ( · ) integrates a 3 × 3 convolutional layer and sub-pixel convolutional layer, H L U ( · ) integrates a 5 × 5 convolutional layer and a sub-pixel convolutional layer.
We optimize MambaUSR using L 1 loss. Given a training dataset { I n L R , I n H R } n = 1 N , where I H R denotes ground truth images. The loss calculation formula is:
L ( Θ ) = 1 N n = 1 N I n H R MambaUSR ( I n L R ) 1
where Θ refers to the trainable parameters of MambaUSR, and n is number of training samples (e.g., N = 1060 for USR-248 dataset and N = 1500 for UFO-120 dataset).

3.2. Frequency State-Space Modules

To better mitigate the blurring of details and color bias effects of underwater light conditions, we put forward a frequency state-space module (FSSM).
The modular design follows the literature [36] that is inspired by the Transformer architecture [37], as reported in Figure 2a. For the g-th FSSM, the input features Z g 1 R H × W × C are first normalized using Layer Normalization (LN), followed by long-range dependency modeling through VSSM. Besides, a learnable parameter ϖ is used to dynamically control the information flow:
Z ¯ g = V S S M ( L N ( Z g 1 ) ) + ϖ · Z g 1
where L N ( · ) represents the operation of layer normalization and V S S M ( · ) denotes the Vision State-Space Module.
Furthermore, since SSM treats the flattened feature mapping as a one-dimensional labeled sequence, the neighborhood pixels of the flattened strategy in the sequence are easily lost [35]. To address this problem, another LayerNorm is employed, followed by a 1 × 1 convolution layer to compensate for local features. Also, to boost the modeling capability of the VSSM while attaining rich high-frequency cues, we have designed the Frequency-Assisted Enhancement Module (FAEM), as detailed in Section 3.4. The above process can be calculated as:
Z g = F A E M ( C o n v ( L N ( Z ¯ g ) ) ) + ϖ · Z ¯ g
where F A E M ( · ) denotes the operation of FAEM. ϖ is a learnable parameter that adaptively scales the residual contribution from Z ¯ g .

3.3. Vision State-Space Module

The Vision State-Space Module (VSSM) can catch long-range dependencies utilizing a state-space equation, as illustrated in Figure 2b. The input features Z ˙ g R H × W × C is handled by two parallel paths [22]. In the first path, input feature channels are initially expanded to λ C by a linear layer, then processed by deep-wise convolution and the activation function SiLU, and finally processed by utilizing the 2D Selective Scan Module (2D-SSM) layer and LayerNorm. Simultaneously, the second pathway applies a linear layer and an activation function to input features. The outputs of these two paths are fused utilizing Hadamard product and then projected back to C channels using a linear layer, obtaining the final output Z ¨ g R H × W × C :
Z ˙ g = LN ( 2 D-SSM ( SiLU ( DWConv ( Linear ( Z g ) ) ) ) )
Z ˙ g = SiLU ( Linear ( Z ˙ g ) )
Z ¨ g = Linear ( Z ˙ g Z ˙ g )
where D W C o n v ( · ) is depth-wise convolution. ⊙ is the Hadamard product. Figure 2c reveals that the 2D-SSM module first transforms a 2D feature map into four sets of 1D sequences by scanning along four directional axes. Each sequence is then processed, leveraging discrete state-space equations to catch the global context. After this temporal modeling, the outputs from all directions are integrated and rearranged to recreate the initial 2D spatial layout. In this way, 2D-SSM can effectively preserve image spatial information and enhance feature acquisition processing.

3.4. Frequency-Assisted Enhancement Module

The complex optical properties of underwater environments often result in substantial loss of detailed information in the captured images. Therefore, we put forth the Frequency-assisted Enhancement Module (FAEM) that leverages FFT to effectively extract and enhance the high-frequency cues, making the reconstructed image more natural, as depicted in Figure 3b. Thanks to FFT, the amplitude and phase components capture the receptive field of the image size. Remarkably, the phase component provides rich structural information that can greatly help in target reconstruction. Unlike the frequency block (FRB shown in Figure 3a) proposed in the literature [25], we advocate the use of channel attention (CA) [38] instead of convolutional operations to make the network more focused on informative channels.
Firstly, the input features Z ¯ g R H × W × C are projected into frequency space utilizing FFT calculated as f r e = F F T ( Z ¯ g ) . Then, the amplitude and phase components are obtained separately:
A = angle ( f r e ) , P = abs ( f r e )
Subsequently, we exploit CA to strengthen informative feature representations and mitigate the impact of noise in the frequency domain. Concretely, the missing structural information in the underwater image can be effectively augmented by CA on the phase branch, which is then mapped back into the image space along A ¯ with the original amplitude components using IFFT. The amplitude component leverages CA to further highlight important image intensity information, which is then mapped back into image space along P ¯ with the original phase component using IFFT. The enhanced amplitude branch can be expressed as:
A ¯ = C A ( A ) + A r e a l = A ¯ × cos P imag = A ¯ × sin P f r e A = c o m p l e x r e a l , i m a g F A = a b s I F F T f r e A
where C A ( · ) is the operation of channel attention. Analogously, the enhanced phase branch can be expressed as:
P ¯ = C A ( P ) + P r e a l = A × cos P ¯ imag = A × sin P ¯ f r e P = c o m p l e x r e a l , i m a g F P = a b s I F F T f r e P
Finally, we integrate these two different frequency-domain features F A and F P with a 1 × 1 convolutional layer, while incorporating a skip connection to boost the training efficiency.
Z ¯ g = C o n v ( F A + F P ) + Z ¯ g
where I F F T ( · ) denotes inverse Fourier transform operation. The implementation of FAEM is depicted in Algorithm 1.
Unlike FRB (shown in Figure 3a) that applies a uniform FFT-IFFT residual mapping, FAEM performs content-adaptive spectral selection and fusion. After transforming features to the frequency domain, two lightweight branches, respectively, emphasize information-rich mid/high-frequency components and stabilize low-frequency structures. Channel attention adaptively reweights spectral bands to suppress noise-dominated frequencies caused by underwater scattering. The refined spectrum is then fused back into the spatial stream via IFFT and residual connections. Compared to FRB, this selective enhancement technique more effectively amplifies structural and textural details, yielding clearer, artifact-free reconstructions.
Algorithm 1: The implementation of Frequency-assisted Enhancement Module (FAEM)
Input: Input Feature Matrix Z g R H × W × C
Output: Resultant Matrix Z g R H × W × C
  • Fourier Projection: Transform the input feature Z s into the frequency domain using FFT:
    f r e = FFT ( Z g ) ;
    Obtain amplitude and phase components:
    A = angle ( f r e ) , P = abs ( f r e ) .
  • Amplitude Enhancement Branch:
    • Apply CA to the amplitude feature to obtain A ¯ = CA ( A ) + A .
    • Reconstruct the complex frequency representation:
      r e a l = A ¯ · cos ( P ) ,    i m a g = A ¯ · sin ( P ) .
    • Combine the real and imaginary parts: f r e A = complex ( r e a l , i m a g ) .
    • Transform back to spatial domain: F A = abs ( IFFT ( f r e A ) ) .
  • Phase Enhancement Branch:
    • Apply CA to the phase feature to obtain P ¯ = CA ( P ) + P .
    • Reconstruct frequency components:
      r e a l = A · cos ( P ¯ ) ,    i m a g = A · sin ( P ¯ ) .
    • Combine real and imaginary parts: f r e P = complex ( r e a l , i m a g ) .
    • Transform back to spatial domain: F P = abs ( IFFT ( f r e P ) ) .
  • Fusion and Output: Apply a 1 × 1 convolution to fuse the outputs of the amplitude and phase branches and merge it with the input feature via a residual connection: Z g = Conv 1 × 1 ( F ) + Z g .

4. Experiments

4.1. Dataset and Experimental Setup

Experiments on the benchmark datasets USR-248 [9] and UFO-120 [10] are carried out to evaluate the performance of our MambaUSR. Specifically, USR-248 contains 1060 training pairs and 248 testing pairs. In this dataset, HR images are downscaled via bicubic interpolation at ×2, ×4, and ×8, and are further degraded with 20% Gaussian noise to produce the corresponding LR images. The UFO-120 dataset contains 1500 training pairs and 120 testing pairs, with downsampling performed at scaling factors of ×2, ×3, and ×4. To remain consistent with mainstream algorithms, we evaluate the proposed networks using the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Underwater Image Quality Metric (UIQM). In this study, both FSSM and FSSG are set to 6 to acquire a lightweight network.
The optimization is performed using the Adam optimizer, with the hyperparameters set as follows: β 1 = 0.9 , β 2 = 0.999 , and ε = 10 8 . The learning rate is initially set to l e 3 , reducing to half every 200 epochs. Each batch comprises 12 LR patches, each measuring 80 × 80, for the SR task. The hyperparameters mentioned above are selected empirically to strike a balance between convergence stability and computational efficiency. All experiments are executed using PyTorch (https://pytorch.org/) on a system powered by an NVIDIA RTX 4090 GPU. To ensure transparency and reproducibility of the experiments, all results reported in this study represent the single optimal model. Results for all comparison methods are sourced from the literature [14].

4.2. Ablation Study

An ablation study is executed on the USR-248 dataset, examining the performance impact of each individual module within the proposed framework. We sequentially remove FAEM and VSSM for retraining. Notably, to further highlight the advantages of FAEM, we utilize the FRB (shown in Figure 3a) from the literature [25] for replacement, named MambaUSR w FRB. Table 1 summarizes ablation study results for our MambaUSR.
Effect of FAEM. When FAEM is replaced by FFT (MambaUSR w FRB), there is a slight reduction in PSNR to 24.93 and SSIM to 0.6709, indicating that FAEM is more effective than FFT in enhancing image quality and maintaining structural similarity. The UIQM score of 2.8179 in this variant also suggests a minor decrease in perceptual quality. In addition, when FAEM is completely removed (MambaUSR w/o FAEM), the PSNR marginally decreases to 25.02, but there is a slight reduction in SSIM to 0.6758, further underscoring FAEM’s advantage in underwater SR tasks. This is because FAEM performs content-adaptive spectral weighting, selectively enhancing useful high-frequency information, thereby significantly improving perceived quality. Moreover, in Figure 4, we visualize the intermediate feature maps of FRB and FAEM on the 6-th FSSM of the 6-th FSSG. Compared to MambaUSR w FRB, MambaUSR supported by FAEM emphasizes richer detail and edges, particularly in textures of the nudibranch and coral images, indicating that FAEM is better at capturing local high-frequency details.
Effect of VSSM. The exclusion of VSSM (MambaUSR w/o VSSM) results in the lowest SSIM score among all model variations (0.6579) and a reduction in PSNR to 24.91, suggesting that VSSM is instrumental in retaining structural similarity and fine details. The UIQM score of 2.8107 further confirms that VSSM can elevate perceptual quality, highlighting its importance as a critical component for underwater SR. VSSM provides long-range dependency modeling that enforces global contour continuity and repeated-pattern alignment. SSIM is highly sensitive to such structural coherence within local windows; when VSSM is removed, edges become locally inconsistent and textures fragment. In contrast, PSNR is an average pixel error metric dominated by low-frequency content, making it less affected by such misalignments and thus exhibiting smaller variations.
Overall, the ablation study demonstrates that both FAEM and VSSM are integral to MambaUSR’s performance, with FAEM offering advantages over FFT in enhancing structural and perceptual image quality, and VSSM being crucial for preserving structural details and boosting perceptual metrics.

4.3. Comparison with Underwater SR

4.3.1. Comparison on USR-248 Dataset

Our method is evaluated relative to a set of advanced approaches on the USR-248 dataset to demonstrate its effectiveness. Quantitative results of the experiments are listed in Table 2. A clear trend observed in the data is that as the scaling factor increases, the PSNR and SSIM values tend to decrease across all methods, indicating a general decline in performance as the resolution enhancement becomes more challenging. MambaUSR consistently performs well across all scales, especially for PSNR and SSIM. At a scaling factor of ×2, MambaUSR produces a PSNR of 29.75 dB and an SSIM of 0.52, which is among the highest across the methods. Similarly, for ×4 and ×8 scales, MambaUSR demonstrates strong performance, with a PSNR of 26.14 dB and 23.97 dB, respectively. In terms of computational efficiency, MambaUSR stands out as having lower FLOPs compared to many of the other high-performing models, such as ESRGAN and SRDRM, especially as the scaling factor increases. The superior performance of Mamba, particularly in maintaining a high PSNR and SSIM across all scaling factors, suggests that it is well-optimized for handling complex SR tasks while keeping computational demands low.
Figure 5 showcases representative visual reconstruction results on the USR-248 dataset. All models capture some detail at scale factor ×2, but differences in sharpness and texture retention are noticeable. Models like SRGAN and SRDRM-GAN tend to produce artifacts, especially visible in areas with complex patterns and textures, such as the scales of fish or coral structures. In contrast, MambaUSR appears to retain finer details more effectively, maintaining a closer resemblance to the HR reference images. As magnification increases to ×4 and ×8, the challenge of preserving details becomes more evident. At these higher magnifications, most models display significant blurring and loss of structure, with only EDSRGAN, SRDRM-GAN, and MambaUSR demonstrating moderate resilience. Notably, MambaUSR continues to show comparatively better detail retention and fewer distortions, particularly in intricate textures such as fish patterns and coral spots.

4.3.2. Comparison on the UFO-120 Dataset

To demonstrate the efficacy of MambaUSR further, we conduct comparative experiments on the UFO-120 dataset against several representative and widely used models, as listed in Table 3. In terms of PSNR, MambaUSR demonstrates competitive performance across all scaling factors. Notably, at the ×2 scale, it achieves a PSNR of 25.76 dB, outperforming all compared methods except SRGAN (26.11 dB). For larger scale factors (×3 and ×4), MambaUSR remains consistent, outperforming most methods, particularly at ×3, where it records a PSNR of 26.15 dB, higher than SRCNN, SRDRM, and SRDRM-GAN. For SSIM, Mamba maintains a strong performance across all scaling factors, matching Deep WaveNet at ×2 (0.74) and outperforming all methods except URSCT at ×3. For UIQM, Mamba’s performance is also notable, especially at ×2 and ×3, where it achieves high scores (2.93 and 2.97), closely trailing Deep WaveNet and AMPCNet. However, at scale factor ×4, its UIQM slightly decreases to 2.84. The trends observed in the data suggest that MambaUSR provides a balanced performance across all three metrics, particularly excelling in the intermediate scale factor (×3) with consistent PSNR and SSIM values. While other methods like SRGAN and Deep WaveNet achieve higher scores in specific metrics or scaling factors, Mamba offers a reliable and well-rounded performance overall, making it a robust choice for image enhancement tasks.
Figure 6 provides some of the visual reconstruction results on the UFO-120 dataset. An evident reduction in detail preservation is observed across models as the scaling factor increases from ×2 to ×4. MambaUSR consistently performs best, retaining finer details and natural textures that closely resemble the high-resolution reference images, while other models show blurring and artifacts, especially in complex textures.

4.3.3. Application in XLD Hydraulic Project

We apply the proposed MambaUSR to the XLD hydraulic project, further verifying its robustness and generalizability. Different underwater scenes are selected to demonstrate MambaUSR’s fidelity in preserving detailed textures. Figure 7 illustrates the method’s effectiveness, depicting scenes at varying heights: the first column shows a height of 98.7 m, the second 95.9 m, the third 87.7 m, and the fourth 83.1 m. To quantitatively assess performance in these real-world scenarios, we computed UIQM and UCIQE for each reconstructed image. MambaUSR achieves an average UIQM of 0.98 and UCIQE of 0.55 across all depths, indicating consistent enhancement in both perceptual quality and color balance. Even under severe degradation at a scaling factor of ×8, the model maintains stable UIQM and UCIQE values, confirming its robustness in preserving fine structural and chromatic information. In summary, the application of MambaUSR to the XLD hydraulic project demonstrates its strong adaptability and reliability in real underwater environments.

4.4. Model Efficiency

To further demonstrate the computational efficiency of our MambaUSR, Table 4 provides the calculation results based on the RTX3060 on the USR-248 data. Compared to mainstream underwater SR methods, MambaUSR attains higher PSNR values while requiring only 6.43G FLOPS of computational effort. In terms of efficiency, MambaUSR achieves 27.34 fps, comparable to SRResNet (29.26 fps) and DeepWaveNet (28.46 fps), while outperforming most GAN-based methods. These findings demonstrate that MambaUSR successfully balances accuracy with computational cost, delivering exceptional reconstruction quality without resorting to the excessive complexity of traditional deep networks or GAN architectures.

5. Conclusions

In this work, we propose MambaUSR, a novel state-space model tailored for underwater SR, which effectively addresses blurred details, color distortion, and low contrast in underwater scenes. The proposed Frequency State-Space Module (FSSM) serves as the core of MambaUSR, where the Visual State-Space Module (VSSM) enables efficient global dependency modeling and the Frequency-Assisted Enhancement Module (FAEM) captures critical high-frequency information through FFT and channel attention. Experimental results reveal that MambaUSR significantly boosts reconstruction quality on the two benchmark datasets. On USR-248, it achieves the best PSNR performance with an improvement of at least 0.15 dB. On UFO-120, at the ×3 scale, its PSNR and UIQM improve by 0.42 dB and 0.12 dB, respectively. These results confirm the model’s strong competitiveness and effectiveness compared with existing methods. In future work, we will explore incorporating depth and polarization priors to better handle complex underwater conditions and develop lightweight versions of Mamba-based methods for real-time applications.

Author Contributions

Conceptualization, Z.C.; Methodology, G.S.; Software, G.S.; Validation, G.S.; Formal analysis, J.Z.; Writing—original draft, J.Z.; Supervision, Z.C.; Funding acquisition, Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key R & D Program of China under Grant 2022 YFC3005401, the Jiangsu Province Youth Science and Technology Talent Support Program under Grant JSTJ-2024-082, the National Natural Science Foundation of China under Grant 52309159 and U23B20150, the Scientific Research Fund of Nanjing Hydraulic Research Institute under Grant Y724001, the Technology Talent and Platform Program of Yunnan Province under Grant 202405AK340002 and the Science and Technology Projects of China Huaneng Group under Grant HNKJ20-H46.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Liu, J.; Chen, C.; Tang, J.; Wu, G.S. From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-Resolution. Proc. AAAI Conf. Artif. Intell. 2023, 37, 1666–1674. [Google Scholar] [CrossRef]
  2. Zhou, Z.T.; Li, G.P.; Wang, G.Z. A Hybrid of Transformer and CNN for Efficient Single Image Super-Resolution via Multi-Level Distillation. Displays 2023, 76, 102352. [Google Scholar] [CrossRef]
  3. Liu, Z.; Lin, Y.T.; Cao, Y.; Hu, H.; Wei, Y.X.; Zhang, Z.; Lin, S.; Guo, B.N. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
  4. Lu, Z.S.; Liu, H.; Li, J.C.; Zhang, L.L. Efficient Transformer for Single Image Super-Resolution. arXiv 2021, arXiv:2108.11084. [Google Scholar]
  5. Yang, C.H.; Chen, Z.H.; Espinosa, M.; Ericsson, L.; Wang, Z.Y.; Liu, J.M.; Crowley, E.J. Plainmamba: Improving Non-Hierarchical Mamba in Visual Recognition. arXiv 2024, arXiv:2403.17695. [Google Scholar]
  6. Zhu, L.H.; Liao, B.C.; Zhang, Q.; Wang, X.L.; Liu, W.Y.; Wang, X.G. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. arXiv 2024, arXiv:2401.09417. [Google Scholar]
  7. Cheng, D.; Kou, K.I. FFT Multichannel Interpolation and Application to Image Super-Resolution. Signal Process. 2019, 162, 21–34. [Google Scholar] [CrossRef]
  8. Kong, L.S.; Dong, J.X.; Ge, J.J.; Li, M.Q.; Pan, J.S. Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5886–5895. [Google Scholar]
  9. Islam, M.J.; Enan, S.S.; Luo, P.G.; Sattar, J. Underwater Image Super-Resolution Using Deep Residual Multipliers. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 900–906. [Google Scholar]
  10. Islam, M.J.; Luo, P.G.; Sattar, J. Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception. arXiv 2020, arXiv:2002.01155. [Google Scholar]
  11. Zhang, Y.; Yang, S.X.; Sun, Y.M.; Liu, S.D.; Li, X.G. Attention-Guided Multi-Path Cross-CNN for Underwater Image Super-Resolution. Signal Image Video Process. 2022, 16, 155–163. [Google Scholar] [CrossRef]
  12. Wang, H.; Wu, H.; Hu, Q.; Chi, J.N.; Yu, X.S.; Wu, C.D. Underwater Image Super-Resolution Using Multi-Stage Information Distillation Networks. J. Vis. Commun. Image Represent. 2021, 77, 103136. [Google Scholar] [CrossRef]
  13. Yang, H.H.; Huang, K.C.; Chen, W.T. Laffnet: A Lightweight Adaptive Feature Fusion Network for Underwater Image Enhancement. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 685–692. [Google Scholar]
  14. Sharma, P.; Bisht, I.; Sur, A. Wavelength-Based Attributed Deep Neural Network for Underwater Image Restoration. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 1–23. [Google Scholar] [CrossRef]
  15. Peng, L.T.; Zhu, C.L.; Bian, L.H. U-Shape Transformer for Underwater Image Enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar] [CrossRef]
  16. Mei, X.K.; Ye, X.F.; Zhang, X.F.; Liu, Y.S.; Wang, J.T.; Hou, J.; Wang, X.L. UIR-Net: A Simple and Effective Baseline for Underwater Image Restoration and Enhancement. Remote Sens. 2022, 15, 39. [Google Scholar] [CrossRef]
  17. Ren, T.D.; Xu, H.Y.; Jiang, G.Y.; Yu, M.; Zhang, X.; Wang, B.; Luo, T. Reinforced Swin-ConvS Transformer for Simultaneous Underwater Sensing Scene Image Enhancement and Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4209616. [Google Scholar] [CrossRef]
  18. Gu, J.; Dong, C. Interpreting Super-Resolution Networks with Local Attribution Maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9199–9208. [Google Scholar]
  19. Gu, A.; Johnson, I.; Timalsina, A.; Rudra, A.; Ré, C. How to Train Your Hippo: State Space Models with Generalized Orthogonal Basis Projections. arXiv 2022, arXiv:2206.12037. [Google Scholar]
  20. Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
  21. Guan, M.S.; Xu, H.Y.; Jiang, G.Y.; Yu, M.; Chen, Y.Y.; Luo, T.; Song, Y. WaterMamba: Visual State Space Model for Underwater Image Enhancement. arXiv 2024, arXiv:2405.08419. [Google Scholar]
  22. Liu, Y.; Tian, Y.J.; Zhao, Y.Z.; Yu, H.T.; Xie, L.X.; Wang, Y.W.; Ye, Q.X.; Jiao, J.B.; Liu, Y.F. Vmamba: Visual State Space Model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
  23. Dong, C.Y.; Zhao, C.; Cai, W.L.; Yang, B. O-Mamba: O-Shape State-Space Model for Underwater Image Enhancement. arXiv 2024, arXiv:2408.12816. [Google Scholar]
  24. Yao, Z.S.; Fan, G.D.; Fan, J.F.; Gan, M.; Chen, C.L.P. Spatial-Frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4706516. [Google Scholar] [CrossRef]
  25. Wang, C.Y.; Jiang, J.J.; Zhong, Z.W.; Liu, X.M. Spatial-Frequency Mutual Learning for Face Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22356–22366. [Google Scholar]
  26. Wang, Z.; Zhao, Y.W.; Chen, J.C. Multi-Scale Fast Fourier Transform Based Attention Network for Remote-Sensing Image Super-Resolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2728–2740. [Google Scholar] [CrossRef]
  27. Chen, X.L.; Wei, S.Q.; Yi, C.; Quan, L.W.; Lu, C.Y. Progressive Attentional Learning for Underwater Image Super-Resolution. In International Conference on Intelligent Robotics and Applications; Springer: Cham, Switzerland, 2020; pp. 233–243. [Google Scholar]
  28. Wang, L.; Li, X.; Li, K.; Mu, Y.; Zhang, M.; Yue, Z.X. Underwater Image Restoration Based on Dual Information Modulation Network. Sci. Rep. 2024, 14, 5416. [Google Scholar] [CrossRef]
  29. Pramanick, A.; Sur, A.; Saradhi, V.V. Harnessing Multi-Resolution and Multi-Scale Attention for Underwater Image Restoration. Vis. Comput. 2025, 41, 8235–8254. [Google Scholar] [CrossRef]
  30. Dharejo, F.A.; Ganapathi, I.I.; Zawish, M.; Alawode, B.; Alathbah, M.; Werghi, N.; Javed, S. SwinWave-SR: Multi-Scale Lightweight Underwater Image Super-Resolution. Inf. Fusion 2024, 103, 102127. [Google Scholar] [CrossRef]
  31. Chang, B.C.; Yuan, G.J.; Li, J.J. Mamba-Enhanced Spectral-Attentive Wavelet Network for Underwater Image Restoration. Eng. Appl. Artif. Intell. 2025, 143, 109999. [Google Scholar] [CrossRef]
  32. Xiao, Y.; Yuan, Q.Q.; Jiang, K.; Chen, Y.Z.; Zhang, Q.; Lin, C.W. Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution. IEEE Trans. Multimed. 2025, 27, 1783–1796. [Google Scholar] [CrossRef]
  33. Shao, M.W.; Qiao, Y.J.; Meng, D.Y.; Zuo, W.M. Uncertainty-Guided Hierarchical Frequency Domain Transformer for Image Restoration. Knowl.-Based Syst. 2023, 263, 110306. [Google Scholar] [CrossRef]
  34. Huang, J.; Liu, Y.J.; Zhao, F.; Yan, K.Y.; Zhang, J.H.; Huang, Y.K.; Zhou, M.; Xiong, Z.W. Deep Fourier-Based Exposure Correction Network with Spatial-Frequency Interaction. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 163–180. [Google Scholar]
  35. Zhang, H.P.; Xu, H.L.; Yu, X.S.; Zhang, X.Y.; Wu, C.D. Leveraging Frequency and Spatial Domain Information for Underwater Image Restoration. J. Phys. Conf. Ser. 2024, 2832, 012001. [Google Scholar] [CrossRef]
  36. Guo, H.; Li, J.M.; Dai, T.; Ouyang, Z.H.; Ren, X.D.; Xia, S.T. Mambair: A Simple Baseline for Image Restoration with State-Space Model. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 222–241. [Google Scholar]
  37. Chen, Z.; Zhang, Y.L.; Gu, J.J.; Kong, L.H.; Yang, X.K.; Yu, F. Dual Aggregation Transformer for Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12312–12321. [Google Scholar]
  38. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  39. Dong, C.; Loy, C.C.; He, K.M.; Tang, X.O. Learning a Deep Convolutional Network for Image Super-Resolution. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 184–199. [Google Scholar]
  40. Kim, J.W.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
  41. Lim, B.; Son, S.H.; Kim, H.W.; Nah, S.J.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
  42. Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
  43. Wang, X.T.; Yu, K.; Wu, S.X.; Gu, J.J.; Liu, Y.H.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
Figure 1. Comparison of LAM for (a) SRDRM, (b) URSCT, and (c) our MambaUSR. Our MambaUSR harvests the highest DI, maintaining local focus while modeling global dependencies.
Figure 1. Comparison of LAM for (a) SRDRM, (b) URSCT, and (c) our MambaUSR. Our MambaUSR harvests the highest DI, maintaining local focus while modeling global dependencies.
Applsci 15 11263 g001
Figure 2. Overall pipeline of our proposed MambaUSR that mainly consists of T FSSGs. (a) Frequency State-Space Module (FSSM), (b) Vision State-Space Module (VSSM), (c) 2D Selective Scan Module (SSM).
Figure 2. Overall pipeline of our proposed MambaUSR that mainly consists of T FSSGs. (a) Frequency State-Space Module (FSSM), (b) Vision State-Space Module (VSSM), (c) 2D Selective Scan Module (SSM).
Applsci 15 11263 g002
Figure 3. Two variants of the FFT. (a) FRB proposed by [25]; (b) FAEM proposed in this paper.
Figure 3. Two variants of the FFT. (a) FRB proposed by [25]; (b) FAEM proposed in this paper.
Applsci 15 11263 g003
Figure 4. The feature visualization results are compared in MambaUSR with FRB and FAEM, respectively.
Figure 4. The feature visualization results are compared in MambaUSR with FRB and FAEM, respectively.
Applsci 15 11263 g004
Figure 5. Visual comparison of our proposed MambaUSR against popular works on USR-248 dataset. It can be found that MambaUSR produces sharper textures and more natural colors, effectively suppressing artifacts in dense patterns.
Figure 5. Visual comparison of our proposed MambaUSR against popular works on USR-248 dataset. It can be found that MambaUSR produces sharper textures and more natural colors, effectively suppressing artifacts in dense patterns.
Applsci 15 11263 g005
Figure 6. Visual comparison between our MambaUSR and popular networks on the UFO-120 dataset. MambaUSR reconstructs sharper edges and more natural textures than competing models.
Figure 6. Visual comparison between our MambaUSR and popular networks on the UFO-120 dataset. MambaUSR reconstructs sharper edges and more natural textures than competing models.
Applsci 15 11263 g006
Figure 7. Application of MambaUSR in the XLD Hydraulic Project under real underwater conditions. The numerical values indicate UIQM/UCIQE scores, where higher values denote better visual quality and color fidelity.
Figure 7. Application of MambaUSR in the XLD Hydraulic Project under real underwater conditions. The numerical values indicate UIQM/UCIQE scores, where higher values denote better visual quality and color fidelity.
Applsci 15 11263 g007
Table 1. Ablation results of individual components on UFO-120 dataset with scale factor ×4.
Table 1. Ablation results of individual components on UFO-120 dataset with scale factor ×4.
ModelPSNR (dB)SSIMUIQM
MambaUSR w FRB24.930.67092.8179
MambaUSR w/o FAEM25.020.67582.8208
MambaUSR w/o VSSM24.910.65792.8107
MambaUSR25.070.68532.8297
Table 2. Quantitative results on USR-248 dataset with scale factors ×2, ×4, and ×8. The best performance is highlighted in bold. FLOPs are computed based on an HR image of size 640 × 480.
Table 2. Quantitative results on USR-248 dataset with scale factors ×2, ×4, and ×8. The best performance is highlighted in bold. FLOPs are computed based on an HR image of size 640 × 480.
ScaleMethodFLOPs (G)Params (M)PSNR (dB)SSIMUIQM
SRCNN [39]21.30.0626.810.762.74
VDSR [40]205.280.6728.980.792.57
EDSRGAN [41]273.341.3827.120.772.67
SRGAN [42]377.765.9528.050.782.74
SRResNet [42]222.371.5925.980.72
×2ESRGAN [43]4274.6816.726.660.752.7
SRDRM [9]203.910.8328.360.802.78
SRDRM-GAN [9]289.3811.3128.550.812.77
PAL [27]203.820.8328.410.80
AMPCNet [11]1.1529.540.802.77
Deep WaveNet [14]21.470.2829.090.802.73
MambaUSR (ours)93.82.7329.750.522.75
SRCNN [39]21.30.0623.380.672.38
VDSR [40]205.280.6725.70.682.44
EDSRGAN [41]206.421.9721.650.652.40
SRGAN [42]529.865.9524.760.692.42
SRResNet [42]85.491.5924.150.66
×4ESRGAN [43]1504.0916.723.790.662.38
SRDRM [9]291.731.924.640.682.46
SRDRM-GAN [9]377.212.3824.620.692.48
PAL [27]303.421.9224.890.69
AMPCNet [11]1.1725.900.662.58
Deep WaveNet [14]5.590.2925.200.682.54
MambaUSR (ours)23.92.7526.140.662.53
SRCNN [39]21.30.0619.970.572.01
VDSR [40]205.280.6723.580.632.17
EDSRGAN [41]189.692.5619.870.582.12
SRGAN [42]567.885.9520.140.602.10
SRResNet [42]51.281.5919.260.55
×8ESRGAN [41]811.4416.719.750.582.05
SRDRM [9]313.682.9721.200.602.18
SRDRM-GAN [9]399.1513.4520.250.612.17
PAL [27]325.512.9922.510.63
AMPCNet [11]1.2523.830.622.25
Deep WaveNet [14]1.620.3423.250.622.21
MambaUSR (ours)6.432.8423.970.552.20
Table 3. Quantitative results on UFO-120 dataset with scale factors ×2, ×3, and ×4.
Table 3. Quantitative results on UFO-120 dataset with scale factors ×2, ×3, and ×4.
MethodPSNR (dB)SSIMUIQM
×2 ×3 ×4 ×2 ×3 ×4 ×2 ×3 ×4
SRCNN [41]24.7522.2219.050.720.650.562.392.242.02
SRGAN [42]26.1123.8721.080.750.700.582.442.392.56
SRDRM [9]24.6223.150.720.672.592.57
SRDRM-GAN [9]24.6123.260.720.672.592.55
Deep WaveNet [14]25.7125.2325.080.770.760.742.992.962.97
AMPCNet [11]25.2425.7324.700.710.700.702.932.852.88
URSCT [17]25.9623.590.800.66
MambaUSR (ours)25.7626.1525.110.740.740.692.932.972.84
Table 4. Computational efficiency of advanced methods on USR-248 dataset with scale factor ×8.
Table 4. Computational efficiency of advanced methods on USR-248 dataset with scale factor ×8.
MethodFLOPs (G)Params (M)PSNR (dB)SSIMAverage (s)Efficiency (fps)
SRCNN21.300.0619.970.570.0849611.77
VDSR205.280.6723.580.630.0885211.30
EDSRGAN189.692.5619.870.580.0463221.59
SRGAN567.885.9520.140.60.0945410.58
SRResNet51.281.5919.260.550.0341829.26
ESRGAN811.4416.719.750.580.0931010.74
SRDRM313.682.9721.20.60.0894011.19
SRDRM-GAN399.1513.4520.250.610.0906311.03
PAL325.512.9922.510.630.0806812.39
AMPCNet-1.2523.830.620.0381526.21
Deep WaveNet1.620.3423.250.620.0351428.46
MambaUSR (ours)6.432.8423.970.550.0365727.34
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shen, G.; Zhang, J.; Chen, Z. MambaUSR: Mamba and Frequency Interaction Network for Underwater Image Super-Resolution. Appl. Sci. 2025, 15, 11263. https://doi.org/10.3390/app152011263

AMA Style

Shen G, Zhang J, Chen Z. MambaUSR: Mamba and Frequency Interaction Network for Underwater Image Super-Resolution. Applied Sciences. 2025; 15(20):11263. https://doi.org/10.3390/app152011263

Chicago/Turabian Style

Shen, Guangze, Jingxuan Zhang, and Zhe Chen. 2025. "MambaUSR: Mamba and Frequency Interaction Network for Underwater Image Super-Resolution" Applied Sciences 15, no. 20: 11263. https://doi.org/10.3390/app152011263

APA Style

Shen, G., Zhang, J., & Chen, Z. (2025). MambaUSR: Mamba and Frequency Interaction Network for Underwater Image Super-Resolution. Applied Sciences, 15(20), 11263. https://doi.org/10.3390/app152011263

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop