UWMambaNet: Dual-Branch Underwater Image Reconstruction Based on W-Shaped Mamba

Zhang, Yuhan; Yu, Xinyang; Cai, Zhanchuan

doi:10.3390/math13132153

Open AccessArticle

UWMambaNet: Dual-Branch Underwater Image Reconstruction Based on W-Shaped Mamba

by

Yuhan Zhang

¹

,

Xinyang Yu

² and

Zhanchuan Cai

^1,*

¹

School of Computer Science and Engineering, Macau University of Science and Technology, Macau 999078, China

²

School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(13), 2153; https://doi.org/10.3390/math13132153

Submission received: 20 May 2025 / Revised: 20 June 2025 / Accepted: 29 June 2025 / Published: 30 June 2025

Download

Browse Figures

Versions Notes

Abstract

Underwater image enhancement is a challenging task due to the unique optical properties of water, which often lead to color distortion, low contrast, and detail loss. At the present stage, the methods based on the CNN have the problem of insufficient global attention, and the methods based on Transformer generally have the problem of quadratic complexity. To address this challenge, we propose a dual-branch network architecture based on the W-shaped Mamba: UWMambaNet. Our method integrates the color contrast enhancement branch and the detail enhancement branch, and each branch is dedicated to improving specific aspects of underwater images. The color contrast enhancement branch utilizes the RGB and Lab color spaces and uses the Mamba block for advanced feature fusion to enhance color fidelity and contrast. The detail enhancement branch adopts a multi-scale feature extraction strategy to capture fine and contextual details through parallel convolutional paths. The Mamba module is added to the dual branches, and state-space modeling is used to capture the long-range dependencies and spatial relationships in the image data. This enables effective modeling of the complex interactions and light propagation effects inherent in the underwater environment. Experimental results show that our method significantly improves the visual quality of underwater images and is superior to existing technologies in terms of quantitative indicators and visualization effects; compared to the best candidate models on the UIEB and EUVP datasets, UWMambaNet improves UCIQE by 3.7% and 2.4%, respectively.

Keywords:

underwater image enhancement; dual-path Mamba network; global detail enhancement; state-space model

MSC:

94A08

1. Introduction

Underwater image enhancement (UIE) is a key technical challenge in the field of computer vision. Its core objective is to effectively alleviate the degradation effects caused by the absorption, scattering, and backscattering of light during its propagation underwater through algorithmic means, thereby accurately restoring the visual information of underwater scenes. In recent years, with the rapid development of remotely operated vehicles (ROVs) [1], autonomous underwater vehicles (AUVs) [2], and marine exploration systems, high-quality underwater images have become a necessary condition in applications such as marine resource monitoring, underwater target detection, and disaster response. For example, in marine ecological research, clear underwater images are crucial for assessing the health of coral reefs and analyzing the behavior of fish; underwater archaeology relies on high-resolution images to identify details of sunken ships and cultural relics; and in the field of military reconnaissance, improving the ability to detect targets in turbid waters (such as submarines or mines) is particularly important. However, due to the unique optical characteristics of the underwater environment, images usually exhibit problems such as insufficient contrast, color distortion, and loss of details [3], which pose significant difficulties for traditional image enhancement techniques. Therefore, there is an urgent need to develop innovative methods to address this challenge [4,5].

Underwater imaging technology has attracted sustained academic and industrial attention. Current traditional methodologies are divided into two distinct approaches: physics-based degradation modeling and non-physical image enhancement techniques [6]. The former employs mathematical formulations to simulate light propagation phenomena, including wavelength-dependent absorption, multipath scattering, and refractive distortions, demonstrating efficacy in controlled scenarios with homogeneous media. Or, the degradation of underwater images can be corrected through traditional digital image processing methods such as fusion [7]. However, its process performance deteriorates markedly in natural marine environments due to dynamic variations and heterogeneous optical properties. In contrast, data-driven enhancement methods bypass physical modeling by directly manipulating image attributes through contrast stretching, color constancy algorithms, and dehazing operations. While these techniques can improve visual perception under specific conditions, they intrinsically suffer from the inability to distinguish between inherent object features and medium-induced distortions due to insufficient incorporation of hydro-optical priors. These methodologies reveal intrinsic constraints in underwater image enhancement: traditional approaches struggle with parameterizing environmental variability, while data-driven enhancement techniques lack principled physical constraints.

The latest advances in deep learning have driven significant progress in underwater image restoration and other downsteam tasks [8,9,10], effectively addressing challenges [11]. Researchers have developed innovative neural architectures that can learn the mapping between degraded images and clear images through large-scale data training [10]. Representative models such as and UWCNN [12] adopt gated fusion networks with multi-scale feature extraction, in which local detail preservation through spatial decomposition works in synergy with global scene understanding achieved through context information fusion. These architectures have shown particular effects on color restoration, contrast enhancement, and noise reduction. Furthermore, some researchers have applied reinforcement learning to underwater image enhancement and constructed a vision perception-driven framework [13,14], bringing new breakthroughs to underwater image enhancement research. Methods based on diffusion models [15,16] have shown promising applications prospects in the field of underwater image enhancement. In addition, apart from diffusion models in the image domain, diffusion models based on the latent space have also effectively improved the effect of underwater image reconstruction. Among these methods, for example, the frequency domain-based latent diffusion model uses a lightweight parameter estimation network and extracts high-frequency and low-frequency prior information [17]. There are also methods that effectively enhance underwater images with global feature priors to improve the stability of the model during the inference process [18]. These methods effectively solve the common problems of mode collapse and training difficulty in Generative Adversarial Network (GAN) methods. In complex scenarios, such as poor lighting conditions and dense suspended particles, these methods demonstrate obvious advantages.

Although certain progress has been made in the above-mentioned technologies, due to the inherent variability of the underwater environment, excellent neural network algorithms usually need to take into account both the global and local information of the image simultaneously. The methods based on Transformer show better performance when establishing the precise correspondence between degraded images and clear images, mainly due to the strong ability of its self-attention mechanism in capturing long-range dependency relationships. However, the self-attention operation leads to the problem of quadratic computational complexity, and the quadratic growth of this complexity limits the scalability of the Transformer-based methods, especially when dealing with high-resolution underwater images.

To address this challenge, we propose UWMambaNet, a novel architecture that distinctly differs from existing Mamba-based approaches by integrating two key innovations. First, it proposes a W-shaped Mamba module, an efficient state-space sequence model (SSM) with linear time complexity, in place of Transformers, enabling scalable modeling of global dependencies. More importantly, while many existing Mamba models typically focus on singular-pathway designs, our core contribution lies in its novel dual-branch structure. This architecture intentionally decouples the optimization objectives: one branch focuses on enhancing fine-grained structural details, while the other specializes in preserving perceptual color fidelity. This careful separation focuses on and combines two aspects of image quality enhancement that are complementary but often get mixed up. The contributions of this article can be summarized as follows:

This paper proposes a novel dual-branch underwater image ehnahcement framework UWMambaNet: This framework introduces a novel dual-branch architecture specifically designed for comprehensive image enhancement. This framework effectively combines a Color Contrast Enhancement Branch and a Detail Enhancement Branch, allowing for simultaneous improvement of color fidelity and detail preservation. The integration of these branches through a sophisticated fusion strategy ensures that the network can produce high-quality enhanced images with balanced color and detail.
A color enhancement module with regional variability based on the Mamba is proposed: Within the Color Contrast Enhancement Branch, the network leverages both RGB and Lab color spaces to enhance color fidelity and contrast. The use of a Mamba block for feature fusion is particularly innovative, as it captures long-range dependencies and spatial relationships, enabling more effective color enhancement. This module’s ability to process and integrate features from multiple color spaces is a significant advancement in image processing.
This paper proposes a Mamba-based multi-scale detail enhancement branch: By integrating the Mamba block, it can effectively perform sequence modeling of spatial features and enhance the ability of the network to retain and enhance fine details throughout the image. The design of this module ensures that the enhanced image highlights important details while maintaining its natural appearance.

2. Related Works

2.1. Traditional Underwater Enhancement

Traditional methods can be divided into two groups according to whether they rely on underwater optical imaging models: non-physical model-based methods and physical model-based methods [19]. Non-physical model techniques directly improve visual quality through image processing strategies. The main implementation methods are enhancement in the spatial domain and transformation in the frequency domain. Spatial domain enhancement techniques improve images by directly modifying pixel gray values to enhance attributes such as contrast. For example, Ancuti et al. [20] introduced an innovative hierarchical fusion framework, which was the first application of image fusion to underwater image enhancement. This method established a dual-channel workflow for color adjustment and contrast improvement, achieving feature complementarity through weight mapping and pyramid fusion. Although this technique performs well in contrast enhancement and color correction, the computational requirements are high. Later studies optimized the weight calculation process and reduced the number of feature channels to three, minimizing the computational cost while maintaining performance. The Retinex theory based on the principle of visual perception has received great attention in this field [21]. It involves decomposing the image into illumination and reflection components for enhancement respectively. Zhang et al. creatively integrated bilateral filtering, trilateral filtering, and the multi-scale Retinex framework, successfully solving the related problems of uneven underwater illumination [22]. Additionally, frequency domain enhancement methods use algorithms such as Fourier or wavelet transforms to convert images from the spatial domain to the frequency domain. These methods utilize frequency characteristics to achieve image enhancement [23,24]. Common examples include low-pass filtering for noise suppression and high-pass filtering for edge sharpening. These techniques have been proven to be effective in various scenarios.

The method based on physical models restores underwater images by establishing degradation models, and its core lies in the accurate estimation of optical parameters. The self-calibration filter developed by Trucco et al. based on the Jaffe–McGlamery model initiated research in this field [25]. They designed a method to perform self-calibration and reverse calculation to enhance underwater images by calculating the optical parameters of the image. However, its strict requirements for lighting conditions limit its application. In addition, the development of the dark channel prior theory has promoted a series of improvement methods. Galdran et al. attempted to explain the degradation phenomenon of the red channel in underwater images using the DCP theory and proposed the red channel prior (RCP) [26]. However, these methods use incomplete and simplified physical models. To reduce computational complexity and implementation difficulty, they only consider a uniform underwater environment, resulting in poor generalization ability for different underwater conditions. Berman et al. proposed a method that combines the spectral characteristics of different types of water bodies and simplifies the restoration process of a single image by calculating two additional global parameters: the attenuation ratios of the blue-red and blue-green color channels [27]. However, due to their adaptability issues, these physical model methods still have room for improvement in some complex scenarios.

2.2. Learning-Based Underwater Image Enhancement

With the rapid development of deep learning, it has been widely applied in underwater image enhancement. Compared with traditional methods, the deep learning-based underwater image processing technology can better adapt to the complex and changing underwater environment through automatic learning and optimization and overcome limitations such as color distortion and image quality degradation. Some CNN-based underwater image enhancement methods are end-to-end, such as Shallow-UWnet [28]. By taking advantage of the powerful generalization ability of neural networks, they learn from diverse data distributions and features, effectively solving the limitations of traditional methods in adaptability. Wang et al. [29] proposed a method that combines a three-layer downsampling UNet network with the Wasserstein Generative Adversarial Network (WGAN). More recent advances extend beyond CNN architectures: The U-Shape Transformer [30] leverages transformer blocks in a U-shaped encoder-decoder framework to preserve structural integrity; WaterMamba [31] pioneers visual state-space models for contextual feature learning; while PixMamba [32] implements a dual-level state-space architecture to coordinate local and global enhancements. This method adopts an end-to-end encoder-decoder architecture, which can directly reconstruct underwater images and better retain the content information of the original scene. However, in the actual training process, this method is prone to instability and has a high computational cost, which may lead to the problem of an excessive image enhancement effect. But many of these methods only consider restoring images from the perspective of color, lacking global detail restoration and consistent adjustment of color.

Recently, diffusion-based image restoration techniques have demonstrated distinct advantages in tackling the complex degradation issues associated with underwater images. For example, Tang et al. were pioneers in applying diffusion models to enhance underwater images. The UW-DDPM framework introduced by Liu et al. [33] utilized a multistage noise prediction network to achieve joint optimization of detail recovery and color adjustment. Subsequent studies further expanded this domain: Zhang et al.’s DCGF [34] minimized the influence of diffusion variability by guiding the denoising process with color-conditional images; DM_Underwater [35] integrated the Transformer architecture with non-uniform sampling skip strategies for adaptive enhancement. Recently, DiffWater [15] implemented a conditional denoising diffusion probabilistic model, and UEDP [16] leveraged diffusion priors to boost enhancement robustness. However, it remains challenging to achieve desired results consistently, and almost all of these approaches require substantial computational resources.

3. Method

From the perspective of causal inference, image degradation can be explained as being caused by two potential independent mechanisms: structural noise (such as blurring or resolution loss) and perceptual distortion (such as color shift or contrast loss). By decomposing the enhancement task into two independent branches, these factors can be modeled conditionally and given their respective inductive biases, thereby facilitating decoupling and improving generalization ability. This separation enables each branch to develop specialized reasoning and recovery mechanisms, thereby enhancing the interpretability and robustness of the model. The above analysis indicates that by decomposing the causal mechanism of image degradation, it is possible to design enhancement models more effectively. The framework proposed in this paper is shown in Figure 1. This framework is a simple two-branch structure that processes the color and details of the image separately. During processing, a combination of CNN and Mamba is used to capture global and local effectiveness, helping the network to further restore the degraded image. Based on this theoretical framework, the design of the network structure and how it realizes the independent modeling and optimization of the two degradation mechanisms will be described in detail.

3.1. Mamba Block

Given a degraded underwater image

X \in R^{H \times W \times 3}

, we formulate the enhancement task as learning a non-linear mapping

F : R^{H \times W \times 3} \to R^{H \times W \times 3}

using state-space modeling. The pixel sequence is obtained via raster scanning:

x = Flatten (X) \in R^{L \times 3}, L = H \times W .

(1)

The degradation process is modeled as a linear time-invariant (LTI) system:

\{\begin{matrix} \frac{d h (t)}{d t} = A h (t) + B x (t), \\ y (t) = C h (t) + D x (t), \end{matrix}

(2)

where

h (t) \in R^{N}

is the latent state,

A \in R^{N \times N}

governs the state transitions, and

B \in R^{N \times 3}

,

C \in R^{3 \times N}

,

D \in R^{3 \times 3}

are projection matrices.

The image is essentially a discrete signal, and the continuous-time system needs to be approximated by the discrete-time method. In this process, the zero-order hold (ZOH) technique is a common implementation method. The zero-order hold method assumes that within each sampling interval determined by the step size

Δ

, the input signal remains unchanged. To understand its principle in more depth, we can explore how the zero-order hold works in coordination with the step size

Δ

. This conversion process involves sampling the continuous signal at fixed intervals (also called the sampling period) defined by

Δ

. Within each sampling interval, the zero-order hold keeps the signal value constant, thereby generating a stepped approximation of the original continuous signal. This approximation method is particularly important when processing underwater images because it can simplify complex signal processing tasks while retaining key information. Therefore, we can conclude that the discretized form of this system is:

\begin{matrix} h_{k} & = {\bar{A}}_{k} h_{k - 1} + {\bar{B}}_{k} x_{k}, \\ y_{k} & = C_{k} h_{k} + D x_{k}, \end{matrix}

(3)

where the discretized parameters are computed via:

{\bar{A}}_{k} = \exp (Δ A_{k}), {\bar{B}}_{k} = {(A_{k})}^{- 1} (exp (Δ A_{k}) - I) B_{k} .

(4)

To handle color distortion, we employ independent SSMs for each RGB channel:

y_{k}^{c} = {SSM}^{c} (x_{1 : k}^{c}), c \in {R, G, B} .

(5)

This architecture enables globally coherent image inpainting by leveraging its state-space model (SSM) to hierarchically integrate multi-scale visual semantics. At its core, a discretized recursive mechanism dynamically adjusts temporal decay rates through input-dependent parameterization, allowing selective retention of long-range dependencies as image patches are processed sequentially. This adaptive propagation of hidden states inherently captures global structural patterns while suppressing irrelevant local artifacts. In addition, the framework synergises these globally optimized representations with high-frequency local features via residual connections, achieving structural consistency in corrupted regions while preserving photorealistic texture details. Such a unified design ensures that the reconstructed pixels adhere to both the holistic scene context and localized visual continuity, thereby balancing semantic fidelity with perceptual naturalness across diverse inpainting scenarios.

3.2. Detail Enhancement Branch

The detail enhancement branch is dedicated to reconstructing and refining structural information that has been lost or degraded, such as texture, edges, and high-frequency detail. To achieve this, the branch combines local multi-scale convolutional encoding with long-range sequence modeling. Initially, the input image is processed through convolutional layers to extract base features:

F_{base} = ReLU (W_{base} * I + b_{base}) .

(6)

In parallel, to preserve locality, the branch employs two parallel paths with distinct receptive fields. The first path uses 3 × 3 convolutions, while the second path uses 5 × 5 convolutions, allowing the network to capture both fine and contextual details:

F_{detail 1} = ReLU (W_{3 \times 3} * F_{base} + b_{3 \times 3}),

(7)

F_{detail 1} = ReLU (W_{5 \times 5} * F_{base} + b_{5 \times 5}) .

(8)

To expand the receptive field and incorporate global context, we reshape the feature map into a sequence and apply a Mamba block:

\begin{matrix} f_{seq} & = reshape (f_{0}), \end{matrix}

(9)

\begin{matrix} f^{'} & = W_{in} f_{seq}, \end{matrix}

(10)

\begin{matrix} f^{''} & = Mamba (f^{'}), \end{matrix}

(11)

\begin{matrix} f_{global} & = reshape (W_{out} f^{''}) . \end{matrix}

(12)

The Mamba mechanism serves as a causal convolutional kernel with dynamic memory, enabling it to model long-distance relationships in spatial structure. The Mamba block is again utilized to process these multi-scale features, capturing long-range dependencies crucial for detail enhancement. The features are concatenated and passed through the Mamba block:

F_{multi - scale} = [F_{detail 1}, F_{detail 2}],

(13)

F_{enhanced} = Mamba (F_{multi - scale}) .

(14)

The enhanced features are then fused through a 1 × 1 convolution, followed by a 3 × 3 convolution to produce the final detail-enhanced output.

These extract the mid- and low-frequency components, respectively. The three features are fused:

f_{detail} = ϕ ([f_{1}, f_{2}, f_{global}]),

(15)

where

ϕ (\cdot)

denotes a fusion function involving linear projections and non-linearities.

3.3. Color and Contrast Restoration Branch

The Color and Contrast Enhancement Branch is a critical component of our dual-branch network, designed to improve the color fidelity and contrast of input images. To align with human perceptual characteristics, this branch utilizes both RGB and Lab color spaces. The Lab space separates luminance (L) from the chromatic components (a and b), which allows the model to better isolate and manipulate the perceptual color attributes. Using separate encoders, features are extracted from both spaces:

\begin{matrix} f_{rgb} & = σ (W_{rgb 2} * (W_{rgb 1} \times x)), \end{matrix}

(16)

\begin{matrix} f_{lab} & = σ (W_{lab 2} * (W_{lab 1} \times x_{L a b})) . \end{matrix}

(17)

These feature maps are concatenated and compressed:

f_{concat} = W_{1 x 1} * [f_{rgb}, f_{lab}],

(18)

and passed through a Mamba block to model contextual contrast and global color dependencies:

f_{mamba} = Mamba (f_{concat}) .

(19)

The output is decoded into a restored color image:

x_{color} = σ (W_{dec 2} * σ (W_{dec 1} * f_{mamba})) .

(20)

3.4. Output Fusion and Residual Reconstruction

We integrate the outputs of the two branches to create a unified feature representation:

f_{fused} = σ (W_{fuse 2} * σ (W_{fuse 1} * [x_{color}, f_{detail}])) .

(21)

This fused output contains both perceptual and structural corrections. To ensure that the values remain within a valid range, a clamp function is applied. The final enhanced image is then obtained via a residual connection, with the clamp function ensuring that all pixel values are constrained appropriately. The final enhanced image is obtained via a residual connection:

\hat{x} = Clamp (f_{fused} + x, 0, 1) .

(22)

3.5. Loss Function

To supervise training, we use a composite loss that encourages both pixel-wise fidelity and perceptual realism. The total loss is defined as:

L_{total} = \underset{pixel - level matching}{\underset{︸}{λ_{1} L_{mse}}} + \underset{semantic features}{\underset{︸}{λ_{2} L_{vgg}}} + \underset{semantic features}{\underset{︸}{λ_{3} L_{pixel}}} .

(23)

The total loss function

L_{total}

consists of three parts, each corresponding to different optimization goals as follows: The first part measures the pixel-level difference between the generated image and the target image through the Mean Squared Error (MSE), ensuring that the generated image is as close as possible to the real image in details, thereby retaining the structural information and clarity of the original image. The parameter

λ_{1}

controls the weight of this loss term; The second part is based on the high-level features extracted by the pre-trained VGG network, which is used to capture the semantic information of the image, ensuring that the generated image is not only close to the target image at the pixel level but also maintains consistency in high-level visual features (such as texture, color distribution, etc.). This feature is particularly important for cases where color distortion and contrast reduction are caused by the underwater environment. The parameter

λ_{2}

determines the importance of semantic feature matching; The third part is the pixel-level semantic loss (

L_{pixel}

), which further strengthens the optimization at the pixel level. Different from

L_{mse}

, it may combine additional constraints (such as edge information or weight adjustment of specific regions) to better adapt to the characteristics of underwater images. The parameter

λ_{3}

can flexibly control the influence of this loss term on the final result.

The UWMambaNet integrates two complementary mechanisms to address the dual nature of image degradation. On one hand, the detail enhancement branch emphasizes high-frequency recovery through convolution and global dependency modeling via Mamba, enabling the reconstruction of sharpness and edge continuity. On the other hand, the color and contrast restoration branch leverages color space disentanglement and contextual encoding to improve visual aesthetics and color consistency. Together, these branches cooperate through feature fusion and residual reconstruction to yield images that are both structurally accurate and perceptually pleasing.

4. Experimental Analysis and Discussion

In order to evaluate the performance of the proposed model fairly, experiments in this paper are conducted on two public underwater image enhancement datasets. The following subsections will introduce the experimental settings, dataset introduction, evaluation metrics, and analysis of experimental results, respectively.

4.1. Experimental Environment and Setup Details

The UWMambaNet is implemented in PyTorch 1.13 on a system equipped with an NVIDIA V100 GPU boasting 32 GB of RAM and an Intel (R) Xeon (R) W-2255 CPU. We use the Adam Optimizer for training. The initial learning rate is set to 0.01, and the learning rate decays by 30% every 30 epochs. The training uses a batch size of 4 and is carried out for a total of 100 epochs. The input images are uniformly scaled to a resolution of 512 × 512. During the training process, the model checkpoint is saved every 10 epochs.

4.2. Experimental Datasets

In this experiment, we used the UIEB dataset [36] and EUVP dataset [37] for training and testing. These two datasets cover various underwater environments, water quality conditions, and lighting conditions, which helps enhance the generalization performance of the model. For testing purposes, we prepared two different datasets: one is a full-reference dataset, and the other is a no-reference dataset. In the training stage, we randomly selected 800 pairs of real-scene images from the UIEB dataset for training. This dataset contains a total of 890 reference images of real scenes. During the training process, all images were adjusted to a resolution of 640 × 640 pixels.

4.3. Evaluation Metrics

The complex optical characteristics of the underwater environment often lead to differences between traditional general image quality assessment indicators (such as PSNR and SSIM) and human subjective perception. To solve this problem, researchers have specifically developed evaluation indicators such as UIQM and UCIQE that are suitable for underwater conditions. These new indicators establish a multi-dimensional assessment framework by integrating parameters that have significant impacts on underwater vision, such as chroma, sharpness, and contrast. Currently, the no-reference assessment method has become the main means of image quality assessment. Among the commonly used no-reference assessment indicators, they include UCIQE [38], UIQM [39], CCF [40], and FDUM [41]. These indicators calculate various attributes of the image through specific algorithms, such as color, contrast, saturation, and fog density. The UCIQE metric places particular emphasis on evaluating aspects like chroma, saturation, and contrast. In contrast, UIQM is geared towards providing a comprehensive assessment of overall underwater image characteristics. CCF acts as a compound indicator that takes into account elements including contrast, color, and fog density. FDUM gauges the restoration quality of underwater imagery by applying a weighted calculation to parameters such as color richness, contrast, and clarity. Should there be an imbalance between enhancing brightness and introducing color distortion within an image, this could affect the scores given by UCIQE and FDUM, which, in turn, may compromise the accuracy of the ultimate evaluation result.

4.4. Results Analysis

In this section, to assess the effectiveness of the UWMambaNet introduced in this paper, we carried out both quantitative and qualitative experiments. A variety of currently available underwater image enhancement methods were chosen for comparison, such as Fusion [42], Shallow-UWnet [28], UColor [43], UWCNN [12], and NU²Net [44], HCLR-Net [45], Semi-UIR [46] and PUIE-Net [47]. These methods cover widely recognized mainstream techniques for underwater image enhancement, along with algorithms based on advanced neural network architectures.

Our method utilizes a dual-branch approach and a network structure that combines global and local features, generating high UIQM scores for vivid colors and high UCIQE scores for complex details. Some scenarios even exceed the restoration effect of the reference images. When evaluating the datasets, we selected three main problematic underwater image capture scenarios: underwater color cast scenarios, underwater hazy scenarios, and underwater low-light scenarios. The underwater color cast scenarios can be further divided into blue color cast and green color cast. Taking the UIEB dataset [36] and the EUVP dataset [37] as the experimental objects, the results are shown in Table 1, Table 2 and Table 3.

In the UIEB dataset’s visualization result Figure 2, we found that our method was only 0.004 lower than Semi-UIR on the UIQM, and FDUM was only 0.003 lower than Semi-UIR, while leading in other indices. This also indicates the effectiveness of our method based on the dual-branch design. One branch focuses on restoring the true color and contrast of the image. This part can solve underwater degradation problems (such as color cast) in complex scenes. Compared with the general methods, this method can better ensure the effectiveness of color restoration. This way of combining global and local features can pay more attention to the restoration of details on the basis of considering the complete color semantics. In indicators such as UCIQE and CCF, our method is significantly better than other methods, and this phenomenon is also confirmed in the visualization results. In the first row of the visualization results, our proposed method can better restore the illumination information when restoring the details of coral reefs and correct the color cast at the same time, while methods like Fusion have problems such as incomplete color cast restoration or unbalanced illumination restoration. In the fog scene (the second row), our method can restore most of the details and eliminate the influence of fog noise on the overall picture. In the low-light scene in the fourth row, our method can well restore the details of the dark part without introducing other degradations such as overexposure. In addition, in the low-light scenes of the fourth and sixth rows of the visualization results, our method has a better visible range and darker details compared to other methods (for example, the dark part of the reef behind the fish in the fourth row and the seabed environment around the shark in the sixth row). At the same time, it can avoid the problem of overexposure, which also proves that the algorithm we propose can effectively improve the dynamic range of images. In the turbid environment of the fifth row, our method can more thoroughly remove the turbidity and blur caused by scattering compared to other algorithms. The visualization result even exceeds that of the reference picture, which further demonstrates that our algorithm has strong generalization ability and can handle a variety of different degradation scenarios. Furthermore, we conducted tests on recently popular diffusion models and Transformer-based enhancement models. The results are shown in Figure 3 in detail. It can be seen that our method performs excellently in two mainstream scenarios of turbidity and color cast. There are no problems of incorrect color correction or introduction of other noises. The images restored by our method have colors that are closer to the normal visual results, and the dark details are richer. Moreover, the method proposed in this study demonstrates significant performance advantages in multiple metrics. It achieved the highest score (0.640) in the UCIQE metric, fully reflecting its outstanding performance in image enhancement. Meanwhile, our method also obtained the highest score in the key color fidelity metric CCF, indicating a breakthrough improvement in its color restoration ability. In addition, in the UIQM and FDUM metrics, our method lags behind the first place by only small margins (0.024 and 0.061), further verifying that this method has considerable competitiveness compared with the current state-of-the-art algorithm architectures. To prove the effectiveness of our method, we conducted a statistical significance analysis. By performing a paired t-test with the Semi-UIR method (the metrics results of Semi-UIR are also superior to those of other methods compared in this paper), it can be concluded that the p-value < 0.05. This indicates that the method we proposed shows a significant advantage in UCIQE. It also proves the rationality of the improvement of our method in the evaluation metrics.

In the EUVP dataset’s visualization result Figure 4, our method has also achieved outstanding results. In terms of quantitative indicators, our method outperforms other methods in the three indicators of UCIQE, CCF, and FDUM, and is only 0.024 lower than Semi-UIR in UIQM. When further analyzing the visualization results, it can be found that our method can restore the colors in the image more accurately and significantly improve the clarity of image details. In contrast, some existing methods fail to completely eliminate the influence of the underwater environment on the image during processing. The results in the second row indicate that these methods have deficiencies in eliminating underwater blur or color shift. Our method successfully overcomes this problem, not only being able to effectively restore the true colors of objects, but also highlighting the vividness of the main color, making it closer to the visual effect in the natural state. Whether for the color adjustment of objects in complex scenes or the distinction and processing of background and foreground details, our method shows higher accuracy and reliability. For example, in the image in the third row, our method successfully restores the vivid color of the red coral and retains its texture details. These results indicate that our method is not only competitive in quantitative indicators but also can provide higher-quality image enhancement effects in practical applications.

In addition, to further explore the real-time processing performance and efficiency of the model, we further calculated the parameters and GFLOPS of the model proposed in this paper and other models. The results are shown in Table 4. In terms of the key indicators of model efficiency, the method proposed in this paper is significantly superior to mainstream deep learning methods. Regarding the number of parameters (Param), although our method (1.01 M) is slightly higher than the lowest Shallow-UWnet (0.22 M), it is much lower than NU²Net (3.15 M, −68%), UColor (157.24 M), and HCLR-Net (4.87 M, −79%). In terms of computational complexity (GFLOPS), our method (132.13 G) only requires 2.3% of the HCLR-Net method (5651.99 G). At the same time, it is reduced by 68.8% and 70.2% compared with PUIE-Net (423.04 G) and UColor (443.84 G), respectively. Although Shallow-UWnet has the lowest computational amount (43.26 G), its performance is lower than that of our method. The method proposed in this paper achieves a balance between accuracy and efficiency at a relatively low computational cost, and has the best deployment potential while maintaining high performance. For edge devices or low computing power platforms [48], this algorithm can be better applied.

4.5. Ablation Experiments

In this section, we conduct an ablation experiment to evaluate the module performance of the proposed UWMambaNet framework. This framework consists of two core components, namely, the detail enhancement module and the color correction module, both of which play significant roles in the overall performance. By removing each module sequentially, we analyze its impact on the overall ability and effectiveness of the framework, the visualization result is shown in Figure 5. The experimental results show that the method proposed in this paper can significantly improve the visualization quality of underwater images. Specifically, under the complete framework, the processed underwater images exhibit excellent performance in color restoration and detail enhancement, which fully validates the effectiveness of this framework in addressing the problems of low contrast and color distortion in underwater images. However, when the color correction module in the framework is removed, the color restoration ability of the images significantly decreases. This phenomenon indicates that the color correction module plays a key role in the framework, and its main function is to correct the color deviation caused by the absorption and scattering effects of water. For example, after removing the color correction module in the second row of images, a significant color degradation problem occurs, manifested as an obvious blue shift phenomenon. The existence of the color correction module helps rebalance the color distribution in the image, thereby improving the consistency between the processed image and the real scene.

On the other hand, when the detail branch is removed, both the image’s detail performance and color quality are compromised. This occurs because the detail branch not only enhances texture and edge information but also plays a role in fine-tuning colors. In essence, there exists a synergistic relationship between the detail branch and the color branch; their collaboration is essential to achieving optimal image enhancement. For instance, in the first row of haze images, omitting the detail branch during processing results in incomplete removal of haze-like degradation and introduces a subtle red color cast, which detracts from the overall quality. This demonstrates the interdependence of the color and detail branches as described in this paper.

The above visualized results are reflected in the quantitative experimental results Table 5. When the color or detail branches are not used, all the indicators are much lower than the method proposed in this paper, which further confirms the effectiveness of our method. To sum up, the framework proposed in this paper successfully addresses multiple challenges in underwater image visualization through reasonable division of labor and collaboration among modules. The results of the ablation experiments fully demonstrate the importance of each module and its contribution to the overall performance, and also provide a valuable reference direction for future research.

5. Conclusions

Underwater imaging tasks suffer from a significant decline in image quality due to the absorption and scattering effects of light, which affects the effectiveness of subsequent tasks. To address this issue, this paper proposes a dual-path underwater image enhancement network based on the Transformer architecture. This model consists of two modules, each responsible for different aspects of image processing: one module focuses on restoring the true color and contrast of the image, and the other is dedicated to enhancing image details. In this framework, the global detail enhancement module captures the texture information in the image by introducing multi-scale receptive fields. Small-scale receptive fields pay attention to local details such as object edges and microstructures, while large-scale receptive fields understand the overall layout and macroscopic features of the scene. This approach ensures that the model can identify and retain important details within a wide range. The adaptive color correction module utilizes the channel attention mechanism to calculate the weights of the compressed feature maps and evaluate the importance of each channel for color restoration. The system automatically reinforces key features and reduces the influence of irrelevant features, thereby maintaining excellent color restoration ability in complex underwater environments. Experimental results show that this method outperforms existing algorithms on multiple underwater image enhancement datasets, demonstrating superior visual effects.

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z. and X.Y. and Z.C.; validation, Y.Z. and X.Y.; investigation, Y.Z. and Z.C.; writing—original draft preparation, Y.Z. and X.Y.; writing—review and editing, Z.C.; supervision, Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Science and Technology Development Fund of Macau under Grant 0159/2024/AMJ, and in part by the Guangzhou Development District International Cooperation Project under Grant 2023GH01.

Data Availability Statement

Data are available in a publicly accessible repository https://irvlab.cs.umn.edu/resources/euvp-dataset and https://li-chongyi.github.io/proj_benchmark.html (accessed on 20 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Bogue, R. Underwater robots: A review of technologies and applications. Ind. Robot. Int. J. 2015, 42, 186–191. [Google Scholar] [CrossRef]
Bovio, E.; Cecchi, D.; Baralli, F. Autonomous underwater vehicles for scientific and naval operations. Annu. Rev. Control 2006, 30, 117–130. [Google Scholar] [CrossRef]
Zhu, D. Underwater image enhancement based on the improved algorithm of dark channel. Mathematics 2023, 11, 1382. [Google Scholar] [CrossRef]
Anwar, S.; Li, C. Diving Deeper into Underwater Image Enhancement: A Survey. arXiv 2019, arXiv:1907.07863. [Google Scholar] [CrossRef]
Wu, S.; Sun, B.; Yang, X.; Han, W.; Tan, J.; Gao, X. Reconstructing the Colors of Underwater Images Based on the Color Mapping Strategy. Mathematics 2024, 12, 1933. [Google Scholar] [CrossRef]
Raveendran, S.; Patil, M.D.; Birajdar, G.K. Underwater image enhancement: A comprehensive review, recent trends, challenges and applications. Artif. Intell. Rev. 2021, 54, 5413–5467. [Google Scholar] [CrossRef]
Yuan, J.; Cai, Z.; Cao, W. TEBCF: Real-World Underwater Image Texture Enhancement Model Based on Blurriness and Color Fusion. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4204315. [Google Scholar] [CrossRef]
Yuan, J.; Cai, Z.; Cao, W. A Novel Underwater Detection Method for Ambiguous Object Finding via Distraction Mining. IEEE Trans. Ind. Inform. 2024, 20, 9215–9224. [Google Scholar] [CrossRef]
Zhou, J.; He, Z.; Zhang, D.; Liu, S.; Fu, X.; Li, X. Spatial Residual for Underwater Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 4996–5013. [Google Scholar] [CrossRef]
Zhang, W.; Chen, G.; Zhuang, P.; Zhao, W.; Zhou, L. CATNet: Cascaded attention transformer network for marine species image classification. Expert Syst. Appl. 2024, 256, 124932. [Google Scholar] [CrossRef]
Cheng, Z.; Fan, G.; Zhou, J.; Gan, M.; Chen, C.L.P. FDCE-Net: Underwater Image Enhancement With Embedding Frequency and Dual Color Encoder. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 1728–1744. [Google Scholar] [CrossRef]
Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
Wang, H.; Sun, S.; Chang, L.; Li, H.; Zhang, W.; Frery, A.C.; Ren, P. INSPIRATION: A reinforcement learning-based human visual perception-driven image enhancement paradigm for underwater scenes. Eng. Appl. Artif. Intell. 2024, 133, 108411. [Google Scholar] [CrossRef]
Chao, D.; Li, Z.; Zhu, W.; Li, H.; Zheng, B.; Zhang, Z.; Fu, W. AMSMC-UGAN: Adaptive Multi-Scale Multi-Color Space Underwater Image Enhancement with GAN-Physics Fusion. Mathematics 2024, 12, 1551. [Google Scholar] [CrossRef]
Guan, M.; Xu, H.; Jiang, G.; Yu, M.; Chen, Y.; Luo, T.; Zhang, X. DiffWater: Underwater image enhancement based on conditional denoising diffusion probabilistic model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 2319–2335. [Google Scholar] [CrossRef]
Du, D.; Li, E.; Si, L.; Zhai, W.; Xu, F.; Niu, J.; Sun, F. UIEDP: Boosting underwater image enhancement with diffusion prior. Expert Syst. Appl. 2025, 259, 125271. [Google Scholar] [CrossRef]
Song, J.; Xu, H.; Jiang, G.; Yu, M.; Chen, Y.; Luo, T.; Song, Y. Frequency domain-based latent diffusion model for underwater image enhancement. Pattern Recognit. 2025, 160, 111198. [Google Scholar] [CrossRef]
Qing, Y.; Liu, S.; Wang, H.; Wang, Y. DiffUIE: Learning Latent Global Priors in Diffusion Models for Underwater Image Enhancement. IEEE Trans. Multimed. 2025, 27, 2516–2529. [Google Scholar] [CrossRef]
Li, F.; Li, W.; Zheng, J.; Wang, L.; Xi, Y. Contrastive Feature Disentanglement via Physical Priors for Underwater Image Enhancement. Remote Sens. 2025, 17, 759. [Google Scholar] [CrossRef]
Ancuti, C.; Ancuti, C.O.; Haber, T.; Bekaert, P. Enhancing underwater images and videos by fusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 81–88. [Google Scholar] [CrossRef]
Zhuang, P.; Wu, J.; Porikli, F.; Li, C. Underwater Image Enhancement With Hyper-Laplacian Reflectance Priors. IEEE Trans. Image Process. 2022, 31, 5442–5455. [Google Scholar] [CrossRef]
Zhang, W.; Zhuang, P.; Sun, H.H.; Li, G.; Kwong, S.; Li, C. Underwater Image Enhancement via Minimal Color Loss and Locally Adaptive Contrast Enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef] [PubMed]
Zhang, T.; Su, H.; Fan, B.; Yang, N.; Zhong, S.; Yin, J. Underwater Image Enhancement Based on Red Channel Correction and Improved Multiscale Fusion. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4205120. [Google Scholar] [CrossRef]
Zhang, W.; Zhou, L.; Zhuang, P.; Li, G.; Pan, X.; Zhao, W.; Li, C. Underwater Image Enhancement via Weighted Wavelet Visual Perception Fusion. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 2469–2483. [Google Scholar] [CrossRef]
Trucco, E.; Olmos-Antillon, A.T. Self-tuning underwater image restoration. IEEE J. Ocean. Eng. 2006, 31, 511–519. [Google Scholar] [CrossRef]
Galdran, A.; Pardo, D.; Picón, A.; Alvarez-Gila, A. Automatic red-channel underwater image restoration. J. Vis. Commun. Image Represent. 2015, 26, 132–145. [Google Scholar] [CrossRef]
Berman, D.; Levy, D.; Avidan, S.; Treibitz, T. Underwater Single Image Color Restoration Using Haze-Lines and a New Quantitative Dataset. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2822–2837. [Google Scholar] [CrossRef]
Naik, A.; Swarnakar, A.; Mittal, K. Shallow-uwnet: Compressed model for underwater image enhancement (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 15853–15854. [Google Scholar]
Peng, Y.T.; Cosman, P.C. Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef]
Peng, L.; Zhu, C.; Bian, L. U-shape transformer for underwater image enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar] [CrossRef]
Guan, M.; Xu, H.; Jiang, G.; Yu, M.; Chen, Y.; Luo, T.; Song, Y. WaterMamba: Visual state space model for underwater image enhancement. arXiv 2024, arXiv:2405.08419. [Google Scholar]
Lin, W.T.; Lin, Y.X.; Chen, J.W.; Hua, K.L. PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement. In Proceedings of the Asian Conference on Computer Vision (ACCV), Hanoi, Vietnam, 8–12 December 2024; pp. 3622–3637. [Google Scholar]
Lu, S.; Guan, F.; Zhang, H.; Lai, H. Underwater image enhancement method based on denoising diffusion probabilistic model. J. Vis. Commun. Image Represent. 2023, 96, 103926. [Google Scholar] [CrossRef]
Zhang, Y.; Yuan, J.; Cai, Z. DCGF: Diffusion-Color-Guided Framework for Underwater Image Enhancement. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4201012. [Google Scholar] [CrossRef]
Tang, Y.; Kawasaki, H.; Iwaguchi, T. Underwater image enhancement by transformer-based diffusion model with non-uniform sampling for skip strategy. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 5419–5427. [Google Scholar]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
Islam, M.J.; Xia, Y.; Sattar, J. Fast Underwater Image Enhancement for Improved Visual Perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Yang, M.; Sowmya, A. An Underwater Color Image Quality Evaluation Metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef]
Panetta, K.; Gao, C.; Agaian, S. Human-Visual-System-Inspired Underwater Image Quality Measures. IEEE J. Oceanic Eng. 2016, 41, 541–551. [Google Scholar] [CrossRef]
Wang, Y.; Li, N.; Li, Z.; Gu, Z.; Zheng, H.; Zheng, B.; Sun, M. An imaging-inspired no-reference underwater color image quality assessment metric. Comput. Electr. Eng. 2018, 70, 904–913. [Google Scholar] [CrossRef]
Yang, N.; Zhong, Q.; Li, K.; Cong, R.; Zhao, Y.; Kwong, S. A reference-free underwater image quality assessment metric in frequency domain. Signal Process. Image Commun. 2021, 94, 116218. [Google Scholar] [CrossRef]
Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2017, 27, 379–393. [Google Scholar] [CrossRef]
Li, C.; Anwar, S.; Hou, J.; Cong, R.; Guo, C.; Ren, W. Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Trans. Image Process. 2021, 30, 4985–5000. [Google Scholar] [CrossRef]
Guo, C.; Wu, R.; Jin, X.; Han, L.; Zhang, W.; Chai, Z.; Li, C. Underwater ranker: Learn which is better and how to be better. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 702–709. [Google Scholar]
Zhou, J.; Sun, J.; Li, C.; Jiang, Q.; Zhou, M.; Lam, K.M.; Zhang, W.; Fu, X. HCLR-Net: Hybrid contrastive learning regularization with locally randomized perturbation for underwater image enhancement. Int. J. Comput. Vis. 2024, 132, 4132–4156. [Google Scholar] [CrossRef]
Huang, S.; Wang, K.; Liu, H.; Chen, J.; Li, Y. Contrastive Semi-supervised Learning for Underwater Image Restoration via Reliable Bank. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18145–18155. [Google Scholar]
Fu, Z.; Wang, W.; Huang, Y.; Ding, X.; Ma, K.K. Uncertainty Inspired Underwater Image Enhancement. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 465–482. [Google Scholar]
Yuhan, Z.; Qin, J.; Guo, Z.; Jiang, K.; Cai, S. Detection of road surface crack based on PYNQ. In Proceedings of the 2020 IEEE International Conference on Mechatronics and Automation (ICMA), Beijing, China, 13–16 October 2020; pp. 1150–1154. [Google Scholar]

Figure 1. The structural diagram of UWMambaNet: proposed in this paper. This framework processes the image through two branches of color and detail, respectively.

Figure 2. Qualitative analysis of the proposed approach in comparison with leading-edge methods on the UIEB dataset.

Figure 3. Qualitative analysis of the proposed approach against recent diffusion-based or Transformer-based methods using the UIEB dataset.

Figure 4. Qualitativeanalysis of the proposed approach in comparison with leading-edge methods on the EUVP dataset.

Figure 5. The outcomes of the ablation studies on the framework introduced in this paper clearly demonstrate that the approach presented here can significantly enhance the visualization quality of underwater images. In the absence of the color module, the reconstruction of image colors will worsen; meanwhile, without the detail module, both color accuracy and image details are negatively impacted.

Table 1. Numerical evaluation of the proposed approach against cutting-edge methods using the UIEB dataset. Bolding represents the best indicator result.

Method	UCIQE	UIQM	CCF	FDUM
Raw	0.520	1.157	20.514	0.447
Fusion 2018	0.592	1.345	20.053	0.539
NU²Net	0.598	1.259	20.694	0.510
Shallow-UWnet	0.508	1.084	18.955	0.382
UColor	0.569	1.210	18.181	0.479
PUIE-Net	0.585	1.271	22.300	0.524
Semi-UIR	0.617	1.386	27.890	0.631
HCLR-Net	0.613	1.350	26.319	0.610
Ours	0.640	1.382	39.840	0.628

Table 2. Numerical evaluation of the proposed approach against cutting-edge methods using the EUVP dataset. Bolding represents the best indicator result.

Method	UCIQE	UIQM	CCF	FDUM
Raw	0.550	1.303	30.267	0.445
Fusion 2018	0.589	1.405	28.311	0.486
NU²Net	0.606	1.389	29.123	0.523
Shallow-UWnet	0.562	1.296	25.972	0.440
UColor	0.585	1.335	28.970	0.491
PUIE-Net	0.594	1.352	29.438	0.502
Semi-UIR	0.617	1.471	40.001	0.625
HCLR-Net	0.618	1.411	36.929	0.555
Ours	0.632	1.447	45.652	0.564

Table 3. Numerical evaluation of the proposed approach against recent diffusion-based or Transformer-based methods using the UIEB dataset. Bolding represents the best indicator result.

Method	UCIQE	UIQM	CCF	FDUM
Raw	0.520	1.157	20.514	0.447
UIEDP	0.621	1.401	26.782	0.646
DiffWater	0.601	1.424	26.782	0.676
DMWater	0.603	1.361	27.062	0.553
DCGF	0.627	1.509	36.563	0.588
U-Shape	0.569	1.343	23.732	0.495
Semi-UIR	0.617	1.386	27.890	0.631
Ours	0.640	1.382	39.840	0.628

Table 4. Comparison of different UIE models in terms of GFLOPs (G) and parameters (M). The downward arrow indicates that the lower the indicator, the better. Bolding represents the best indicator result.

Method	Param (M) ↓	GFLOPS (G) ↓
NU²Net	3.15 M	46.33 G
Shallow-UWnet	0.22 M	43.26 G
UColor	157.24 M	443.84 G
PUIE-Net	1.40 M	423.04 G
Semi-UIR	1.68 M	550.76 G
HCLR-Net	4.87 M	5651.99 G
Ours	1.01M	132.13 G

Table 5. Ablation study of various modules and loss functions on the UIEB dataset.

Method	Evaluation Metric
Method	UCIQE	UIQM	CCF	FDUM
−w/o detail branch	0.575	1.350	28.179	0.542
−w/o color branch	0.568	1.341	25.600	0.569
Full Model	0.640	1.382	39.840	0.628

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Yu, X.; Cai, Z. UWMambaNet: Dual-Branch Underwater Image Reconstruction Based on W-Shaped Mamba. Mathematics 2025, 13, 2153. https://doi.org/10.3390/math13132153

AMA Style

Zhang Y, Yu X, Cai Z. UWMambaNet: Dual-Branch Underwater Image Reconstruction Based on W-Shaped Mamba. Mathematics. 2025; 13(13):2153. https://doi.org/10.3390/math13132153

Chicago/Turabian Style

Zhang, Yuhan, Xinyang Yu, and Zhanchuan Cai. 2025. "UWMambaNet: Dual-Branch Underwater Image Reconstruction Based on W-Shaped Mamba" Mathematics 13, no. 13: 2153. https://doi.org/10.3390/math13132153

APA Style

Zhang, Y., Yu, X., & Cai, Z. (2025). UWMambaNet: Dual-Branch Underwater Image Reconstruction Based on W-Shaped Mamba. Mathematics, 13(13), 2153. https://doi.org/10.3390/math13132153

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

UWMambaNet: Dual-Branch Underwater Image Reconstruction Based on W-Shaped Mamba

Abstract

1. Introduction

2. Related Works

2.1. Traditional Underwater Enhancement

2.2. Learning-Based Underwater Image Enhancement

3. Method

3.1. Mamba Block

3.2. Detail Enhancement Branch

3.3. Color and Contrast Restoration Branch

3.4. Output Fusion and Residual Reconstruction

3.5. Loss Function

4. Experimental Analysis and Discussion

4.1. Experimental Environment and Setup Details

4.2. Experimental Datasets

4.3. Evaluation Metrics

4.4. Results Analysis

4.5. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI