DualOadamNet: Dual-Branch Lightweight Network for Underwater Image Processing with Optical-Aware Detail Augmentation

Zhan, Siyang; Yu, Jianyong; Li, Dong

doi:10.3390/ai7030088

Open AccessArticle

DualOadamNet: Dual-Branch Lightweight Network for Underwater Image Processing with Optical-Aware Detail Augmentation

by

Siyang Zhan

¹,

Jianyong Yu

^1,2,*

and

Dong Li

¹

Sanya Institute of Hunan University of Science and Technology, Sanya 572024, China

²

Hunan Provincial Key Laboratory of Intelligent Control and Maintenance for Complex Systems, Hunan University of Science and Technology, Xiangtan 411201, China

^*

Author to whom correspondence should be addressed.

AI 2026, 7(3), 88; https://doi.org/10.3390/ai7030088

Submission received: 9 January 2026 / Revised: 16 February 2026 / Accepted: 25 February 2026 / Published: 2 March 2026

Download

Browse Figures

Versions Notes

Abstract

The deep water environment is complex and variable, and underwater images are easily affected by light scattering, water absorption and other factors, resulting in blurred details and color distortion. The existing enhancement methods generally suffer from poor model generalization, parallel task conflicts, and imbalance between real-time performance and optimization effect. To address this challenge, this study proposes a dual-branch lightweight network named DualOadamNet for underwater image processing with optical-aware detail augmentation. The model is based on the branches of global and local feature extraction, and the Optical-Aware Detail Augmentation Module with the characteristics of human visual simulation repair is introduced to repair the image naturally. Combined with pixel rearrangement operation, the model achieves efficient feature scale extraction. The experimental results on UIEB, EUVP and LSUI datasets show that the proposed method achieves average full-color evaluation metrics of 24.63 for PSNR, 0.919 for SSIM, and 3.267 for UIQM. Additionally, the real-time enhancement speed of a 1080p resolution image is 85.299 FPS.

Keywords:

underwater image enhancement; dual-branch network; optical-aware detail; lightweight model; real-time processing

1. Introduction

The underwater environment is complex and changeable. Due to the light attenuation and the scattering effect of sea water, the acquisition of high-quality underwater visual information faces severe challenges, such as underwater scene turbidity, image scattering degradation, texture ambiguity, and so on. Underwater image processing is one of the core technologies in key fields such as marine resource exploration, underwater machinery navigation, and underwater archaeology. Therefore, underwater image enhancement has become a major technical challenge to promote marine research and exploration [1].

At present, underwater image processing methods are mainly divided into two categories: traditional non-deep-learning methods and deep-learning methods. The traditional methods are based on prior knowledge of physical model imaging or the image, and directly modify the degradation model through mathematical analysis and physical modeling. However, the underwater environment is disordered, and methods customized for specific physical scenes are difficult to adapt to other diverse scenes, resulting in limited generalization ability [2]. Deep-learning methods are data-driven and use network architectures such as CNN and Transformer to learn the mapping relationship between degraded images and clear images, which can significantly improve the efficiency and quality of underwater image optimization. These methods have become the mainstream of current research, but they still bring major challenges to computer vision in aspects such as collaborative optimization degradation, real-time processing and transmission, and complex lightweight image data [3,4].

Although existing methods have made some progress, there are still many unsolved key problems in the underwater image enhancement task. First, due to the unique scattering and blurring effects of the underwater environment, multi-dimensional degradation problems of underwater images, such as optical color distortion, noise interference, detail texture loss, and color deviation loss, are intertwined [5,6]. Most tasks adopt parallel processing mechanisms or directly ignore certain problems to pursue extreme lightweight models on a dataset. Parallel processing of images can easily cause task conflicts, and optimization on a single dataset may lead to overfitting and weak generalization, resulting in an imbalance of the enhancement effect. Secondly, image super-resolution reconstruction can obtain high-resolution images from low-resolution images, but it cannot eliminate serious color distortion and semantic loss in poor-quality underwater images [1], which may also lead to interference between color correction and detail retention, affecting visual naturalness. The traditional denoising module tends to blur edge details while suppressing noise, while color enhancement modules mostly use Sigmoid or ReLU activation alone, which suffer from gradient disappearance or hard truncation, making smooth color transition difficult. Finally, the weight distribution of existing fusion strategies is not balanced enough, which can easily lead to excessive enhancement or the loss of original information, making the enhanced image lack realism. These problems seriously restrict the practical application of underwater image enhancement technology.

To address the aforementioned issues, this study proposes a Dual-Branch Optical-Aware Detail Augmentation Network (DualOadamNet), which implements a multi-dimensional image degradation restoration mechanism through targeted innovations. To resolve the multi-task conflict problem, an Optical-Aware Detail Augmentation Module (OADAM) is designed. A serial pipeline of “optical restoration → multi-scale denoising → image reconstruction → color enhancement” is adopted to simulate the human visual restoration process, where each stage focuses on a single type of degradation to avoid mutual interference.

For color decoupling and detail processing, a dual-branch architecture is constructed. The global branch realizes color balance and optical distortion repair through OADAM, and the local branch focuses on detail preservation. The collaborative integration of the two is achieved by combining them with the Global Contrast Weighting correction mechanism (optimized from the CDR mechanism) [7]. To address the limitations of denoising and color enhancement, a collaborative mechanism of multi-scale denoising and channel-by-channel edge preservation, along with an adaptive color enhancement module based on Softplus, is designed to solve problems of blurred edges and unnatural color transition, respectively. Meanwhile, the contrast constraint fusion strategy and hierarchical depth feature fusion are optimized to ensure balanced enhancement and the integrity of the original features.

The main contributions of this study are summarized as follows:

1.: A dual-branch optical sensing network architecture is proposed, with the core being the serial design of the OADAM module, which realizes precise staged repair of multiple types of degradation and ensures independent optimization of color and details through dual-branch decoupling.
2.: A series of specialized function modules are designed, including a Multi-scale Denoising Module and Softplus Adaptive Color Enhancement Module, to solve core defects of existing methods in denoising and color correction.
3.: The image fusion and feature transfer mechanisms are optimized, and the optimized contrast constraint adaptive fusion strategy is used to improve the integrity and naturalness of image enhancement.
4.: The real-time architecture of the model is optimized while maintaining good image optimization quality and quantitative performance, ensuring lightweight real-time processing on 720p and 1080p images.

Comparative experiments demonstrate that the proposed model effectively handles multi-dimensional degradation of underwater images and outperforms existing mainstream methods in color correction, detail preservation, and visual naturalness.

2. Related Work

Mainstream research on underwater image processing focuses on two research directions: image restoration and enhancement, and efficient real-time deployment. Existing methods are mainly divided into two categories: traditional non-deep-learning methods and deep-learning methods. Among them, deep-learning-based schemes have become the mainstream of current research due to their stronger generalization ability [8]. Relevant research can be discussed from three dimensions: feature extraction structure, degradation repair strategy, and lightweight optimization direction.

2.1. Traditional Non-Deep Learning Methods

This method, also called manual image processing, relies on physical imaging models or prior knowledge and corrects specific degradation problems through mathematical modeling. For example, methods based on Gamma Correction theory directly adjust pixel values using mathematical functions to improve the contrast and brightness distribution of underwater images, without relying on underwater optical physical models [9]. Variant algorithms based on histogram equalization (such as L-CLAHE) can improve image contrast [10], but these methods are limited by fixed mathematical or physical models, making it difficult to adapt to complex underwater environments with light scattering, noise, and detail loss, and they perform poorly in muddy water or low-light scenes.

In addition, traditional fusion methods optimize visual effects through multi-scale feature superposition. For example, the underwater image decomposition and fusion framework SPDF comprehensively improves the three major components of image HSV [11,12]. However, the weight distribution lacks self-adaptability, which easily leads to over-enhancement or blurred details, making it difficult to balance color correction and detail retention.

2.2. Deep-Learning Methods

With the development of deep learning, convolutional neural networks (CNNs) became the mainstream architecture for underwater image enhancement. For instance, FUnIE-GAN uses a normalization module to enhance stability via generative adversarial network (GAN) learning degradation maps, but the single-branch structure struggles to decouple color correction and detail repair, leading to cross-conflict between these tasks [13]. Shallow-UWnet achieves real-time lightweight performance through a shallow architecture design, but due to its extreme light weight and high efficiency, the feature extraction capability is insufficient, resulting in degraded performance under severe scattering conditions [14].

To address the limitations of single-branch architectures, multi-branch structures have gradually attracted attention. Some models handle brightness and texture features through two branches, but most adopt parallel optimization mechanisms, and the branch function boundaries are unclear. Multi-task conflict still exists, making it difficult to achieve collaborative optimization of global color balance and local detail clarity.

Focusing on optical distortion and detail loss, existing deep-learning methods follow two main optimization ideas:

Physical-model-guided enhancement: integrates underwater light attenuation and scattering models to improve repair accuracy, but complex physical modeling increases computational overhead, limiting real-time deployment.
Data-driven multi-dimensional collaborative optimization: e.g., ASANnet uses fusion channel attention and adaptive normalization modules to optimize color distribution; and DICAM enhances detail extraction through deep Inception structures [15,16]. However, these methods often rely on the parallel processing of multiple degradation dimensions (noise, color, and detail), which can lead to an imbalance in the enhancement effect. Existing detail augmentation modules also rely on complex attention mechanisms or dense connections, improving performance but significantly increasing parameter count and inference delay, which conflicts with real-time deployment requirements.

2.3. Lightweight and Real-Time Optimization

Lightweight and real-time optimization are key for practical underwater image enhancement. Existing research mainly achieves this by simplifying network structures, using efficient convolutions, or optimizing up/down sampling strategies. For example, LiteEnhanceNet uses depthwise separable convolution as the core architecture, reducing computation, but its high memory access frequency limits FPS in high-resolution image processing [17]. DNnet proposes a pixel-rearrangement bottleneck (PBP) structure to improve parallel computing efficiency and realize real-time 4K enhancement. However, it lacks specialized modules for optical distortion and detail loss, limiting performance in low-light or turbid conditions [7]. Ultra-lightweight models like Fa+Net achieve rapid inference via extreme parameter compression, but the network depth is greatly reduced, sacrificing generalization and complex processing ability, as well as noise suppression and detail augmentation [18].

2.4. Feature Fusion and Normalization

Optimizing feature fusion and normalization strategies is important to improve enhancement effects. Most existing methods use simple element-wise addition or concatenation for branch fusion, but the weight distribution lacks self-adaptability, easily causing loss of original features or over-enhancement. BatchNorm and LayerNorm are widely used for stable training; BatchNorm normalizes spatial dimensions, and LayerNorm focuses on channels, making them insufficient to handle color imbalance or uneven local illumination in underwater images. Some models combine multiple normalization methods, but they are not linked to dynamic range information, making it difficult to suppress over- or under-enhancement accurately.

2.5. Limitations and Proposed Solution

Existing underwater image enhancement methods still face three core limitations:

1.: Lack of multi-branch function or unclear task division between branches, making it difficult to decouple color correction and detail repair, resulting in task conflict.
2.: Optical distortion and detail augmentation are mostly processed in parallel, lacking a natural progressive repair mechanism.
3.: Difficulty in balancing lightweight design and performance: complex architectures improve performance but sacrifice real-time capability, while overly simplified models cannot handle complex degradations.

To address these problems, this paper proposes the Dual-Branch Optical Perception Detail Augmentation Network (DualOadamNet). Through decoupled feature extraction with dual branches of clear functions, the network accurately repairs multi-dimensional degradation with the progressive OADAM module. Combined with a lightweight structural design and an adaptive fusion strategy, it achieves synergistic optimization of performance and real-time inference capability.

3. Methods

The method of this study consists of five parts, and the focus is to improve the problem of missing details, color distortion and noise interference of underwater images by combining the complementary extraction of double-branch features and multi-stage feature enhancement. Firstly, the visual feature preprocessing module performs the down sampling operation on the original image by using PixelUnshuffle, which splits the image into local pixel blocks and reduces the amount of calculation while preserving local features. The Dual-branch feature extraction module is the core feature extraction module: the local texture feature branch uses

3 \times 3

convolution to extract the spatial texture and edge features of the image, and the channel feature global branch uses

1 \times 1

convolution to fuse the channel correlation features within the pixel block, in order to achieve the complementary feature extraction of spatial details + channel color. Then, the OADAM inputs the global features of the channel into the sublayer of this module in turn: the Optical Recovery Layer, Image Detail Restoration Layer, Multi-Scale Denoising Layer and Color Enhancement Layer to repair the optical distortion, noise and color imbalance of the underwater image. In addition, in order to reduce the impact of excessive enhancement and uneven brightness, this study optimized and used the Global–Local Contrast Constraining Module to calculate the global–local contrast weight of the image, guided the fusion process of double-branch features, fused the enhanced channel global features and local spatial texture features element by element, restored the original size of the enhanced image through PixelShuffle, and finally, used the contrast weighting and residual fusion to perform FAN Adaptive Normalization to output the final enhanced image.

3.1. Problem Definition

This study utilizes the classic underwater dataset UIEB, as well as the large-scale underwater datasets LSUI and EUVP. The goal of underwater image enhancement can be defined as follows: Given a low-quality underwater image

I \in R^{3 \times H \times W}

(3-channel RGB image with dimensions

H \times W

), a high-quality enhanced image

I_{t} \in R^{3 \times H \times W}

is generated with the network mapping function

f (\cdot)

. This

I_{t}

should be close to the real clear underwater image

I_{G T}

(from the labeled Test.GT file in the dataset) in terms of visual effects (details, color, and contrast) and quantitative metrics (PSNR, SSIM, MSE, and UIQM). The enhancement process is shown in Equation (1):

I_{t} = f_{θ} (I), min_{θ} L (f_{θ} (I), I_{G T})

(1)

where

θ

is the learnable parameter of the network and

L

is the loss function.

The core constraints are as follows. (1) The quantitative index of enhanced image

I_{t}

needs to be better than the existing lightweight model. (2) Model parameters

\leq 1

M, 1080p image delay

\leq 30

ms, i.e., greater than or equal to 30 fps, which can adapt to underwater real-time visual processing tasks. (3) The enhanced image has no excessive enhancement, color distortion, heavy noise interference or other problems, and the visual naturalness of the image should be close to the real scene.

3.2. Overall Model Architecture

The design of DualOadamNet focuses on the two core ideas of “multi-degradation precise repair and lightweight real-time reasoning”, and adopts a five-stage architecture of “preprocessing–double-branch extraction–progressive enhancement–contrast constraint–Fusion restoration”. Each module cooperates to solve the core degradation problem of underwater images. The model architecture is shown in Figure 1.

1.: Feature Preprocessing: the input image is split by PixelUnshuffle to reduce the amount of calculation while preserving the local feature structure, which lays the foundation for decoupling the “global color” and “local detail” features of the two branches;
2.: Double-Branch Feature Extraction: the global color branch extracts the optical features (brightness/color) of the channel dimension through $1 \times 1$ convolution, and the local branch extracts the detailed features (edge/texture) of the spatial dimension through $3 \times 3$ convolution, so as to realize the decoupling of multi-branch tasks;
3.: Optical-Aware Detail Augmentation Module, OADAM: perform “optical restoration → multi-scale denoising → image restoration → color enhancement” series enhancement on the global color branch features, solve the single degradation problem in stages, and avoid the conflict of image optimization tasks;
4.: Two-Branch Element-Wise Feature Fusion: the enhanced global color channel features and local detail features are fused element by element, and the fused features are restored to the original image size by PixelShuffle;
5.: Global–Local Contrast Constraining Module: the global contrast weight is generated through the global contrast learning module GLC to adaptively enhance the fused features; then, the feature adaptive normalization module FAN performs the final local normalization, constrains the range of pixel values, and outputs the final enhanced image.

3.3. Core Module Design

3.3.1. Image Preprocessing

The pixel rearrangement coefficient m is set to adapt to diverse real-time performance requirements, and the original input image

I \in R^{3 \times H \times W}

is processed by the PixelUnshuffle operation as shown in Equation (2):

I_{patch} = PixelUnshuffle (I, m) \in R^{3 m^{2} \times H / m \times W / m}

(2)

where I represents the original input image, and the block tensor feature

I_{patch}

is obtained after pixel unshuffling. The number of channels is expanded from 3 to

3 m^{2}

, the spatial size is reduced from

H \times W

to

H / m \times W / m

, and the calculation amount is reduced to

1 / m^{2}

of the original calculation amount, which significantly improves the reasoning speed of the model and ensures the real-time performance of the preprocessing stage.

3.3.2. Channel-Branch Feature Extraction Model

Aiming at the problem that color correction and detail preservation interfere with each other, a dual-branch structure decoupling task is designed to prevent the interference between parallel enhancement tasks and realize the parallel optimization of global color balance and local detail preservation. The specific design is as follows.

Global channel color branching: the task division is to extract the optical features of the channel dimension and repair the problems of color deviation and uneven brightness. Therefore,

1 \times 1

convolution is used to fuse the channel information. The nature and lightness of the

1 \times 1

convolution without a spatial receptive field can not only ensure that the global dimension is not affected by the spatial details of the edge texture, but also further reduce the amount of model calculation. The Tanh constraint feature range

[- 1, 1]

can avoid gradient explosion on the basis of extracting color deviation, as shown in Equations (3) and (4):

I_{channel_global} = Tan h ({Conv}_{1 \times 1} (I_{patch}))

(3)

I_{channel_global_feature} = {Conv}_{1 \times 1} (ReLU (I_{channel_global}))

(4)

Among them, the block feature

I_{patch}

performs global channel feature extraction through

1 \times 1

convolution, and uses the hyperbolic tangent function Tanh constraint to capture bidirectional color deviation to obtain the channel dimension global feature

I_{channel_global}

. Activate the

I_{channel_global}

executive correction linear unit activation function ReLU to suppress negative noise, screen positive color correction features, and avoid the subsequent enhancement of additional noise interference after Tanh capture. Finally, the effective features filtered by the ReLU activation function are recovered by using the second

1 \times 1

convolution, and the abstract channel dimension features are transformed into the channel dimension global color correction feature

I_{channel_global_feature}

(hereinafter referred to as IG) that can be directly color-corrected through effective weight adjustment, which can ensure the fusion of subsequent module enhancement and local detail features.

Local channel detail branching: the task division is to extract the texture features of the spatial dimension and preserve the image edge and detail structure. Therefore,

3 \times 3

convolution is used to cover the local pixel association. The local pixel fusion feature and local spatial receptive field of

3 \times 3

convolution can adapt to the spatial distribution characteristics of underwater image details, that is, the gray level and color mutation in the neighborhood of local pixels, can initially smooth the random noise of an underwater image, and can also form a strict bidirectional decoupling of color details with

1 \times 1

convolution. Use edge copy to fill replicate to protect edges, as shown in Equations (5) and (6):

I_{channel_local} = Tanh ({Conv}_{3 \times 3} (I_{patch}))

(5)

I_{channel_local_feature} = {Conv}_{3 \times 3} (ReLU (I_{channel_local}))

(6)

Among them, the block feature

I_{patch}

performs local channel feature extraction through

3 \times 3

convolution and uses Tanh constraint to obtain the channel dimension local feature

I_{channel_local}

. The Tanh constraint and ReLU activation are still used because the double function combination can still filter forward detail features and suppress noise to a certain extent. In the local details, ReLU function can also add nonlinear mapping to the features, so that the model can learn the underwater nonlinear degradation law, that is, the difference of texture ambiguity under different turbidities and the difference of contrast attenuation under different depths.

3 \times 3

convolution is the optimal balance between efficiency and feature expression in the convolutional neural network. The secondary

3 \times 3

convolution is used to recover the effective features, make up for the possible loss of texture in weak details after ReLU filtering, restore the feature dimension to obtain the channel dimension local detail feature

I_{channel_local_feature}

(hereinafter referred to as IL), and ensure the subsequent integration with the globally optimized color features.

3.3.3. Optical-Aware Detail Augmentation Module (OADAM)

Aiming at the core problem of “task conflict caused by multi-dimensional degradation parallel processing”, this study designed the Optical-Aware Detail Augmentation Module (OADAM) to simulate the process of human visual restoration. The module first solves the basic degradation problem through optical restoration, then optimizes the detail effect via multi-scale denoising and image restoration, and finally strengthens the visual effect via image color enhancement. Each stage focuses on a single type of degradation to achieve accurate image restoration. The flow chart of the data flow module is shown in Figure 2.

Figure 2 shows the processing flow of the OADAM module for the global color correction feature

I_{G}

: the input

I_{G}

flows through four functional modules in turn along the data stream. The details are as follows:

1.: Optical Recovery Layer
The core objective of this layer is to solve the optical distortion caused by the scattering and absorption of underwater light. The module structure is shown in Figure 3.

Figure 3 shows the specific process of optical recovery features through high-dimensional mapping, double normalization collaborative optimization of feature distribution, and residual connection retaining original features after low-dimensional restoration in the optical repair layer. The details of the process are shown in Formulas (7)–(9).

I L_{R e l u} = LeakyRelu ({Conv}_{3 \times 3} (I_{G}))

(7)

I L_{N o r m} = LayerNorm (BatchNorm (I L_{R e l u}))

(8)

I_{o p t i c a l} = {Conv}_{3 \times 3} (I L_{N o r m}) + I_{G}

(9)

Among them, BatchNorm and LayerNorm are batch normalization and layer normalization functions, LeakyRelu is a linear correction unit with leakage, and the

3 \times 3

convolutional neural network is used to upgrade the dimension of

I_{G}

. The LeakyRelu activation function allows a small number of negative gradients to pass through, avoiding the occurrence of dead neurons, and outputs the global activation characteristic tensor

I L_{R e l u}

. BatchNorm performs batch normalization on

I L_{R e l u}

to reduce the internal covariate shift. LayerNorm performs layer normalization to balance the feature distribution of each channel, and adapts to the color imbalance between underwater image channels to obtain the global normalized feature

I L_{N o r m}

. Finally,

3 \times 3

convolution dimensionality reduction processing is performed, and

I_{G}

is fused through residual connection to output the optical recovery feature

I_{o p t i c a l}

.

1.: Multi-Scale Denoising Layer
The core goal of this layer is to remove the underwater noise pollution and solve the contradiction with the blurred edge details. The module structure is shown in Figure 4.

Figure 4 shows the specific process of multi-scale de-noising in the Multi-scale Denoising Layer by implementing the pooling de-noising strategy through the down-sampling mechanism and bilinear interpolation restoration, using channel-by-channel edge detection for fuzzy edges, and finally connecting the residuals to achieve multi-scale de-noising. The details of the process are shown in Formulas (10)–(13).

I_{d o w n} = MaxPool ({Conv}_{3 \times 3} (I_{o p t i c a l}))

(10)

I_{u p} = Upsample ({Conv}_{3 \times 3} (I_{d o w n}))

(11)

I_{e d g e} = Laplacian_in_channels (I_{o p t i c a l})

(12)

I_{d e n o i s e} = I_{o p t i c a l} - I_{u p} + 0.15 \times I_{e d g e}

(13)

The fixed 0.15 coefficient for Laplacian edge superposition is determined via grid search (step size = 0.05, search range = [0.05, 0.3]) on the validation sets of the UIEB dataset. Using PSNR and SSIM as evaluation metrics, it aims to balance noise suppression and edge preservation. Sensitivity analysis on coefficients 0.1, 0.15, 0.2, and 0.25 shows that the model achieves the optimal trade-off when the coefficient is 0.15.

Among them, after extracting the local features of the upper input optical recovery feature

I_{o p t i c a l}

by

3 \times 3

convolution, MaxPool downsampling reduces the feature size, suppresses high-frequency noise, and outputs the coarse-grained denoising feature

I_{d o w n}

. Then,

3 \times 3

convolution is used to perform feature restoration, and bilinear interpolation upsampling is used to restore to the original size to generate the upsampling restore feature

I_{u p}

. For the direction of edge enhancement, the Laplacian edge detection kernel is used, and the optimal weight parameters instead of dynamic parameters are used to reduce the amount of calculation. Laplacian_in_channels is used for channel-by-channel edge detection of

I_{o p t i c a l}

through grouped convolution, highlighting the detailed edge features of the image and outputting the edge enhancement feature

I_{e d g e}

. Finally, the channel residual operation is used to suppress the noise. At the same time, the

0.15

times edge feature superposition strategy is used to balance the effect of denoising and edge preservation, and the denoised edge enhancement feature

I_{d e n o i s e}

is output.

1.: Image Detail Restoration Layer
The core objective of this layer is to solve the blurred details left after noise removal. The module structure is shown in Figure 5.

Figure 5 shows the specific process of the Image Detail Restoration Layer by using a lightweight Encoder–Decoder structure, using only four layers of convolution (no pooling operation), while ensuring the detail restoration ability, controlling the parameter quantity, and restoring image details by expanding channel activation and shrinking channel restoration. The details of the process are shown in Formulas (14) and (15).

I_{e n c o d e} = ReLU ({Conv}_{3 \times 3} (ReLU ({Conv}_{3 \times 3} (I_{d e n o i s e}))))

(14)

I_{r e s t o r e} = ReLU ({Conv}_{3 \times 3} (ReLU ({Conv}_{3 \times 3} (I_{e n c o d e})))))

(15)

In the encoding phase, the number of feature channels is increased step by step through two layers of

3 \times 3

convolution to extract the deep details of

I_{d e n o i s e}

, and ReLU introduces the nonlinear expression to output the high-order encoding feature

I_{e n c o d e}

. In the decoding stage, the number of feature channels is reduced step by step through two layers of

3 \times 3

convolution, so that the high-dimensional coding is mapped back to the original dimension and the detail structure is restored, and the detail restoration feature

I_{r e s t o r e}

is output. This scheme discards the complex attention mechanism and dense connection, and uses a simple Encoder–Decoder structure to realize detail restoration.

1.: Color Enhancement Layer
The core goal of this layer is to correct the color offset and solve the unnatural problem of color transition. The module structure is shown in Figure 6.

Figure 6 shows the specific process of the Color Enhancement Layer through high-dimensional expansion, Softplus smoothing function activation, and low-dimensional restoration in the color enhancement layer. The details of the process are shown in Formula (16).

I_{e n h a n c e} = Softplus ({Conv}_{3 \times 3} (ReLU ({Conv}_{3 \times 3} (I_{r e s t o r e}))))

(16)

Among them, two layers of

3 \times 3

convolution and ReLU activation function form a feature-thinning link to extract the color features in the detail restoration feature

I_{r e s t o r e}

. Finally, the Softplus activation function (Softplus

(x) = ln (1 + e^{x})

) is used to replace the traditional Sigmoid and ReLU. Its output range is

(0, + \infty)

, which is more flexible and smooth than other activation functions. This scheme can realize the smooth enhancement of color, avoid the color fault caused by hard truncation, and output the color enhancement feature

I_{e n h a n c e}

.

Compared with traditional Sigmoid and ReLU, Softplus avoids color distortion via its inherent mathematical properties: Sigmoid’s gradient vanishing in the saturation region causes hard color truncation and discontinuous transition, while ReLU’s direct discard of negative values leads to weak color signal loss and gradient-mutation-induced color discontinuity, whereas Softplus’s smooth, non-saturating

(0, + \infty)

output preserves color brightness gradient continuity and avoids hard clipping. This advantage is indirectly verified by existing quantitative results: on UIEB/EUVP/LSUI datasets, DualOadamNet (m = 3) achieves UIQM values of 3.31/3.23/3.26, outperforming Sigmoid/ReLU-based baselines like FunIE-GAN and DNnet on most datasets. As UIQM quantifies color balance and visual naturalness, its higher scores confirm Softplus-enabled color enhancement reduces distortion and enables more natural color transition than Sigmoid/ReLU.

The final output of the OADAM module is the residual fusion result, as shown in Formula (17). The residual fusion of the original global color feature

I_{G}

and

I_{e n h a n c e}

not only preserves the basic structure of the original color information, but also superimposes the enhanced color details to avoid feature distortion in the enhancement process. The output of the OADAM module finally enhances the feature

I_{O A D A M}

:

I_{O A D A M} = I_{G} + I_{e n h a n c e}

(17)

3.3.4. Two-Branch Element-Wise Feature Fusion Model

The function of this module is to fuse the double-branch features, restore the original image size, and lay the foundation for the subsequent contrast constraint. The specific process is shown in Formula (18):

I_{p r e} = PixelShuffle (I_{O A D A M} ⊙ I_{L}, m) + I

(18)

Among them, the OADAM module finally multiplies the enhanced feature

I_{O A D A M}

and the local detail branch feature

I_{L}

element by element to achieve mutual guidance between color enhancement and detail retention, and precise matching optimization. Then, the fused features are mapped back to the original spatial dimension through the PixelShuffle operation, and residual fusion is performed with the original image (where m is used as the PixelShuffle coefficient to control the scale of model parameters), which can avoid excessive enhancement and loss of the original structure. The output fusion enhancement feature

I_{p r e}

prepares for the subsequent global contrast adjustment.

3.3.5. Global–Local Contrast Constraining Module

Aiming at the problems of unbalanced weight and excessive enhancement of the fusion strategy, the Global–Local Contrast Constraining Module (GLC-FAN) is designed to integrate Global Contrast Weighting (GLC) and Feature Adaptive Contrast Normalization (FAN), and execute in the order of “global adjustment → local normalization”, so as to balance the enhancement effect and visual naturalness.

The purpose of global contrast weighting is to generate an adaptive global weight

ω

, improve the brightness of low-contrast areas and suppress the overexposure of high-contrast areas. The specific process is shown in Formulas (19) and (20):

\begin{matrix} Δ d & = max (Flatten ({AvgPool}_{8 \times 8} (I))) - min (Flatten ({AvgPool}_{8 \times 8} (I))) + 1 \times 10^{- 4} \end{matrix}

(19)

\begin{matrix} ω_{g l c} & = 0.5 \times (1 - \frac{Δ d}{4}) + 0.5 \end{matrix}

(20)

Among them, the

8 \times 8

average pooling

{AvgPool}_{8 \times 8}

is used to extract the regional brightness of the original input I, and the one-dimensional Flatten expansion is used to facilitate the calculation of the grid brightness extremum. For each grid, the brightness difference

Δ d

is calculated, with

1 \times 10^{- 4}

added to avoid division by 0. Formula (20) converts the difference between light and shade into a global contrast weight

ω_{g l c}

in

[0.5, 1]

for weight adjustment feature enhancement. The use of

ω_{g l c}

is shown in Formula (21):

I_{g l c} = ω_{g l c} \cdot I_{p r e} + 0.2 \cdot I

(21)

Among them,

I_{p r e}

is the fusion enhancement feature of the previous module. Multiplying by the weight

ω_{g l c}

achieves the brightening of dark areas and suppression of overexposure. Superimposing 0.2 times the original image I retains the original image scene structure and outputs the global contrast weighted feature

I_{g l c}

.

After GLC adjustment, some pixel values may exceed the normal image range, resulting in a dazzling image and color loss. If the pixel value exceeds

[0, 1]

, the local adaptive normalized FAN is used to scale the pixel values, and the original features are fused to retain details. The calculation process is shown in Formulas (22)–(24):

Δ d^{'} = max ({AvgPool}_{6 \times 6} (I_{g l c})) - min ({AvgPool}_{6 \times 6} (I_{g l c})) + 0.01

(22)

I_{F_n o r m} = \frac{I_{g l c} - min ({AvgPool}_{6 \times 6} (I_{g l c}))}{Δ d^{'}}

(23)

I_{t} = 0.7 \times Clamp (I_{F_n o r m}, 0, 1) + 0.3 \times I_{g l c}

(24)

Among them,

6 \times 6

average pooling

{AvgPool}_{6 \times 6}

is performed on

I_{g l c}

to extract the local brightness, and the local contrast difference

Δ d^{'}

is calculated with a

0.01

constant to avoid division by 0. Then, each grid is locally normalized, stretching or compressing the brightness range to

[0, 1]

to obtain the locally normalized feature

I_{F_n o r m}

. Finally, the truncation function Clamp is used to limit exceeded pixels. A 7:3 residual fusion strategy is adopted: 70% of the normalized natural image and 30% of the enhanced features with GLC. This ensures the natural look of the image while retaining the enhanced details, outputting the final enhanced image

I_{t}

.

3.4. Loss Function Design

This study uses a combined loss function, which consists of three parts.

Color column loss: In the underwater environment, the difference in the absorption of water to different wavelengths of light will lead to serious color deviation of the image, which will affect the visual naturalness of the enhanced image. The ColorColumnLoss function can force the model output to be consistent with the color distribution of the real label in the three RGB channels, so as to achieve accurate color correction. The loss function is a simplified color column loss function, and its mathematical expression is:

L_{simple_color} = \sum_{c \in {R, G, B}} [MSE (μ_{t, c}, μ_{G T, c}) + MSE (σ_{t, c}, σ_{G T, c})]

(25)

where

μ_{t, c}

and

σ_{t, c}

denote the mean and standard deviation of the c-th channel of the model output

I_{t}

, and

μ_{G T, c}

,

σ_{G T, c}

are those of the ground truth

I_{G T}

.

Scattering loss: The scattering effect of underwater light can easily lead to blurred image details and serious decline in contrast, which is one of the core difficulties of underwater image enhancement. The design goal of the ScatterLoss function is to suppress scattering noise and enhance the definition of image edges and texture details by constraining the gray difference between the model output and the label in the detail area. Its mathematical expression is:

L_{scatter} = L_{1} (\nabla^{2} I_{t}, \nabla^{2} I_{G T})

(26)

where

\nabla^{2}

represents the Laplacian edge detection operator, and

L_{1} (\cdot, \cdot)

denotes the pixel-wise L1 loss for edge feature consistency.

Basic loss: It can improve the robustness of the loss function and avoid gradient explosion or overfitting problems in the process of model training. This paper introduces the basic loss function L1 loss officially provided by PyTorch 2.8.0 as the basic regression loss to constrain the overall pixel difference between the model output and the real label, and its mathematical expression is:

L_{1} = L_{1} (I_{t}, I_{G T})

(27)

In this paper, the simplified color column loss

L_{simple_color}

, scattering loss

L_{scatter}

and basic loss

L_{1}

are fused by weighting, where

λ_{1}

,

λ_{2}

and

λ_{3}

are constants with values in

[0, 1]

. The weight coefficients are determined by **grid search** (step size = 0.1) on the validation sets of UIEB, EUVP and LSUI datasets. Taking the average PSNR, SSIM and UIQM as the evaluation criterion, the optimal values are obtained as

λ_{1} = 0.4

,

λ_{2} = 0.3

and

λ_{3} = 0.3

, following the principle of “color correction first, scattering detail enhancement second”. The three coefficients satisfy

λ_{1} + λ_{2} + λ_{3} = 1

, forming the final combined loss

L_{combine}

:

L_{combine} = λ_{1} L_{simple_color} + λ_{2} L_{scatter} + λ_{3} L_{1}, (λ_{1} + λ_{2} + λ_{3} = 1)

(28)

4. Experiment

This section introduces the experimental setup details and datasets, and displays the quantitative analysis and ablation experimental results. In addition, the efficiency and real-time performance of this research model are tested.

4.1. Experimental Setup and Dataset

This study adopted the NVIDIA GeForce RTX 4070Ti SUPER graphics card for model training. The training image size was uniformly resized to

256 \times 256

pixels. During the image preprocessing stage, the pixel values of the three RGB channels were normalized to the range of

[0, 1]

, the channel order was retained in RGB format, and no additional color space conversion was performed. The model performance was trained and evaluated based on the standard subsets of three public datasets (UIEB, EUVP, and LSUI): specifically, UIEB followed the official partition with 890 training images and 60 test images, while EUVP and LSUI adhered to the built-in training/test subset division rules of the datasets.

For data augmentation, only 100% horizontal flipping (full flipping) was applied, which was executed in real-time during training; all augmentation strategies were completely disabled in the testing phase to ensure the authenticity of evaluation, and no other augmentation strategies (e.g., random cropping, brightness perturbation) were introduced to avoid altering the degradation distribution characteristics of the original data.

In terms of optimization strategy, the AdamW optimizer was used (weight decay coefficient set to

1 \times 10^{- 4}

, beta parameters adopted the default PyTorch values of

(0.9, 0.999)

, and weight decay was only applied to the weights of convolutional layers (not to biases or the gamma/beta parameters of BN/LayerNorm layers)). The initial learning rate was set to

8 \times 10^{- 5}

, and the learning rate scheduling adopted the cosine annealing strategy with the minimum learning rate reduced to

5 \times 10^{- 6}

(no learning rate warm-up phase was used).

The model was preset to train for a total of 50 epochs, with a training batch size of 8 and a test batch size of 1. Additionally, an early stopping mechanism was implemented: if the SSIM (Structural Similarity Index) and PSNR (Peak Signal-to-Noise Ratio) metrics (calculated based on the pixel values of three RGB channels) on the validation set showed no improvement for seven consecutive epochs, early stopping would be triggered to terminate training, and the optimal model weights at the time of early stopping would be saved (judged by the comprehensive improvement of the two metrics) to avoid model overfitting.

To ensure training stability, a gradient norm clipping strategy was adopted with a maximum gradient norm of 1.5 to suppress gradient explosion. Model initialization followed unified rules: convolutional layers used He normal initialization (Kaiming Normal), and Batch Normalization (BN) layers were initialized with

gamma = 1

and

beta = 0

.

The pixel rearrangement coefficient

m \in {1, 2, 3}

was used to adjust the channel expansion and spatial compression ratio during the visual feature preprocessing stage. All models corresponding to different m values adopted unified training parameters, test standards, and training environments. Furthermore, all experiments strictly followed the official standard training subsets of each public dataset for model training, and the standard test subsets for performance verification, ensuring the fairness of performance and efficiency comparison.

All evaluation metrics were calculated based on color images (instead of grayscale images), with specific calculation rules as follows: SSIM was implemented in a single-scale manner (window size of

11 \times 11

, standard deviation of 1.5), PSNR was calculated based on the mean squared error (MSE) of three RGB channels, UIQM (Underwater Image Quality Measure) adopted the original official implementation (no parameter modification), MSE was the average of pixel-level mean squared errors across three channels, and the optimal model weight of each indicator was saved. The following is the test dataset:

1.: UIEB dataset [19]: UIEB contains 950 real underwater images, 890 of which provide clear visual reference images manually screened, and the remaining 60 are used as challenging test samples. These images cover diverse underwater environments such as oceans, lakes and artificial pools, and show rich target categories such as coral reefs, fish, underwater buildings and aquatic plants.
2.: EUVP dataset [13]: A multi-functional underwater image dataset, which contains about 11,000 composite subset images, converts clear underwater images to underwater degraded versions through physical models, and provides accurate pixel-level-matching GT, with about 1100 real-world collected images. The dataset design is based on underwater optical physical models, and can simulate degradation under different “water types” (clear water, turbid water, and extremely turbid water), shooting distance and lighting conditions.
3.: LSUI dataset [20]: Low-light underwater dataset. LSUI contains 5004 pairs of images (original low-light image + clear reference image), and is the largest of the three datasets. All images are collected under real low-ight conditions, with special attention to weak-light environments such as deep-sea, night and muddy waters.

To evaluate the performance of the model, DualOadamNet will conduct quantitative and qualitative comparative tests with six current advanced methods:

GC [9]: A mathematical technique for adjusting image brightness and contrast through nonlinear pixel value transformation. The basic module commonly used in the current underwater image enhancement model can solve the problems of dark detail loss and partial contrast imbalance in underwater images.
LU2net [21]: A new U-shaped network specially designed for real-time underwater image enhancement. The proposed model combines the axial depth convolution and channel attention module, which can significantly reduce the computational requirements and model parameters, so as to improve the processing speed.
LiteEnhanceNet [17]: A model for single underwater image enhancement. The network uses deep separable convolution as the main building block to reduce the computational complexity. Single aggregation joining is used to effectively extract the features of the lower and middle layers. In addition, the appropriate activation function and extrusion excitation module are integrated in the appropriate position in the network to reduce the computational complexity.
FunIE-GAN [13]: An underwater image enhancement technology based on visual perception fusion. The method is divided into three stages: color correction, contrast enhancement and multi-task fusion. In color correction, the relationship between statistical properties and the analyzed color channels is combined to construct an adaptive compensation method to achieve color correction. The advanced multi-scale decomposition method is used to enhance the gray information of the l-channel.

Shallow-Uwnet [14]: A hybrid architecture model using a full convolution network + dense links + residual mechanism can optimize the efficiency of feature utilization while avoiding redundant computing. In addition, the lightweight design of the mechanism of the model greatly reduces the computational parameters.
DNnet [7]: A lightweight neural network for real-time enhancement of high-resolution underwater images, which is based on pixel rearrangement, FAN normalization and CDR dynamic coordination to improve the efficiency of image optimization, suppress excessive enhancement and balance the brightness distribution of images, so as to achieve the balance between high-resolution underwater images and real-time enhancement.

4.2. Experimental Evaluation

4.2.1. Comparison of Evaluation Indicators

In order to effectively evaluate the image quality generated by this model, four standard evaluation indicators, namely mean square error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and underwater image quality measurement (UIQM), are used.

1.: MSE: The classic full reference index to measure the pixel-level difference of an image is used to quantify the global distortion by calculating the mean square difference of the gray value of the corresponding pixels between the original image and the enhanced image. This index is efficient and differentiable, and is one of the basic optimization objectives of the deep learning model.
2.: PSNR: Based on the full reference quality index derived from MSE, the image fidelity is evaluated by quantifying the ratio of signal peak power to noise power. This index is regarded as the core objective evaluation standard in any resolution image compression and enhancement task, especially suitable for measuring the global brightness and color distortion.
3.: SSIM: Based on the full reference index of the characteristics of the human visual system (HVS), the image similarity is quantified from the three dimensions of brightness, contrast and structure to fit human subjective perception. In recent years, it has been widely used in the field of underwater image enhancement, which is very suitable for evaluating the structure detail retention ability of the enhanced image. Its basic form is the most commonly used standard evaluation index in the paper.
4.: UIQM: Aiming at the no-reference quality index of an underwater image, without the original clear image as a reference, the typical problems of low contrast, blue-green color deviation and blurred details of an underwater image are fully adapted by fusing the three sub-indexes of color richness, sharpness and contrast. Its weight coefficient has been verified by a large number of underwater datasets, which can effectively distinguish the color balance and detail recognizability of enhanced images, and is the core evaluation index in the field of underwater image enhancement in recent years.

The experimental results of comparative evaluation on the UIEB dataset are shown in Table 1.

The experimental results of comparative evaluation on the EUVP dataset are shown in Table 2:

The experimental results of comparative evaluation on the LSUI dataset are shown in Table 3:

The real-time measurement results on the UIEB classic underwater dataset are shown in the line charts in Figure 7, Figure 8 and Figure 9:

The higher the FPS (Frames Per Second) and the lower the FLOPS (Floating-Point Operations Per Second), the better the real-time performance of the model and the lower the required computing power. Additionally, Figure 9 presents the single-image generation time (in milliseconds) of each model under different resolutions; shorter inference time indicates higher processing efficiency and better suitability for latency-sensitive application scenarios. It is generally accepted in the industry that real-time image processing can be achieved when the frame rate of an image enhancement method exceeds 30 FPS, which corresponds to a maximum single-image processing time of approximately 33 milliseconds. To ensure the auditability and reproducibility of real-time performance metrics, the key experimental settings are specified as follows: All FPS and FLOPS tests were conducted with a test batch size of 1 (consistent with practical single-image processing scenarios) and FP32 single-precision inference to guarantee numerical stability; the input/output (I/O) condition was configured as the direct GPU memory input—all test images were preloaded into the memory of the NVIDIA GeForce RTX 4070Ti SUPER graphics card, eliminating interference from disk read/write latency. It should be noted that the timing scope is strictly limited to the model’s forward propagation phase, excluding preprocessing steps and post-processing steps.

4.2.2. Visual Quality Comparison

The underwater visual quality comparison between this model and the other six models is shown in Figure 10 and Figure 11:

4.3. Ablation Experiment

The Ablation Experiment of this study is divided into three aspects. The first is the quantitative analysis and comparison of multi-branch module ablation, which is used to verify the optimization of the multi-branch strategy for image details and color. The second is the quantitative analysis and comparison of OADAM module ablation, which is used to verify the effect of the OADAM natural branch strategy on image optimization. The third is the ablation of the minimal Baseline model, to quantify the performance improvement brought by DualOadamNet’s core innovative designs on the basic lightweight skeleton. All ablation experiments are conducted with training and testing on standard subsets, and the hyperparameters remain fixed throughout the experiments; the parameter scale is m = 1, which is verified by using three datasets of UIEB, EUVP and LSUI.

4.3.1. Channel-Branch Feature Extraction Model Ablation Experiment

The ablation experiment removes the channel-branch feature extraction structure, retains the use of global color features in the OADAM model, and directly carries out multi-granularity global contrast fusion, called GlobalOADAMNet. The quantitative evaluation results are shown in Table 4:

4.3.2. OADAM Model Ablation Experiment

The ablation experiment removes the OADAM module, retains the global color features and local detail features, and directly conducts multi-granularity global comparison and fusion, called DualNet. The quantitative evaluation results are shown in Table 5:

4.3.3. Baseline

A minimal Baseline model is constructed to quantify the performance gain of DualOadamNet’s core innovations (dual-branch feature decoupling, OADAM progressive optical-aware enhancement, and GLC-FAN global–local contrast constraint). It only retains the basic lightweight skeleton consistent with DualOadamNet, including PixelUnshuffle/PixelShuffle operation with

m = 1

and single-branch basic convolution feature extraction, and removes all custom-designed specialized modules in this study. Adopting a simplified single-branch pipeline with fixed-weight residual connection (without adaptive feature fusion or progressive degradation repair mechanisms), the Baseline uses completely consistent experimental settings with the above two ablation experiments: the same training hyperparameters, evaluation metrics (MSE, PSNR, SSIM, and UIQM) and verification on the UIEB, EUVP and LSUI datasets. The consistent lightweight skeleton ensures the fairness of comparison, which is used to verify the effective performance improvement brought by the integrated design of DualOadamNet’s core modules. The quantitative evaluation results are shown in Table 6:

4.4. Analysis of Experimental Results

A radar chart of the quantitative experimental results of this research model and the current advanced six models is shown in Figure 12.

Figure 12 shows the average experimental results of the seven models, including the proposed model, evaluated on five metrics: MSE, PSNR, SSIM, UIQM, and FPS. It should be noted that the MSE values are displayed on a reversed scale for consistency with the other indicators.

From the quantitative comparison results on the three datasets, it can be observed that DualOadamNet consistently outperforms LiteEnhanceNet, FunIE-GAN, Shallow-UWnet, and other mainstream comparison methods under different pixel rearrangement coefficients m, and exhibits a significant performance advantage over GC. These results verify the effectiveness of the dual-branch decoupling strategy and the OADAM module in addressing the multi-dimensional degradation characteristics of underwater images. The gradient experiments on the parameter m further demonstrate that increasing the value of m effectively reduces computational complexity by compressing the spatial dimension, without sacrificing the core enhancement performance. Meanwhile, channel expansion enhances both color correction and detail restoration, providing flexible parameter configurations for real-time deployment on underwater mobile devices.

From the real-time measurement results, it can be seen that the proposed model achieves a higher FPS than most comparison methods, while its FLOPS and parameter scale (Params = 0.307 M) remain well below the upper threshold of lightweight models. The model is capable of efficiently handling lightweight underwater image enhancement tasks at both 720p and 1080p resolutions. Across four resolutions (720p, 1080p, 2K, and 4K), the proposed method maintains a favorable balance between FPS and FLOPS, demonstrating strong potential for future lightweight real-time applications at higher resolutions.

From the visual quality comparison across the three datasets, the proposed model shows clear superiority over GC, LiteEnhanceNet, Shallow-UWnet, and DNnet in terms of visual fidelity and restoration quality, while achieving performance comparable to FunIE-GAN and LU2net. These results indicate that DualOadamNet effectively addresses color distortion, light scattering, underwater blur, and noise, producing enhanced images with clear structures and natural appearance.

Finally, the quantitative results of both the multi-branch module ablation experiments and the OADAM module ablation experiments demonstrate that DualOadamNet achieves the best overall performance. These experiments confirm that branch decoupling fusion and global color-aware optical perceptual detail enhancement are both critical components of the model. Neither component can be removed without degrading performance, as they jointly enable the effective restoration and enhancement of key underwater image features, thereby forming the foundation for the model’s superior performance.

5. Conclusions

This study proposes a lightweight underwater image processing network with dual-branch optical-aware detail enhancement, named DualOadamNet. The proposed model adopts a dual-branch fusion strategy that integrates global color features and local detail features. To address the issues of parallel task conflict and limited generalization ability in underwater image processing, an Optical-Aware Detail Augmentation Module (OADAM) is designed to perform sequential and natural image restoration.

To further improve real-time performance, pixel rearrangement, a simplified encoder–decoder architecture, and a fixed Laplacian kernel are introduced to reduce computational complexity while preserving enhancement quality. Extensive experiments conducted on three benchmark datasets, along with comprehensive ablation studies, demonstrate the superiority of the proposed model in underwater image enhancement tasks.

In future work, we plan to further optimize the image enhancement module and explore more advanced and high-performance feature extraction and processing methods to obtain more accurate and visually natural underwater images. In addition, we intend to extend the proposed framework to real-time 4K image enhancement and underwater video processing applications.

Author Contributions

Conceptualization, J.Y. and S.Z.; methodology, J.Y.; software, S.Z.; validation, S.Z., J.Y. and D.L.; formal analysis, S.Z.; investigation, S.Z.; resources, J.Y.; data curation, S.Z.; writing—original draft preparation, S.Z.; writing—review and editing, J.Y. and D.L.; visualization, S.Z.; supervision, J.Y.; project administration, J.Y.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. U25A20426). The APC was funded by the National Natural Science Foundation of China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this study are publicly available. The UIEB, EUVP and LSUI underwater image datasets employed in the research can be accessed from their official public repositories.

Acknowledgments

Special thanks are extended to Jianyong Yu for his invaluable guidance and support throughout the entire research process, and to all members of the Underwater Vision Research Group for their valuable discussions and assistance in research and experimental validation. Additionally, the authors acknowledge the providers of the UIEB dataset (Underwater Image Enhancement Benchmark), EUVP dataset (Enhanced Underwater Vision Dataset), and LSUI dataset (Low-Light Underwater Image Dataset) for making their data publicly available, which has laid a solid foundation for conducting the experimental work in this paper.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Xia, H.; Bao, B.; Liao, F.; Chen, J.; Wang, B.; Li, Z. A patch-based method for underwater image enhancement with denoising diffusion models. IEEE Trans. Cybern. 2024, 55, 269–281. [Google Scholar] [CrossRef] [PubMed]
Shi, X.; Wang, Y.G. CPDM: Content-preserving diffusion model for underwater image enhancement. Sci. Rep. 2024, 14, 31309. [Google Scholar] [CrossRef] [PubMed]
Khan, R.; Kulkarni, A.; Phutke, S.S.; Murala, S. Underwater image enhancement with phase transfer and attention. In Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN); IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
Zhou, J.; Gai, Q.; Zhang, D.; Lam, K.-M.; Zhang, W.; Fu, X. IACC: Cross-illumination awareness and color correction for underwater images under mixed natural and artificial lighting. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
Zhang, D.; Zhou, J.; Guo, C.; Zhang, W.; Li, C. Synergistic multiscale detail refinement via intrinsic supervision for underwater image enhancement. Proc. AAAI Conf. Artif. Intell. 2024, 38, 7033–7041. [Google Scholar] [CrossRef]
Zhou, J.; Liu, Q.; Jiang, Q.; Ren, W.; Lam, K.-M.; Zhang, W. Underwater camera: Improving visual perception via adaptive dark pixel prior and color correction. Int. J. Comput. Vis. 2023, 133, 8215–8233. [Google Scholar] [CrossRef]
Cao, T.; Yu, Z.; Zheng, B. DNnet: A lightweight network for real-time 4K underwater image enhancement using dynamic range and average normalization. Expert Syst. Appl. 2025, 270, 126561. [Google Scholar] [CrossRef]
Cong, X.; Zhao, Y.; Gui, J.; Hou, J.; Tao, D. A comprehensive survey on underwater image enhancement based on deep learning. arXiv 2024, arXiv:2405.19684. [Google Scholar] [CrossRef]
Lai, Y.L.; Ang, T.F.; Bhatti, U.A.; Ku, C.S.; Han, Q.; Por, L.Y. Color correction methods for underwater image enhancement: A systematic literature review. PLoS ONE 2025, 20, e0317306. [Google Scholar] [CrossRef] [PubMed]
Dhanya, P.R.; Anilkumar, S.; Balakrishnan, A.A.; Supriya, M.H. L-CLAHE intensification filter (L-CIF) algorithm for underwater image enhancement and colour restoration. In Proceedings of the 2019 International Symposium on Ocean Technology (SYMPOL); IEEE: Piscataway, NJ, USA, 2019; pp. 117–128. [Google Scholar]
Kang, Y.; Jiang, Q.; Li, C.; Ren, W.; Liu, H.; Wang, P. A perception-aware decomposition and fusion framework for underwater image enhancement. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 988–1002. [Google Scholar] [CrossRef]
Hou, G.; Zhao, X.; Pan, Z.; Yang, H.; Tan, L.; Li, J. Benchmarking underwater image enhancement and restoration, and beyond. IEEE Access 2020, 8, 122078–122091. [Google Scholar] [CrossRef]
Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Naik, A.; Swarnakar, A.; Mittal, K. Shallow-UWnet: Compressed model for underwater image enhancement. Proc. AAAI Conf. Artif. Intell. 2021, 35, 15853–15854. [Google Scholar] [CrossRef]
Park, C.W.; Eom, I.K.; Kim, J.H.; Lee, S.Y.; Park, J.W.; Choi, Y.S.; Kim, H.J.; Lee, J.H.; Oh, S.H.; Kim, Y.J.; et al. Underwater image enhancement using adaptive standardization and normalization networks. In Engineering Applications of Artificial Intelligence; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Tolie, H.F.; Ren, J.; Elyan, E. DICAM: Deep inception and channel-wise attention modules for underwater image enhancement. Neurocomputing 2024, 584, 127585. [Google Scholar] [CrossRef]
Zhang, S.; Zhao, S.; An, D.; Li, D.; Zhao, R. LiteEnhanceNet: A lightweight network for real-time single underwater image enhancement. Expert Syst. Appl. 2024, 240, 122546. [Google Scholar] [CrossRef]
Jiang, J.; Ye, T.; Bai, J.; Chen, S.; Chai, W.; Jun, S.; Liu, Y.; Chen, E. FA⁺Net: You only need 9K parameters for underwater image enhancement. arXiv 2023, arXiv:2305.08824. [Google Scholar]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
Peng, L.; Zhu, C.; Bian, L. U-shape Transformer for underwater image enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar] [CrossRef] [PubMed]
Yang, H.; Xu, J.; Lin, Z.; He, J. LU2Net: A lightweight network for real-time underwater image enhancement. arXiv 2024, arXiv:2406.14973. [Google Scholar]

Figure 1. DualOadamNet Model Framework Diagram.

Figure 2. Modular Flow Chart of Data Processing for Optical-Aware Detail Augmentation.

Figure 3. Processing of Optical Recovery Layer.

Figure 4. Processing of Multi-Scale Denoising Layer.

Figure 5. Processing of Image Detail Restoration Layer.

Figure 6. Processing of Color Enhancement Layer.

Figure 7. FPS Performance Comparison of Different Image Enhancement Models Under Various Resolutions.

Figure 8. FLOPS Computational Power Comparison of Different Image Enhancement Models Under Various Resolutions.

Figure 9. Single Image Generation Time (ms/image) Comparison of Different Image Enhancement Models Under Various Resolutions.

Figure 10. Underwater image enhancement results of different models on the UIEB and EUVP datasets. (a) GT Standard Optimization Set, (b) Original Image Input Set, (c) DualOadamNet (ours), (d) FUnIE-GAN Model, (e) LiteEnhanceNet Model, (f) Shallow-UWnet Model, (g) DNnet Model, (h) GC Model, and (i) LU2net Model.

Figure 11. Deep sea image enhancement results of different models on the LSUI dataset. (a) GT Standard Optimization Set, (b) Original Image Input Set, (c) DualOadamNet (ours), (d) FUnIE-GAN Model, (e) LiteEnhanceNet Model, (f) Shallow-UWnet Model, (g) DNnet Model, (h) GC Model, and (i) LU2net Model.

Figure 12. Radar chart for model quantitative evaluation.

Table 1. Quantitative comparative evaluation of the UIEB dataset.

Methods	MSE (×10⁻³) ↓	PSNR ↑	SSIM ↑	UIQM ↑
GC	2.3990	15.99	0.776	2.68
LU2net	0.4410	22.99	0.889	3.11
LiteEnhanceNet	0.5523	22.32	0.904	3.18
FunIE-GAN	0.8109	19.84	0.820	3.34
Shallow-UWnet	1.2561	18.31	0.778	2.91
DNnet	0.5460	23.01	0.910	2.97
DualOadamNet (m = 1)	0.4586	23.16	0.905	3.16
DualOadamNet (m = 2)	0.4418	22.26	0.910	3.26
DualOadamNet (m = 3)	0.4071	22.58	0.913	3.31

Table 2. Quantitative comparative evaluation of the EUVP dataset.

Methods	MSE (×10⁻³) ↓	PSNR ↑	SSIM ↑	UIQM ↑
GC	3.309	14.51	0.584	2.619
LU2net	0.1720	26.60	0.904	2.920
LiteEnhanceNet	0.2091	25.56	0.849	2.88
FunIE-GAN	0.2915	24.41	0.801	2.94
Shallow-UWnet	0.2607	24.54	0.829	2.84
DNnet	0.5910	21.37	0.831	2.81
DualOadamNet (m = 1)	0.2831	24.58	0.847	3.06
DualOadamNet (m = 2)	0.2166	25.35	0.907	3.23
DualOadamNet (m = 3)	0.1915	25.89	0.917	3.23

Table 3. Quantitative comparative evaluation of the LSUI dataset.

Methods	MSE (×10⁻³) ↓	PSNR ↑	SSIM ↑	UIQM ↑
GC	1.691	16.55	0.794	2.954
LU2net	0.2340	25.66	0.868	3.014
LiteEnhanceNet	0.6821	20.46	0.811	2.88
FunIE-GAN	0.3910	23.57	0.823	3.00
Shallow-UWnet	0.9653	19.36	0.750	2.84
DNnet	0.5280	22.25	0.867	2.89
DualOadamNet (m = 1)	0.3273	24.02	0.887	3.09
DualOadamNet (m = 2)	0.2651	24.58	0.924	3.26
DualOadamNet (m = 3)	0.2505	24.84	0.927	3.26

Table 4. Quantitative comparative evaluation of partial branch ablation.

Methods (Dataset)	MSE (×10³) ↓	PSNR ↑	SSIM ↑	UIQM ↑
DualOadamNet (UIEB)	0.4586	23.16	0.905	3.16
GlobalOADAMNet (UIEB)	0.4903	23.08	0.897	3.109
DualOadamNet (EUVP)	0.2831	24.58	0.847	3.06
GlobalOADAMNet (EUVP)	0.3334	23.68	0.842	2.97
DualOadamNet (LSUI)	0.3273	24.02	0.887	3.09
GlobalOADAMNet (LSUI)	0.3698	23.61	0.876	2.99

Table 5. Quantitative comparative evaluation of OADAM module ablation.

Methods (Dataset)	MSE (×10³) ↓	PSNR ↑	SSIM ↑	UIQM ↑
DualOadamNet (UIEB)	0.4586	23.16	0.905	3.16
DualNet (UIEB)	0.4928	22.85	0.901	3.11
DualOadamNet (EUVP)	0.2831	24.58	0.847	3.06
DualNet (EUVP)	0.3086	24.19	0.841	3.01
DualOadamNet (LSUI)	0.3273	24.02	0.887	3.09
DualNet (LSUI)	0.3547	23.54	0.885	3.07

Table 6. Quantitative comparative evaluation of the Baseline module.

Methods (Dataset)	MSE (×10³) ↓	PSNR ↑	SSIM ↑	UIQM ↑
DualOadamNet (UIEB)	0.4586	23.16	0.905	3.16
Baseline (UIEB)	1101.6	18.77	0.834	3.08
DualOadamNet (EUVP)	0.2831	24.58	0.847	3.06
Baseline (EUVP)	0.4768	23.59	0.862	3.02
DualOadamNet (LSUI)	0.3273	24.02	0.887	3.09
Baseline (LSUI)	0.4662	22.82	0.878	3.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhan, S.; Yu, J.; Li, D. DualOadamNet: Dual-Branch Lightweight Network for Underwater Image Processing with Optical-Aware Detail Augmentation. AI 2026, 7, 88. https://doi.org/10.3390/ai7030088

AMA Style

Zhan S, Yu J, Li D. DualOadamNet: Dual-Branch Lightweight Network for Underwater Image Processing with Optical-Aware Detail Augmentation. AI. 2026; 7(3):88. https://doi.org/10.3390/ai7030088

Chicago/Turabian Style

Zhan, Siyang, Jianyong Yu, and Dong Li. 2026. "DualOadamNet: Dual-Branch Lightweight Network for Underwater Image Processing with Optical-Aware Detail Augmentation" AI 7, no. 3: 88. https://doi.org/10.3390/ai7030088

APA Style

Zhan, S., Yu, J., & Li, D. (2026). DualOadamNet: Dual-Branch Lightweight Network for Underwater Image Processing with Optical-Aware Detail Augmentation. AI, 7(3), 88. https://doi.org/10.3390/ai7030088

Article Menu

DualOadamNet: Dual-Branch Lightweight Network for Underwater Image Processing with Optical-Aware Detail Augmentation

Abstract

1. Introduction

2. Related Work

2.1. Traditional Non-Deep Learning Methods

2.2. Deep-Learning Methods

2.3. Lightweight and Real-Time Optimization

2.4. Feature Fusion and Normalization

2.5. Limitations and Proposed Solution

3. Methods

3.1. Problem Definition

3.2. Overall Model Architecture

3.3. Core Module Design

3.3.1. Image Preprocessing

3.3.2. Channel-Branch Feature Extraction Model

3.3.3. Optical-Aware Detail Augmentation Module (OADAM)

3.3.4. Two-Branch Element-Wise Feature Fusion Model

3.3.5. Global–Local Contrast Constraining Module

3.4. Loss Function Design

4. Experiment

4.1. Experimental Setup and Dataset

4.2. Experimental Evaluation

4.2.1. Comparison of Evaluation Indicators

4.2.2. Visual Quality Comparison

4.3. Ablation Experiment

4.3.1. Channel-Branch Feature Extraction Model Ablation Experiment

4.3.2. OADAM Model Ablation Experiment

4.3.3. Baseline

4.4. Analysis of Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI