1. Introduction
Underwater vision supports a broad range of marine applications, such as ecological monitoring, offshore infrastructure inspection, underwater archeology, and perception for autonomous underwater vehicles (AUVs). However, underwater images are often severely degraded by wavelength-dependent absorption and scattering in the water medium. In practice, long-wavelength components attenuate rapidly, while shorter wavelengths are more strongly scattered, leading to persistent blue-green color casts, haze-like veiling, reduced contrast, and loss of fine structural details [
1]. These degradations, together with non-uniform illumination and particle-induced backscatter [
2], can significantly impair both human interpretation and downstream vision tasks [
3]. In particular, for underwater target tracking, such visual degradation introduces severe observational uncertainty, leading to trajectory drift [
4] and instability under occlusion [
5], thereby necessitating high-fidelity enhancement as a prerequisite for reliable robotic perception.
Underwater image enhancement (UIE) has been extensively studied to alleviate these degradations [
6]. Early UIE methods mainly depended on hand-crafted priors and fusion-style enhancement, which are computationally efficient but tend to be less robust when water type, imaging depth, and illumination vary [
7]. Prior-driven restoration such as red-channel recovery leverages wavelength-dependent attenuation to compensate for severe color loss and improve visibility [
8]. Physics-inspired correction further refines the underwater image formation process to estimate medium effects and suppress veiling artifacts [
9]. Contrast-oriented enhancement, exemplified by CLAHE, is still widely used because it can boost local contrast and partially relieve shallow-water color imbalance with minimal overhead [
10].
As larger real-world benchmarks and data-driven optimization have emerged, UIE research has increasingly shifted toward deep learning, as illustrated by UTIEB, which focuses on turbid-water degradations and offers standardized evaluation data [
11]. Supervised CNN-based methods benefit from paired references and architectures that integrate color correction with spatial fusion to enhance contrast and preserve fine details. When paired supervision is scarce, uncertainty-aware and unsupervised formulations have been explored to stabilize learning on unpaired data and handle ambiguous supervision in real scenes [
12]. Multi-color-space modeling also proves effective for mitigating global color distortion and luminance imbalance by exploiting complementary representations such as RGB, HSV, and Lab [
13]. Transformer-based designs strengthen global context modeling and are frequently combined with simplified physical priors to better address scattering-related visibility loss [
14]. In parallel, progress continues through multi-domain learning strategies [
15] and lightweight, frequency-aware networks that target real-time deployment on resource-limited underwater platforms [
16]. Despite these advances, many models remain computationally intensive, and lightweight designs may struggle to jointly handle globally consistent color shifts and the multi-scale nature of scattering, causing residual color casts or over-smoothed structures, especially under diverse water conditions [
17]. Application-oriented analyses stress that UIE should be assessed together with downstream requirements such as feature matching, reinforcing the need to balance enhancement quality and deployment efficiency.
To achieve a favorable trade-off between enhancement quality and computational efficiency, we propose LCS-Net, a lightweight network that explicitly targets global color distortion and spatially varying scattering degradations. Instead of relying on a heavy stack of dense convolutional correction layers, LCCM predicts an image-adaptive global color mapping from global color statistics, enabling low-overhead cast compensation and simplifying subsequent feature learning. Second, the backbone is constructed with efficient inverted residual learning enhanced by channel attention, which strengthens channel-wise representations and helps preserve details under non-uniform degradation. Third, at the bottleneck, we design a Selective Multi-Scale Dilated Block (SMSDB) that aggregates complementary context via parallel dilated convolutions and global cues, then adaptively reweights features to better accommodate diverse underwater conditions. Finally, a PixelShuffle-based decoder combined with residual learning reconstructs fine details while maintaining stable optimization.
The main contributions of this paper are summarized as follows:
To facilitate deployment on resource-constrained platforms, we propose LCS-Net, a lightweight global-to-local enhancement framework. At its core is the DF-IRB, which effectively models scattering-induced veiling and contrast attenuation with low computational overhead, achieving a favorable trade-off between quality and efficiency.
We develop a dynamic LCCM to efficiently rectify scene-wide chromatic distortions. By predicting image-adaptive parameters from global color statistics, LCCM provides efficient color cast correction, which stabilizes and simplifies subsequent feature learning.
To address the multi-scale nature of underwater artifacts, we propose an SMSDB. This module aggregates complementary context via parallel dilated convolutions and global cues, employing adaptive reweighting to emphasize the most informative scales across varying water conditions.
The remainder of the paper is organized as follows:
Section 2 reviews related work on underwater enhancement and restoration.
Section 3 presents the proposed LCS-Net in detail.
Section 4 describes experimental settings and provides quantitative comparisons with extensive analysis.
Section 5 concludes the paper and discusses limitations and future directions.
4. Experiments
This section details the experiments for LCS-Net. We first present the implementation details and benchmark settings. Next, we report quantitative and qualitative comparisons against representative prior-based and learning-based UIE methods on the UIEB and EUVP datasets. Finally, we conduct ablation studies and analyze computational complexity to validate the effectiveness of our design choices.
4.1. Experimental Details
Training and inference were performed on an NVIDIA GeForce RTX 4090 GPU with 24 GB VRAM using CUDA 11.8. The network was trained for 200 epochs using the Adam optimizer with a batch size of 16 and an initial learning rate of 2 × 10−4. A cosine-annealing schedule was adopted to decay the learning rate to 1 × 10−6. To increase data diversity, synchronized random rotations of 0°, 90°, 180°, and 270° were applied to each input–reference pair, which avoids interpolation-induced artifacts.
To comprehensively assess the enhancement quality and generalization ability of the proposed method, we select a set of representative baselines spanning both conventional and data-driven UIE approaches. Specifically, the conventional group includes IBLA [
22], UDCP [
23], and RGHS [
18], covering typical physics-prior and histogram-based enhancement pipelines. The learning-based group further comprises FUnIE-GAN [
35], Water-Net [
26], Shallow-UWnet [
29], Ucolor [
38], UIEC2Net [
27], PUIE [
28], LiteEnhanceNet [
30], and U-Shape [
39], representing mainstream CNN, GAN, and Transformer paradigms. For a fair comparison, we follow the parameter settings and inference procedures reported in the original papers or released implementations as closely as possible; apart from necessary adjustments to input resolution to match the dataset protocol, all other key configurations are kept unchanged.
4.2. Datasets
We evaluate LCS-Net on two widely used benchmarks, UIEB [
26] and EUVP [
35], to validate its performance under paired supervision and diverse real-world conditions. The dataset statistics and splits are summarized in
Table 1. Unless otherwise specified, all images are resized to 256 × 256 for both training and evaluation to standardize the input resolution across compared methods.
UIEB is a paired benchmark for underwater image enhancement that covers diverse degradations, including blue or green color casts, contrast attenuation, haze-like veiling caused by scattering, and non-uniform illumination. It consists of real underwater photographs collected from multiple sources such as in situ captures, online resources, and prior studies. UIEB provides human-in-the-loop references, where multiple candidates’ enhanced images are generated for each input and the visually preferred one is selected as the reference. In our experiments, we used 800 paired samples for training and the remaining 90 paired samples for testing. The 60 challenging images without references are excluded from full-reference evaluation and are used only for qualitative comparison.
EUVP is a large-scale benchmark for underwater visual perception enhancement. It provides both paired and unpaired collections with diverse scene contents, water types, and camera characteristics, comprising over 12,000 paired images and 8000 unpaired images. The paired data include three distinct subsets: Underwater Dark, which focuses on low-light imagery; Underwater ImageNet, which contains ImageNet-derived pairs generated via a CycleGAN-based distortion pipeline; and Underwater Scenes, which consists of diverse in situ underwater environments. In this work, we focus on the Underwater Scenes subset, which contains 2185 paired images. We use the entire subset without manual filtering and randomly split it into 1900 training pairs and 285 testing pairs. This subset is adopted because it is closely aligned with real-world Underwater Scenes and supports a consistent full-reference evaluation protocol commonly used in prior UIE studies. Compared with UIEB, EUVP exhibits stronger cross-device variability, serving as a complementary testbed for assessing robustness.
4.3. Quantitative Evaluation
4.3.1. Evaluation Metrics
We evaluated enhancement quality in terms of fidelity, structural consistency, and perceptual appearance. For paired test sets, we reported MSE, PSNR, and SSIM. We also reported UIQM, a no-reference metric designed for underwater images. For cases where ground-truth targets are unavailable, we primarily relied on UIQM together with qualitative comparisons.
MSE measures the pixel-wise squared difference between the enhanced image
and the reference image
, where a lower value indicates better fidelity:
Here, N denotes the total number of pixels.
PSNR is derived from MSE and quantifies reconstruction quality in decibels:
where MAX is the maximum possible pixel value.
SSIM evaluates similarity from luminance, contrast, and structural components, and is more consistent with perceived structural preservation, as shown in Formula (3).
UIQM combines colorfulness, sharpness, and contrast components for underwater images and is defined as:
In this formulation, UICM denotes the underwater colorfulness measure, UISM denotes the sharpness measure, and UIConM denotes the contrast measure. Following Panetta et al. [
42], the empirically determined weights are set to
,
, and
, respectively.
4.3.2. Results
Table 2 and
Table 3 report quantitative comparisons on the UIEB and EUVP benchmarks using MSE, PSNR, SSIM, and UIQM. MSE, PSNR, and SSIM are full-reference metrics that evaluate reconstruction error and structural consistency against reference images, while UIQM reflects perceptual quality in terms of color, sharpness, and contrast. On UIEB, LCS-Net achieves the best performance in MSE, PSNR, and SSIM among all compared methods, indicating accurate reconstruction and strong structural preservation under paired supervision. On EUVP, LCS-Net attains the best PSNR and SSIM, and ranks second in both MSE and UIQM, demonstrating robust generalization across more diverse scenes and imaging conditions. Overall, although LCS-Net does not always obtain the top UIQM, its perceptual quality remains highly competitive, suggesting that it enhances visibility without relying on overly aggressive saturation or contrast boosting that may introduce unnatural tones or artifacts. Representative visual results in
Figure 4 and
Figure 5 are consistent with these quantitative trends, and the next section provides a dedicated qualitative discussion.
To assess deployment feasibility and runtime efficiency, we benchmark representative underwater image enhancement models across varying complexity levels, ranging from lightweight GAN- and CNN-based methods to computationally intensive Transformer architectures.
Table 4 presents the floating-point operations (FLOPs), parameters, and inference speeds (FPS) measured under a unified setting on an NVIDIA RTX 4090 GPU with 256 × 256 inputs. The results demonstrate that LCS-Net achieves a favorable balance between computational cost and throughput, offering significant efficiency advantages over complex baselines while maintaining competitive enhancement quality. Although specific lightweight models yield higher frame rates, they exhibit lower quantitative performance on the UIEB and EUVP datasets. Consequently, LCS-Net provides a superior trade-off between restoration fidelity and deployment efficiency.
4.4. Qualitative Evaluation
As shown in
Figure 6,
Figure 7 and
Figure 8, we further evaluate the visual fidelity and robustness of the proposed method under three representative and challenging underwater scenarios, including (i) heavy backscatter with low contrast and large homogeneous water-body regions (
Figure 6), (ii) strong blue color cast with salient structural targets (
Figure 7), and (iii) extremely degraded low-light conditions with non-uniform illumination (
Figure 8). All competing methods are presented in a consistent order (Raw, (a) IBLA, (b) RGHS, (c) FUnIE-GAN, (d) Water-Net, (e) PUIE, (f) Shallow-UWnet, (g) Ucolor, (h) UIEC2Net, (i) LiteEnhanceNet, (j) U-Shape, and (k) Ours). PSNR and SSIM values are reported below each result to facilitate the interpretation of quantitative trends alongside visual observations.
In
Figure 6, the degradation is mainly dominated by veiling effects caused by backscatter, leading to compressed contrast and large nearly uniform water-body regions. For such cases, an effective enhancement method should improve visibility while maintaining stable tone and luminance distribution, avoiding over-aggressive global stretching that may introduce over-whitening, tone shifts, or blocky artifacts. The results indicate that different methods make different trade-offs among dehazing strength, global tone mapping, and artifact suppression. Notably, UIEC2Net achieves higher PSNR and SSIM, reaching 22.03 and 0.90, which indicates stronger pixel-level agreement with the reference for this example. Meanwhile, our method yields slightly lower PSNR and SSIM, with values of 21.59 and 0.86, but produces a more uniform luminance distribution and a more consistent water-body tone. It also better avoids over-whitening and local block-like distortions, thereby offering a more balanced perceptual quality. In
Figure 7, the scene is characterized by a dominant blue cast together with clear structural targets, which jointly evaluates color correction and structure preservation. This scenario requires the model to recover a reasonable color distribution while retaining object boundaries and textures. Compared with competing approaches, our method more effectively corrects the blue cast and enhances object–background separation, while preserving continuous textures and natural edge transitions. Consequently, it achieves the highest PSNR, reaching 36.37, together with a high SSIM of 0.96. In addition, U-Shape attains a higher SSIM of 0.98, while producing a relatively flatter contrast in this example, which suggests that a higher SSIM does not necessarily correspond to stronger perceptual sharpness or more desirable local contrast under certain conditions. In
Figure 8, extremely low illumination and non-uniform lighting pose a severe challenge, where enhancement is constrained by the trade-off between visibility improvement and distortion suppression. Increasing brightness may amplify noise and induce color instability, while aggressive local enhancement can lead to saturation or loss of structural details. Different methods therefore exhibit distinct behaviors in balancing visibility improvement and artifact control. In this extreme case, U-Shape achieves the best quantitative results, reaching PSNR/SSIM of 25.69/0.95, and also produces the most favorable visual quality. By contrast, our method attains a comparable PSNR of 25.11 with a lower SSIM of 0.82, while still maintaining a stable enhancement outcome with effective artifact suppression and structure preservation compared with most competing methods. This example indicates that extremely low-light conditions remain challenging and leave room for further improvement.
Overall, the qualitative comparisons in
Figure 6,
Figure 7 and
Figure 8 highlight that underwater image enhancement methods typically need to balance enhancement strength, color fidelity, detail preservation, and artifact suppression. Across these challenging cases, the proposed method produces more coherent color correction and contrast restoration, remains stable under strong backscatter and extreme low-light conditions, and achieves the most pronounced quantitative gains in the structural-target scenario. These results further support the effectiveness of our approach and its potential for practical deployment.
4.5. Ablation Study
To further demonstrate the effectiveness of each component in the proposed LCS-Net, we conducted comprehensive ablation experiments covering the network architecture, and loss functions.
4.5.1. Ablation on Network Components
To validate the effectiveness of our design, we conducted a progressive ablation study on the LCCM and SMSDB modules. Integrating the LCCM into the baseline significantly boosts image fidelity and perceptual quality with negligible computational overhead, as the FLOPs remain virtually unchanged. The addition of the SMSDB further enhances structural consistency by capturing multi-scale context, albeit with a modest trade-off in inference speed. Ultimately, the full LCS-Net achieves the best overall performance by combining these complementary advantages, delivering robust restoration quality while maintaining high efficiency suitable for real-time deployment.
As shown in
Figure 9, the baseline still exhibits residual veiling haze and noticeable color artifacts. Adding LCCM markedly reduces the dominant global color cast and yields a more consistent overall tone. In contrast, adding SMSDB mainly enhances structural sharpness and local contrast, which is particularly beneficial in regions affected by scattering. Combining both modules produces clearer and more natural results, consistent with the quantitative improvements reported in
Table 5. These observations further support the intended roles of LCCM and SMSDB in stabilizing global color statistics and recovering local structural details.
4.5.2. Ablation on Depthwise Kernel Size
To study the effect of effective receptive field on modeling scattering-induced underwater degradations, we performed an ablation study on the kernel size of the depthwise spatial aggregation convolution in the proposed DF-IRB. Specifically, we tested kernel sizes of 3 × 3, 5 × 5, and 7 × 7 while keeping the rest of LCS-Net unchanged, including the DF-IRB configuration, the number of blocks, and all training settings. The results are reported in
Table 6. This study assesses whether enlarging the depthwise kernel improves restoration quality and provides evidence supporting our final kernel-size choice.
4.5.3. Ablation on Loss Functions
To quantify the relative contributions of individual loss terms, we adopt a leave-one-out ablation strategy. Using the full objective
as the baseline, we remove one loss component at a time while keeping the network architecture, training protocol, and the remaining loss terms unchanged, and then evaluate the resulting model using MSE, PSNR, SSIM, and UIQM. In general, a larger performance degradation after removing a term indicates that this component plays a more critical role in driving the final enhancement quality. As shown in
Table 7, removing SSIM loss term causes the most noticeable drop in SSIM and UIQM, highlighting the importance of structural consistency for preserving edge geometry, local contrast, and overall visual coherence. Removing MSE loss term leads to the largest loss in PSNR, suggesting that pixel-wise reconstruction remains the primary driver for fidelity and stable convergence. By contrast, removing perceptual loss term has a relatively limited impact on the quantitative scores, but it more often results in weaker textural details and reduced visual naturalness, indicating that perceptual supervision mainly complements high-frequency detail recovery and helps alleviate over-smoothing. Overall, jointly optimizing
,
, and
provides a better balance among pixel fidelity, structural preservation, and texture realism, yielding more stable and overall superior enhancement results.
5. Conclusions
This paper presents LCS-Net, an efficient framework for single underwater image enhancement that addresses the coupled degradations of wavelength-dependent color shift and scattering-induced contrast attenuation through a compact global–local restoration paradigm. The proposed LCCM predicts image-adaptive correction parameters from global color statistics to stabilize the input color distribution and suppress dominant casts, while the SMSDB aggregates multi-receptive-field context with selective reweighting to better model scattering haze and depth-related degradations. In addition, SE-equipped inverted residual blocks strengthen channel-wise representation and detail recovery with low computational overhead. Experiments on UIEB and EUVP demonstrate that LCS-Net achieves consistently competitive performance in PSNR, SSIM, and UIQM, delivering stable color and structure restoration while maintaining low parameter and computation budgets, which supports its suitability for deployment on resource-constrained platforms. Current limitations mainly arise in extremely low-light scenarios, where visibility enhancement may amplify sensor noise, partly due to the absence of an explicit denoising module. Future work will explore joint enhancement–denoising, unsupervised noise modeling, and semi-supervised learning with large-scale unlabeled underwater data to further improve robustness and generalization in open-water conditions.