1. Introduction
Underwater exploration is an important focus of research on Earth, which is approximately two-thirds water. Underwater exploration has significant contributions to many fields from various branches such as archeological remains, coral reefs, marine biology, aquatic ecosystems and unique underwater landscapes [
1,
2,
3]. However, light reaching the underwater environment is absorbed and scattered when it interacts with water molecules and particles. Especially of note, long-wavelength lights such as red light incur the greatest losses. In contrast, green and blue lights have shorter wavelengths compared to red and can reach deep into the water [
4]. Absorption and scattering of light cause many problems, such as color loss, low contrast, hazy and blurry appearance in underwater images [
5]. This situation distorts the real colors and appearance of objects and the environment, culminating in perceptual weaknesses (bluish and greenish appearance), and it is a key limitation for many research areas. Therefore, in image-based underwater applications, images are first preprocessed with an image enhancement algorithm. However, owing to the complexity of the underwater environment, many advanced methods apply multiple sequential or co-occurring enhancement steps (color correction, contrast enhancement, noise reduction, haze removal and detail enhancement) and image fusion strategies to eliminate these distortions [
5].
According to existing research, underwater image enhancement methods are divided into two main categories: physical models and non-physical models. In addition, deep learning models have recently been studied as a subfield of non-physical models. Physical models are based on the principle of reversing the effects of light absorption and scattering in underwater images. They focus on compensating for color loss in images and removing hazy images by estimating how much each pixel is attenuated underwater and the scattered light. For example, Image Blurriness and Light Absorption (IBLA) [
6] estimates the transmission map of the light lost due to background light and absorption and scattering; Wavelength Compensation and Dehazing (WCD) [
7] focuses on estimating the depth of the scene using the relationship between image blur and light absorption. Underwater Light Attenuation Prior (ULAP) [
8] corrects the underwater image according to the physical model by performing fast depth estimation. Underwater Dark Channel Prior (UDCP) [
9] and Underwater Total Variation (UTV) [
10] adapt the dark channel assumption to the underwater environment and adapt the background light. To summarize, physically based methods are based on the high accuracy estimation of many parameters of the underwater environment such as absorption/scattering, depth map and background light. The performance of these physical model-based methods is highly dependent on the precise estimation of multiple parameters. Consequently, even minor inaccuracies in these estimations can cause the model to produce inconsistent or degraded results, particularly when applied to images from diverse underwater environments [
11].
Non-physical enhancement models, instead of estimating any parameters of the underwater environment, focus on image enhancement with statistical changes, contrast enhancement, color correction and image fusion steps. For example, Relative Global Histogram Stretching (RGHS) [
12], which is based on histogram equalization, focuses on contrast enhancement and color compensation by expanding the histogram of the blue and green channels, which are the dominant colors underwater, according to the global distribution. Hue-Preserving (HP) [
13] aims to preserve the original color tone by applying histogram stretching to HSI/HSV color spaces. Color Balance and Fusion (CBF) [
14] divides the input image into two copies, increases the contrast of one, and applies color balancing with white balance correction to the other. The two images are fused by the Laplacian pyramid to generate the final image. However, it may produce red-colored artifacts in the image. The Hybrid Fusion Method (HFM) [
15] aims to comprehensively correct underwater images by blending images with multiple enhancement strategies: color and white balancing, visibility enhancement, contrast enhancement, and perceptual image fusion. The method proposed by [
11], combining the improved Retinex output with Adaptive color correction with NSST-based multi-scale fusion, preserves some advantages of the physical model while offering fusion flexibility [
11]. In addition, Bayesian Retinex [
16] minimizes color and brightness aberrations with a statistical decomposition; Modified Color Correction + Adaptive LUT [
17] limits artifacts with edge-preserving filters and LUT-based contrast enhancement. Minimal Color Loss + Locally Adaptive Contrast [
18] offers a balance that minimizes color losses while adaptively increasing local contrast. In short, although methods that are not based on physical models stand out due to their fast and easy applicability, side effects such as excess color saturation or contrast, extra noise or poor generalization ability can be observed with the use of more than one method. However, acceptable results can be achieved with appropriate method selection and fine tuning [
11].
Deep-learning-based methods with strong problem-solving capabilities in the field of image processing produce high-quality results by learning the mapping between corrupted images and reference images; however, they require large datasets to strengthen their generalization capabilities. A Fast Underwater Image Enhancement Generative Adversarial Network (FUnIE-GAN) [
19] is a conditional convolutional GAN model. It aims to eliminate chromatic aberrations and low contrast of corrupted underwater images in real time. However, since it is a supervised learning model, it creates a large dataset with synthetic underwater images. This leads to generalization problems of the model. To address degradation diversity, Fu et al. [
19] introduced SCNet to learn “water type desensitized” representations using novel normalization schemes; however, this supervised approach still relies on paired training data and its effectiveness can be limited in images suffering from extreme color casts or heavy backscatter. Although Li et al. [
20] proposed Ucolor, a hybrid model using multi-color spaces, its visual results often exhibit significant color distortions and low contrast, leading to images that can appear unnatural and unrealistic. Target-Oriented Perceptual Adversarial Learning (TOPAL) [
21] is a GAN-based model that focuses on improving object detection performance while enhancing underwater images. This requires multiple loss calculations, their simultaneous optimization, and specific training for each application, limiting the flexibility of the model. Unsupervised Single Underwater Image Restoration (USUIR) [
22] is an unsupervised network with an encoder–decoder structure. It performs unsupervised recovery with a differentiable gradient layer and cyclic consistency using the “homology” assumption between corrupted and clean underwater images; however, if the gradient model or homology is invalid, the training becomes unstable and there is a risk of model collapse. In general, supervised deep-learning-based methods require a large number of referenced data, a need often met with distorted or synthetic data. The difficulty of obtaining referenced data in the underwater environment significantly limits the reliability and generalization ability of such methods.
In this study, we focus on the negative effects of non-physical based methods, which lead to excessive contrast or color saturation, additional noise generation, and poor generalization ability. While a dedicated processing unit strengthens the image’s structural details, we employ the unsupervised fusion network from to intelligently combine these features. This allows us to harness the powerful feature-learning capacity of deep learning, thereby achieving effective enhancement without the data dependency limitations of supervised models [
23]. The main contributions of the proposed method are as follows:
- I.
A two-stage Multi-Scale Detail Enhancement Unit (MSDE) is proposed to expose structural details in underwater images in a natural and distinct way.
- a.
First stage: a copy of the input image is added to the original image with a detail layer created with detail maps obtained at different scales; both the edge sharpness and structural details of the image are enhanced.
- b.
Second stage: the noise and artefacts that may occur as a result of the first stage are decomposed into subspaces with the Latent Low-Rank Representation (LatLRR) [
24] method and unwanted effects are successfully attenuated.
- II.
Adaptive gamma correction [
25] is applied to another copy of the image; thus, a sensitive and dynamic brightness enhancement is provided to the brightness level in the scene.
- III.
The two developed copies are combined using the MU-Fusion [
26] network, an unsupervised learning method adapted to underwater images. Thus, a balanced final image covering both global contrast and local details is obtained. In addition, owing to the ability of MU-Fusion to dynamically learn information about the scene, the generalization ability of the proposed method to different underwater environments is increased.
The rest of this paper is organized as follows: In
Section 2, the proposed hybrid underwater image enhancement method is presented in detail and the main steps of the method are explained. In
Section 3, the experimental setup, datasets, comparison metrics, the quantitative and qualitative results, and statistical tests are comprehensively discussed. Finally, in
Section 4, the general conclusions are summarized and evaluations on future work are presented.
2. Proposed Hybrid Framework for Underwater Image Enhancement
In this section, the proposed hybrid underwater image enhancement method is explained in detail. The method progressively eliminates distortions such as natural tone loss and chromatic aberrations, low contrast and loss of detail due to irregular absorption and scattering of light at different wavelengths in the underwater environment: First, the adaptive color correction algorithm [
11] that is sensitive to the color distribution of the image is used; owing to traditional white balance and gray world, color correction approaches are insufficient to eliminate underwater spectral distortions and often lead to problems such as red artefacts or overcorrection [
27]. Then, the YCbCr [
28] color space is switched and the other steps are performed only on the Y channel. Two copies are created from the Y channel. One copy is expanded with the adaptive gamma correction proposed in [
25], while the other copy enhances the structural detail and edge clarity with the detail enhancement module. Finally, when these two processed Y channels are balanced through fusion by the MU-Fusion network and integrated with the CbCr components, the final RGB image with optimized color accuracy and richness of detail is obtained, as schematically shown in
Figure 1.
Proposed Multi-Scale Detail Enhancement Unit (MSDE)
The weak structural details, low contrast and blurred edges in underwater images make the detail enhancement phase a critical step in the image enhancement process. In this study, the proposed detail enhancement unit consists of two consecutive sub-steps: multi-scale detail enhancement and Latent Low-Rank Representation (LatLRR)-based structural decomposition. Using this module, the structural integrity is preserved, details are inherently emphasized, and artifacts are effectively attenuated.
Multi-Scale detail enhancement: In order to enhance the structural details, the Y channel is smoothed with Gaussian filters with different standard deviation values; then, for each scale, detail layers are created by taking the difference between the original image and its smoothed version. These detail layers contain edge and structure information of the scene at different scales. Small sigma values reveal thinner edges, while large sigma values expose wider structures. The obtained multi-scale layers are combined with certain weights to form a single detail component. This detail component is added to the input Y channel with a certain gain factor in the last stage to obtain the strengthened Y channel. Hence, this ensures that both thin edges and global details become more distinct at the same time. The mathematical modeling of the stages of the method is as follows:
The input image, represented as a normalized grayscale (Y-channel) component, is defined as follows:
To extract structural information at multiple spatial resolutions, the image is convolved with Gaussian kernels of varying standard deviations:
Here,
denotes a 2D Gaussian filter with standard deviation
represents the convolution operation. For each scale, a detail layer is computed by subtracting the smoothed image from the original input:
These multi-scale detail maps are aggregated using scale-specific weights to form a unified detail component:
Finally, the enhanced image is obtained by adding the scaled detail component to the original image:
where
is a gain factor controlling the strength of the detail enhancement.
Latent Low-Rank Representation (LatLRR): Although detail enhancement processes highlight structural information in the image, amplification of spurious effects such as noise and excessive emphasis may also occur. Therefore, in the proposed unit, the Latent Low-Rank Representation (LatLRR)-based structural decomposition method was preferred in order to reduce these negativities that may arise from using the detail contribution in its raw form and to extract only meaningful structural components and transfer them to the fusion stage. LatLRR enables us to represent the basic information components in the image in a more dense and separate way by decomposing the input image into three separate subspaces, namely, principal features, salient features and sparse noise [
24]. In the method, only the principal features’ subspace is included in the fusion process; thus, spurious patterns or noises that may occur in the detail enhancement process are largely removed from the image. This selective decomposition approach is a critical step that supports the detail enrichment and deep-learning-based fusion process to produce more consistent results.
The LatLRR method decomposes an image matrix into three components, modeled as follows:
Here, I represents the input image,
represents the principal features’ subspace,
represents salient features, and E represents sparse noise. This decomposition process is performed within the framework of the following optimization problem:
In this formulation,
* represents the nuclear norm, which is the sum of a matrix’s singular values and serves as a convex relaxation of the rank function. Its purpose is to encourage the primary components Z and L to be low-rank. The
denotes the l1-norm, which is applied to the error term E to model it as a sparse matrix, effectively isolating noise and gross errors. The parameter λ > 0 is a balancing coefficient that controls the trade-off between the low-rank structure and the sparse error [
24].
Unsupervised Deep-Learning-Based Fusion (MU-Fusion): MU-Fusion represents one of the most critical stages of our proposed method. It is an unsupervised, deep-learning-based image fusion model that optimally combines the differently enhanced Y channels. In unsupervised fusion models, the training process continues by focusing only on the source image without transferring in-process outputs to subsequent epochs. However, in-process outputs may contain some clues for important information such as pixel intensity distribution, gradient information and structural similarity for image fusion. Owing to its memory-based architecture, MU-Fusion proposes a memory unit that allows the in-process outputs obtained in previous learning steps to affect the current output. In this way, not only the current inputs but also the in-process outputs contribute to learning; the network exhibits a more consistent and generalizable fusion performance [
26]. The learning process of MUFusion is defined by a dual-component loss function that takes both content loss and memory loss into account:
In the above formula, Lcontent measures the degree to which the obtained fusion image preserves important content details in the source images; and Lmemory ensures the consistency and continuity of the learning process with the intermediate outputs produced in the previous epochs. The parameters
and
control the weighting between these two types of loss and can be determined adaptively according to the image type [
26]. In the proposed method, MUFusion synthesizes both global contrast and local details optimally by performing information fusion between the structural detail component has been denoised with the Latent Low-Rank Representation (LatLRR) method and the Y channel to which adaptive gamma correction has been applied. The final Y channel obtained is recombined with the Cb and Cr color components obtained in the adaptive color correction step in order to preserve color accuracy; thus, the image created in the YCbCr space is converted to the RGB color space and the final enhanced underwater image is obtained. This last step ensures that the enhancement processes only affect the structural information and do not distort the color components, thus ensuring that natural, balanced and visually rich results are obtained. This approach, which does not require a reference image, increases the generalization ability of the method and provides consistent and high-quality image enhancement under different illumination, chromatic aberration or blur conditions.
4. Conclusions
The method proposed in this study offers a hybrid underwater image enhancement approach focused on a single scene, which does not require a large amount of dataset and reference images. The proposed detail enhancement unit, as one of the basic building blocks of the method, combines multi-scale detail highlighting with structural decomposition, enabling both the highlighting of real details and the suppression of noise and artifacts. This unit produces a simplified content that preserves structural information and directly contributes to the final fusion quality. In addition, the method, which combines image-processing techniques with the powerful representation capacity of unsupervised deep learning, minimizes fundamental distortions such as chromatic aberration, contrast loss and structural blurring caused by the intense absorption and scattering of light in the underwater environment. In order to demonstrate the effectiveness of the method, experiments conducted on images obtained from different underwater environments such as hazy, bluish, greenish and yellowish quantitatively demonstrate that the proposed method outperforms the current state-of-the-art methods in metrics such as UIQM, UICM, IL-NIQE, IE and AG. In addition, in the qualitative evaluation, it is observed that it visually eliminates color distortions, clarifies the details and produces results more suitable for the human visual system. The proposed method also achieves successful results in tests performed with Scale-Invariant Feature Transform (SIFT) and produces results comparable to those of outstanding approaches such as Lin et al.’s and HFM. The statistical significance of the proposed method’s performance was rigorously evaluated using the Wilcoxon signed-rank test on all reference-free quality metrics. This analysis substantiates that the observed superiority is not a random occurrence, thereby statistically validating the effectiveness of our approach. As a result, the proposed hybrid method makes significant progress in the field of underwater image enhancement and contributes to obtaining higher quality and detailed underwater images in various applications.
While the proposed framework has demonstrated strong performance in enhancing underwater images, it is important to acknowledge its inherent limitations, which also pave the way for future research. The primary limitation is the trade-off between detail enhancement and potential noise amplification. Our method’s design prioritizes the recovery of fine textures and structural details; a consequence of this is that in some images, latent noise can become more prominent compared to methods that employ heavier smoothing. A second practical limitation is the computational cost. As the primary focus of this study was to validate the model’s effectiveness, the current implementation has not been optimized for runtime speed, which may limit its use in real-time applications.
Future work will focus on two key areas: (1) integrating more sophisticated noise-aware modules that can better distinguish between fine textures and unwanted noise, and (2) optimizing the framework for computational efficiency to facilitate its deployment on resource-constrained platforms. This approach can both reduce processing time and facilitate the integration of the method into embedded systems and real-time underwater imaging applications.