Adaptive Underwater Image Enhancement Techniques Using Deep Learning

Vrochidis, Alexandros; Krinidis, Stelios

doi:10.3390/asi9050088

Open AccessArticle

Adaptive Underwater Image Enhancement Techniques Using Deep Learning

by

Alexandros Vrochidis

^*

and

Stelios Krinidis

Department of Management Science and Technology, Democritus University of Thrace, 65404 Kavala, Greece

^*

Author to whom correspondence should be addressed.

Appl. Syst. Innov. 2026, 9(5), 88; https://doi.org/10.3390/asi9050088

Submission received: 27 March 2026 / Revised: 22 April 2026 / Accepted: 24 April 2026 / Published: 28 April 2026

(This article belongs to the Special Issue Deep Visual Recognition for Intelligent Systems and Applications)

Download

Browse Figures

Versions Notes

Abstract

Underwater images often suffer from degradations, including color distortion, reduced visibility, and low contrast due to light absorption and scatter in water. Numerous enhancement techniques have been proposed to improve visual quality and address these challenges. However, no single method consistently performs best across all underwater scenes. This work introduces a novel deep learning framework for the automatic selection of the most suitable enhancement technique for underwater images. A novel fused objective metric, combining the Underwater Color Image Quality Evaluation (UCIQE), Underwater Image Quality Measure (UIQM), and Underwater Image Fidelity (UIF) metrics is introduced to assess image quality effectively. The metric is then utilized to train a Shifted Window (Swin) transformer model, which predicts the best enhancement method for each image. This approach advances automatic underwater image enhancement by addressing varying image conditions with a data-driven, adaptive process. Experimental results show that the proposed model achieves an F1 score of 87.88% in selecting the optimal enhancement technique, effectively determining the best enhancement based on the characteristics of the input image.

Keywords:

underwater image enhancement; adaptive enhancement; underwater image processing; deep learning; image quality assessment

1. Introduction

Underwater image enhancement has become a critical area of research due to the unique challenges posed by underwater environments, including light absorption, scattering, and suspended particles. As natural light diminishes rapidly with depth, colors become distorted, and visibility is significantly reduced, making it difficult to capture clear and accurate images. Recent trends in underwater image enhancement focus on various techniques like color correction, contrast adjustment, and the reduction of noise caused by sediment and water turbidity. Enhancing methods and adaptive algorithms have gained popularity for improving image quality. Despite these advancements, underwater image enhancement still faces significant challenges. These include variability in environmental conditions, dynamic water movements, and limitations of traditional imaging sensors. Addressing these challenges remains crucial for applications in marine biology, archaeology, and 3D reconstruction, where precise visual data is essential.

The field of image enhancement and 3D reconstruction has evolved significantly with advancements in computational techniques [1], computer vision [2,3], and sensor technologies. These innovations have enabled the creation of highly detailed models across terrestrial, underwater, and aerial environments. However, challenges persist, particularly in extreme environments like underwater [4], where factors such as lighting conditions, turbidity, and sensor limitations complicate the accuracy and quality of visual data [5]. The integration of machine learning and deep learning [6] approaches has shown promise in addressing these challenges, enabling more adaptive and automated enhancement methods [7]. As a result, there is a growing need for accurate and efficient image processing in both natural and man-made environments. This demand has driven continuous research into specialized methods. These methods are further explored in the following section.

Despite significant advancements in underwater image enhancement, several key challenges persist that hinder the development of universally applicable methodologies. One of the primary issues is the lack of models or algorithms that can generalize effectively across diverse underwater environments. Current methods are often tailored to specific conditions, such as particular depths, water types, or sensor characteristics. This makes them highly sensitive to variations in environmental factors, limiting their adaptability in real-world scenarios. Furthermore, there is a lack of approaches to automatically select the most appropriate enhancement techniques. Existing methods do not adequately consider the dynamic features of each image, such as lighting variations, turbidity levels, or the specific type of the captured scene. As a result, users are often required to manually adjust enhancement methods or rely on predefined solutions, which can be time-consuming and inefficient. This highlights a critical gap in the field. There is an absence of a robust, adaptive model capable of delivering optimal results across various underwater conditions without prior knowledge of the environment. The development of such a model remains a significant challenge and an area for research in underwater image enhancement.

To address the challenges, a novel adaptive framework is proposed that leverages a composite image quality metric, combining multiple established quality measures such as Underwater Color Image Quality Evaluation (UCIQE), Underwater Image Quality Measure (UIQM), and Underwater Image Fidelity (UIF). This framework aims to automatically select the optimal enhancement technique for each image, based on its unique features and the specific conditions of the underwater environment. It utilizes a Swin transformer model by integrating these metrics and applying a dynamic selection strategy. The proposed approach is capable of adapting to a wide range of underwater conditions. It ensures the best possible enhancement results without requiring prior knowledge of the scene or environmental factors. The Swin model is evaluated on both controlled and in-the-wild datasets, demonstrating superior performance compared to traditional enhancement methods and baseline models. This adaptive, data-driven approach offers significant improvements in image quality. It includes higher accuracy and better F1 scores, thereby advancing the field of underwater image enhancement. Some of the key contributions include:

Adaptive deep learning enhancement selection that automatically identifies and selects the most suitable enhancement technique for each image, ensuring optimal results across diverse underwater conditions without manual intervention.
Composite quality metric that fuses UCIQE, UIQM, and UIF, leveraging the strengths of each individual metric to provide a holistic assessment of image quality.
Generalization across datasets with the model performing well on both controlled test sets and in-the-wild datasets, demonstrating its ability to adapt to varying depths, water types, and sensor characteristics.
Evaluation across diverse datasets, demonstrating significant improvements in image quality, accuracy, and F1 scores, validating the effectiveness of the proposed approach in real-world conditions.

The structure of this paper is as follows. Section 2 provides an overview of related research, while Section 3 presents the proposed methodology for underwater image enhancement. Section 4 presents the experimental evaluation, including the results of the study. Finally, Section 5, concludes the paper, with insights into future improvements and limitations.

2. Recent Advances in Image Enhancement

Image enhancement has been widely studied in the field of computer vision [4], aiming to improve image quality and visibility under challenging conditions such as low illumination, noise, haze, or poor contrast. Over the years, numerous approaches have been proposed, ranging from traditional image processing techniques to modern deep learning-based solutions [8]. Underwater image enhancement techniques play a crucial role in diverse applications, including 3D reconstruction [9], segmentation [10], and image registration tasks [11], where they significantly improve the accuracy and quality of underwater visual data.

Early image enhancement methods primarily relied on traditional image processing techniques. Histogram-based approaches, such as histogram equalization and adaptive histogram equalization, were widely used to improve image contrast by redistributing intensity values. In [12], a novel underwater image enhancement method is proposed that integrates histogram modification in both RGB and HSV color models to improve contrast and reduce noise. The approach stretches the color channels in the RGB model using the Rayleigh distribution, while adjusting the S and V components in the HSV model for enhanced image quality. In [13], a Relative Global Histogram Stretching (RGHS) method is proposed to address the challenges of underwater image enhancement. This approach combines contrast correction in the RGB space with color correction in the CIE-Lab space. Recent research introduced an Optimized Gamma Correction [14] using Guided Filter (OGC-GF) for low-light image enhancement. The method applies gamma correction followed by optimization with a Guided Filter to preserve image edges while reducing noise and improving brightness.

A novel image enhancement technique based on Contrast Limited Adaptive Histogram Equalization (CLAHE) has been proposed in [15] to improve the contrast and reduce noise in foggy images. The method applies CLAHE in two phases, first on the RGB model (targeting the red channel) and then on the HSV model (adjusting the saturation and value components). In [16], a method combining the Lucy-Richardson restoration technique with the Gray World Model for white balancing was proposed to improve the quality of underwater object images. This approach leverages the degradation process to restore original image details and enhance sharpness, particularly in low-illumination underwater environments. In [5], a fusion of CLAHE and RGB stretching is proposed to enhance image quality in underwater environments. The approach improves local contrast while preserving color balance, effectively boosting keypoint detection and matching. Traditional methods are computationally efficient and easy to implement. However, they are typically designed to address a single type of degradation and often fail to generalize well across diverse imaging conditions.

With the advancement of deep learning, convolutional neural networks (CNNs) have significantly improved the performance of image enhancement methods. CNN-based architectures [17] enable end-to-end learning from large datasets, allowing models to automatically learn complex mappings between degraded and enhanced images. Architectures such as U-Net [18] and ResNet [19] have been widely used in image restoration and enhancement tasks due to their strong feature extraction capabilities and ability to preserve spatial information. More recently, generative models and adversarial learning frameworks have also been employed to produce visually pleasing enhancement results. Despite their effectiveness, many deep learning-based enhancement models are designed for a specific task, such as low-light enhancement, denoising, or dehazing. As a result, their performance may degrade when applied to images that suffer from different or multiple types of degradation.

In recent years, transformer-based architectures [20] have emerged as powerful alternatives to convolutional networks in computer vision tasks. The introduction of the Vision transformer [21] demonstrated that self-attention mechanisms can effectively capture long-range dependencies in images. Building upon this concept, the Swin Transformer [22] introduced a hierarchical architecture with shifted window attention, enabling efficient representation learning for high-resolution images. Due to their ability to capture global contextual information, transformer-based models have been increasingly applied to image restoration and enhancement problems.

Adaptive frameworks have garnered significant attention as they aim to select or combine different techniques based on the characteristics of the input image [23]. Traditional approaches often apply a single enhancement algorithm to all images, but adaptive methods attempt to identify the most suitable technique for each specific case. This paper proposes an innovative approach that leverages machine learning to guide the selection of enhancement techniques. Specifically, a pretrained Swin Transformer model is utilized to analyze the visual properties of an input image and predict the most appropriate enhancement method. By combining transformer-based feature extraction with an adaptive enhancement pipeline, the approach aims to enhance the robustness and effectiveness of image enhancement across diverse visual conditions. An overview of the related methods is provided in Table 1.

3. Adaptive Underwater Image Enhancement

This work focuses on adaptive selection of image enhancement methods rather than proposing a new enhancement operator. The proposed methodology takes as input a set of images requiring enhancement, where each image is processed by a pretrained Swin Transformer model trained to predict the most suitable enhancement technique based on its visual characteristics. This model operates as a decision module. It analyzes the visual characteristics of the input, such as illumination conditions, contrast distribution, and noise levels. Based on this analysis, it determines which enhancement method is expected to yield the most effective improvement in visual quality.

The proposed approach adopts a data-driven, camera-agnostic framework for enhancement method selection, without explicitly modeling the physical image formation process. This choice avoids the added complexity of detailed optical modeling while enabling evaluation on real underwater images acquired under varying practical conditions, ensuring robustness in realistic scenarios. Once the model predicts the optimal enhancement strategy for each image, the corresponding image enhancement algorithm is applied. In this way, the system performs an adaptive selection of enhancement techniques rather than relying on a single fixed method for all images. This adaptive approach allows the pipeline to handle images with varying degradation characteristics more effectively, improving overall enhancement performance across diverse visual conditions.

The final output of the pipeline is the enhanced image produced by the selected enhancement method. The proposed framework integrates a pretrained classification model with multiple enhancement algorithms. This creates an automated and intelligent enhancement pipeline, which selects the most appropriate processing technique for each input image. The overall architecture of the proposed methodology, including the classification stage and the subsequent enhancement stage, is illustrated in Figure 1.

The adaptive image enhancement pipeline, summarized in Algorithm 1, provides a systematic, data-driven approach for improving the visual quality of images in a dataset. Let

D = I_{1}, I_{2}, \dots, I_{n}

denote the input dataset, where

I_{i}

represents the i-th image and n is the total number of images. The process begins by loading a Swin Transformer classifier, denoted as

S W

, which has been trained to predict the most appropriate enhancement method for a given image based on its visual characteristics.

Algorithm 1 Adaptive Image Enhancement Pipeline

Require: Dataset of images

D = {I_{1}, I_{2}, \dots, I_{n}}

, pretrained Swin Transformer classifier

S W

Ensure: Enhanced dataset

D^{E D} = {I_{i}^{E D_{1}}, I_{i}^{E D_{2}}, \dots, I_{i}^{E D_{n}}}

1:: Load the pretrained Swin Transformer classifier $S W$
2:: Initialize empty list $E \leftarrow []$
3:: for all $I_{i} \in D$ do
4:: Preprocess the image: $I_{i}^{p} \leftarrow ResizeAndNormalize (I_{i})$
5:: Feed $I_{i}^{p}$ into the classifier
6:: Predict the optimal enhancement method: $M_{i} \leftarrow S W (I_{i}^{p})$
7:: Select the corresponding enhancement algorithm based on $M_{i}$
8:: Apply the selected enhancement method: $I_{i}^{E D} \leftarrow Enhance (I_{i}, M_{i})$
9:: Obtain the enhanced image $I_{i}^{E D}$
10:: Store $I_{i}^{E D}$ in the output dataset: $E \leftarrow E \cup {I_{i}^{E D}}$
11:: end for
12:: return Enhanced dataset $E$

For each image

I_{i} \in D

, the pipeline first applies a standard preprocessing step, such as resizing and normalization, to produce a preprocessed image

I_{i}^{p}

suitable for input to the classifier. The preprocessed image

I_{i}^{p}

is then fed into the Swin transformer model

S W

, which outputs a predicted enhancement method

M_{i}

for

I_{i}

. Based on this prediction, the corresponding enhancement algorithm is applied to the original image

I_{i}

, producing the enhanced image

I_{i}^{E D} = Enhance (I_{i}, M_{i})

. The resulting enhanced image

I_{i}^{E D}

then forms the output enhanced dataset

D^{E D} = I_{i}^{E D_{1}}, I_{i}^{E D_{2}}, \dots, I_{i}^{E D_{n}}

.

The enhancing process is applied to the original image. It first performs preprocessing to reduce the size and improve prediction efficiency. However, after preprocessing, the method is applied to the original image to avoid unnecessary loss of resolution. Although the model was trained on lower-resolution images to reduce computational cost, each image is resized to this smaller resolution for prediction. Once the prediction is made, the method is then applied to the image at its original dimensions. This approach preserves the quality of the final output while reducing computational cost during the analysis stage. The pipeline automates the selection of the most suitable enhancement technique for each image. This enables adaptive and data-driven enhancement. It can effectively handle diverse image conditions while maintaining reproducibility and scalability.

The applicability and effectiveness of these enhancement methods have been demonstrated in previous studies. In [9], it was shown that enhancement techniques significantly improve performance, with the RGB-CLAHE combination achieving the most substantial gains, increasing reconstructed points by 7.60%, detected features by 7.56%, and reconstructed features by 12.94% on average across datasets. Similarly, ref. [5] reports that image enhancement improves reconstruction accuracy by 7.91% and 11.4%, further confirming the positive impact of preprocessing on downstream tasks. This highlights the importance of image enhancement in underwater environments.

3.1. Image Enhancement Techniques

The first image enhancement technique applied in this study was RGB stretching [12]. It is a simple image enhancement method that adjusts the contrast by stretching the pixel values of each color channel (Red, Green, Blue) to span the full range of [0, 255]. This technique helps to enhance the colors in the image, making them more vibrant and improving visibility, especially in cases where the original image has poor contrast or is underexposed. The transformation is applied independently to each color channel (R, G, and B), and the formula is given by

I_{i}^{e n h a n c e d} = \frac{(I_{i} - I_{\min}) \cdot 255}{I_{\max} - I_{\min}},

(1)

where

I_{i}

represents the input image,

I_{i}^{e n h a n c e d}

denotes the enhanced image,

I_{\min}

, and

I_{\max}

indicate the minimum and maximum pixel values of the corresponding channel.

Another technique that was implemented was the one of RGHS [13]. It is a widely used method for underwater image enhancement, designed to improve image contrast by redistributing pixel intensities based on their global histogram. It corrects the contrast in the RGB color space and enhances the color information in the CIE-Lab color space, which is particularly useful for underwater environments, where natural light loss and color distortion are significant challenges. The RGHS approach helps to restore the details and visibility in underwater images by enhancing contrast and reducing artifacts, especially in environments with low-light conditions. The formula of RGHS is given by

I_{i}^{s t r e t c h e d} = clip (\frac{(P I_{L} - P I_{\min}) \cdot 255}{P I_{\max} - P I_{\min} + ϵ}, 0, 255),

(2)

where

P I_{L}

is the original pixel intensity of the L luminance channel,

P I_{\min}

and

P I_{\max}

are the 1st and 99th percentiles of the pixel intensities in the L channel. The clip function constrains the output values to a specified range. In this formula, it ensures that all pixel intensities after stretching remain within the valid range of [0, 255]. Any values below 0 are set to 0, and any values above 255 are set to 255, preventing underflow or overflow in the image representation. For the non-linear stretching of the chroma channels, the formulas are given by

a_{s} = a_{c h r} \cdot ({1.3}^{(1 - \frac{| a_{c h r} |}{128})}) + 128,

(3)

b_{s} = b_{c h r} \cdot ({1.3}^{(1 - \frac{| b_{c h r} |}{128})}) + 128,

(4)

where

a_{c h r}

and

b_{c h r}

are the chroma components of the image, centered around zero by subtracting 128. The

{1.3}^{(1 - \frac{| a_{c h r} |}{128})}

is the scaling factor that adjusts the chroma values non-linearly, with more significant adjustments made to the more intense color values. The factors

a_{s}

and

b_{s}

are the scaled chroma values that are re-centered back to the valid range by adding 128.

Another technique that was used was the Gamma correction [14], which is a widely used technique to adjust the luminance of an image, enhancing its visibility in low-light or high-contrast scenarios. It works by first normalizing the image’s pixel values to the range [0, 1], followed by the application of the gamma function, where a typical gamma value is used for enhancing dark regions while preserving highlights. The result is then scaled back to the original range of [0, 255] and clipped to ensure all pixel values remain within valid bounds. The formula is given by

I_{i}^{c o r r e c t e d} = clip ({(\frac{I_{i}}{255})}^{γ} \times 255, 0, 255),

(5)

where

I_{i}

is the original pixel value in the image,

γ

is the gamma value,

I_{i}^{c o r r e c t e d}

is the output image after gamma correction, and the clip function as mentioned, ensures that the pixel values stay within the valid range of [0, 255].

The CLAHE [15] was the next implemented method. It is a widely used image enhancement technique that improves the contrast of images, particularly in areas with poor visibility or low contrast. The process begins by converting the image into the LAB color space, where the L channel (luminance channel) is isolated for enhancement. CLAHE is applied exclusively to the L channel to prevent excessive contrast enhancement in the color components, which could result in unnatural coloring. The method operates by dividing the image into small tiles, applying histogram equalization to each tile, and limiting the contrast to avoid over-enhancement. This approach helps improve the visibility of details in underwater images while minimizing the risk of introducing artifacts or noise. The formula is given by

H^{'} (I_{i}) = \frac{H (I_{i}) - \min (H)}{\max (H) - \min (H)} \times (L_{\max} - L_{\min}) + L_{\min},

(6)

where

H (I_{i})

is the original histogram of image

I_{i}

in a tile,

H^{'} (x)

is the enhanced histogram of pixel intensities in the tile,

L_{\max}

and

L_{\min}

are the maximum and minimum values allowed for the pixel intensities after equalization, constrained by the clip limit. Finally, min(H) and max(H) are the minimum and maximum histogram values of the tile, used to normalize the histogram.

Next, the Shades-of-Gray White Balance method [16] was used to adjust the color balance of the images by estimating the overall scene’s gray reference and scaling the color channels accordingly. This technique aims to correct color distortions caused by varying lighting conditions, thus making the image appear more natural. It works by applying the Minkowski p-norm to each color channel (Blue, Green, and Red) of the image and then normalizing the channels to match a computed gray reference. The channels are adjusted proportionally so that their average color balance corresponds to a neutral gray reference, improving the image’s white balance. The formula for computing the white balance using the Minkowski p-norm for each channel is as follows:

I_{i}^{n o r m, c h a n n e l} = {(\frac{1}{N} \sum_{i = 1}^{N} {| I_{c h a n n e l, i} |}^{p})}^{\frac{1}{p}},

(7)

where

c h a n n e l \in {B, G, R}

represents the color channels, N is the number of pixels, and p is the Minkowski norm order. Next, the gray reference, which is the mean of the norms of the three channels, is computed using the formula:

I_{i}^{g r a y} = \frac{I_{i}^{n o r m, B} + I_{i}^{n o r m, G} + I_{i}^{n o r m, R}}{3} .

(8)

Finally, the white balanced image for each channel is given by

I_{i}^{b a l a n c e d, c h a n n e l} = I_{i}^{n o r m, c h a n n e l} \cdot \frac{I_{i}^{g r a y}}{I_{i}^{n o r m, c h a n n e l}},

(9)

where

c h a n n e l \in {B, G, R}

represents the color channels.

The last image enhancement technique is a fusion of CLAHE with RGB-Stretching. The fusion is named RGB-CLAHE [5] and is a technique used to improve the local contrast of an image while preventing noise amplification, especially in regions that already have high contrast. This combination creates a powerful synergy, making it particularly effective in enhancing image quality in various underwater environments.

3.2. Adaptive Model Training

The enhancement methods were initially applied to the images, and their respective performance metrics were subsequently evaluated. Following this, the optimal method for each image was determined by utilizing the proposed fusion-based scoring metric, which integrated the individual performance evaluations. The Swin Transformer model was trained using an image dataset with the respective labeling, where each image and its corresponding enhancement method label were provided in a CSV file. The dataset was preprocessed by resizing the images to a standard size of 224 × 224 pixels, ensuring compatibility with the Swin model’s input requirements. Additionally, the images were normalized using z-score normalization, where each RGB channel was standardized using the mean and standard deviation values commonly adopted for ImageNet [24]. This preprocessing helped standardize the input data, ensuring the model received optimized inputs for training.

For this classification task, the Swin transformer model was chosen due to its efficiency and flexibility in handling large-scale image classification tasks. The Swin transformer operates by dividing the input image into patches, followed by processing these patches through multiple stages using local and global self-attention mechanisms. The final output is then passed through a linear classification head to predict the enhancement method for each input image. During training, the model was optimized using the weighted Cross-Entropy Loss, computed as

L_{CE} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c l = 1}^{C L} w_{c l} \cdot y_{i, c l} \cdot log ({\hat{y}}_{i, c l}),

(10)

where N is the number of samples in the batch, CL is the number of classes, and

y_{i, c l}

is defined as

y_{i, c l} = \{\begin{matrix} 1 & if sample i belongs to class c l, \\ 0 & otherwise, \end{matrix}

{\hat{y}}_{i, c l}

is the predicted probability for class

c l

of sample i, and

w_{c l}

is the weight assigned to class

c l

to account for class imbalance.

Additionally, since the dataset was imbalanced, class weights were computed and incorporated into the loss function to give greater weight to underrepresented classes. This approach aimed to ensure that the model treated all classes fairly during training, improving performance on less frequent classes. The model’s parameters were optimized using the Adam optimizer, which updates the parameters according to the following rule:

θ_{t + 1} = θ_{t} - \frac{η}{\sqrt{{\hat{v}}_{t}} + ϵ} {\hat{m}}_{t},

(11)

where

θ_{t}

represents the model parameters at step t,

η

is the learning rate,

{\hat{m}}_{t}

and

{\hat{v}}_{t}

are the bias-corrected first and second moment estimates of the gradients, and

ϵ

is a small constant to prevent division by zero.

Throughout the training process, model performance was evaluated at each epoch using multiple metrics. These included the training loss, validation loss, and accuracy. The F1 Macro score was calculated on the validation set to assess the model’s performance across all classes, particularly in the presence of class imbalance. Additionally, the model achieving the highest validation accuracy at each epoch was saved, ensuring that the model with the best generalization performance was preserved. To visualize the Swin transformer model architecture and help with the understanding of its structure, Figure 2 below shows the typical architecture used in this study. The model processes images through several stages of self-attention and feature extraction, before outputting the final classification prediction.

To further evaluate the performance, several baseline models were created. The first baseline model randomly selected labels from the top three classes. This baseline was specifically designed to simulate a scenario where the methodology makes predictions without any intelligent decision-making process or learning mechanism. In this model, the label assignments are made arbitrarily from the available classes, with no regard to the input data. To obtain more reliable results, the experiment was repeated 100 times, and the average performance across these runs was calculated. This approach helps us to understand the performance that can be expected when the methodology chooses randomly between the most likely classes. The baseline model serves as a reference point, setting a minimum threshold for comparison, and highlights the importance of using more advanced models that incorporate data-driven decision-making strategies. Formally, the predicted label is given by

{\hat{y}}_{i} \sim Uniform (Available Classes),

(12)

where

{\hat{y}}_{i}

denotes the predicted label for sample i, drawn uniformly at random from the three most frequent classes in the dataset.

The second baseline model involved selecting the class with the highest frequency for all predictions. This approach assumes that the most prevalent class is the correct one for every instance, regardless of the input data. Since this method always produces the same result, it was only run once. The purpose of this baseline was to demonstrate the performance of a model that solely relies on class distribution, without considering any additional features or patterns within the data. By using this model, it is possible to gauge the effectiveness of a strategy that simply chooses the most common class, providing a comparison against more sophisticated, data-driven methods. Formally, the predicted label is given by

{\hat{y}}_{i} = y_{m o d e},

(13)

where

{\hat{y}}_{i}

denotes the predicted label for sample i and

y_{m o d e}

is the most frequent class in the dataset.

4. Experimental Evaluation

This section provides a comprehensive evaluation of the proposed methodology. It begins with a description of the evaluation metrics and continues with detailing the implementation details. The initial training results are presented, followed by an overview of the datasets used in the experiments. The methodology’s performance is first evaluated in terms of accuracy and subsequently in terms of its effectiveness in improving underwater-specific metrics. Experimental results on both the test set and an in-the-wild dataset are then reported, along with a discussion of the associated computational costs.

4.1. Evaluation Metrics

For evaluating the effectiveness of the enhancement methods, two key factors influenced the selection of metrics. The first was the necessity to work without ground truth, as no reference ground truth for the enhanced images was available. The second was the importance of choosing metrics that are specific to underwater environments, where typical image processing challenges, such as color distortion and low visibility, are particularly pronounced. A table with all the metric acronyms and their definitions is included in Table 2 for clearer understanding and reference. It is important to note that all metric scores are computed on the original high-resolution images, whereas resizing is used solely for selecting the most suitable enhancement method.

The first metric employed was the UCIQE [25], which is a specialized quality metric designed to assess the visual quality of underwater images. Traditional image quality metrics are often based on subjective perception or predefined ground truth images. In contrast, UCIQE considers the unique characteristics of underwater images. This makes it more suitable for underwater evaluation. This includes the color distortion due to light absorption in water and the overall visibility of objects within the scene. UCIQE works by analyzing three primary factors, including colorfulness, contrast, and naturalness. It compares the enhanced image with typical characteristics of a natural, undistorted image, providing a quality score that reflects how visually pleasant and clear the image appears after enhancement. The metric ranges from 0 to 1, where a higher score indicates better enhancement. The formula for calculating UCIQE is given by

UCIQE = c_{1} \cdot σ_{c} + c_{2} \cdot C_{L C} + c_{3} \cdot μ_{s},

(14)

where

σ_{c}

, is the standard deviation of chroma and measures the intensity of color variation in the image.

C_{L C}

is the luminance contrast, which quantifies the contrast in brightness across the image, and

μ_{s}

is the mean saturation, which indicates the overall color saturation of the image.

c_{1}, c_{2}, c_{3}

are constants that are used as weights for chroma, luminance contrast, and saturation.

In addition to the UCIQE metric, UIQM [26] was also utilized to evaluate the enhancement of underwater images. The UIQM is a composite metric designed to assess the quality of underwater images by combining various quality indicators such as contrast, sharpness, and naturalness. It incorporates multiple individual metrics, each focusing on a specific aspect of image quality, which are then weighted and combined to produce a final score. The UIQM formula integrates the following three components, which include UICM (Underwater Image Contrast Measure), UISM (Underwater Image Sharpness Measure), and UICONM (Underwater Image Colorfulness Measure). The UIQM combines these individual measures using a weighted sum approach. The formula for UIQM is as follows:

UIQM = λ_{1} \cdot UICM + λ_{2} \cdot UISM + λ_{3} \cdot UICONM,

(15)

where

λ_{1}

,

λ_{2}

,

λ_{3}

are the empirical weighting coefficients and UICM, UISM, and UICONM represent the individual underwater image quality measures for contrast, sharpness, and colorfulness, respectively.

The third metric that was used was the one of UIF [27], which is designed to quantify the fidelity of an underwater image by measuring the uniformity of chroma distribution after applying robust normalization. It focuses on the color information by calculating the deviation in the chroma distribution, making it a crucial metric for assessing color balance and consistency in underwater images. Unlike traditional metrics that only focus on global image characteristics, UIF accounts for local variations in chroma, ensuring that the color fidelity of the image is properly evaluated. The UIF is calculated using the following formula:

UIF = \sum_{i = 1}^{K} w_{i} (1 - \frac{| h_{i} - \bar{h} |}{\bar{h} + ϵ}),

(16)

where

h_{i}

is the normalized chroma histogram for the i-th bin,

\bar{h} = \frac{1}{K}

is the reference uniform histogram, and

w_{i} = \frac{1}{K}

is the weight for each histogram bin. K is the number of bins in the chroma histogram, and

ϵ

is a small constant to prevent division by zero. For a specific value of K, the deviation of the metric remains stable, though it can be adjusted depending on the specific requirements.

A composite metric was then created that integrates all three key image quality measures of UCIQE, UIQM, and UIF to leverage the strengths of each and provide a comprehensive evaluation of image enhancement techniques in underwater environments. The UCIQE metric is particularly sensitive to color and luminance quality, making it useful for assessing color fidelity and contrast, which are crucial for underwater images. The UIQM focuses on the image’s structural integrity and its perceptual similarity to human visual preferences, while UIF emphasizes the uniformity and distribution of chroma across the image. By combining these three metrics, the aim is to capture not only the technical aspects of image enhancement, such as color distribution and contrast, but also the perceptual aspects, ensuring that the resulting composite metric provides a balanced and holistic view of image quality.

The fusion of these metrics was achieved by applying a weighted combination of the z-scores derived from each individual metric. This approach normalizes the metrics and integrates them into a UIQM score that can be used to rank the quality of enhanced images. The statistical analysis of the combined results (e.g., mean and standard deviation) allows for assessing the overall performance of the enhancement methods. The composite metric, by aggregating the best features of UCIQE, UIQM, and UIF, serves as a more robust and reliable measure of image quality, particularly in the complex and diverse conditions found in underwater environments. By doing so, it eliminates the need for subjective judgment or reliance on a single metric, providing a more consistent and comprehensive evaluation framework. The formula for calculating the z-scores is given by

z = \frac{X - μ}{σ},

(17)

where X is the value being transformed,

μ

is the mean of the distribution and

σ

is the standard deviation of the distribution. Then the composite z score is given by

{Composite}_{z} = \frac{{UCIQE}_{z} + {UIQM}_{z} + {UIF}_{z}}{3}

(18)

This approach effectively combines the information from all three metrics, giving equal weight to each one in the final composite score. It is a simple yet effective way to leverage the strengths of different image quality metrics, and it assumes that each metric contributes equally to the final image quality evaluation. If a user likes to give different weights to each metric, the formula can be modified to use weighted averages instead of a simple average.

For the evaluation of the model, the F1 macro score was utilized as a key evaluation metric for assessing the model’s performance, alongside accuracy, throughout the training process. The F1 macro score provides a balanced measure of precision and recall across all classes, regardless of their distribution. Unlike accuracy, which may be biased towards more frequent classes, the F1 macro considers both false positives and false negatives, making it particularly valuable when evaluating imbalanced datasets. This metric computes the F1 score for each class individually and then averages the results, ensuring that the model’s ability to correctly classify all classes is taken into account. By using the F1 macro score, the evaluation process provides a more comprehensive view of the model’s overall performance, especially in cases where accuracy alone may not fully reflect the model’s effectiveness across different classes. This approach helps ensure that the model is not only accurate but also robust in handling varying data distributions, which is crucial for tasks involving complex or imbalanced data. Its formula is given by

F 1_{macro} = \frac{1}{C l} \sum_{c l = 1}^{C l} F 1_{c},

(19)

where

T P_{c l}

are the true positives for class cl,

F P_{c l}

are the false positives for class cl,

F N_{c}

are the false negatives for class cl and

C l

represents the number of classes in the classification problem. All of the enhancement methods are visualized in Figure 3. The figure includes images that were not used during training, as well as in-the-wild samples that are completely unseen by the model.

4.2. Implementation Details

The training of the Swin Transformer model was performed on a high-performance desktop system with 62 GB of RAM and an Intel Core i9-13900F processor. Computation acceleration was provided by an NVIDIA GeForce RTX 3070 Ti GPU with 8 GB of GDDR6X VRAM. The other experiments, including image enhancement techniques and model evaluation, were conducted on a secondary desktop system featuring 32 GB of RAM, an Intel Core i5-10600K processor, and an NVIDIA GeForce GTX 1060 GPU.

For the UCIQE metric, the coefficients were fine-tuned with values

c_{1}

= 0.4680,

c_{2}

= 0.2745, and

c_{3}

= 0.2576. Similarly, for the UIQM metric, the empirical weighting coefficients were determined as

λ_{1}

= 0.0282,

λ_{2}

= 0.2953, and

λ_{3}

= 3.5753. These weighting coefficients of UIQM are not arbitrarily selected in this work but are taken from the original UIQM formulation proposed in the literature. These values were empirically determined in prior studies to balance the relative contributions of colorfulness, sharpness, and contrast in underwater image quality assessment. Standard fixed coefficients are adopted to ensure consistency and fair comparison with existing methods rather than re-optimizing them. All other implementation parameters are adopted from the corresponding original works or standard settings reported in the literature to ensure fair and reproducible evaluation. For the UIF metric calculation, the number of bins K for the chroma histogram is set to 10, offering a balanced resolution for chroma normalization, although this value can be adjusted depending on the specific requirements of the application. For the Gamma correction, the value of

γ

was set to 0.7. For the CLAHE enhancement, the processing involves using a rectangular grid with a tile size of (4, 4) and setting the clip limit to 2. For the Shades-of-Gray white balance method, the Minkowski p-norm was set to 6, which is commonly used to compute the channel normalization and achieve a more accurate white balance correction. It is important to note that all images were resized to 224 × 224 pixels to reduce computational cost.

The Swin Transformer base variant was utilized, which employs a patch size of 4 and a window size of 7. The model was pretrained on the ImageNet dataset and subsequently adapted for the image enhancement classification task. It was trained for 50 epochs using a batch size of 16, and the dataset was divided into training (70%), validation (15%), and test (15%) subsets. The training and validation sets were used for model optimization, while the test set was reserved for final evaluation. To ensure reproducibility across different runs, a fixed random seed was applied to both data shuffling and model initialization. The model training was conducted using the Adam optimizer with a learning rate of

1 \times 10^{- 4}

, and the loss function used was Cross-Entropy Loss.

4.3. Dataset

The training dataset consisted of six sub-datasets, each collected at different depths and under varying environmental conditions in order to capture a wide range of underwater scenes. In total, the dataset contained 4051 images. From these, 321 images were reserved as an in-the-wild test dataset, while the remaining 3730 images were used for the training, validation, and testing processes. The data were split following a 70–15–15 ratio, with 70% used for training, 15% for validation, and 15% for testing.

The Mermaid Underwater Dataset [28] (dataset 1), which is publicly available, was the first subset and was used as an in-the-wild test set. It contains 321 high-resolution images (3840 × 2880 pixels) captured in 2022 at a depth of approximately 20 m at the La Sirène site in Saint-Raphaël, France. The dataset was captured using a GoPro Hero 3, silver edition camera. The dataset covers an area of roughly 150 m² and provides sub-millimeter ground sampling resolution. It includes a variety of underwater environments, such as a mermaid statue, sandy plains, and rocky habitats. The images were acquired by divers using a single camera under natural lighting conditions and are provided without any preprocessing. Dataset 6 includes images captured under both natural illumination and artificial lighting by a diving torch, which is useful for visibility in deep water environments. This combination enhances the dataset’s diversity by incorporating conditions beyond natural lighting alone, thereby better representing realistic underwater operational environments.

Dataset 2 contains images of a Rubik’s Cube placed at a depth of 28 m underwater, in Chrousso, Greece, and consists of 2171 images. The cube was firmly embedded in the seabed to ensure stability and reduce movement caused by underwater currents. During image acquisition, the diver moved around the cube to capture it from different viewpoints.

Dataset 3 consists of 540 images captured at a depth of 20 m in Akti Kalogrias, Greece. Dataset 4 includes 342 images captured at a depth of 40 m in Porto Valitsa, Greece. Dataset 5 contains 77 images captured at a depth of 50 m in Avlaki, Greece. The second, third, fourth, and fifth datasets were captured using a GoPro Hero 9 camera equipped with an underwater housing and have Full HD resolution. Dataset 6 consists of 600 images captured at a depth of 62 m in Porto Valitsa, Greece, with Full HD resolution. The images were acquired using a Vaquita Paralenz camera. The composition of the dataset is illustrated in Figure 4.

The GoPro HERO9 Black features a 1/2.3-inch sensor with approximate dimensions of 6.17 × 4.55 mm and sensor area of 28.07 mm². It is equipped with a fixed lens of approximately 3 mm focal length and an aperture of f/2.5 under the standard configuration. The Paralenz Vaquita camera also employs a 1/2.3-inch sensor; however, its aperture and focal length are not publicly specified by the manufacturer. Both cameras were used with their default lenses, and no additional optical filters (e.g., anti-reflection or color filters) were used during image acquisition.

Since the proposed approach is camera-agnostic, it does not incorporate optical calibration or explicitly model distortions, but instead addresses them implicitly through learned image representations. It is important to note that the datasets were extracted from in situ underwater videos under real-world conditions, where the use of color calibration charts is neither feasible nor beneficial to the study’s purpose. Therefore, the evaluation focuses on relative performance under realistic conditions rather than absolute color calibration.

4.4. Experimental Results

First, each enhancement method was evaluated for every image in the dataset using metrics such as UCIQE, UIQM, UIF, and the composite Z score calculated across the entire dataset. Based on these metrics, the optimal enhancement for each individual image was identified and used as the ground truth to train and guide the image classification network. For the used dataset, out of the seven enhancement methods applied, only three consistently produced the best results, with their relative distribution illustrated in Figure 5.

The fact that the top enhancement methods for the dataset are not limited to a single method, but rather three distinct ones, highlights the importance of automatically determining the optimal image enhancement for each case. This variability in the best method is particularly notable, as even within the same dataset originating from the same location, the most effective enhancement method can differ. Such differences likely arise due to varying features in the images, such as lighting conditions, noise, or contrast, which affect how each enhancement technique performs. This emphasizes the necessity for a dynamic model. Such a model should adapt to the specific characteristics of each image. It should also select the most suitable enhancement method to ensure optimal results.

The results from the training demonstrate the model’s progressive improvement across both the training and testing datasets. Initially, the model’s performance was modest, with training accuracy gradually increasing and validation accuracy fluctuating in the first few epochs. By Epoch 3, noticeable improvement in both training and validation accuracies was observed, signaling the model’s beginning to generalize well. Throughout the training process, the loss consistently decreased, and the accuracy steadily increased, indicating the model is effectively learning from the data and adapting to the underlying patterns.

While the model continued to improve throughout the epochs, Epoch 22 was preferred for further analysis due to its balanced performance and the highest validation accuracy. At Epoch 22, the training accuracy reached 95.86%, while validation accuracy was 88.37% and test accuracy was 88.04%. The F1 Macro score also peaked at 0.869913, demonstrating strong performance in terms of precision and recall balance across all classes. This epoch reflects the model’s ability to generalize well and make reliable predictions on unseen data. Nevertheless, the overall trend of increasing performance underscores the model’s effectiveness and strong progress across both the training and evaluation datasets.

The results presented in the table show a comparison of model performance metrics for the test set, specifically focusing on Accuracy and F1 Macro. The baseline models, random and major, serve as reference points for understanding the performance of the proposed approach. The Baseline random model achieved an accuracy of 33.24% and an F1 Macro of 29.27%, indicating random predictions across the three classes. The Baseline major model, which always selects the class with the highest frequency, performed better with an accuracy of 50.18% and an F1 Macro of 22.28%. In contrast, the Proposed swin model, leveraging a more advanced methodology, achieved significantly higher performance, with an accuracy of 88.04% and an F1 Macro of 87.88%. Compared to the Baseline random and Baseline major, the proposed model shows an improvement of 58.61% and 65.60% in F1 Macro, respectively. The results are shown in Table 3. Also, the confusion matrix is provided in Figure 6.

Except for evaluating accuracy, further assessment was conducted by calculating the metrics for the predicted classes and comparing them to the metrics that would result from selecting a specific enhancement method. These evaluations were carried out on the testing set. The values presented in Table 4 are averages computed over the testing set for each enhancement method, with metrics including UCIQE, UIQM, UIF, and the Composite Score. These values reflect the overall effectiveness of each enhancement technique in improving image quality. Ideally, the best result would be obtained if the method selected for each image perfectly matched the optimal enhancement, leading to the highest possible values across all metrics. The “Optimal” row, which corresponds to the enhancement with the highest composite score, represents the ideal outcome of correctly predicting the optimal enhancement for every image.

While the goal is to maximize these values, the methods demonstrate varying levels of effectiveness, and the Proposed Swin model shows how the model’s predictions align with the best-performing methods on average. The proposed model shows notable improvements over the other enhancement methods. This demonstrates that the proposed model’s predictions are very close to the optimal enhancement technique. Moreover, the proposed model outperforms several other enhancement methods in terms of UCIQE, UIQM, and UIF values, indicating that it is highly effective in selecting the best enhancement based on the input image. The small gap between the proposed model and the optimal method reflects the model’s ability to consistently identify high-quality enhancement techniques.

In addition to the improvements demonstrated by the proposed model, one of its most significant contributions is its ability to automatically identify the best enhancement method, a task that would otherwise require prior knowledge of the image characteristics. The key challenge in image enhancement is that, in practice, the optimal method is not always apparent or consistent across different images. Users typically lack the expertise or means to determine which enhancement technique would work best for a given image without trial and error. The proposed model effectively addresses this problem. It selects the most appropriate enhancement method based on the specific features of each input image, ensuring the best possible outcome. This capability is crucial, as it removes the need for manual selection, streamlining the process and guaranteeing superior image quality without requiring user intervention or prior knowledge. A visualization of all the enhancement methods, including both the predicted and ground truth results, is presented in Figure 3.

In addition to the controlled test set evaluations, an in-the-wild dataset [28] was used to assess the methodology’s performance in more realistic, unconstrained conditions. In-the-wild evaluation is crucial for image enhancement models. It simulates real-world scenarios where images come from diverse sources. These images often vary significantly in terms of lighting, noise, image quality, and content. Unlike controlled datasets, in-the-wild images often contain unpredictable challenges, such as varying exposure, complex backgrounds, and inconsistent color distributions. By testing the model on such data, it becomes possible to evaluate its robustness and generalizability in handling diverse image types and conditions.

The second part of Table 4 presents the performance metrics for the in-the-wild dataset. The Baseline random model achieves an F1 macro of 26.18%, while the Baseline major model shows a higher macro of 39.55%. The proposed Swin model surpasses both with an F1 macro of 52.14%, demonstrating an improvement of 31.87%. The higher accuracy in the Baseline major class is attributed to the dataset’s inherent class imbalance, with a larger number of images belonging to the major category. However, the significant improvement in the proposed Swin model suggests its ability to not only handle such class imbalances but also to boost overall performance. The model’s performance across both major and minor classes shows its effectiveness in recognizing patterns and making predictions even in the presence of imbalanced data. This also reflects the model’s potential to generalize well across varying datasets and adapt to different class distributions, which is critical for robust performance in real-world applications. The ability to balance the accuracy between classes further reinforces the model’s effectiveness in complex scenarios, where real-world data is often skewed or uncertain.

After these, the metrics for each enhancement method were calculated for the in-the-wild dataset, which is crucial for assessing how well these methods generalize to real-world, unrecognized data. The metrics for each enhancement method on the in-the-wild dataset highlight the effectiveness of various techniques in improving image quality across different dimensions. The proposed Swin model demonstrates superior performance with a composite score of 0.6491, surpassing all of the traditional methods. Notably, it outperforms the RGHS method, which achieves a composite score of 0.6465. The results are shown in the Table 5.

It is equally crucial to recognize that the optimal enhancement method for the in-the-wild dataset differs from the best method identified on the testing set. This distinction underscores the power and necessity of the proposed model, as it actively identifies the most effective method tailored to each dataset. Without such a model, users would have to manually test multiple enhancement methods for each new dataset. This process is not only time-consuming, but is also prone to error, especially when the best method varies even within the same dataset. The proposed model automates this selection process, delivering significant time savings while ensuring the highest possible quality enhancement. This ability to dynamically choose the best method, adjusting to the specific characteristics of each dataset, demonstrates a level of flexibility and efficiency that would be nearly impossible to achieve through manual trial and error. The model’s adaptability to real-world conditions makes it an invaluable tool, setting a new standard for smart, data-driven enhancement.

The Swin Transformer model used in this study contains 86,746,299 parameters, reflecting its high capacity to learn complex features from underwater images. With 15.47 GFLOPs during inference, the model is relatively light in terms of computational complexity compared to other large-scale architectures. These metrics highlight its efficient performance while maintaining a manageable computational cost for practical deployment. The computational costs of various image enhancement methods were evaluated using the Mermaid dataset [28], which consists of 321 high-resolution images transformed to 512 × 512. The results indicate that the Proposed Swin model achieved a processing speed of 26.75 fps for detecting the optimal enhancement methodology, demonstrating that it introduces only a minimal delay before applying the enhancement. This suggests that the Swin model efficiently integrates the decision-making process without significantly impacting overall processing time. The average RAM usage for the Swin model was 1375.44 MB, indicating a higher memory requirement compared to simpler methods, which had minimal memory usage. The Proposed Swin model demonstrated 49.74% CPU usage and 47.79% GPU usage, highlighting its efficient use of resources despite the higher computational load. In contrast, the traditional methods, such as RGB Stretching and RGHS, exhibited near-zero CPU and GPU usage, suggesting minimal computational demand. Notably, the CPU percentage reported here refers to the usage of a single CPU core, providing a more granular view of the computational demands. The results are shown in Table 6.

5. Conclusions

In this study, a novel image enhancement framework was proposed, leveraging a composite metric that integrates three distinct image quality measures, including UCIQE, UIQM, and UIF. The effectiveness of the proposed approach was demonstrated through extensive evaluations on both controlled and in-the-wild datasets. An adaptive model was proposed and evaluated through extensive testing on both controlled and in-the-wild datasets. The proposed model demonstrated significant improvements in image quality across several key metrics, including accuracy and F1 Macro. For the controlled test set, the proposed model achieved an accuracy of 88.04% and an F1 Macro of 87.88%, outperforming the baseline models, which showed lower performance. Specifically, the Baseline random model achieved an accuracy of only 33.24%, while the Baseline major model performed slightly better at 50.18%. The proposed model exhibited a remarkable improvement of 58.61% and 65.60% in F1 Macro when compared to the Baseline random and Baseline major models, respectively.

When tested on the in-the-wild dataset, the proposed model again demonstrated superior performance, achieving an F1 Macro of 52.14%, surpassing both the Baseline random model (26.18%) and the Baseline major model (39.55%). This significant improvement highlights the model’s ability to generalize well in more realistic and unconstrained conditions. Furthermore, the proposed model consistently identified the best enhancement techniques, achieving composite scores that closely matched the optimal enhancement methods for both datasets. These results emphasize the robustness of the proposed framework, which not only performs well on controlled data but also adapts to the diverse and unpredictable nature of real-world images.

The results show that the optimal enhancement method varies across datasets. This highlights the dynamic nature of image enhancement. In some datasets, methods such as RGHS performed best, while in others, RGB-CLAHE outperformed the alternatives. This variability underscores the limitations of relying on a single enhancement method across diverse conditions and the need for an adaptive model capable of selecting the most appropriate method based on the characteristics of each image. The proposed adaptive model addresses this challenge by automating the selection of the optimal enhancement, ensuring superior results without requiring prior knowledge of the image’s features. The proposed framework naturally supports the inclusion of new enhancement methods. The classifier predicts the most suitable technique for a given input image and can be extended without architectural changes by incorporating additional methods into the training dataset. By learning from labeled examples, it identifies the conditions under which each method is optimal. This design ensures adaptability and scalability as new enhancement techniques become available. This flexibility is essential in real-world scenarios where the conditions can vary significantly, and no single enhancement technique can guarantee optimal results across all datasets.

Despite the promising results, several limitations exist in the proposed image enhancement framework. One key limitation is the reliance on the dataset characteristics, as the model’s performance can vary depending on the specific features of the input images, such as lighting, noise, and contrast. Additionally, while the model demonstrates strong generalization across multiple datasets, there is still a need for further fine-tuning to handle more extreme or uncommon cases in the in-the-wild scenarios. Moreover, the framework requires substantial computational resources for training and evaluation, which may limit its scalability for large-scale applications.

Future work will focus on enhancing the proposed framework by exploring AI-based approaches that directly generate enhanced underwater images, moving beyond predefined enhancement methods. Deep learning models for end-to-end image enhancement, including generative models and diffusion techniques, will be investigated to improve visibility, color balance, and contrast. Additionally, expanding the dataset to cover diverse underwater environments, lighting conditions, and devices will enhance model generalization. Future studies may also incorporate perceptual quality assessment and task-specific evaluations, such as object detection or marine life recognition, to better assess the real-world impact of enhancement techniques.

Author Contributions

Conceptualization, A.V. and S.K.; methodology, A.V.; software, A.V.; validation, A.V. and S.K.; formal analysis, S.K.; investigation, S.K.; data curation, S.K.; writing—original draft preparation, A.V.; writing—review and editing, A.V.; visualization, A.V.; supervision, S.K.; project administration, S.K.; funding acquisition, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study did not involve human participants, animals, or sensitive data requiring ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data are included in the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, H.; Chen, L.; Lu, X.; Wang, H.; Bai, L.; Wang, M.; Ren, P. A visual-textual mutual guidance fusion network for remote sensing visual question answering. Pattern Recognit. 2026, 176, 113258. [Google Scholar] [CrossRef]
Wang, H.; Zhang, W.; Xu, Y.; Li, H.; Ren, P. WaterCycleDiffusion: Visual–textual fusion empowered underwater image enhancement. Inf. Fusion 2026, 127, 103693. [Google Scholar] [CrossRef]
Liu, W.; Xu, J.; He, S.; Chen, Y.; Zhang, X.; Shu, H.; Qi, P. Underwater-image enhancement based on maximum information-channel correction and edge-preserving filtering. Symmetry 2025, 17, 725. [Google Scholar] [CrossRef]
Alsakar, Y.M.; Sakr, N.A.; El-Sappagh, S.; Abuhmed, T.; Elmogy, M. Underwater image restoration and enhancement: A comprehensive review of recent trends, challenges, and applications. Vis. Comput. 2025, 41, 3735–3783. [Google Scholar] [CrossRef]
Vrochidis, A.; Tzovaras, D.; Krinidis, S. Enhancing Three-Dimensional Reconstruction Through Intelligent Colormap Selection. Sensors 2025, 25, 2576. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Sun, W.; Ji, Y.; Huang, W. S2G-GCN: A Plot Classification Network Integrating Spectrum-to-Graph Modeling and Graph Convolutional Network for Compact HFSWR. IEEE Geosci. Remote Sens. Lett. 2025, 22, 3506805. [Google Scholar] [CrossRef]
Cong, X.; Zhao, Y.; Gui, J.; Hou, J.; Tao, D. A comprehensive survey on underwater image enhancement based on deep learning. arXiv 2024, arXiv:2405.19684. [Google Scholar] [CrossRef]
Trigka, M.; Dritsas, E. A comprehensive survey of deep learning approaches in image processing. Sensors 2025, 25, 531. [Google Scholar] [CrossRef]
Vrochidis, A.; Tzovaras, D.; Krinidis, S. Enhancing 3D Reconstructions in Underwater Environments: The Impact of Image Enhancement on Model Quality. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2025, 48, 317–324. [Google Scholar] [CrossRef]
Yuan, Y.; Tian, Y. Semantic segmentation algorithm of underwater image based on improved DeepLab v3+. In International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022); SPIE: Guangzhou, China, 2023; Volume 12604, pp. 862–867. [Google Scholar]
Vrochidis, A.; Krinidis, S.; Tzovaras, D. Colored ICP: Advancing 2D to 3D shape registration for enhanced accuracy and visualization. In Proceedings of the 7th International Conference on Algorithms, Computing and Systems; Association for Computing Machinery: New York, NY, USA, 2023; pp. 105–110. [Google Scholar]
Abdul Ghani, A.S.; Mat Isa, N.A. Underwater image quality enhancement through integrated color model with Rayleigh distribution. Appl. Soft Comput. 2015, 27, 219–230. [Google Scholar] [CrossRef]
Huang, D.; Wang, Y.; Song, W.; Sequeira, J.; Mavromatis, S. Shallow-water Image Enhancement using Relative Global Histogram Stretching Based on Adaptive Parameter Acquisition. In MultiMedia Modeling: 24th International Conference, Bangkok, Thailand; Springer: Berlin/Heidelberg, Germany, 2018; pp. 453–465. [Google Scholar] [CrossRef]
Nagarathna, P.; MuhamedAle, H.; Jagadeesan, S.; Chekuri, N.; D.Y, A. Low-Light Image Enhancement with Optimized Gamma Correction using Guided Filter. In 2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE); IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar] [CrossRef]
Zhang, D.; Alkhayyat, A.; Sudhamsu, G.; Sahu, P.K.; Dasi, S.; Srivastava, A.; Singh, D.; Mayan, J.A. CLAHE-Based Contrast Improvement and Noise Reduction for Foggy Images. Natl. Acad. Sci. Lett. 2025, 1–8. [Google Scholar] [CrossRef]
Wibisono, I.S.; Andono, P.N.; Syukur, A.; Pujiono. Image Improvement of Underwater Objects Using the Lucy-Richardson Method with the White Balancing Technique with the Gray World Model. In 2024 International Seminar on Application for Technology of Information and Communication (iSemantic); IEEE: New York, NY, USA, 2024; pp. 459–464. [Google Scholar] [CrossRef]
Salman, H.A.; Kalakech, A. Image enhancement using convolution neural networks. Babylon. J. Mach. Learn. 2024, 2024, 30–47. [Google Scholar] [CrossRef]
Wang, X.; Luo, Z.; Huang, W.; Zhang, Y.; Hu, R. Optimized UNet framework with a joint loss function for underwater image enhancement. Sci. Rep. 2025, 15, 7327. [Google Scholar] [CrossRef] [PubMed]
Anila, V.; Nagarajan, G.; Perarasi, T. Low-light image enhancement using retinex based an extended ResNet model. Multimed. Tools Appl. 2025, 84, 29143–29158. [Google Scholar] [CrossRef]
Oulmalme, C.; Nakouri, H.; Jaafar, F. A systematic review of generative AI approaches for medical image enhancement: Comparing GANs, transformers, and diffusion models. Int. J. Med. Inform. 2025, 199, 105903. [Google Scholar] [CrossRef]
Wang, Y.; Deng, Y.; Zheng, Y.; Chattopadhyay, P.; Wang, L. Vision transformers for image classification: A comparative survey. Technologies 2025, 13, 32. [Google Scholar] [CrossRef]
Kumar, A.; Yadav, S.P.; Kumar, A. An improved feature extraction algorithm for robust Swin Transformer model in high-dimensional medical image analysis. Comput. Biol. Med. 2025, 188, 109822. [Google Scholar] [CrossRef]
Li, Y.; Cheng, F.; Yu, W.; Wang, G.; Luo, G.; Zhu, Y. AdaIFL: Adaptive image forgery localization via a dynamic and importance-aware transformer network. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2024; pp. 477–493. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Yang, M.; Sowmya, A. An Underwater Color Image Quality Evaluation Metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef]
Panetta, K.; Gao, C.; Agaian, S. Human-visual-system-inspired Underwater Image Quality Measures. IEEE J. Ocean. Eng. 2015, 41, 541–551. [Google Scholar] [CrossRef]
Zheng, Y.; Chen, W.; Lin, R.; Zhao, T.; Le Callet, P. UIF: An Objective Quality Assessment for Underwater Image Enhancement. IEEE Trans. Image Process. 2022, 31, 5456–5468. [Google Scholar] [CrossRef] [PubMed]
Avanthey, L.; Beaudoin, L. Dense In Situ Underwater 3D Reconstruction by Aggregation of Successive Partial Local Clouds. Remote Sens. 2024, 16, 4737. [Google Scholar] [CrossRef]

Figure 1. Proposed adaptive underwater enhancement architecture.

Figure 2. The architecture of the Swin transformer model, which is based on shifted windows for efficient self-attention.

Figure 3. Visualization of all the implemented enhancement methods applied to images from different datasets, showcasing the effects of each method on image quality.

Figure 4. Visualization of the dataset used, along with the depth information and the cameras employed.

Figure 5. Class distribution across the entire dataset.

Figure 6. Overview of the confusion matrix for the test set performance.

Table 1. Comparison of related work with the proposed approach.

Study	S	C-E	W	A	X-E
RGB Stretching [12]	✓
RGHS [13]	✓
Gamma Correction [14]	✓
CLAHE [15]	✓
White Balance [16]	✓
RGB-CLAHE [5]	✓		✓		✓
ResNet [19]		✓
U-Net [18]		✓	✓		✓
Proposed	✓	✓	✓	✓	✓

S = Scalability; C-E = Cost-effective; W = In the wild comparison; A = Adaptive; X-E = Cross-environment comparison.

Table 2. List of acronyms used in this study.

Acronym	Full Name	Description
UCIQE	Underwater Color Image Quality Evaluation	Metric for assessing colorfulness, contrast, and saturation in underwater images
UIQM	Underwater Image Quality Measure	Composite metric evaluating contrast, sharpness, and colorfulness
UIF	Underwater Image Fidelity	Metric measuring chroma distribution and color consistency
UICM	Underwater Image Contrast Measure	Component of UIQM focusing on contrast
UISM	Underwater Image Sharpness Measure	Component of UIQM focusing on sharpness
UICONM	Underwater Image Colorfulness Measure	Component of UIQM focusing on colorfulness
CLAHE	Contrast Limited Adaptive Histogram Equalization	Technique for local contrast enhancement
RGHS	Relative Global Histogram Stretching	Method for contrast and color correction in underwater images

Table 3. Comparison of model performance metrics for the test and the in-the-wild dataset.

Model	Accuracy	F1 Macro
Baseline random T.	33.24%	29.27%
Baseline major T.	50.18%	22.28%
Proposed Swin T.	88.04%	87.88%
Baseline random W.	33.48%	26.18%
Baseline major W.	65.42%	39.55%
Proposed Swin W.	56.07%	52.14%

T. = Testing Set, W. = In the Wild Dataset.

Table 4. Comparison of enhancement methods based on evaluation metrics for the test set.

Method	UCIQE	UIQM	UIF	Composite Score
Unprocessed	2.3394	86.7986	0.1993	0.3005
RGB Stretching [12]	3.594	123.2316	0.3	0.4276
RGHS [13]	4.2162	159.9681	0.1692	0.4551
Gamma Correction [14]	1.7016	78.5565	0.2129	0.2701
CLAHE [15]	2.6196	143.2994	0.1745	0.3880
White Balance [16]	2.2419	85.5556	0.297	0.3252
RGB-CLAHE [5]	3.5661	176.6997	0.2995	0.4918
Proposed Swin	4.202	170.8211	0.2705	0.4951
Optimal	4.2117	170.6324	0.275	0.4961

Table 5. In-the-wild Comparison of enhancement methods based on evaluation metrics.

Method	UCIQE	UIQM	UIF	Composite Score
Unprocessed	3.9903	93.3956	0.6005	0.4865
RGB Stretching [12]	4.946	137.4112	0.6488	0.6084
RGHS [13]	6.3764	196.577	0.5181	0.6465
Gamma Correction [14]	2.956	84.4448	0.5693	0.4281
CLAHE [15]	4.3574	164.0938	0.6013	0.6031
White Balance [16]	2.4059	90.1115	0.387	0.3685
RGB-CLAHE [5]	4.9387	208.7989	0.6504	0.6432
Proposed Swin	5.794	202.9852	0.5844	0.6491
Optimal	6.0389	206.3785	0.5815	0.6555

Table 6. Computational costs for various image enhancement methods and the proposed classification method.

Method	Time (s)	Speed (Fps)	Avg. RAM (MB)	Avg. CPU (Cores %)	Avg. GPU (Load %)
RGB Stretching [12]	17	18.19	75.77	0.00	0.00
RGHS [13]	17	17.92	81.36	0.00	0.00
Gamma Correction [14]	17	18.14	82.87	0.00	0.00
CLAHE [15]	17	18.14	81.33	0.00	0.00
White Balance [16]	17	18.17	81.36	0.00	0.00
RGB-CLAHE [5]	17	18.29	82.80	0.00	0.00
Proposed Swin	12	26.75	1375.44	49.74	47.79

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Vrochidis, A.; Krinidis, S. Adaptive Underwater Image Enhancement Techniques Using Deep Learning. Appl. Syst. Innov. 2026, 9, 88. https://doi.org/10.3390/asi9050088

AMA Style

Vrochidis A, Krinidis S. Adaptive Underwater Image Enhancement Techniques Using Deep Learning. Applied System Innovation. 2026; 9(5):88. https://doi.org/10.3390/asi9050088

Chicago/Turabian Style

Vrochidis, Alexandros, and Stelios Krinidis. 2026. "Adaptive Underwater Image Enhancement Techniques Using Deep Learning" Applied System Innovation 9, no. 5: 88. https://doi.org/10.3390/asi9050088

APA Style

Vrochidis, A., & Krinidis, S. (2026). Adaptive Underwater Image Enhancement Techniques Using Deep Learning. Applied System Innovation, 9(5), 88. https://doi.org/10.3390/asi9050088

Article Menu

Adaptive Underwater Image Enhancement Techniques Using Deep Learning

Abstract

1. Introduction

2. Recent Advances in Image Enhancement

3. Adaptive Underwater Image Enhancement

3.1. Image Enhancement Techniques

3.2. Adaptive Model Training

4. Experimental Evaluation

4.1. Evaluation Metrics

4.2. Implementation Details

4.3. Dataset

4.4. Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI