Dual-CycleGANs with Dynamic Guidance for Robust Underwater Image Restoration

Lin, Yu-Yang; Huang, Wan-Jen; Yeh, Chia-Hung

doi:10.3390/jmse13020231

Open AccessArticle

Dual-CycleGANs with Dynamic Guidance for Robust Underwater Image Restoration

by

Yu-Yang Lin

¹

,

Wan-Jen Huang

¹ and

Chia-Hung Yeh

^2,3,*

¹

Institute of Communications Engineering, National Sun Yat-Sen University, Kaohsiung 80404, Taiwan

²

Department of Electrical Engineering, National Taiwan Normal University, Taipei 10610, Taiwan

³

Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 80404, Taiwan

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(2), 231; https://doi.org/10.3390/jmse13020231

Submission received: 29 December 2024 / Revised: 21 January 2025 / Accepted: 21 January 2025 / Published: 25 January 2025

(This article belongs to the Special Issue Application of Deep Learning in Underwater Image Processing)

Download

Browse Figures

Versions Notes

Abstract

The field of underwater image processing has gained significant attention recently, offering great potential for enhanced exploration of underwater environments, including applications such as underwater terrain scanning and autonomous underwater vehicles. However, underwater images frequently face challenges such as light attenuation, color distortion, and noise introduced by artificial light sources. These degradations not only affect image quality but also hinder the effectiveness of related application tasks. To address these issues, this paper presents a novel deep network model for single under-water image restoration. Our model does not rely on paired training images and incorporates two cycle-consistent generative adversarial network (CycleGAN) structures, forming a dual-CycleGAN architecture. This enables the simultaneous conversion of an underwater image to its in-air (atmospheric) counterpart while learning a light field image to guide the underwater image towards its in-air version. Experimental results indicate that the proposed method provides superior (or at least comparable) image restoration performance, both in terms of quantitative measures and visual quality, when compared to existing state-of-the-art techniques. Our model significantly reduces computational complexity, resulting in a more efficient approach that maintains superior restoration capabilities, ensuring faster processing times and lower memory usage, making it highly suitable for real-world applications.

Keywords:

underwater image restoration; deep learning; unsupervised learning; generative adversarial networks

1. Introduction

In recent years, the exploration of underwater environments has become increasingly important because of the increasing exhaustion of natural resources and the advancement of the global economy. Many applications in ocean engineering and research now rely heavily on underwater imagery captured by autonomous underwater vehicles (AUVs). One of the primary functions of AUVs is to be able to capture underwater imagery that is essential to explore, understand, and interact with the marine environment. Much research regarding underwater image processing has been proposed for scientific exploration of deep-sea environments [1,2]. However, underwater imaging faces more challenges than atmospheric imaging. Underwater images frequently experience degradation caused by attenuation, color distortion, and noise from artificial lighting. Specifically, scattering and absorption caused by particles in the water, such as microscopic phytoplankton or non-algae particles, can attenuate direct transmission and produce ambient scattered light. The diminished direct transmission lowers scene intensity and causes color distortion, while scattered ambient light further alters the scene’s appearance. These degradations complicate the restoration of underwater image quality, seriously affecting related tasks in underwater exploration, such as target detection, pattern recognition, and scene understanding.

Image restoration is fundamentally an ill-posed problem used to obtain high-quality images from degraded input images. Quality degradation may occur due to the capture process (such as noise and lens blur), post-processing (such as compression), or under non-ideal conditions (such as haze and fog). Various image restoration techniques have been proposed in the literature, which rely on prior knowledge, assumptions, and learning strategies. With the rapid advancement of deep learning algorithms, more and more image restoration technologies based on deep learning have emerged [3,4]. The success of deep learning-based methods often depends on sufficient and effective training datasets. Considering the scarcity of paired training image samples of underwater images and their corresponding ground truth (or clean) versions, this poses a significant challenge to training deep models for single under-water image restoration. Although several underwater image datasets are synthesized through physical-based models, there is still a lack of publicly accessible collections. Furthermore, most underwater image synthesis methods do not intend to reproduce atmospheric scenes, resulting in incomplete enhancement and difficulty in approaching actual underwater conditions.

In recent years, underwater image restoration methods based on single images have attracted attention due to their effectiveness and flexibility. The restoration of underwater images generally falls into two primary categories: conventional techniques and deep learning-based approaches. Chiang et al. focused on improving the quality of individual underwater images by applying image dehazing techniques, addressing attenuation discrepancies along the propagation path [5]. Li et al. employed dehazing through blue-green channels and corrected the red channel for restoring single underwater images [6]. Ancuti et al. proposed a fusion-based haze removal framework for single underwater images, which integrates two images produced by applying color compensation and white balancing to the input image [7]. For example, Li et al. developed a convolutional neural network (CNN) specifically designed for the enhancement of underwater images and relied on underwater scene priors to generate synthetic training data [8]. Additionally, Dudhane et al. introduced a deep learning network trained on this synthetic dataset [9]. Another notable contribution is [10], which presented a large-scale benchmark for underwater image enhancement. This benchmark includes reference images generated through 12 selected enhancement methods, with the optimal result for each underwater image determined by a voting process. The simultaneous enhancement and super-resolution method called Deep SESR is a generative model based on a residual network, designed to enhance image quality and improve spatial resolution during restoration [11]. Naik et al. proposed a Shallow-UWnet based on a shallow neural network model to maintain performance with fewer parameters [12]. Zhou et al. introduced a method for underwater image restoration that involves estimating depth maps and employing backscatter reduction [13]. A lightweight multi-level network called Lit-Net is proposed by Pramanik et al., focusing on multi-resolution and multi-scale image analysis for recovering underwater images [14].

Training end-to-end CNN models on paired training data is challenging due to the difficulty in acquiring a data set consisting of pairs of underwater images and their corresponding ground truth (clean) images. To address this issue, generative adversarial network (GAN) architectures [15], including the cycle-consistent adversarial network called CycleGAN [16], have been employed in the restoration of underwater images. The WaterGAN framework [17] uses GAN to generate realistic underwater images from in-air (atmospheric) images and depth information to facilitate color correction of monocular underwater images. proposed a model based on a conditional generative adversarial network for real-time enhancement of underwater images [18]. Guo et al. introduced a multi-scale dense GAN designed to enhance underwater images [19]. Cong et al. introduced PUGAN, a GAN model guided by physical models, for enhancing underwater image processing [20].

Existing methods for underwater image restoration often face two major challenges including inadequate color correction and insufficient detail reconstruction. To tackle these challenges, this paper presents a solution by proposing a dual-CycleGAN model specifically designed for single underwater image restoration, which restores an underwater image with dynamically learning guidance. In more detail, our framework utilizes one CycleGAN to learn a light field guidance image, which is generated from the target image to improve color accuracy. The second CycleGAN focuses on training a generator specifically for underwater image restoration. Both CycleGANs are trained concurrently, with the guidance image dynamically steering the output of the restoration generator, leading to more effective and efficient restoration of underwater images. Two CycleGANs work together, with one focusing on extracting useful color features from the underwater images, while the other focuses on restoration and reconstruction. This division of labor helps improve the final image quality and consistency but also ensures stability throughout the training process.

The paper is structured as follows. In Section 2, we present a background review of light field generation techniques, which are utilized in the design of our guidance image. Section 3 describes the proposed deep learning network for single underwater image restoration and outlines the problem that this study aims to solve. Section 4 presents the experimental results, and Section 5 concludes with final remarks.

2. Related Work

2.1. Light Filed Map

The light characteristics of underwater scenes are distinct from those of aerial images due to particle scattering being highly random, making it difficult to accurately simulate them with traditional physics methods. A light field preservation approach is introduced to capture and integrate the diverse underwater light field information into the target image. The background light field map is generated by the multi-scale filtering process, and this is achieved by applying a Gaussian blur at different levels of intensity [21]. This multi-scale approach helps address the limitations of using a single level of filtering, offering a more precise depiction of the background light. The method is inspired by the multi-scale Retinex technique [22]. The preserved features in underwater light field images emphasize the natural stylistic elements of various underwater scenes while omitting the detailed and structured information found in the original underwater images. In our proposed algorithm, we use the light field map as a learning guide image to iteratively guide the output of a generator to effectively enhance the color presentation of underwater image restoration. Figure 1 illustrates the light field map derived from several underwater images.

2.2. CycleGAN

Cycle-GAN, extended from GAN [15], which was originally proposed by Zhu et al. [16], aims at enhancing GAN for unsupervised image style transfer by utilizing the cycle consistency principle. Cycle-GAN has been adapted and extended to various other image generation tasks, yielding impressive results in each case [23]. CycleGAN learns a generator that generates images in one domain under given images in another domain. Many possible mappings can be inferred without using matching information, which demonstrates outstanding performance in image processing tasks. Deep learning has been shown to perform well in a variety of underwater tasks, but large datasets are difficult to obtain in underwater environments. GAN-based methods are suitable for underwater image restoration. Although generative adversarial networks have excellent results, there are also unavoidable problems [24,25,26]. One challenge lies in the difficulty of training GANs and the complexity of objectively assessing the generated output. Another issue is the potential for model collapse, which can occur if the training penalties are not properly adjusted. Our proposed method uses light field maps as guide images to stabilize model training and achieve better restoration performance.

3. Proposed Framework

The overall structure of the proposed dual-CycleGAN that includes two CycleGANs is shown in Figure 2. As shown in Figure 2, the upper section referred to as the Light Field CycleGAN, abbreviated as “LFCycleGAN” is mainly designed for guidance learning to guide an input underwater image toward its in-air version by producing the light field image of the input image, while the bottom part of Figure 2, referred to as the Restoration CycleGAN, abbreviated as “RCycleGAN”, is the major CycleGAN designed for producing the finally restored image. The two models will be jointly trained until the network converges to obtain the final generator

G_{R}

(Figure 3).

3.1. Dual-CycleGAN

The primary objective of this paper is to design a deep neural network that transforms a single underwater image

x \in X

into its corresponding in-air image,

y \in Y

.

X

and

Y

denote the collections of underwater images (the source domain) and in-air images (the target domain), respectively. To design an effective generator model, we base it on an autoencoding architecture, the image is first passed through an encoder, then fed into an 8-layer U-Net to extract features, and finally processed by a decoder. The encoder is made up of 8 convolutional layers, each with a

4 \times 4

kernel size, using a stride of 2, and both padding and output padding values set to 1. The feature maps have sizes of 64, 128, 256, and 512, respectively. The decoder is made up of 8 transposed convolutional (deconvolution) layers, each with a

4 \times 4

kernel size and a stride of 2, and both padding and output padding set to 1. The feature maps for these layers are 512, 256, 128, and 3, respectively. Finally, we can obtain an image with the same shape (

H \times W \times

3) as the input.

In the RCycleGAN architecture, the process begins with the input of a degraded underwater image

x

into the discriminator. The goal of the RCycleGAN is to transform this degraded underwater image into its in-air version. This is done through the first generator

G_{R 1}

, which takes the underwater image

x

as input and produces an intermediate in-air version

G_{R 1 (x)}

. However, since the aim is to recover the original degraded input, the next step involves converting the in-air version back into a degraded underwater image. This is accomplished using the second generator

G_{R 2}

, which takes the in-air image

G_{R 1 (x)}

as input and outputs the degraded underwater version

G_{R 2} (G_{R 1 (x)})

. The process in RCycleGAN essentially cycles the image between these two transformations: from degraded underwater to in-air and back to degraded underwater.

In contrast, the LFCycleGAN introduces an additional layer of complexity by incorporating a guidance mechanism for improved restoration. In this architecture, an additional generator

G_{L F 1}

is used to generate a light field image that serves as a guidance map. This light field image captures the illumination and other environmental characteristics of the scene, which are then used to help guide the RCycleGAN process. Specifically, the guidance light field image produced by

G_{L F 1}

is applied to the input underwater image, assisting the RCycleGAN’s first generator

G_{R 1}

in transforming the degraded underwater image into a more accurate in-air version. This guided approach improves the overall restoration performance by providing additional information to inform the generation of the in-air image. Since underwater images often exhibit color distortion compared to corresponding in-air images, we believe that the inherent light field image of the input underwater image should be consistent with its restored image. Our main idea is to use the light field map obtained by

G_{L F 1}

in LFCycleGAN of the input underwater image to guide the restoration process in RCycleGAN to produce its in-air version. As shown in Figure 2, for the generated in-air image

y^{'} = G_{R} (x)

of the input underwater image

x

in the proposed RCycleGAN, we directly get its “baseline” light field version

{y'}_{L F}

. To obtain an “enhanced” light field version of the input underwater image, it is needed to exclude possible noises, blurring effects, or color distortions within the image while preserving its inherent image color representation. To achieve this purpose, we propose a generator denoted by

G_{L F 1}

in our LFCycleGAN (circled by the red dotted line region in Figure 2) for transforming an underwater image to the corresponding enhanced light field image. To train

G_{L F}

, the LFCycleGAN mainly consists of the forward generator

G_{L F 1}

and the backward generator

G_{L F 2}

.

According to the Retinex theory, an image

S

can be decomposed into the light field information

L

and the reflectance

R

as their product

S = L \cdot R,

(1)

where

L

is the light field information and

R

is the reflectance. This equation explains how light field information influences the appearance of the image. In the LFCycleGAN, the input image

X

is processed through the light field module to get the light field information

y_{L F}^{'}

by Equation (1). The LFCycleGAN takes the original image

X

and generates the enhanced light field

L F_{e n h a n c e d}

. This process can be expressed as a functional relationship as

{L F}_{e n h a n c e d} = G_{L F 1} ({y^{'}}_{L F}) .

(2)

We hypothesize that if the enhanced light field information

{L F}_{e n h a n c e d}

accurately reflects the light conditions of the real scene, then the smaller

C (y^{'}, {L F}_{e n h a n c e d})

is, the closer the light distribution of the restored image

y^{'}

is to that of the real scene. Therefore, the RCycleGAN can leverage this comparison mechanism to improve the restoration quality of the image. This can be further expressed as:

Q (y^{'}) = g (C (y^{'}, {L F}_{e n h a n c e d})),

(3)

where

Q (y^{'})

is the quality assessment function of the image, and

g

is a function that describes the impact of the comparison result

C (y^{'}, {L F}_{e n h a n c e d})

on the restoration quality. The ultimate goal is to maximize

Q (y^{'})

, i.e., improve the restoration quality of the image. This can be achieved by minimizing the comparison function

C (y^{'}, {L F}_{e n h a n c e d})

\max_{G_{R 1}} Q (y^{'}) s u b j e c t t o \max_{G_{R 1}} C (y^{'}, {L F}_{e n h a n c e d})

(4)

This mathematical analysis describes how LFCycleGAN uses the light field information generated by the RCycleGAN as a comparison baseline for image restoration. By minimizing the difference between the restored image and the light field information, it enhances the quality of the restored image.

3.2. Loss Design in RCycleGAN and LFCycleGAN

To achieve the goal of underwater image restoration in RCycleGAN and mimic features of light field from LFCycleGAN, we include seven kinds of formulate loss functions in the proposed dual-CycleGAN and they are adversarial loss (

L_{G A N})

, cycle consistency loss (

L_{c y c})

, identity loss

(L_{i d})

, perceptual loss (

L_{p e r}

), patch-based contrast quality index(PCQI) [27] loss (

L_{p c q i}

), color balance loss (

L_{c b}

), and intermediate output (Inter) loss (

L_{i n t e r}

).

Adversarial loss

L_{G A N}

is used to determine the similarity between the produced data and the actual data distribution. Different from the adversarial loss of the original CycleGAN, in our RCycleGAN and LFCycleGAN, we not only optimize the equations for converting underwater image into in-air image,

G_{R 1}

and on-road image into underwater images,

G_{R 2}

, but also, we optimize the equations for converting underwater light field into in-air light field

G_{L F 1}

and equations for converting in-air light field into underwater light field. We also incorporate four discriminators:

D_{R}

, comprising

D_{R 1}

and

D_{R 2}

, and

D_{L F}

, comprising

D_{L F 1}

and

D_{L F 2}

, to distinguish the translated samples from those of the real samples. Therefore, the proposed adversarial loss

L_{G A N}

is expressed as:

\begin{matrix} L_{G A N} (G_{R 1}, G_{L F 1}, D_{R 1}, D_{L F 1}, X, Y) \\ = E_{y ~ p_{d a t a} (y)} [\binom{\log (D_{R 1} (y))}{+ \log (D_{L F 1} (y))}] \\ + E_{x ~ p_{d a t a} (x)} [\binom{\log (1 - D_{R 1} (G_{R 1} (x)))}{+ \log (1 - D_{L F 1} (G_{L F 1} (x)))}], \end{matrix}

(5)

where

x ~ p_{d a t a} (x)

and

y ~ p_{d a t a} (y)

denote the data distributions of the samples

{x_{i}}

and

{y_{i}}

, respectively. The expectation

E_{x ~ p_{d a t a} (x)}

represents the average value of a function over the distribution

p_{d a t a} (x)

, meaning it calculates the expected value of some operation across all possible samples

x

drawn from the real data distribution. Similarly,

E_{y ~ p_{d a t a} (y)}

represents the expected value over the distribution

p_{d a t a} (y)

, calculated across all possible samples

y

.

G_{R 1}

and

G_{L F 1}

generate images that aim to resemble domain

Y

, starting from domain

X

(i.e.,

X \to Y

).

D_{R 1}

and

D_{L F 1}

are the discriminators responsible for distinguishing the RGB and light field characteristics of the generated images from the real ones. Similarly, we define an adversarial loss for the reverse mapping (

Y \to X

), using generators

G_{R 2}

,

G_{L F 2}

and discriminators

D_{R 2}

,

D_{L F 2}

.

Cycle consistency loss,

L_{c y c}

is used to ensure that the image converted back retains the complete information and characteristics of the original image. Different from the cycle consistency loss of the original CycleGAN, we introduce the respective optimizations in underwater images and underwater light fields in the proposed cycle consistency loss,

L_{c y c}

, defined as:

\begin{matrix} L_{c y c} (G_{R 1}, G_{R 2}, G_{L F 1}, G_{L F 2}) \\ = E_{x ~ p_{d a t a} (x)} [\binom{{‖G_{R 2} (G_{R 1} (x)) - x‖}_{1}}{+ {‖G_{L F 2} (G_{L F 1} (x)) - x‖}_{1}}] \\ + E_{y ~ p_{d a t a} (y)} [\binom{{‖G_{R 1} (G_{R 2} (y)) - y‖}_{1}}{+ {‖G_{L F 1} (G_{L F 2} (y)) - y‖}_{1}}] . \end{matrix}

(6)

We utilize the L1 norm (i.e., Manhattan distance) as part of the cycle consistency loss to ensure that the mapping from one domain to another is reversible. Compared to the L2 norm (i.e., Euclidean distance), the L1 norm does not impose excessive penalties on small errors, allowing the generator to capture the features of real images more stably during the training process. Additionally, the sparsity characteristics of the L1 norm help produce images that are more interpretable and visually appealing. By focusing on the similarity between the generated and original images, our model effectively preserves the semantic information of the input images, thereby achieving higher-quality image translation.

Identity loss,

L_{i d}

, ensures that when the input image belongs to the target domain, the generator does not alter it and outputs the same image as the input. This avoids unnecessary changes and maintains consistency in the content of the image. Different from the identity loss of the original CycleGAN, we introduce the respective optimizations in RGB and light field components in the proposed identity loss

L_{i d}

, is defined as:

\begin{matrix} L_{i d} (G_{R 1}, G_{R 2}, G_{L F 1}, G_{L F 2}) \\ = E_{y ~ p_{d a t a} (y)} [\begin{matrix} {‖G_{R 1} (y) - y‖}_{1} \\ {+ ‖G_{L F 1} (y) - y‖}_{1} \end{matrix}] + E_{x ~ p_{d a t a} (x)} [\begin{matrix} {‖G_{R 2} (x) - x‖}_{1} \\ {+ ‖G_{L F 2} (x) - x‖}_{1} \end{matrix}] . \end{matrix}

(7)

Perceptual loss focuses more on the perceived quality of the image, aligning it better with how a human observer perceives image quality. To address the blurring issue in images generated by GAN, we incorporate perceptual loss,

L_{p e r}

, which is based on the VGG16 network, is defined as:

\begin{matrix} L_{p e r} (G_{R}, G_{L F}) = E_{x ~ p_{d a t a} (x)} [\begin{matrix} \sum_{i} λ_{i} {‖ϕ_{i} (x) - ϕ_{i} (G_{R} (x))‖}_{1} \\ + \sum_{i} λ_{i} {‖ϕ_{i} (x) - ϕ_{i} (G_{L F} (x))‖}_{1} \end{matrix}] \\ + E_{y ~ p_{d a t a} (y)} [\begin{matrix} \sum_{i} λ_{i} {‖ϕ_{i} (y) - ϕ_{i} (G_{R} (y))‖}_{1} \\ + \sum_{i} λ_{i} {‖ϕ_{i} (y) - ϕ_{i} (G_{L F} (y))‖}_{1} \end{matrix}], \end{matrix}

(8)

where

ϕ_{i}

and

λ_{i}

represents the feature extraction function and weight assigned from the

i

-th layer of the VGG16, respectively.

To refine the color quality of the restored image

G_{R} (x)

from the input

x

in accordance with human visual perception, we incorporate the PCQI loss function, which is defined as:

L_{p c q i} (G_{R 2} (G_{R 1} (x)), x) = e^{- 1 \times P C Q I (G_{R 2} (G_{R 1} (x)), x)},

(9)

where the PCQI function is defined in [27].

In addition, the color balance loss function

L_{c b}

is used to adjust and optimize the color distribution in images to achieve a more balanced or desired color representation and is defined as:

L_{c b} (y^{'}) = \sum_{C \in {R, G, B}} |\frac{1}{H \times W} \sum_{i, j = 0}^{H \times W} y_{C}^{'} (i, j) - M_{y^{'}}|,

(10)

where

y^{'}

is the output map of

G_{R 1}

,

y_{C}^{'}

denotes the color component of

C \in {R, G, B}

,

H

and

W

are the rows and hights in

y^{'}

,

M_{y^{'}}

is the pixel mean across the three color channels of

y^{'}

, and

i

and

j

are row and column coordinates of the

y^{'}

image. Finally, we use intermedia output loss

L_{i n t e r}

to assess the color similarity between the light field image produced by

G_{L F 1} (x)

and the image generated by

G_{R 1} (x)

, as defined by

L_{i n t e r} (x) = \sum_{i, j = 0}^{H \times W} |G_{R 1} {(x)}_{(i, j)} - G_{L F 1} {(x)}_{(i, j)}| .

(11)

The loss functions of RCycleGAN,

L_{L O S S}^{R C y c l e G A N}

, and LFCycleGAN,

L_{L O S S}^{L F C y c l e G A N}

are defined as:

\begin{matrix} L_{L O S S}^{R C y c l e G A N} = L_{G A N}^{R C y c l e G A N} + L_{c y c}^{R C y c l e G A N} + L_{i d}^{R C y c l e G A N} + L_{p e r}^{R C y c l e G A N} + L_{p c q i}^{R C y c l e G A N} \\ + L_{c b}^{R C y c l e G A N} + L_{i n t e r}, \end{matrix}

(12)

L_{L O S S}^{L F C y c l e G A N} = L_{G A N}^{L F C y c l e G A N} + L_{c y c}^{L F C y c l e G A N} + L_{i d}^{L F C y c l e G A N} + L_{p e r}^{L F C y c l e G A N} .

(13)

In the overall training process for jointly training our RCycleGAN and LFCycleGAN, the complete loss function employed to train the proposed model dual-CycleGAN model is defined as:

L_{L o s s}^{T o t a l l} = L_{L o s s}^{R G B C y c l e G A N} + L_{L o s s}^{G r a y C y c l e G A N} .

(14)

In the training process, the perceptual loss is iteratively calculated by

L_{p e r} (y^{'}, y_{L F}^{'})

, as depicted in Figure 2. That is, minimizing this perceptual loss can be viewed as the linkage between the two CycleGANs. During the overall network training process, the iteratively updated light field image

y_{L F}^{'}

is used as the guidance to guide the restoration of the input underwater image.

4. Experimental Results

To train the dual-CycleGAN for underwater image restoration with unpaired images, four datasets of different domains are used. Three of these datasets are widely recognized for underwater image augmentation, including UFO-120 [11], EUVP [28], and UIEB [10]. The DIV2K (DIVerse 2K) dataset [29] is used as the in-air domain dataset. The proposed method was implemented using PyTorch version 2.0.1 with the Python programming language. The model optimization was performed using the Adam optimizer [30], with the initial learning rate configured at 0.00002. The training input patch size was set to 256 × 256, and the model was trained for four hundred epochs. This section is organized under subheadings, offering a succinct and clear explanation of the experimental results, their analysis, and the conclusions derived from the experiments.

4.1. Quantitative Comparisons

To assess the performance of our method quantitatively, three well-known quantitative metrics are used and there are PSNR (peak signal-to-noise ratio), SSIM (structural similarity index measure), and UIQM (underwater image quality measurement) [31], which leverages multiple factors influencing underwater image quality. It integrates three components: underwater image color measurement (UICM), underwater image sharpness measurement (UISM), and underwater image contrast measurement (UIConM) to provide a comprehensive evaluation. Typically, a higher UIQM value indicates better image quality. To assess the performance, we compare our method against five state-of-the-art deep learning-based underwater image enhancement techniques, which are Deep SESR [11], Shallow-UWnet [12], UGAN [18], WaterNet [17], and PUGAN [20]. Among the five comparison methods, the first two are end-to-end model architectures that require paired training data, while the last three are GAN based and do not require paired training data. As revealed by Table 1, our method exhibits better or comparable quantitative performances. Our proposed method achieves a PSNR that is only 1.89 lower than Deep SESR on the UFO-120 dataset, while our GFLOPs are just 12.4% of those of Deep SESR. Additionally, the Deep SESR method requires paired data for training. Compared to PUGAN, our method demonstrates superior performance compared to others, achieving higher PSNR and SSIM values while achieving comparable UIQM. Also, our method has only 25% of the computational cost of PUGAN, which represents a significant improvement in computational efficiency.

Table 2 provides a detailed comparison of the GFLOPs (giga floating point operations) and the number of network parameters required by the evaluated methods. Our model stands out by demonstrating a significant reduction in both GFLOPs and network parameters when compared to the other methods. This results in a more computationally efficient model that retains or even surpasses the performance of state-of-the-art methods in terms of underwater image restoration quality. By achieving a lighter model complexity without compromising performance, our approach not only addresses computational resource limitations but also demonstrates scalability, making it highly suitable for practical applications with constrained resources. This efficiency, alongside its superior restoration capability, is one of the most notable strengths of our proposed method, as it ensures faster processing times and lower memory usage, thus making it a highly practical solution for real-world deployment.

4.2. Qualitative Comparisons

As illustrated in Figure 4, our proposed method significantly outperforms state-of-the-art approaches, particularly in preserving image details and enhancing color representation. Moreover, benefited from the proposed guidance learning, our method recovers better color representation than those recovered by other methods.

4.3. Ablation Study

To assess the contribution of each component in the proposed model, we performed the ablation studies, shown as follows. We remove the guidance learning component and loss function from the proposed framework to evaluate the effectiveness of the learned light field guidance,

L_{i n t e r}

,

L_{p c q i}

, and

L_{c b}

. The results of the ablation study are presented in Table 3. From the findings, it is evident that removing LFCycleGAN from the proposed method results in a decline in image quality, including blur, loss of structure, and color distortion. As shown in Table 3, the complete method demonstrates the best performance in both quantitative and qualitative restoration. All components, including the loss functions and the guidance learning component, are essential for optimal underwater image restoration. Removing any of these components resulted in a significant degradation in image quality, highlighting their necessity in achieving the best restoration performance.

4.4. Discussion

Compared with the existing literature, the proposed method outperforms GAN-based models, such as WaterNet and PUGAN, in quantitative metrics like PSNR, SSIM, and UIQM, while maintaining lower computational complexity. Importantly, the model’s reliance on unpaired training data overcomes a major challenge in applying underwater restoration models to real-world scenarios, highlighting its methodological flexibility and applicability. Compared to traditional physics-based approaches or shallow networks like Shallow-UWnet, the dual-CycleGAN leverages light field guidance to enhance color authenticity and detail restoration. This improvement likely stems from the light field module’s ability to capture additional information related to light propagation, effectively mitigating the scattering and absorption effects prevalent in underwater environments. Additionally, the CycleGAN’s adversarial learning framework resolves the data pairing challenges inherent in underwater scenarios, offering greater stability compared to conventional GAN models.

However, several limitations of this study warrant further investigation. First, while the light field guidance module significantly improves restoration, its reliability in highly turbid or dynamic water bodies requires further validation. Second, the model’s generalizability across diverse underwater scenarios may benefit from additional real-world training data. Furthermore, although the framework’s computational demands are relatively low, its real-time application in resource-constrained environments, such as underwater drones, remains an area for optimization. Future research directions include designing more efficient and generalizable light field generation methods to handle diverse underwater conditions. Incorporating advanced deep learning techniques, such as self-attention mechanisms or variational autoencoders, could further enhance restoration detail and stability. Additionally, developing multi-frame image restoration techniques for dynamic underwater scenarios presents another promising avenue.

5. Conclusions

This paper proposes a dual-CycleGAN architecture, extending and refining concepts established in previous works on underwater image enhancement and restoration. The model employs a dynamic guided learning approach, utilizing one CycleGAN to generate light field information as a reference for the second CycleGAN, which performs the actual restoration. This innovative dual-CycleGAN structure addresses persistent challenges in underwater image processing, such as color attenuation, contrast loss, and noise, which are often exacerbated by the unique optical properties of water. Drawing from existing literature, most of the methods often rely on paired datasets or heuristic algorithms; our approach circumvents the need for paired training data, thus enhancing flexibility and applicability across diverse underwater environments. By transforming underwater images to their in-air counterparts, our model achieves substantial improvements in color correction, detail preservation, and texture recovery, as validated by comprehensive experimental results. The proposed model not only outperforms the existing state-of-the-art methods in both quantitative metrics and qualitative visual assessments but also demonstrates comparable or superior performance in recovering crucial image features. Moreover, its lower computational complexity facilitates faster processing times, making it highly scalable and suitable for real-time applications, such as AUVs and remote sensing systems. This advancement in computational efficiency positions our method as a promising solution for large-scale deployment in practical scenarios where computational resources may be limited.

Author Contributions

Conceptualization, C.-H.Y.; Methodology, C.-H.Y.; Software, Y.-Y.L.; Validation, Y.-Y.L.; Formal analysis, C.-H.Y.; Investigation, Y.-Y.L.; Data curation, Y.-Y.L.; Writing—original draft, Y.-Y.L.; Writing—review & editing, C.-H.Y. and W.-J.H.; Visualization, Y.-Y.L.; Supervision, C.-H.Y. and W.-J.H.; Project administration, C.-H.Y.; Funding acquisition, W.-J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science and Technology Council, grant number MOST 110-2221-E-003-005-MY3, MOST 111-2221-E-003-007-MY3, and NSTC 113-2221-E-003-018-MY3.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in [10,11,28,29].

Conflicts of Interest

The authors declare no conflict of interest.

References

Peng, L.; Zhu, C.; Bian, L. U-Shape transformer for underwater image enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar] [CrossRef]
Liu, X.; Chen, Z.; Xu, Z.; Zheng, Z.; Ma, F.; Wang, Y. Enhancement of underwater images through parallel fusion of transformer and CNN. J. Mar. Sci. Eng. 2024, 12, 1467. [Google Scholar] [CrossRef]
Yeh, C.-H.; Lin, C.-H.; Lin, M.-H.; Kang, L.-W.; Huang, C.-H.; Chen, M.-J. Deep learning-based compressed image artifacts reduction based on multi-scale image fusion. Inf. Fusion 2021, 67, 195–207. [Google Scholar] [CrossRef]
Yeh, C.-H.; Lai, Y.-W.; Lin, Y.-Y.; Chen, M.-J.; Wang, C.-C. Underwater image enhancement based on light field guided rendering network. J. Mar. Sci. Eng. 2024, 12, 1217. [Google Scholar] [CrossRef]
Chiang, Y.-W.; Chen, Y.-C. Underwater image enhancement by wavelength compensation and dehazing. IEEE Trans. Image Process. 2012, 21, 1756–1769. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Guo, J.; Pang, Y.; Chen, S.; Wang, J. Single underwater image restoration by blue-green channels dehazing and red channel correction. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, 20–25 March 2016; pp. 1731–1735. [Google Scholar]
Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2018, 27, 379–393. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
Dudhane, A.; Hambarde, P.; Patil, P.W.; Murala, S. Deep underwater image restoration and beyond. IEEE Signal Process. Lett. 2020, 27, 675–679. [Google Scholar] [CrossRef]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
Islam, M.J.; Luo, P.; Sattar, J. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv 2020, arXiv:2002.01155. [Google Scholar]
Naik, A.; Swarnakar, A.; Mittal, K. Shallow-uwnet: Compressed model for underwater image enhancement (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021. [Google Scholar]
Zhou, J.; Liu, Q.; Jiang, Q.; Ren, W.; Lam, K.-M.; Zhang, W. Underwater camera: Improving visual perception via adaptive dark pixel prior and color correction. Int. J. Comput. Vis. 2023, 1–19. [Google Scholar] [CrossRef]
Pramanick, A.; Sur, A.; Saradhi, V.V. Harnessing multi-resolution and multi-scale attention for underwater image restoration. arXiv 2024, arXiv:2408.09912. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Bengio, Y. Generative adversarial nets. In Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
Li, J.; Skinner, K.A.; Eustice, R.M.; Johnson-Roberson, M. WaterGAN: Unsupervised generative network to enable real-time color correction of monocular underwater images. IEEE Robot. Autom. Lett. 2018, 3, 387–394. [Google Scholar] [CrossRef]
Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial network. In Proceedings of the IEEE International Conference on Robotics and Automation, Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
Guo, Y.; Li, H.; Zhuang, P. Underwater image enhancement using a multiscale dense generative adversarial network. IEEE J. Ocean. Eng. 2020, 45, 862–870. [Google Scholar] [CrossRef]
Cong, R.; Yang, W.; Zhang, W.; Li, C.; Guo, C.-L.; Huang, Q.; Kwong, S. PUGAN: Physical model-guided underwater im-age enhancement using GAN with dual-discriminators. IEEE Trans. Image Process 2023, 32, 4472–4485. [Google Scholar] [CrossRef] [PubMed]
Ye, T.; Chen, S.; Liu, Y.; Ye, Y.; Chen, E.; Li, Y. Underwater light field retention: Neural rendering for underwater imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 488–497. [Google Scholar]
Rahman, Z.; Jobson, D.J.; Woodell, G.A. Multi-scale retinex for color image enhancement. In Proceedings of the IEEE International Conference on Image Processing, Lausanne, Switzerland, 16–19 September 1996; Volume 3, pp. 1003–1006. [Google Scholar]
Chang, B.; Zhang, Q.; Pan, S.; Meng, L. Generating handwritten chinese characters using cyclegan. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 199–207. [Google Scholar]
Arjovsky, M.; Bottou, L. Towards principled methods for training generative adversarial networks. arXiv 2017, arXiv:1701.04862. [Google Scholar]
Srivastava, A.; Valkov, L.; Russell, C.; Gutmann, M.-U.; Sutton, C. Veegan: Reducing mode collapse in gans using implicit variational learning. In Advances in Neural Information Processing Systems 30; Long Beach Convention & Entertainment Center: Long Beach, CA, USA, 2017. [Google Scholar]
Li, W.; Fan, L.; Wang, Z.; Ma, C.; Cui, X. Tackling mode collapse in multi-generator gans with orthogonal vectors. Pattern Recognit. 2021, 110, 107646. [Google Scholar] [CrossRef]
Wang, S.; Ma, K.; Yeganeh, H.; Wang, Z.; Li, W. A patch-structure representation method for quality assessment of contrast changed images. IEEE Signal Process. Lett. 2015, 22, 2387–2390. [Google Scholar] [CrossRef]
Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Agustsson, E.; Timofte, R. NTIRE 2017 Challenge on single image super-resolution: Dataset and study. In Proceeding of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
Kinga, D.; Adam, J.B. A method for stochastic optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
Panetta, K.; Gao, C.; Agaian, S. Human-visual-system-inspired underwater image quality measures. IEEE J. Ocean. Eng. 2015, 41, 541–551. [Google Scholar] [CrossRef]

Figure 1. Illustration of the output from the light field module.

Figure 2. The proposed unsupervised adversarial learning framework consisting of dual-CycleGAN with unpaired training images.

Figure 3. The architecture of the generator in the proposed deep single underwater image restoration network.

Figure 4. Qualitative evaluation results on the UFO-120 dataset.

Table 1. Quantitative performance assessments on UFO-120, EUVP, and UIEB datasets.

Method	UFO-120			EUVP			UIEB
Method	PSNR	SSIM	UIQM	PSNR	SSIM	UIQM	PSNR	SSIM	UIQM
Deep SESR	27.15	0.84	3.13	25.25	0.75	2.98	19.26	0.73	2.97
Shallow-UWnet	25.20	0.73	2.85	27.39	0.83	2.98	18.99	0.67	2.77
UGAN	23.45	0.80	3.04	23.67	0.67	2.70	20.68	0.84	3.17
WaterNet	22.46	0.79	2.83	20.14	0.68	2.55	19.11	0.80	3.04
PUGAN	23.70	0.82	2.85	24.05	0.74	2.94	21.67	0.78	3.28
Dual-CycleGAN	25.23	0.84	3.06	27.39	0.91	2.97	22.12	0.85	3.26

Table 2. Complexity evaluations for difference methods.

Method	FLOPs	Parameters
Deep SESR	146.10 G	2.46 M
Shallow-UWnet	21.63 G	0.22 M
UGAN	38.97 G	57.17 M
WaterNet	193.70 G	24.81 M
PUGAN	72.05 G	95.66 M
Dual-CycleGAN	18.15 G	54.41 M

Table 3. Quantitative results of ablation studies.

	UIEB
	PSNR	SSIM	UIQM
Complete Dual-CycleGAN	22.12	0.85	3.26
(w/o) $L_{i n t e r}$	20.81	0.82	2.99
(w/o) $L_{p c q i}$	20.75	0.84	2.96
(w/o) $L_{c b}$	20.8	0.84	2.96
(w/o) $L_{p c q i} & L_{c b}$	20.88	0.84	2.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, Y.-Y.; Huang, W.-J.; Yeh, C.-H. Dual-CycleGANs with Dynamic Guidance for Robust Underwater Image Restoration. J. Mar. Sci. Eng. 2025, 13, 231. https://doi.org/10.3390/jmse13020231

AMA Style

Lin Y-Y, Huang W-J, Yeh C-H. Dual-CycleGANs with Dynamic Guidance for Robust Underwater Image Restoration. Journal of Marine Science and Engineering. 2025; 13(2):231. https://doi.org/10.3390/jmse13020231

Chicago/Turabian Style

Lin, Yu-Yang, Wan-Jen Huang, and Chia-Hung Yeh. 2025. "Dual-CycleGANs with Dynamic Guidance for Robust Underwater Image Restoration" Journal of Marine Science and Engineering 13, no. 2: 231. https://doi.org/10.3390/jmse13020231

APA Style

Lin, Y.-Y., Huang, W.-J., & Yeh, C.-H. (2025). Dual-CycleGANs with Dynamic Guidance for Robust Underwater Image Restoration. Journal of Marine Science and Engineering, 13(2), 231. https://doi.org/10.3390/jmse13020231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dual-CycleGANs with Dynamic Guidance for Robust Underwater Image Restoration

Abstract

1. Introduction

2. Related Work

2.1. Light Filed Map

2.2. CycleGAN

3. Proposed Framework

3.1. Dual-CycleGAN

3.2. Loss Design in RCycleGAN and LFCycleGAN

4. Experimental Results

4.1. Quantitative Comparisons

4.2. Qualitative Comparisons

4.3. Ablation Study

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI