Unpaired Underwater Image Enhancement Based on CycleGAN

: Underwater image enhancement recovers degraded underwater images to produce corresponding clear images. Image enhancement methods based on deep learning usually use paired data to train the model, while such paired data, e.g., the degraded images and the corresponding clear images, are difﬁcult to capture simultaneously in the underwater environment. In addition, how to retain the detailed information well in the enhanced image is another critical problem. To solve such issues, we propose a novel unpaired underwater image enhancement method via a cycle generative adversarial network (UW-CycleGAN) to recover the degraded underwater images. Our proposed UW-CycleGAN model includes three main modules: (1) A content loss regularizer is adopted into the generator in CycleGAN, which constrains the detailed information existing in one degraded image to remain in the corresponding generated clear image; (2) A blur-promoting adversarial loss regularizer is introduced into the discriminator to reduce the blur and noise in the generated clear images; (3) We add the DenseNet block to the generator to retain more information of each feature map in the training stage. Finally, experimental results on two unpaired underwater image datasets produced satisfactory performance compared to the state-of-the-art image enhancement methods, which proves the effectiveness of the proposed model.


Introduction
With the rapid development of marine resources, underwater robots are necessary to replace humans to work in the complex underwater environment. The underwater robot mainly relies on its visual ability to achieve several tasks, such as object recognition, localization, 3D reconstruction, and route guidance. Due to the light absorption and scattering properties in water, underwater images usually contain color distortion and low contrast. Therefore, how to enhance underwater images becomes an urgent problem for practical underwater applications [1].
Over recent decades, underwater image enhancement has attracted an increasing amount of attention. Wang et al. [2] divided underwater image enhancement methods into three main categories: spatial-domain methods, transform-domain methods, and the popular deep learning-based methods.
The spatial-domain methods usually tune the grayscale range of one image to enhance its contrast and reduce the color distortion [3]. Traditional methods include gray world [4], white balance [4], automatic white balance [5], histogram equalization, adaptive histogram equalization [6], contrast limited adaptive histogram equalization [7], and its variations. Although these methods have had success in enhancing degraded images, they still have significant limitations for severely degraded underwater images, which introduce red artifacts and noise.
The transform-domain methods transfer an underwater image to the frequency domain, then, they enhance the image contrast by amplifying the high-frequency information and suppressing the low-frequency information. Classic transform-domain methods include a low-pass filter [8], high-pass filter [9], homomorphic filter [10], and wavelet transform [11][12][13]. Although these methods decrease the noise and enhance the contrast of an underwater image, the performance of color correction is bad.
The above two categories of methods just enhance each underwater image independently, without learning procedures. Deep learning-based methods can exploit an end-to-end automatic training mechanism to enhance underwater images, which learns the intrinsic underwater features from a set of underwater images. In [14], they replaced the handcrafted features with the nonparametric deep features for the image representation. Other researchers [15,16] introduced the convolutional neural network (CNN) into underwater image enhancement applications. A residual CNN was further proposed in [17]. Furthermore, [18] provided a deep pixel-to-pixel network by designing an encoding-decoding framework. In [19], they utilized domain adversarial learning to enhance underwater images. In [20,21], they improved the quality of visual underwater scenes using Generative Adversarial Networks (GAN), and then [22] proposed a fusion adversarial network. Finally, Hu et al. [23] introduced the natural image quality evaluation to a supervised generative adversarial network.
These methods improved the visual effect and quality of underwater images, but they require a large amount of paired data, i.e., each degraded image has a corresponding clear image. Paired data is difficult to obtain in an underwater environment, which also causes difficulty for underwater image enhancement. Therefore, researchers usually use synthetic data to construct paired data. Figure 1 shows some samples of unpaired underwater images. To solve the problem of deep learning-based underwater image enhancement methods requiring paired data, we propose a novel underwater cycle generative adversarial network (UW-CycleGAN) for image enhancement, which just needs one set of unpaired underwater degraded images and clear images to train the proposed model. A brief illustration of UW-CycleGAN is shown in Figure 2.
The main contributions of this paper are briefly summarized as follows: • We introduce a content loss regularizer into the generator in CycleGAN, which keeps more detailed information in the corresponding generated clear image. This strategy is different from CartoonGAN [24]; • We add a blur-promoting adversarial loss regularizer into the discriminator in Cycle-GAN, which reduces the effects of blur and noise and enhances the image clarity; • We exploit the improved DenseNet Block in the generator to strengthen the forward transfer of feature maps, so that every feature map can be utilized; • We test our proposed UW-CycleGAN on different types of underwater images and obtain a satisfactory performance. We develop an end-to-end underwater image enhancement system. The structure of this paper is organized as follows: The necessary knowledge about underwater image enhancement is reviewed in Section 2. An improved underwater CycleGAN model for unpaired data, so-called UW-CycleGAN, is proposed in Section 3. The experimental results on two underwater image datasets are illustrated in Section 4. Finally, we conclude this paper in Section 5.

Underwater Image Enhancement
As we mentioned above, capturing paired data in the underwater environment is difficult. To study the intrinsic relationship between the degraded image and the corresponding clear image, some researchers designed a simplified physical model according to the refraction, scattering, and attenuation properties of light [25], where I λ (x) denotes the degraded image captured by underwater cameras, J λ (x) means the corresponding restored clear image, t λ (x) is the medium transmission map, B λ represents the well-proportioned background light, and λ gives the light wavelength. In order to enhance J λ (x), the key problem of the traditional physical models is to estimate t λ (x) and B λ , since only image I λ (x) is known. Although this physical model does not need paired data, some assumptions and prior knowledge are required to evaluate t λ (x) and B λ , which severely limits the practical applications.
In recent years, many researchers have applied CNN to process underwater images and achieved good results in underwater image enhancement applications. However, CNN based methods need paired data to train their network models, and researchers have to use synthetic data instead. Fortunately, CycleGAN can utilize unpaired data for the conversion of image style, which offers a new direction for underwater image enhancement.

Underwater CycleGAN (UW-CycleGAN)
Deep learning-based image enhancement methods usually need paired underwater images to train network models. To solve this problem, we propose a CycleGAN-based underwater image enhancement method (UW-CycleGAN), which can utilize unpaired data to train its model.
Suppose we have the unpaired degraded image set X and clear image set Y. One complete procedure of UW-CycleGAN is shown in Figure 2: (1) The mapping function G generates the clear image G(x) from x ∈ X.
(2) Another mapping function F reconstructs the degraded image x by G(x) → F(G(x)).
(3) Discriminator D Y judges whether the generated image G(x) and clear image y derive from the same distribution.
In addition, y → F(y) → G(F(y)) and D X are the similar inverse processes, which ensures the model invertibility. We display some samples of x, G(x), and F(G(x)) in Figure 3, respectively.

Loss Function
Zhu et al. [26] proposed a CycleGAN framework to achieve unpaired image-to-image translation, which consisted of adversarial loss and cycle consistency loss.
The adversarial loss restricts the generated image G(x) and F(y) to derive from the same distribution with the clear image y and degraded image x, respectively: where P data (x) and P data (y) represent the distributions of underwater degraded images and clear images, respectively.
The cycle consistency loss ensures the reconstructed images are similar to the input images, where · 1 means the 1 -norm.

Content Loss
The cycle consistency loss (3) only minimizes the difference between the input image x (or y) and its reconstructed image F(G(x)) (or G(F(y))), which ignores whether the generated image G(x) (or F(y)) is visually similar to x (or y). Therefore, we add the content loss regularizer measured by 1 norm, However, the above function makes the generated image G(x) (or F(y)) too similar to the input image x (or y) due to the element-wise subtraction, retaining almost all the information of the input image x (or y). We want to keep the detailed information of the input image x (or y) unchanged, meanwhile, calibrating the image color to enhance the visual quality of the generated image G(x) (or F(y)).
In order to achieve this purpose, a VGG19 pretraining network is used to extract the conv4_4 layer feature maps of the input and generated images. We also employ 1 -norm to measure the content loss, since 1 -norm is more robust to noise and outliers, which can recover the underwater image details well. Thus, the new content loss regularizer is rewritten as, where VGG(·) denotes the VGG19 feature map in this paper.

Blur-Promoting Adversarial Loss
Although the content of the image generated by G is consistent with its corresponding input image, a large amount of noise and blur are also generated at the same time, which effects the visual performance. We need to make the discriminator robust to blur. Therefore, a blur dataset Z is constructed by adding Gaussian blur to the clear image dataset Y. The discriminator D Y should judge z ∈ Z as the fake image and y as the real image, so that the images generated by generator G can be clearer. With this idea, the adversarial loss can be rewritten as follows: where P data (z) represents the distribution of underwater clear images with Gaussian blur.

Full Loss Funtion
Finally, we construct the full loss function of UW-CycleGAN as follows, where the generator G, F and the discriminator D X , D Y can be updated by It should be noted, we equally treat each loss regularizer and never set several hyperparameters to tune experimental results.

Network Architectures
As illustrated in Figure 2, our UW-CycleGAN network architecture consists of two main generators and two discriminators.
Generators G and F have the same network structure with different parameters, which directly adopts the encoder-decoder structure in this paper. Firstly, one flat convolution stage with convolution kernel size of 7 × 7 and step length of 1 and two down-convolution stages with convolution kernel size of 3 × 3 and step length of 2 are exploited to spatially compress and encode the input images. Then, three DenseNet blocks are used to transfer the feature maps and preserve their high-level features. The detailed structure of the DenseNet block is shown in Figure 4b [27]. In the T layer, with 128 convolution kernels size of 1 × 1 and step length of 1, we reduce the size of feature maps from 64 × 64 × 256 to 64 × 64 × 128. In the L1 layer, L1 includes 64 convolution kernels size of 1 × 1 and step length 1 and 16 convolution kernels size of 3 × 3 and step length 1, so the size of the output feature maps is 64 × 64 × 16. Then, we concatenate the output of the T layer and the output of the L1 layer to obtain a feature map size of 64 × 64 × 144 as the inputs for the L2 layer. Similarly, the outputs of the L1 and L2 layers are concatenated as the input for the L3 layer. After several similar operations, we obtain the output size of 64 × 64 × 256 in the L8 layer. Finally, the generated clear images are reconstructed by two up-convolutions, which contain one convolution kernel size of 3 × 3 and step length 1/2 and one final convolution kernel size of 7 × 7 and step length 1. Discriminators D X and D Y also contain the same network structure with different parameters. In the discriminator network, a Markov discriminator, which comprises five full convolution layers outputs a "0-1" indicator matrix size of 70 × 70 and then calculates the mean value of all elements in the matrix as the real/fake output at last.

Experiment and Evaluation
In this section, UW-CycleGAN is tested on two real-world unpaired underwater image datasets and is compared with several classic image enhancement methods to evaluate the superiority of UW-CycleGAN. Finally, the ablation experiments verify the importance of each component in our UW-CycleGAN model.

Datasets and Metrics
URPC2019 (http://www.cnurpc.org/a/js/2019/0805/125.html accessed on 17 December 2021) contains over 4000 underwater images [28], and we took a subset in this paper. We chose 670 underwater images as the training set, in which 335 degraded images belonged to training set X, and the remaining 335 clear images belonged to training set Y. There was no paired relationship between training sets X and Y. The Gaussian blur set E was formed by performing a Gaussian blur operation on the previous training set Y. The testing set consisted of 70 degraded images. We set the color image size of both the training set and the test set as 256 × 256 × 3.
EUVP (http://irvlab.cs.umn.edu/resources/euvp-dataset accessed on 17 December 2021) contains over 6446 underwater human images. We choose 405 degraded images as the training set X and 405 clear images as the training set Y. The testing set consisted of 200 degraded images. The color image size was also set as 256 × 256 × 3.
To fairly assess these image enhancement methods from different aspects, we selected three standard metrics, which were average gradient (AG) [29], information entropy (IE) [30], and underwater image quality measure (UIQM) [31]. Lower values of IE reflect better performance, while AG and UIQM are the opposite. The entire network was coded in the pytorch framework and implemented on a workstation with 8 Nvidia Tesla P100 GPUs.

Experimental Assessment
If only the clear image set Y was used to train the discriminator, the generated image usually had an obvious blur. To solve this problem, we exploited the Gaussian blur set Z to train the discriminator, and the generator could output clear images well.
We compared the proposed model with three traditional underwater image enhancement methods:  [26]. Figure 5 displays the enhancement results of five underwater scene images on the URPC dataset. Deunderwater recovered the image color to a certain extent, but the contrast and details of the generated image were not good. Although HL restored the image detailed information well, the generated image had the problems of poor contrast and color distortion. UCM and FUnIE-GAN-UP recovered the image color and detail well, while the contrast was relatively bad. CartoonGAN and CycleGAN improved the image color and contrast excellently, but blur existed in the image detail. Our UW-CycleGAN obtained good performance in image contrast, color, and detail. Figure 6 shows the vision enhancement results of underwater human images on the EUVP dataset. Obviously, Deunderwater, HL and UCM had color distortion problems. FUnIE-GAN-UP, CatroonGAN, and CycleGAN performed reasonably well, and their enhanced images were comparable to our UW-CycleGAN, but UW-CycleGAN was still the best in terms of the image detail and clarity.   Figures 5 and 6 are obviously unsatisfactory. Deep learning-based methods work relatively steadily in both value and visulization performance. Our UW-CycleGAN obtained the best experimental results measured by all objective evaluation metrics.

Ablation Experiments
We designed a set of ablation experiments to further analyze the importance of each module in our UW-CycleGAN method, and the experimental results are shown in Figure 7 and Table 3. We introduce each ablation experiment as follows: (i) w/o L Content : We removed the content loss from UW-CycleGAN, which led to serious detail loss and image blur in the generated images.
(ii) w/o L Blur : Without the blur-adversarial loss, the generated images remain intact but had slight blurring.
(iii) G ResNet : DenseNet block in UW-CycleGAN was replaced by ResNet-block. Although the subjective difference between G ResNet and our UW-CycleGAN is not obvious in Figure 7, the objective evaluation results in Table 3 verify the advantages of DenseNet-block.
In the above, the effect of each component in our UW-CycleGAN was verified.

Conclusions
Underwater vehicle vision has important research value in underwater applications. We proposed an end-to-end underwater image enhancement method for unpaired data (UW-CycleGAN). To be specific, we firstly added a content loss regularizer to the generator in traditional CycleGAN through a VGG19 pretraining network. Then, a blur-promoting adversarial loss regularizer was adopted in the discriminator. Finally, we replaced the commonly used ResNet-block in CycleGAN with the DenseNet block in the coding layer. Compared with several image enhancement methods, our proposed methods restored the underwater degraded images with blue-green background and blur into clear images effectively. We also performed ablation experiments to verify the importance of each module in UW-CycleGAN.