Overwater Image Dehazing via Cycle-Consistent Generative Adversarial Network

. In contrast to images taken on land scenes, images taken over water are more prone to degradation due to the influence of the haze. However, existing image dehazing methods are mainly developed for land scenes and perform poorly when applied to overwater images. To address this problem, we collect the first overwater image dehazing dataset and propose an OverWater Image Dehazing GAN (OWI-DehazeGAN). Due to the difficulties of collecting paired hazy and clean images, the dataset is composed of unpaired hazy and clean images taken over water. The proposed OWI-DehazeGAN learns the underlying style mapping between hazy and clean images in an encoder-decoder framework, which is supervised by a forward-backward translation consistency loss for self-supervision and a perceptual loss for content preservation. In addition to qualitative evaluation, we design an image quality assessment network to rank the dehazed images. Experimental results on both real and synthetic test data demonstrate that the proposed method performs superiorly against several state-of-the-art land dehazing methods.


Introduction
Images of overwater scenes play an important role in human image galleries.However, these images are prone to degradation due to thick mist that are often appearing over lakes, rivers, and seas.Although numerous image dehazing methods have been developed [1][2][3][4][5], our experiments show that these methods perform far from satisfying since they are originally designed for land scene images, of which the data distribution differs significantly.
Hazy images are usually modeled as I(x) = J(x)t(x) + A(1 − t(x)), where I(x) and J(x) are the observed hazy image and the scene, respectively [6,7].
The symbol x denotes a pixel index, and A is the global atmospheric light.t(•) denotes the transmission map, which describes the portion of light that is not scattered and reaches the camera sensors.When the haze is homogeneous, t(•) can be defined as: t(x) = e −βd(x) , where β is the scattering coefficient and d(x) is the distance between objects and the camera.Existing methods fall into two categories according to the type of features they used: methods based on hand-crafted features [1,2,[9][10][11][12] or methods based CNN features [3-5, 8, 13-15].The former generally focuses on estimating the global atmospheric light intensity and the transmission map, and hence their performance are susceptible to estimation errors of A(•) or t(•).To alleviate these limitations, the latter, which is based on CNNs [16] or Generative Adversial Networks (GANs) [17], aims to directly estimate clean images in a data-driven scheme.Although promising dehazing results have been achieved, existing CNNor GAN-based methods perform not well on overwater images as shown in Figure 1b and 1c.Another issue is that existing image dehazing datasets [18][19][20] are dominated by land scenes.
In this paper, we address both the above mentioned issues.First, we construct a new dataset, named OverwaterHaze, specially for dehazing overwater images.Since collecting paired hazy and clear images is difficult and expensive, the Over-waterHaze dataset is composed of unpaired hazy and clean overwater images.Second, we propose an OverWater Image Dehazing GAN (OWI-DehazeGAN) inspired by CycleGAN [21] to directly recover clean images.Although the unpaired character challenges most of existing methods, we demonstrate that satisfying dehazing performance could be achieved by the proposed method.
Our contributions are summarized as follows: -We create the first overwater image dehazing dataset, and we hope this dataset is able to facilitate the research in this field.-We propose an OWI-DehazeGAN to dehaze overwater images, which is based on but performs superior to Cycle-GAN.The proposed network is able to utilize unpaired training data and preserve image details simultaneously.-We propose an image quality assessment network to rank the generated dehazed images, which facilitates the comparison of different algorithms.

Methods Based on Hand-crafted Features
Many efforts have been devoted to image dehazing in the past decades based on hand-crafted features [1,2,[9][10][11][12]. Tan et al. [9] propose a contrast maximizing approach using markov random fields (MRF) based on the observation that clean images have higher contrast than hazy ones.In [10] Tarel et al. propose a fast dehazing method by combining atmospheric veil inference, image restoration and smoothing tone mapping.Later, He et al. [11] estimate the transmission map by utilizing dark-channel prior (DCP).Meng et al. [12] explore the inherent boundary constraint on the transmission function.In order to recover depth information, Zhu et al. [1] propose a color attenuation prior (CAP) by creating a linear model on local priors.Different from previous methods that use various patch-based priors, Berman et al. [2] present a new image dehazing algorithm based on non-local prior so that a haze-free image is able to be well approximated by a few distinct colors.
While the afore-mentioned methods have achieved promising results, they perform far from satisfying when applied to overwater images.MRF [9] tends to produce over-saturated images.The enhanced images of FVR [10] often contain distorted colors and severe halos.DCP [11] does not work well when it comes to the sky regions, as the scene objects are similar to the atmospheric light.

Methods Based on CNN Features
Deep convolutional neural networks have shown promising success in various computer vision tasks.Cai et al. [8] propose an end-to-end DehazeNet with nonlinear regression layers to estimate medium transmission.Instead of estimating the transmission map or the atmospheric light firstly, AOD-Net [15] predicts the haze-free images directly using a light-weight CNN.Proximal Dehaze-Net [4] attaches the advantages of traditional prior-based dehazing methods to deep learning technologies by incorporating the haze-related prior learning.
Since Goodfellow [17] proposed the GAN method in 2014, there have been many effective variants tailored to different computer vision tasks [21][22][23].Motivated by the success of GANs in those regions, several GAN-based methods have been proposed for image dehazing.In [5], a Densely Connected Pyramid Dehazing Network (DCPDN) is proposed to jointly learn the transmission map, atmospheric light and dehazing result all together.Yang et al. [24] propose to loose the paired training constraint by introducing a disentanglement and reconstruction mechanism.Li et al. [13] design a solution based on a cGAN network [22] to directly estimate the clean image.Ren et al. [3] adopt an ensemble strategy to take advantage of information in white balance, contrast enhancing, and gamma correction images.Overall, these methods are trained on paired data, which is unsuitable for the proposed overwater image dehazing task, where only unpaired training data is available.

Image Dehazing Dataset
Image dehazing tasks profit from the continuous efforts for large-scale data.Several datasets [18][19][20] have been introduced for image dehazing.
MSCNN [14] and AOD-Net [15] utilize the indoor NYU2 Depth Database [25] and the Middlebury Stereo database [26] to synthesize hazy images using the known depth information.O-HAZE [19] is an outdoor scenes dataset, which is composed of pairs of real hazy and corresponding clean data.I-HAZE [19] is a dataset that contains 35 image pairs of hazy and corresponding ground-truth indoor images.Li et al. [20] have launched a new large-scale benchmark which is made up of synthetic and real-world hazy images, called Realistic Single Image Dehazing (RESIDE).However, most datasets are synthetic and not tailored to handling overwater image dehazing.Different from the above datasets, we collect a dataset containing real data which is specially for dehazing overwater images.

Generator
We adopt the same structure for the two generators G and F .Both generators are divided into three parts: encoding, transformation, and decoding.The architecture of the generator is shown in Figure 3a.Encoding.The encoding module extracts image features by three convolution layers, which serve as down-sampling layers to decrease the resolution of the original input.Each convolution layer is followed by an instance normalization and a Leaky ReLU.Since image dehazing can be treated as a domain adaptation problem, instance normalization is more suitable than batch normalization.
Transformation.The transformation module translates information from one domain to another via nine ResNet blocks [27].The ResNet block in our network contains two 3×3 convolution layers with the same number of filters.Due to the results of image dehazing need to retain the characteristics of the original image, the ResNet block is well suited to accomplish these transformations.
Decoding.The decoding module includes up-sampling operations and nonlinear mappings.There are several choices for upsampling, such as deconvolution [28], sub-pixel convolution [29] and resize convolution [30].In order to reduce checkerboard artifacts [30] caused by deconvolution or sub-pixel convolution, we use the resize convolution for decoding.Inspired by the success of U-Net [31], we introduce two symmetric skip connections to deliver information between encoding and decoding modules.Finally, images are recovered through convolution and tanh activation.

Discriminator
We use two discriminators D x and D y to distinguish the input hazy images and clean images, respectively.The discriminator is implemented in a fully convolution fashion, as shown in Figure 3b.We use four convolution blocks in discriminator.The first block consists of a convolution layer and a Leaky ReLU, the last block only contains a convolution layer and the remaining blocks are composed of a convolution layer, an instance normalization and a Leaky ReLU.

Loss Function
We utilize three kinds of losses to enable the proposed network trainable with unpaired data and preserve image details simultaneously.

Adversarial loss.
As done in CycleGAN, we use the adversarial loss and the cycle consistency loss for unpaired training data.x ∈ X, y ∈ Y are a hazy image and an unpaired clean image, respectively.For the generator G and discriminator D y , the adversarial loss is formulated as: Correspondingly, the constraint on generator F and its discriminator D x is However, the above losses are prone to unstable training and generating low quality images.To make the training more robust and achieve high quality images, we use a least squares loss [23] instead of the negative log likelihood objective [17].Therefore, Eq (1) and ( 2) are modified as: ) And the final adversarial loss is denoted as: Cycle consistency losss.CycleGAN introduces a cycle consistency loss to solve the problem that an adversarial loss alone cannot ensure the matching between the output distribution and the target distribution.For each image x, F (G(x)) is able to bring G(x) back to the original image.Similarly, G(F (y)) is able to bring F (y) back to the original image y.F (G(x)) is the cyclic image of input x, and G(F (y)) is the cyclic image of the original image y.To train generators G and F at the same time, the consistency loss includes two constraints: The cycle consistency loss is defined to calculate L1-norm between the input and the cyclic image for unpaired image dehazing: Perceptual loss.We introduce perceptual loss to restrict the reconstruction of image details.Instead of measuring per-pixel difference between the images, perceptual loss is concerned with the distinction between feature maps, which comprises various aspects of content and perceptual quality.The perceptual loss is defined as: Here, θ represents the feature maps which generated from the relu4_2 on pertained VGG-16 [32] network.
Objective function.We define the loss as a weighted sum of previous losses: Where coefficients λ and µ represent the weights of cycle consistency loss and perceptual loss, respectively.We found that giving an over-weight to perceptual loss may cause the instability of training process thus the weight of perceptual loss should be much less than the weight of cyclic consistency loss.We minimize the generators G, F and maximize the discriminators D x , D y in training process.
The final objective function is:

Dehazed Image Quality Assessment
In order to verify the effectiveness of the proposed OWI-DehazeGAN, we design a dehazed image quality assessment model based on natural image statistics and VGG network.Natural images are directly captured from natural scenes, so they have some natural properties.By making statistics on these properties, natural scene statistics (NSS [33]) of images can be obtained.NSS has been widely used in image quality assessment, especially no-reference image quality assessment.In this paper, the NSS we use is mean substracted contrast normalization (MSCN [34]) coefficients, which is used to normalize a hazy image.After normalization pixel intensities of haze-free by MSCN coefficient follow a Gaussian Distribution while pixel intensities of hazy images do not.The deviation of the distribution from an ideal bell curve is therefore a measure of the amount of distortion in the image.To calculate the MSCN Coefficients, the image intensity I(i, j) at pixel (i, j) is transformed to the luminance I(i, j).I(i, j) is defined as: where µ(i, j) and σ(i, j) represent the local mean field and local variance field obtained by calculating the image using a gaussian window with a specific size.
Local Mean Field µ is the Gaussian Blur of the input image.Local Variance Field σ is the Gaussian Blur of the square of the difference between original image and µ.In case the denominator is zero, a constant C is added.When a dehazed image is normalized by MSCN coefficient, only the uniform appearance and the edge information are retained.Human eyes are very sensitive to edge information, so the normalized image is consistent with human vision.The proposed IQA model for dehazed images is consist of luminance normalization, feature extraction and regression of evaluation score.The dehazed images are firstly normalized by MSCN coefficient, which provides a good normalization of image luminance and does not have a strong dependence on the intensity of texture.Then a VGG-16 model is used to extract features for finally predicting an image quality score between 0 and 9 through two fully connected layers, whose units are 512 and 1 respectively.The architecture of the IQA model is shown in Figure 4.The loss function of this IQA model is MAE.The loss is defined as: where N represents the number of images in the training set, y i and y * i denote target data and output data, respectively.The optimization goal of the IQA model in the training phase is to minimize the average absolute error loss.Learning the mapping between dehazed images and corresponding Mean Opinion Scores (MOS [35]) is achieved by minimizing the loss between the predicted score y * i and the corresponding ground truth y i .

Dataset
We collect a real unpaired image dataset called OverwateHaze for image dehazing in overwater scenes.The training set consists of 4531 unpaired images, which are 2090 hazy images and 2441 clean images, all images are crawled from Google.
These training images are resized to 640×480. Figure 5 illustrates some examples of our dataset.There are three differences between the proposed dataset and the existing datasets : (1) The OverwaterHaze dataset is a large-scale natural dataset with hazy images and unpaired haze-free images, as the previous datasets are only composed of synthetic data; (2) The OverwaterHaze dataset is tailored to the task of overwater image dehazing, rather than focusing on indoor or outdoor scenes; (3) The proposed dataset is much more challenging because the regions of sky and water surface make up a large part of the image.

Experimental Settings
The input images of generators and discriminators are set to 256 × 256 during training.We use an Adam solver to optimize gradient with a learning rate of 2e−4.The batch size is 1.The weights of cyclic consistency loss λ and perceptual loss µ are 10 and 0.0001, respectively.The coefficient a of Leaky ReLU is 0.2.The update proportion is 1 for generators G, F and discriminators D x , D y .

Qualitative Results on Real Images
Figure 6 shows an example of dehazing results of the proposed algorithm against the state-of-the-art methods.DCP [11] tends to overestimate the thickness of the haze and produce dark results (Figure 6b).The dehazed images by FVR [10] and BCCR [12] have significant color distortions and miss most details as shown in Figure 6c∼6d.The best performer in the hand-crafted prior based methods is CAP [1], which generally reconstructs details of haze-free images.The deep learning based approach achieve comparable results, such as DehazeNet [8], MSCNN [14] and dehaze-cGAN [13].But these results indicate that existing methods cannot handle overwater hazy images well.For example, the dehazed results by MSCNN and DehazeNet (Figure 6f∼6g) have a similar problem that tends to magnify the phenomenon of color cast and have some remaining haze.
The illumination appears dark in the results of Proximal Dehaze-Net [4] and AOD-Net [15], as shown in Figure 6h∼6j.From Figure 6k, CycleGAN [21] generates some pseudo-colors to a certain degree, which makes it quite different from the original colors.Meanwhile, its result generates extensive checkerboard artifacts in the sky regions.In contrast, the dehazed result by our method shown in Figure 6l is visually pleasing in the mist condition.

Qualitative and Quantitative Results on Synthetic Images
We further conduct some experiments based on synthetic hazy images.Although the proposed method is trained on real unpaired data, we note that it can be applied for synthetic images as well.Figure 7 shows some dehazed images generated by various methods.Figure 7a shows the groundtruth as reference.As shown in Figure 7b~7d, the results of DCP [11], FVR [10], and BCCR [12] have some distortions in colors or details.The dehazed results processed by CAP [1] (Figure 7e), DehazeNet [8] (Figure 7f), MSCNN [14] (Figure 7g), AOD-Net [15] (Figure 7h), dehaze-cGAN [13] (Figure 7i), and Proximal Dehaze-Net [4] (Figure 7j) are closer to groundtruth 7a than the results based on priors.However, there still exists some remaining haze as shown in Figure 7e~7h.The result generated by CycleGAN [21] in Figure 7k exists serious color cast and losses some color information.The dehazed result generated by our approach in Figure 7l, by contrast, is visually close to the groundtruth image.
An advantage of testing on synthetic data is able to objectively evaluate experimental results via SSIM, PSNR and CIEDE2000.Higher SSIM score indicates that the generated results are more consistent with human perception.PSNR forecasts the effectiveness of image dehazing, and CIEDE2000 presents that smaller scores indicate better color preservation.In Figure 7, the SSIM and PSNR values also indicate that our method surpass other methods.From Table 1, our method get higher PSNR and SSIM.Remarkably, the SSIM and PSNR of our model are significantly better than CycleGAN.

Dehazed Images Ranking
The proposed IQA model for dehazed images is pre-trained on TID2013 [35] and then fine-tuned using the IVC Dehazing Dataset [18].The TID2013 includes different types of image distortion, while IVC Dehazing Dataset is designed to evaluate the quality of dehazed images.Predicted scores are used to qualitatively rank photos as shown in Figure 8. Ranking scores and the ranking are presented below each image, where '1' denotes the best visual perception and '10' for the worst image quality.Figure 8 shows that the quality of overwater dehazed images generated by OWI-DehazeGAN is better than other methods.For a comprehensive comparison, we also report the dehazed image quality measured by four typical image quality assessment methods in Table 2.The best results are shown in red font.Table 2 shows that the proposed method achieves the best performance in terms of almost all metrics.

Effect of Resize Convolution
In the decoding process of the generator, we use the resize convolution to increase the resolution of feature maps, rather than deconvolution or sub-pixel convolution.To better understand how the resize convolution contributes to our  proposed method, we train three end-to-end networks with different upsample mode: (i)deconvolution, (ii)sub-pixel convolution, and (iii) resize convolution.
Figure 9 shows the results of three upsampling mode in our network.In comparison, the result of resize convolution is best viewed from the perspective of the human perception and retain more detailed information.From Figure 9c, we observe plenty checkerboard pattern of artifacts caused by deconvolution.Although the sub-pixel convolution (Figure 9d) can alleviate the 'checkerboard artifacts' to some extent, the result of sub-pixel convolution is rough and unsatisfying.Compared with the first two approaches, resize convolution recover most scene details and maintain the original colors.From Table 3, the introduced resize convolution gains higher PSNR, SSIM scores and a lower CIEDE2000 score than deconvolution and sub-pixel convolution, which indicate resize convolution can generate visually perceptible images.

Effect of Perceptual Loss
To show the effectiveness of our loss function, we train an overwater image dehazing network without perceptual loss additionally.The result of a comparative experiment of dehazing with and without perceptual loss is shown in Figure 10, the generated direction for images in the first row is X → G(x) → F (G(x)), the second row is the opposite of the first row Y → F (Y ) → G(F (Y )).We can observe from the Figure 10c, 10d, 10g, 10h that the estimated haze-free images and cyclic images lack fine details and the regions of the sky do not match with the input hazy image, which leads to the dehazed results containing halo artifacts when the perceptual loss is not used.Through the comparison of Figure 10b and 10d we can also find that the perceptual loss is favorable for the reconstruction of the sky regions, which is very necessary for the overwater image dehazing.
From Table 4, we observe that our network with perceptual loss gains higher PSNR, SSIM scores and a lower CIEDE2000 score.Higher SSIM and PSNR scores suggest the proposed method with perceptual loss is consistent with human perception.Lower CIEDE2000 means the less color difference between dehazed image and groundtruth.The above experiments show that the proposed loss is effective for the overwater image dehazing task.

Conclusion
In this paper, we formulate an overwater image dehazing task, create the first overwater image dehazing dataset, and propose the OWI-DehazeGAN to dehaze overwater images.Compared to previous CNN-based methods which require paired training data, our OWI-DehazeGAN is able to be trained on unpaired images.Our method directly predicts clean images from hazy input bypassing to estimate transmission maps and global atmospheric lights.We utilize the perceptual loss and the resize convolution to preserve detailed textures and alleviate checkerboard artifacts.Extensive experiments demonstrate that our method produces superior results than most of the state-of-the-art dehazing methods.

Fig. 1 .
Fig. 1.Overwater image dehazing example.The proposed method generates more clear images compared to state-of-the-art methods.

Fig. 2 .
Fig. 2. The main architecture of the proposed OWI-DehazeGAN network.G and F denote generators, where G : X → Y generates clean images from hazy images and F : Y → X vice versa.Dx and Dy denote discriminators.Adversarial loss, cycle consistency loss and perceptual loss are employed to train the network.

Figure 2
Figure 2 shows the main architecture of the proposed OWI-DehazeGAN.Unlike traditional GANs, OWI-DehazeGAN consists of two generators (G and F ) and two discriminators (D x and D y ) in order to be trainable with unpaired training data.Specifically, generator G predicts clean images Y from hazy images X, and F vice versa.D x and D y distinguish hazy images and clean images, respectively.Below we provide more details about each component.

Fig. 3 .
Fig. 3. Architecture of our generator and discriminator.The Generator consists of encoding, transformation, and decoding three parts.

Fig. 6 .
Fig. 6.Real hazy images and corresponding dehazing results from several state-of-theart methods (best viewed in color).

Fig. 8 .
Fig. 8.Comparison via the proposed IQA model.Ranking scores and the ranking are shown below each image.

Fig. 9 .
Fig. 9. Effectiveness of the proposed network with resize convolution.(a) and (b) are input hazy images.(b)~(e) are the zoom-in views.(c)~(e) are the dehazing results of deconvolution, sub-pixel convolution and resize convolution, respectively.

Fig. 10 .
Fig. 10.Comparison of dehazing with and without perceptual loss.(a)~(e) represents the generation direction of X → Y → X. (f)~(j) says the direction of formation is Y → X → Y .'(w/o Lper)' denotes the network without perceptual loss, and '(w/ Lper)' denotes the network with perceptual loss.

Table 1 .
Average PSNR, SSIM and, CIEDE2000 values of the dehazed results on the new synthetic dataset.The best result, the second result, and the third place result are represented by red, blue, and green, respectively.

Table 2 .
Comparison of dehazed image quality using four image quality assessment methods.The top three results are in red, blue, and green font, respectively.

Table 3 .
Average scores in terms of PSNR, SSIM, and CIEDE2000 for three upsampling convolutions on the synthetic test set of the OverwaterHaze dataset.

Table 4 .
Effect of perceptual loss in terms of SSIM, PSNR, and CIEDE2000.