Photo-Realistic Image Dehazing and Verifying Networks via Complementary Adversarial Learning

Physical model-based dehazing methods cannot, in general, avoid environmental variables and undesired artifacts such as non-collected illuminance, halo and saturation since it is difficult to accurately estimate the amount of the illuminance, light transmission and airlight. Furthermore, the haze model estimation process requires very high computational complexity. To solve this problem by directly estimating the radiance of the haze images, we present a novel dehazing and verifying network (DVNet). In the dehazing procedure, we enhanced the clean images by using a correction network (CNet), which uses the ground truth to learn the haze network. Haze images are then restored through a haze network (HNet). Furthermore, a verifying method verifies the error of both CNet and HNet using a self-supervised learning method. Finally, the proposed complementary adversarial learning method can produce results more naturally. Note that the proposed discriminator and generators (HNet & CNet) can be learned via an unpaired dataset. Overall, the proposed DVNet can generate a better dehazed result than state-of-the-art approaches under various hazy conditions. Experimental results show that the DVNet outperforms state-of-the-art dehazing methods in most cases.


Introduction
In outdoor environments, acquired images lose important information such as contrast and salient edges because the particles attenuate the visible light. This degradation is referred to as hazy degradation, which distorts both spatial and color features and decreases visibility of the outdoor object. If the hazy degradation is not restored, we cannot expect a good performance of main image processing or image analysis methods such as object detection, image matching, and imaging systems [1][2][3][4], to name a few. Therefore, the common goal of dehazing algorithms is to enhance the edge and contrast while suppressing intensity or color saturation. To the best of the authors' knowledge, Middleton and Edgar were the first to employ a physical haze model for the dehazing problem [5].
To generate the haze-free image using the physical model, atmospheric light and the corresponding transmission should be estimated. However, an accurate estimation of the atmospheric light and transmission map generally requires additional information, such as a pair of polarized images, multiple images under different weather conditions, distance maps, or user interactions [6][7][8][9]. For that reason, many state-of-the-art approaches try to find a better method to estimate the atmospheric light and the transmission map based on reasonable assumptions [10][11][12][13]. He et al. proposed a dark channel prior (DCP)-based haze removal method [14]. They assumed that pixels in the local patch of a clear image have at least one dark pixel. The DCP method works well in most regions that satisfy the DCP assumption, but fails in a white object region. Berman et al. estimated the transmission map using haze-line prior assumption that the pixel coordinates in the color space tend to become closer to the atmospheric light in a hazy image [15]. To find the lower bound of a haze-line, they used the 500 representative colors. While the Berman's approach enhances color contrast, it is impossible to find representative colors in a severely degraded image by haze or fog. Shin et al. optimized the transmission estimation process using both radiance and reflectance components [16].
Recently, convolutional neural networks (CNN) are being applied not only to image classification, but also to variety of low-level image processing applications [17][18][19][20]. The CNN-based dehazing methods were also proposed in the literature to overcome the limitation of the transmission map estimation using a single image. Cai et al. estimated the transmission to restore a haze image using a DehazeNet [21]. Cai's method falls in the end-to-end supervised learning approach using synthetic haze and clean patches. To overcome the limitation of haze feature estimation, Ren et al. presented a multi-scale CNN [22]. They also proposed a learning method using the pairs of the simulated haze image and true transmission [23].
To increase the training accuracy, Li et al. combined two CNN modules of the transmission and atmospheric light estimation via all-in-one dehazing network (AODNet) [24]. Zhang et al. proposed a densely connected pyramid dehazing network (DCPDN) optimized by a conditional adversarial learning method [25,26]. The depth information can be incorporated into the transmission estimation process using a supervised learning method. However, it is hard to reflect other quantities such as attenuation, atmospheric light, and illuminance at once because it is difficult to collect the data including the depth, attenuation, airlight, and ideal illuminance maps.
For example, Figure 1a shows a real-haze image provided by [27]. This type of haze in Figure 1a is different from what we have simulated, and degraded by multiple factors including the color attenuation, unbalanced light source and scattered light. Therefore, CNN-based estimation can not adaptively remove this real-haze as shown Figure 1b   Note that the proposed method can restore the most naturally looking image by removing real-haze based on the direct estimation of the radiance map.
To overcome the dependency, a radiance estimation method can be applied to the dehazing process. Ren et al. estimated the haze-free radiance by using a mult-scale convolutional neural network and simulated haze dataset [22]. The mult-scale convolutional neural network can stably remove the simulated haze. Chen et al. estimated a physical haze modelbased radiance image using a dilated convolution [18] and adaptive normalization [28]. It can approximate the DCP or non-local dehazing operators using low computational complexity. This radiance estimation method can accurately estimate the dehazed result without additional estimation steps, but it may generate the amplified noise and dehazing artifacts. To approach fusion method, Ren et al. removed the haze using derived inputs and gated fusion network [29], Shin et al. proposed the triple convolutional networks including dehazing, enhancement, and concatenating subnetworks to enhance the contrast without dehazing artifacts [30]. However, the separated subnetworks result in increasing computational complexity. To solve this problem, this paper presents a new dehazing and verifying network (DVNet). The proposed DVNet does not need the subnetworks in the prediction procedure. Instead, only correction subnetwork is used for the training process, and evaluates the dehazing error in the output using a complementary adversarial learning. Different from the transmission estimation-based method, the proposed DVNet successfully removed the real-haze without the noise, halo, or other undesired artifacts with low computational complexity. Since the proposed method can use more enhanced ground truth images, our DVNet can be effectively learned by using absolute-mean error and perceptual loss functions. Furthermore, our verifying network simultaneously estimates and reduces the error of the resulting images via self supervised learning and least square adversarial network. Therefore, experimental results show that the proposed DVNet outperforms existing state-of-the-art approaches in the sense of both robustness to various haze environment and computational efficiency. This paper is organized as follows: Section 2 summarized related works, and Section 3, respectively, describes the proposed DVNet and the corresponding training method. After summarizing experimental results in Section 4, we conclude the paper with some discussions in Section 5.

Related Works
A clear image is degraded by the physical haze model as [5] x where J represents a haze-free, clean image, x the hazy, degraded version, p the twodimensional pixel coordinate, t the light transmission map, and A the spatially-invariant atmospheric light. Superscripts in x, J, and A represent a color channel, and the transmission t(p) is independent of the color channel. To solve this equation, physical haze model-based methods estimate the major components such as t and A based on a proper assumptions. Recently, several deep learning techniques can make this formula solvable without estimating t or A estimations. Therefore, this section introduces various deep learning-based dehazing approaches.

Physical Haze Model-Based Dehazing
He et al. applied the dark channel prior (DCP) to estimate the transmission as [14] t DCP (p) = 1 − min where q is the 2D pixel coordinate in a local patch region around p, denoted as N (p), in which the transmission is assumed to be constant. Berman et al. estimated the non-local (NL) transmission map using the geometric haze feature as [15] To solve for the feature in (3), Berman et al. used 500 representative colors and approximated the denominator using the k-nearest neighbor (k-NN) algorithm [31]. To minimize the dehazing artifacts such as noise and halo in the estimated transmission, either soft matting or weighted least squares [32,33] algorithm can be used as a regularization function. Shin et al. estimate the transmission by minimizing the radiance-reflectance combined cost as [16] arg min

Radiance-Based Dehazing
Given N pairs of haze-free and its hazy version patches, CNN-based dehazing methods commonly train the network by minimizing the loss function as where J P i and x P i represent the i-th training patches of the haze-free and hazy images, respectively. Θ is a set of network parameters including weights and biases, and F (·) is the output of the network given an input hazy image patch and the set of parameters [28,34].

Adversarial Learning
To reduce the divergence between the generated and real images, the adversarial loss can be defined as [26,[35][36][37] arg min where G J is the haze-free generator, D is a discriminator to discriminate a real or fake class, and L{·} denotes a sigmoid cross entropy operator. This adversarial learning can generate a haze-free image that is closer to the clean image.

Proposed Method
To remove haze, we present a new dehazing and verifying networks using dilated convolution layers and generative adversarial network. Deep learning-based dehazing methods require a serious of procedures including: Generation of dataset, configuration of a deep learning model, and training the model. In this section, we describe the data generation method in Section 3.1, the network architecture and learning functions of both correction and haze nets are given in Sections 3.2 and 3.3. Section 3.4 presents the proposed training approaches including the verifying network and complementary adversarial learning.

Data Generation
To generate the pairs of the haze and clean images, we first generate the initial dehazed image from the input hazy image using a physical haze model given in (1). Let I(p) be the input hazy image, andt(p) the estimated transmission using either (3) or (4), the initial clean image is computed as Since (7) gives an one-step, closed-form estimation, the training pairs of the hazy and haze-free images can be easily created. In this paper, we used the result of the non-local dehazing (NL) and radiance-reflectance optimization method (RRO) given in (3) and (4) to generate the initial dehazed images. In addition, haze simulated images such as NYUdepth data [23] can also be used to generate I D and I in pair based on physical haze-model. Overall, the generated data I D is used to input data of the correction network as shown in Figure 2. In the dehazing procedure, the input haze images are resoted by the haze network, which is learned by the corrected images. The verifying network imitates the natural images using self supervised learning, and the discriminator classifies the real or fake class between the natural image and generated images to reduce the statistical divergence.

Correction-Network (CNet)
We propose a correction network (CNet) to enhance the initial dehazed images by correcting both color and intensity values. To restore the missing information, we concatenate features of the haze network (HNet) using the dilated convolution and adaptive normalization [18,28] wheref k i and b i k , respectively, represent the i-th feature map and bias in the k-th layer, and is the kernels to obtain the i-th feature map using the feature maps extracted in the The operator " * r k " represents the dilated convolution using the rate of the k-th layer, r k . The dilated convolution can quickly perform filtering in a wide receptive field without changing the scale. g is a leaky rectified linear unit (LReLU) [38] function defined as A k (·) represents the adaptive normalization (AN) function in the k-th layer as where BN(·) denotes the batch normalization function [39], α k and β k are the trainable parameters to control the relative portion of the batch normalization function. The adaptive normalization approach given in (10) can provide an enhanced restoration results [28].
is concatenated as where concat is a feature concatenation operator [40], f is the feature map in a HNet that will be described in Section 3.3. This connection plays an important role in coordinating the learning direction. For example, if the CNet is incorrectly learned without the upward connections, the HNet is also learned with different images and such erroneous cycles are repeated. To correctly propagate the learning direction, we concatenate the feature maps of the HNet to the upward feature maps of the CNet. Top of Figure 2 shows the CNet and the proposed upward connection scheme. In addition, the parameters of CNet can be optimized by self-supervised learning using the perceptual loss [41], and it can be defined by VGG16 network [42] which is pretrained using ImageNet data [43]. The perceptual loss in the CNet is referred to as correction loss, which is defined as where N represents the batch size, I C the output of the C-Net, and F returns the feature maps of the VGG16 network model. We used relu1-2, relu2-2, relu3-3 and relu4-3 features in the VGG16. λ is a parameter to regularize 1 -norm of the gradient. This self-supervised CNet can correct color, intensity, and saturation in real-hazy dataset [27] as shown in Figure 3.

Haze Images
Initial Dehazed Images Corrected Images (CNet)

Haze-Network (HNet)
The HNet plays an important role in enhancing the degraded images. In addition, an efficient design of the H-Net can significantly reduce the processing time. For that reason, the HNet uses the dilated convolution and adaptive normalization [18,28] as, where f i k is a feature map of the H-Net in the k-th layer. b, h, and A k (·), respectively, represent the bias, kernel and adaptive normalization operator. Since the HNet is learned using the results of the CNet, its result can also be corrected in an adaptive manner. The HNet can be optimized by minimizing the haze loss as: where I H i is the output of the HNet.

Verifying Network
To make the outputs of the dehazing network (HNet, CNet) look more natural, we verify the errors, such as noise and halo artifact, using self-supervised learning with clean data [44]. The verifying loss of the self-supervised learning is defined as where I N i , IV i , and I V i , respectively, represent the clean image, results of the CNet, and HNet. Note that the self-supervised terms are designed by considering the errors, which means that the pixels and features in output images of both CNet and HNet are closed to the real natural images when the input images are ideally clean [30]. If input images are the clean images, the ideal haze model should generate the same natural images as in the left-bottom of Figure 2. Therefore this self supervised loss should be separately applied to optimize the networks as Algorithms 1 and 2. In this context, the self-supervised learning based on the loss in (15) using a clean image can minimize the dehazing artifacts as shown in Figure 4d. Futhermore, to reduce the statistical divergence between the generated and real images, the proposed DVNet can be optimized based on the least square adversarial cost [36] and min where D is a convolutional neural net based dicriminator as shown in right-bottom of Figure 2, which returns a probablity value of the input image I * using a binary softmax algorithm. G is the generative networks including HNet and CNet. The input data of the discriminator is the ideally natural data I N , and the random noise is replaced to real-haze image I in , the initial dehazed image I D , and natural image I N to engage our HNet and CNet.
In this adversarial learning method, the proposed network can be learned to reduce the probability divergence between the clean image I N and the result of the proposed network (I H , I C , I V ) using unfair images. To implement the adversarial cost, we will describe about the optimal parameters in Appendix A.
Therefore, the resulting images (I H , I C , I V ) can be improved as the visibility is similar to the clean images (I N ). Figure 4e shows the performance of the proposed DVNet. More specifically, the resulting images in Figure 4 show that our DVNet can better enhance the hazy images [45] in the sense of both details and contrast without the undesired dehazing artifacts.

Implementation
For the implementation, we split our method into the training and testing procedures. The training procedure consists of eight steps: (i) Feature extraction using HNet, (ii) feature concatenation using the CNet and generation of the corrected clean image, (iii) error verification using the same network architecture and natural image [44], (iv) differentiation of the real and fake images using discriminator, (v) minimizing (14) + (12), (vi) minimizing (15), (vii) maximizing and minimizing adversarial costs V(D) and V(G), (viii) repeat the above seven steps until the optimal CNN weights are obtained. The test procedure is simpler than the training procedure, and applies the optimal HNet to remove haze. Table 1 shows the pseudo-code of training and testing procedures of the proposed method. In Tables 2 and 3, the parameters of the proposed DVNet and discriminator are given for the implementation. To optimize the cost functions, we used an adaptive moment estimation (ADAM) optimization algorithm proposed by [46]. Learning rate values of the DNet and VNet were, respectively, set to 1 × 10 4 and 4 × 10 4 . We used 500 real-haze images from the dataset provided by [27], which are engaged to the DVNet with high quality images from NITRE 2017 dataset [44]. Initial clean images were created using the NL, RRO, and NYU-depth data [15,16,23] using five hundred training images. We trained the proposed DVNet 10,000 times. Table 1 shows conventions for the important variables and parameters for the implementation.

Experimental Results
For the experiment, we selected three benchmark datasets of size 512 × 512 including I-Haze, O-Haze, and 100 real hazy images [27,[47][48][49]. Especially for the comparative experiment, we tested existing dehazing methods including: Haze-line prior-based nonlocal dehazing method (NL), densely connected pyramid dehazing net (DCPDN), radiancereflectance optimization based dehazing (RRO), the region-based haze image enhancement method by using triple convolution network(TCN) [15,16,25,30]. Both NL and RRO were implemented in Matlab 2016b and tested on i7 CPU equipped with 64 GB of RAM. On the other hand, DCPDN, TCN and the proposed method were tested using NVIDIA RTX 2080ti graphics processing unit (GPU) and implemented in Python version 3.6 and Tensorflow. This section includes similarity evaluation in Section 4.1, visual quality evaluation in Section 4.2, and ablation study in Section 4.3.
For the quantitative evaluation, we measured the peak signal to noise ratio (PSNR), structural similarity index measure (SSIM), and CIE color difference formula 2000 (CIED) [50,51] as shown in Figures 5 and 6 and Table 4, where the best and second best scores are, respectively, shown with blue and cyan colored text. The proposed DVNet is trained by non-local dehazing or radiacne-reflectance optimization-based restoration results or NYU-depth dataset based haze-clean pair.   Both DVNet-RRO and DVNet-NL outperform than state-of-the-art approaches in term of both SSIM, and CIED in I-Haze dataset, which has the ideal illumination because each image was acquired in the indoor environment. However, the performance of DVNet-NYU was slightly lower than TCN-RRO in term of PSNR and SSIM because simulated dataset can not reflect various environments such as airlight and illuminance. It means that the DVNet-NYU can generate intensity saturation as shown in Figure 5h.
Since adaptive normalization used in the TCN and our DVNet stretches the intensity, both DVNet and TCN can change the background color. Therefore, the PSNR of the DVNet-RRO is similar to that of TCN. Note that the DVNet does not only remove the haze but also change the illumination. So the resulting image has a different illuminance from the ground truth image. For that reason, the DVNets and TCN produce a lower similarity in the O-Haze dataset than the NL and RRO approaches.
However the DVNet-RRO performs better than other CNN-based methods such as DCPDN and TCN in term of SSIM.

Visual Qaulity Assessment
To verify the performance of the DVNets in the real haze conditions, we used 100-FADE test sets provided by [27]. For the objective evaluation, we select no-reference measures including: Contrast to noise ratio (CNR), natural image quality evaluation (NIQE), entropy to evaluate amount of information in a single image such as intensity distribution, and intensity saturation [27,52,53]. A high-quality image has high CNR and entropy values, whereas it should have a low NIQE and saturation values for stable enhancement. The average scores of the proposed DVNet-NL are higher than those of stateof-the-art approaches in terms of the CNR and saturation as shown in Table 5. The ranking of the DVNet-NYU was the best score in terms of CNR, entropy, and NIQE. However, due to highly saturated pixels, the color of resultant image of DVNet-NYU can be distorted as shown in Figure 7h. Note that the DVNet-NL has high score in terms of the NIQE with a very small difference from the first NL. The DVNet-RRO also has a similar score in term of NIQE compared with RRO. However, the saturation score of the DVNets are lower than NL and RRO because our DVNets verifies the errors of the NL, RRO, and NYU-depth dataset. In summary, the proposed DVNet can successfully remove various types of haze in various environment [27] as shown in Figures 7 and 8.

Additional Study
To demonstrate the effect the proposed contributions, we conducted the additional studies using the I-Haze and O-Haze datasets. We also used version of the DVNet-NL for the ablation study. In Table 6, HNet and CNet represent the baseline of the proposed dehazing network, DVNet the optimized version of the proposed method with the natural image and self-supervised learning, GAN the optimized version of the proposed method using the proposed adversarial learning method. Note that the combined HNet and CNet model without VNet returns only similar images to those of physical model-based dehazing method, which also imitates the error such as noise and saturation. Our DNet (HNet + CNet) can reduce the intensity distortion caused by initial dehazed image I D . The SSIM values the DVNet increased at the cost of a slight PSNR reduction. This means that our verifying process can prevent the noise and halo at the cost of slightly reduced dehazing performance. However, since the proposed adversarial network complements the dehazing performance, the PSNR values outperform the vanilla DVNet. In addition, Table 7 shows the processing time of the proposed DVNet with various image sizes. In evaluation procedure, the proposed DVNets only use a single network(HNet). Therefore, the DVNets can more reduce the computational time over 5-10 times than the TCN and DCPDN, which have several subnetworks.

Conclusions
To estimate a high-quality, clean radiance image without the dehazing artifacts, we proposed a novel dehazing network followed by a verifying network, which generates the radiance images to verify the dehazing errors. To estimate an ideally clean image pair, we concatenate feature maps using adaptive normalization and upward connections from the HNet to the CNet. In addition, an unpaired natural image and the discriminator can help minimizing the noise and dehazing artifacts without the performance degradation. The DVNet can be adaptively remove the haze without addtional estimation processes. Therefore, the proposed approach can efficiently remove various types of haze with low conputational complexity. More specifically, three experiments were conducted to verify the performance of the DVNet and the effect of the individual contributions. As a result, the DVNet can provide high-quality dehazing results under various types of haze environments. However, the DVNet may depend on the based training data. In the future work, we plan to combine the DVNet with the data augmentation method, and expand it to video dehazing.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Optimal Parameters
In proposed method, the least square adversarial cost functions are defined as [36] max where D returns the probability values via the discriminator using the soft-max algorithm, and G represents the proposed generator model including H and CNet. To find the optimal point of the discriminator, V(D) in (A1) can be expressed as The optimal point of the discriminator D * can be obtained when its partial derivative with respect to D is equal to zero, such as Therefore, the optimal point D * can be defined as which can be simplified by defining the real and fake distributions, respectively, denoted as P 1 = P D and P 2 = 3P G , D * (x) = bP 1 + aP 2 P 1 + P 2 , (A7) (A2) is expressed as and V(G) = x ((b−c)(p 1 (x)+p 2 (x))−(b−a)p 2 (x)) 2 p 1 +p 2 dx.
If we set conditions as: b − c = 1, b − a = 4 3 , and P 1 ≈ 1 3 P 2 , then V(G) will converge. Therefore, (A10) is re-written as V(G) = x 4 3 P 2 (x) − (P 1 (x) + P 2 (x)) 2 P 1 + P 2 dx, (A12) where χ 2 P represents Pearson-χ 2 divergence [36]. It means that when the above conditions are satisfied, χ 2 divergence can minimize the distance between P 1 + P 2 and 4 3 P 2 . So, above equation can be expressed as If all conditions are satisfied, then P D = P G . Therefore, the optimal parameters can be defined as a = 4/3, b = 0, and c = 1. However, since the maximum value of D is equal to 1, the proposed parameters are applied as