Learning a Convolutional Autoencoder for Nighttime Image Dehazing

: Currently, haze removal of images captured at night for foggy scenes rely on the traditional, prior-based methods, but these methods are frequently ineffective at dealing with night hazy images. In addition, the light sources at night are complicated and there is a problem of inconsistent brightness. This makes the estimation of the transmission map complicated in the night scene. Based on the above analysis, we propose an autoencoder method to solve the problem of overestimation or underestimation of transmission captured by the traditional, prior-based methods. For nighttime hazy images, we ﬁrst remove the color effect of the haze image with an edge-preserving maximum reﬂectance prior (MRP) method. Then, the hazy image without color inﬂuence is input into the self-encoder network with skip connections to obtain the transmission map. Moreover, instead of using the local maximum method, we estimate the ambient illumination through a guiding image ﬁltering. In order to highlight the effectiveness of our experiments, a large number of comparison experiments were conducted between our method and the state-of-the-art methods. The results show that our method can effectively suppress the halo effect and reduce the effectiveness of glow. In the experimental part, we calculate that the average Peak Signal to Noise Ratio (PSNR) is 21.0968 and the average Structural Similarity (SSIM) is 0.6802.


Introduction
Because suspended particles in the air can absorb and scatter atmospheric light, the presence of fog and haze degrades the quality of images collected by image capture devices. Degraded images make it difficult for computer vision applications to make judgments about the information in the blurred image. Therefore, the existence of haze and fog seriously affects the development of object recognition, image segmentation [1], and autonomous vehicles. In recent years, an increasing number studies have indicated the importance of image dehazing.
At present, image dehazing methods based on prior theory [2][3][4][5][6][7][8] work well for daytime scenes. They extract features by using dark channel prior theory, color attenuation prior, or other priors. These methods are effective in the daytime scenes, but they cannot deal with the night scenes as well. The reason for this is that the biggest difference between nighttime images and daytime images is ambient illumination. Moreover, these priors are based on the statics of daytime images that make them unsuitable for the night situations.
With the development of deep learning, Cai et al. first proposed a network using the convolutional neural network (CNN) to estimate the transmission map, named Dehaze Net [9]. At first, we compute the color map of the hazy image according to maximum reflectance prior and remove it. Second, a transmission map of the no color effect hazy image is obtained through an autoencoder network with skip connections. Considering that it is difficult to obtain an effective model for the training of the ambient illumination by the neural network at night, we use guided filtering to obtain ambient illumination. Finally, we get the dehazing result by introducing the transmission map and the ambient illumination map into the night scene model. Experimental results show that our proposed method not only weakens the glow effect, but also reduces the halo artifacts surrounding the light sources. The contributions of this research can be summarized as follows:

1.
We propose a novel method for estimating the transmission map of the hazy image in the night scene, in which we have developed an autoencoder method to solve the problem of overestimation or underestimation of transmission in the traditional methods.

2.
The ambient illumination mainly comes from the low-frequency components of an image. We propose to use a guided filtering method to obtain the ambient illumination. This method is more accurate than the local pixel maximum method.

3.
In order to make the synthesized image close to the real situation at night, we propose a new method of synthesizing the night haze training set.
The rest of this paper is organized as follows. In Section 2, related works of image dehazing are briefly reviewed. Our proposed method is presented in Section 3. The experimental results and analysis are shown in Section 4. Finally, the conclusions are given in Section 5.

Related Works
Most of the existing methods [2,3,5,9,12,14,23,24] have been proposed to deal with the daytime haze removal. However, when applying them directory into nighttime scenes, the results become strange because the hazy nighttime images are different from the daytime images.
Among the nighttime haze removal works, Pei and Lee [19] removed haze based on color transfer preprocessing, dark channel priors, and bilateral filtering. Though this method improves the visualization of hazy images, it has the problem of color distortion. Li et al. [15] added an atmospheric point spread function to simulate the halo scattering propagation function in the atmosphere on the basis of the atmospheric scattering model. Based on this new model, they removed the glow effect from the input image before dehazing. After that, they obtain atmospheric illumination according to MRP and image guide filtering. Then, a spatially varying atmospheric light is used to calculate the transmission map. Since the method involves some additional post processing steps, the result of this method contains glow artifacts. Based on the statics of the outdoor images, Zhang et al. [18] proposed the maximum reflectance prior to estimate the varying ambient illuminations. After obtaining the intensities of illumination, dark channel prior is applied to estimate the transmission maps. This method is effective for areas where the maximum reflection prior is valid. For the prior invalid region, the color of dehazing images will be distorted.
Recently, deep neural networks have been widely used for daytime image dehazing. Cai et al. proposed the Dehaze Net [9], which relies on the physical scattering model. To learn the mapping between hazy image and medium transmission map, this network goes through feature extraction, multi-scale mapping, local extremum, and nonlinear regression. Based on the re-formulated atmospheric scatting model, AOD-Net [25] was designed. This is an end-to-end network, which can direct generate dehazed images. Qu et al. [22] proposed an EPDN, which can obtain haze-free images without relying on an atmospheric scattering model. Deng et al. [23] proposed an end-to-end network based on the atmospheric scattering model. In this network, the attention mechanism is used to integrate different dehazing results. RCNN [26] was proposed to extract haze-relevant features. A random forest regression model and guided filtering are used to estimate the transmission map.
Because nighttime scene typically contains multiple lights, we hardly through CNNs directly learning ambient illumination. Inspired by Dehaze Net, we combine CNNs and traditional methods to estimate the dehazed images.

Our Method
In this section, we introduce our proposed nighttime haze removal model method. To predict the clean images, we first remove the color effects from the hazy images, and then estimate the transmission maps and the ambient illuminations. The details of our method will be explained in the following subsections. Figure 2 illustrates the flowchart of our network. The proposed method contains three parts: color correction, transmission map estimation, and ambient illumination estimation. After that, we recover the clean image according to nighttime haze model.

Nighttime Haze Model
For daytime haze scenes, the most widely used image haze model is the physical atmospheric scattering model [13]: Among them, I(x) is the hazy image captured by the camera, J(x) denotes the haze-free image that needs to be restored, and A describes the global atmospheric light. The transmission map T(x) = e −β * d(x) indicates the portion of light reaching the camera. Here, term β is the scattering coefficient of A and d denotes the scene depth information. For nighttime haze scenes, it usually contains multiple artificial light sources, such as street lights, car lights, neon lights, and so on. Therefore, these artificial lights make the atmospheric lights vary from consistent values of daytime to the inconstant values of nighttime. Based on this, we introduce the ambient illumination map A(x) in the night scenes. Thus, (1) is modified as: Our goal is to obtain the clean image without haze, so we rewrite (2) as: Through this formula, we know that the key steps for nighttime dehazing are estimating the ambient illumination map A(x) and the transmission map T(x).

Color Correction
In our experiment, we first remove the color effect of the hazy images. As discussed in [18], for nighttime hazy image patches, the maximum intensities at each color channel are appearing a rough approximation of the varicolored ambient illumination. Therefore, we need to remove the color effect of the hazy image. The principles of color correction are as follows.
In view of the Retinex theory [27], we are aware that haze-free images consist of ambient illumination A(x) and reflectance from the surface of objects R(x). The formula is: During the night, the artificial light sources not only contain different colors, but also inconsistent brightness. Through analysis of [18], the ambient illumination are composed of brightness L(x) and color η(x) at night.
Thus, we can rewrite (2) as: The Maximum Reflectance Prior assumes that the value of ambient illuminations, light intensities, and color maps are consistent in the same patch. In addition, this prior also assumes the transmission on the same patch is constant. Among the above assumptions, the maximum operator is applied for (6).
where M c Ω i represents the maximum pixel value in patch Ω i on channel c. (7) also can be rewritten as: Since max x∈Ω i R(x) c ≈ 1 is defined in the maximum reflectance prior, we have: Through (9), we can estimate the color map of ambient illumination by: Here, η c Ω i is a rough ambient illumination color map. We refine it by the following Equation: The second term denotes the smoothness penalty. To complete this, image guide filtering is applied. After refining, we remove the color effects from the hazy images.
where I c j denotes the image after removing color influence. Figure 3 shows the examples of color map. The second row shows the color maps obtained by (10). The last row shows the hazy images after removing the color effect.

Transmission Estimation
After correcting the color of hazy images, we estimate medium transmission maps employing an autoencoder network. Nowadays, encoder-decoder networks structures are widely used in image denoising problems and produce good results. Motivated by this, we utilize an autoencoder network to deal with the problems of over/underestimation of the transmission map that occurs in the state-of-the-art nighttime dehazing methods. In the network, the receptive field represents the size of the perception range of neurons in different positions of the original image. Large receptive fields can acquire more global features. In contrast, local features are generated by small receptive fields. In order to reduce the image blur and get more contextual information, we introduce the skip connections to the proposed network. In addition to this, we also need small-size kernels to produce more local information.
The input of this network are the nighttime hazy images after color correction. It is firstly fed into 1 × 1 convolutional layer with 3 channels, and then enters into the encoder-decoder network. Figure 4 illustrates the structure of transmission computing network. The encoding part includes two (Conv +ReLU) blocks and four blocks (Maxpool + Conv + ReLU + Conv+ ReLU), while the decoding part is composed of four blocks (UpSample + Conv + ReLU + Conv + ReLU). The specific information for our autoencoder network is shown in Tables 1 and 2. Figure 5 shows four exemplar results of our transmission computing network.  The traditional method is based on empirical assumptions, but these priors may not hold at night, so we use a data-driven way to find the transmittance and obtain the model through training. Our training goal is to obtain the same transmission map as the ground truth transmission map. For this, we have to calculate the loss function every time and optimize it using the Adam [28] optimizer. After multiple training, we obtained a model that can be used for testing data.
The loss function of this part is: Here, T represents the transmission map produced by the autoencoder network, and T gt represents the ground truth transmission map.

Ambient Illumination Estimation
After obtaining the transmission map, we estimate the ambient illumination. In the night hazy scenes, the existing methods for estimating the ambient illumination are mainly based on the maximum value of pixels in each local patch. This method works well in the daytime because the pixel value of the sky area is the largest. At this time T(x) tends to be 0, and the atmospheric light value A is approximately equal to I(x). However, the ambient illumination of the night scene gradually decreases with the center of the light sources. Therefore, the method of local maximum is not suitable for night scenes. Since the estimation of transmission has occupied much computation time, we introduce a fast yet efficient method for estimating the ambient illumination. According to the Retinex theory, the ambient illumination mainly comes from the low-frequency part of an image. The low-pass guided filtering is widely applied in the daytime and nighttime haze removal. It not only smooths the images but also works well in preserving the edges of the images. Thus, we estimate ambient illumination through a low-pass guided filtering method. There are two types of guide images, i.e., the input images itself and no-input images.
Nowadays, many researchers make use of the channel difference map [29] (no-input images) to guide the image filtering. In this way, the middle position of the light sources obtained by this method is black, as shown in Figure 6b. This is because there are little differences between the maximum and minimum values in bright places. Due to the center of the light sources of the transmission map are black, it is necessary to get the ambient illumination with the center of the light source to compensate it. Based on the above analysis, we consider employing the input image itself as the reference image. Figure 6 shows ambient illumination maps obtained by using different guide images. The guided filtering algorithm assumes that the ambient illumination A(x) and the guided image I(x) satisfy the linear relationship in a two-dimensional window ω: Among them, a and b are the coefficients of the current pixel, and k represents the pixel index. Finding the coefficients a and b is to minimize the difference between the input and output in the fitted function, thereby obtaining the loss function: Here, ω k denotes the filter window, while ε is to prevent the obtained a from being too large. Solve formula (15) by using least squares method.
where, µ k is the average pixel value of image I in window ω k , σ 2 k denotes the variance of the image I in the window µ k , and |ω| represents the number of pixels in the window ω k . The first I i in (16) represents the input image, and the second represents the reference image. In our work, the guide image is the same as the input image. After getting coefficients a and b, we input them into formula (14) to acquire the ambient illumination A(x).
In the last step of our dehazing method, we put the transmission map, hazy image, and ambient illumination into the dehazing model (3) to calculate the haze-free image.

Data Synthesis
In order to train the autoencoder network, we create a new dataset based on the NYUv2 dataset [30] named NYUv2-Night. We can see the specific process through Algorithm 1. Through our proposed algorithm to synthesizing images, the 1420 images existing in NYUv2 dataset are expanded 6 times as our training set. In fact, our synthesis method is based on the synthesis method in [18]. The main difference from [18] is that we add the darken image operation and randomly set the position of the light source. In the algorithm, clean image c denotes an image without haze during the day. The depth map d contains the depth information of the scene. c and d are both included in NYUv2. B ∼ U(0.1, 0.2) denotes the value of B are set to 0.1, 0.2, which means the image darker differently. p0 and p1 represent the position of the light source. Randomly set the position of the light source to meet the actual situation of inconsistent lighting positions at night. α ∼ U(0.4, 0.6, 0.8) denotes that the values of α are set to 0.4, 0.6, 0.8. As introduced in [18], since the dp in equation A(x) = e (−α * dp) is very small, Zhang et al. use Taylor series expansion instead. Therefore, we obtain A(x) = 1 − α * dp(x). Here, c night is used as the reflectance.

Experimental Details
After synthesizing the training set, we utilize the haze images that are not affected by color as the input of the autoencoder network. When removing the color effect of hazy images, we set the convolution kernel size of the guide filter to 16 × 16 and set the smoothing factor to 0.01.
The initial learning rate of our network is set to 0.01 and becomes half of the original after every 10 epochs. In our work, we training the autoencoder network 50 times with Adam optimizer. Our estimated transmission map network is implemented in the Pytorch framework. Moreover, our training is under ubuntu18.04, and the environment configuration is torch1.2.0 + cuda9.2 + python2.7. The experiment is trained with NVIDIA Geforce GTX1650 GPU, and the training time is about 8 h.
When computing the ambient illumination, we set the convolution kernel size of the guide filter to 64 × 64 and set the smoothing factor to 0.01.
We implement synthetic data, color correction, and ambient illumination estimation through MATLAB.

Comparison of Real Images
In order to prove the effectiveness of our experiment, we compare our method with nighttime haze removal methods [15,18]. Deep learning methods are extensively applied in the daytime, whereas they are not used at nighttime. To make our results more convincing, we also compare with CNN-based methods [22]. Figure 7 shows the comparisons of our method with the state-of-the-art methods on the real images.
In the dehazing result of the first image, the method of Li et al. presents more details than that of other methods, whereas it looks unnatural and contains noise in the sky region. The methods of [15,22] are too dark to preserve local details, while our dehazed image can preserve more details and looks more natural than Li et al.'s result. On the second row, the results of [15,18]

Comparation of Synthetic Images
Different from the real nighttime haze images, the synthesized haze images have the corresponding ground truth images. To evaluate the quality of haze-free images, we employ the wide evaluation items PSNR and SSIM.
To further illustrate the effectiveness of our experiment, we conduct comparative experiments on the outdoor dataset O-Haze [31]. Since this dataset contains hazy images and ground truth images, we only darken the images and add light sources when converting the dataset to a nighttime dataset. Figure 8 shows the dehazing results of the synthesized outdoor dataset.
As shown in Figure 8, our dehazed image becomes pale, this is because the color effect is removed from the hazy images. Thanks to this processing, our results better color appearance than others. Moreover, our results have higher PSNR and SSIM than those of methods. Table 3 shows the average PSNR and SSIM values of the four methods.

Summary
In this paper, we have proposed a novel method for estimate nighttime image dehazing. We first estimate the color map of the hazy image and then remove it according to MRP and image-guided filter. After that, for more accurate estimation of the transmission map, we propose an autoencoder network with skip connections. Subsequently, we propose a self-guided-filtering based method to obtain the ambient illumination, and it is able to extract the image low-frequency components as the estimation and preserve the image structures. Finally, we put the ambient illumination map, transmission map, and hazy image into the nighttime haze removal model to restore a haze-free image. In addition, we also propose a new method for generating a nighttime hazy training set. Our proposed method works well in keeping the edges of the image and suppress the halo effect. However, the color of the image changes slightly after dehazing, which is mainly due to the process of color correction. Besides, our proposed method also shows the same limitations as other methods that use atmospheric scattering models. For example, the estimation accuracy of ambient lighting and transmission map has a great influence on the quality of haze-free images. To eliminate the above problems, our next step will focus on how to use CNN to estimate the ambient illumination of the night haze image. We will rely on the Generative Adversarial Network (GAN) to weaken the influence of the atmospheric scattering model.