A Sensor Image Dehazing Algorithm Based on Feature Learning

To solve the problems of color distortion and structure blurring in images acquired by sensors during bad weather, an image dehazing algorithm based on feature learning is put forward to improve the quality of sensor images. First, we extracted the multiscale structure features of the haze images by sparse coding and the various haze-related color features simultaneously. Then, the generative adversarial network (GAN) was used for sample training to explore the mapping relationship between different features and the scene transmission. Finally, the final haze-free image was obtained according to the degradation model. Experimental results show that the method has obvious advantages in its detail recovery and color retention. In addition, it effectively improves the quality of sensor images.


Introduction
Visual information accounts for more than 80% of the total amount of information that is available to humans. As an extension of the visual system, image sensors [1] play an increasingly important role in industry, agriculture and daily life. Different sensors are widely used in military, astronomy, medical, radio, television, fax and communication applications due to their high integration, low power consumption, impact resistance and long life. However, in bad weather conditions such as foggy weather, external scene information is seriously polluted, the color of the perceived scene is seriously drifted, the contrast and saturation performance are rapidly degraded, and the features of the target are difficult to recognize. Therefore, how to use digital processing technology to avoid the above problems and improve the sensor's ability to perceive the real world has become an urgent problem that must be solved.
Image dehazing aims to recover real scene information covered by fog in sensor images through relevant technical means and methods, to obtain complete details, structural information and natural, haze-free images. To improve the quality of images obtained by the sensors, many dehazing methods have been proposed.
Currently, image dehazing methods are mainly divided into the methods based on image enhancement and the methods based on image restoration. The former include: histogram equalization [2], homomorphic filtering [3], wavelet transform [4] and Retinex methods based on color invariance [5]. The image enhancement-based dehazing method improves the contrast of the restored image and highlights local features such as the details of the haze image. However, it does not consider the fundamental cause of the degradation of the haze image. therefore, the restored image cannot effectively reflect the real information of the scene, leading to the drawbacks of incomplete dehazing and severe color distortion.
On the other hand, methods based on image restoration have been widely used in recent years: He [6] proposed the dark channel to estimate transmission, and this method has been widely adopted. However, color distortion occurs in restored results for large white areas such as the sky. According to the degeneration model and atmospheric scattering model, Tan [7] proposed a dehazing method for maximizing local contrast based on the assumption that local area atmospheric light is smooth. However, the restored images show halo effects and supersaturation.
Many other improved methods have been proposed: Zhu [8] proposed the use of color attenuation and estimated the transmission through the linear model, which improved the operation efficiency of the method. However, his method relied too much on image color information, resulting in an unsatisfactory effect when testing the entire haze image. Berman [9] estimated transmission by calculating the color distance point-by-point. This method effectively avoided block effects, but it is prone to color saturation. Tang [10] adopted random forest to combine dark primary colors with multiple color features to improve the accuracy of transmission. However, this method did not account for the textural features associated with haze, resulting in inaccurate estimation of the transmission.
It is apparent from the above method that most current methods are only based on color features or structure features. To better utilize features such as texture, structure, and color for improving the dehazing effect, we propose an image dehazing algorithm based on feature learning. Using sparse coding and GAN as tools, our method aims to explore the mapping relationship between transmission and color and structure features of the images to achieve accurate estimation of transmission. The main contributions of the paper are as follows: Unlike most other methods, we consider the effects of color and structure features while estimating the transmission map. Through the training process of a weighted sparse dictionary, the multiscale texture and structure features of the images are extracted. At the same time, the multiscale color features of the image are considered in the estimation of the transmission.
We investigated the mapping relationship between features and the transmission through generative adversarial networks. In addition, the model of our networks was jointly constructed with a multilayer neural network and a convolutional neural network to further ensure the accuracy of estimation.
The remaining sections are arranged as follows: in Section 2, we introduce the generative process of haze images and apply it to the generation of training samples. In Section 3, we explain how to extract the color and structure features. Moreover, we build the GAN model to obtain the mapping relationship between the transmission and the features in this section. In Section 4, we further prove the effectiveness of our method in detail recovery and color preservation through experiments. In Section 5, we summarize the paper and provide future focus.

The Degradation Model and the Test Database
In this section, we introduce the foggy degradation model and the process of building a test database based on the degradation model.

Degradation Model
The degradation model was proposed by McCartney [11] in 1978, and has been widely used in computer vision and image processing. The model describes the objective mechanism of haze image degradation. The scattering effects that cause image degradation are divided into two categories: One is the direct attenuation, wherein reflected light on the surface of the target is attenuated by suspended particles in a turbid atmospheric medium. The other is the indirect attenuation, wherein atmospheric light is scattered through the suspended particles. The degradation model of the images acquired by the CCD sensors is shown in Figure 1. In Figure 1, I(x) represents haze image acquired by the CCD sensor, J(x) represents the reflected image of the scene, which denotes the haze-free image to be restored, and A denotes the global atmospheric light intensity. In our paper, we assume that the atmospheric light value is constant. Furthermore, t(x) represents the transmission, reflecting the attenuation degree of the scene, which can be expressed as: where β denotes the atmospheric scattering rate and d(x) denotes the depth of scene, indicating the distance from the target to the CCD sensors. He [6] established a foggy image degradation model as follows: where J(x)t(x) is the interference term due to direct attenuation, and A(1 − t(x)) is the interference term due to indirect attenuation. The accurate estimation of transmission t(x) is the key to obtain the haze-free image J(x). The method in our paper focuses on obtaining accurate transmission through feature extraction and generative adversarial networks. Since we assume that A is a constant, which is consistent with the literature [6], we can obtain the haze-free image J(x) according to Equation (3) when t(x) is obtained: To reduce the introduction of noise, we set 0.1 as the minimum value of the transmission, which is consistent with the parameter setting in He [6] and is widely used by most methods now.

Training Sample Generation of Our Networks
Since there is no publicly available dataset containing haze-free images and its corresponding haze images and transmission for the training of our networks, we selected an image from the NYU databases as the dataset. The NYU dataset [12] consists of video sequences from a variety of indoor scenes, recorded by RGB and depth cameras from Microsoft Kinect, which have high accuracy. It is a standard training dataset with haze-free images of different scenes and real depth maps. It has been used in research regarding image segmentation widely, detection and recognition. The NYU dataset is appropriate for use in network training in this study.
According to the degradation model, we randomly selected 500 haze-free images and their true depth maps from the NYU dataset to generate the corresponding haze images and transmission to be used as training samples. The formation of haze images is shown in Figure 2. In Figure 1, I(x) represents haze image acquired by the CCD sensor, J(x) represents the reflected image of the scene, which denotes the haze-free image to be restored, and A denotes the global atmospheric light intensity. In our paper, we assume that the atmospheric light value is constant. Furthermore, t(x) represents the transmission, reflecting the attenuation degree of the scene, which can be expressed as: where β denotes the atmospheric scattering rate and d(x) denotes the depth of scene, indicating the distance from the target to the CCD sensors. He [6] established a foggy image degradation model as follows: where J(x)t(x) is the interference term due to direct attenuation, and A(1 − t(x)) is the interference term due to indirect attenuation. The accurate estimation of transmission t(x) is the key to obtain the haze-free image J(x). The method in our paper focuses on obtaining accurate transmission through feature extraction and generative adversarial networks. Since we assume that A is a constant, which is consistent with the literature [6], we can obtain the haze-free image J(x) according to Equation (3) when t(x) is obtained: To reduce the introduction of noise, we set 0.1 as the minimum value of the transmission, which is consistent with the parameter setting in He [6] and is widely used by most methods now.

Training Sample Generation of Our Networks
Since there is no publicly available dataset containing haze-free images and its corresponding haze images and transmission for the training of our networks, we selected an image from the NYU databases as the dataset. The NYU dataset [12] consists of video sequences from a variety of indoor scenes, recorded by RGB and depth cameras from Microsoft Kinect, which have high accuracy. It is a standard training dataset with haze-free images of different scenes and real depth maps. It has been used in research regarding image segmentation widely, detection and recognition. The NYU dataset is appropriate for use in network training in this study.
According to the degradation model, we randomly selected 500 haze-free images and their true depth maps from the NYU dataset to generate the corresponding haze images and transmission to be used as training samples. The formation of haze images is shown in Figure 2. First, we selected the true depth map and the haze-free image from the NYU database. Meanwhile, an atmospheric scattering coefficient β is generated randomly, which ranges from 0.5 to 1.5. The selection of β is based on the experience of the most advanced algorithms, which can help us get accurate transmission. What's more, the large range of β also enriches the sample database, we can obtain haze images with different density. According to Equation (1), we can obtain the transmission t(x), which has been regarded as ground truth T in our paper. And then, the haze image I(x) can be obtained according to Equation (2).
According to this method, we obtain four groups of training samples: the haze-free image data set J set = {J1, J2, …, Jn} and its scene depth map dataset d set = {d1, d2, …, dn}, and the generated haze image dataset I set = {I1, I2, …, In} and the ground truth dataset T set = {T1, T2, …, Tn}. Therefore, J set , T set , I set and d set are the training samples for the networks.

Framework on Transmission Restoration
In Section 3, we explain the framework of our algorithm in detail. Based on sparse coding and feature learning, we present an efficient method for estimating the scene transmission of a haze image. Figure 3 shows the flow of our method, which consists of three parts: estimating the color feature maps of haze images through different priors and assumptions, extracting the structure features of haze images through sparse coding, and estimating the scene transmission through generative adversarial networks based on the extracted color and structure features.  First, we selected the true depth map and the haze-free image from the NYU database. Meanwhile, an atmospheric scattering coefficient β is generated randomly, which ranges from 0.5 to 1.5. The selection of β is based on the experience of the most advanced algorithms, which can help us get accurate transmission. What's more, the large range of β also enriches the sample database, we can obtain haze images with different density. According to Equation (1), we can obtain the transmission t(x), which has been regarded as ground truth T in our paper. And then, the haze image I(x) can be obtained according to Equation (2).
According to this method, we obtain four groups of training samples: the haze-free image data set J set = {J 1 , J 2 , . . . , J n } and its scene depth map dataset d set = {d 1 , d 2 , . . . , d n }, and the generated haze image dataset I set = {I 1 , I 2 , . . . , I n } and the ground truth dataset T set = {T 1 , T 2 , . . . , T n }. Therefore, J set , T set , I set and d set are the training samples for the networks.

Framework on Transmission Restoration
In Section 3, we explain the framework of our algorithm in detail. Based on sparse coding and feature learning, we present an efficient method for estimating the scene transmission of a haze image. Figure 3 shows the flow of our method, which consists of three parts: estimating the color feature maps of haze images through different priors and assumptions, extracting the structure features of haze images through sparse coding, and estimating the scene transmission through generative adversarial networks based on the extracted color and structure features. First, we selected the true depth map and the haze-free image from the NYU database. Meanwhile, an atmospheric scattering coefficient β is generated randomly, which ranges from 0.5 to 1.5. The selection of β is based on the experience of the most advanced algorithms, which can help us get accurate transmission. What's more, the large range of β also enriches the sample database, we can obtain haze images with different density. According to Equation (1), we can obtain the transmission t(x), which has been regarded as ground truth T in our paper. And then, the haze image I(x) can be obtained according to Equation (2).
According to this method, we obtain four groups of training samples: the haze-free image data set J set = {J1, J2, …, Jn} and its scene depth map dataset d set = {d1, d2, …, dn}, and the generated haze image dataset I set = {I1, I2, …, In} and the ground truth dataset T set = {T1, T2, …, Tn}. Therefore, J set , T set , I set and d set are the training samples for the networks.

Framework on Transmission Restoration
In Section 3, we explain the framework of our algorithm in detail. Based on sparse coding and feature learning, we present an efficient method for estimating the scene transmission of a haze image. Figure 3 shows the flow of our method, which consists of three parts: estimating the color feature maps of haze images through different priors and assumptions, extracting the structure features of haze images through sparse coding, and estimating the scene transmission through generative adversarial networks based on the extracted color and structure features.

Multiscale Color Feature Extraction
As is widely known, there are great differences in color between haze images and haze-free images. Therefore, color features are often used to estimate the scene transmission of a haze image in image dehazing. A variety of color-based priors are fully adopted in a single image removal, such as the color-lines prior proposed by Fattal [13], the dark channel prior (DCP) proposed by He [6], and haze-lines prior proposed by Berman [9]. The different kinds of priors can be viewed as a series of color features related to haze density. In fact, although these features can achieve image dehazing effectively, different features have unique limitations. For example, images restored through DCP show color distortion in the sky region. To improve the accuracy of the scene transmission, a variety of color features have been extracted from the haze images to further estimate the transmission in our method: multiscale dark primary color features, haze-lines color features, and RGB channel color features.

Multiscale Dark Primary Color Features
In image dehazing, dark primary color is an important color feature. In most non-sky local areas, some pixels always have very low values in at least one color channel. In this paper, the dark primary color features are defined as: where Ω(x) denotes the local patches centered on x, of which the size is r × r. I c (x) represents the components of haze image I(x) under the color channel C. P r (x) denotes the DCP color feature map, which denotes the dark primary color features of local patches of the size of r × r and is used to estimate the scene transmission t(x). Since r is directly related to the estimation of the transmission t(x), we adopt multiscale dark primary color features to estimate the transmission t(x). Different scales of dark primary color features can reflect more information of the scene transmission t(x). Different scales of the DCP color feature maps P r (x) are shown in Figure 4. It is apparent that the DCP color feature maps approximately reflect the distribution of haze in the image. The brighter the area in the image, the higher the haze density is. At the same time, r has an important influence on the shape of DCP color feature maps.

Multiscale Color Feature Extraction
As is widely known, there are great differences in color between haze images and haze-free images. Therefore, color features are often used to estimate the scene transmission of a haze image in image dehazing. A variety of color-based priors are fully adopted in a single image removal, such as the color-lines prior proposed by Fattal [13], the dark channel prior (DCP) proposed by He [6], and haze-lines prior proposed by Berman [9]. The different kinds of priors can be viewed as a series of color features related to haze density. In fact, although these features can achieve image dehazing effectively, different features have unique limitations. For example, images restored through DCP show color distortion in the sky region. To improve the accuracy of the scene transmission, a variety of color features have been extracted from the haze images to further estimate the transmission in our method: multiscale dark primary color features, haze-lines color features, and RGB channel color features.

Multiscale Dark Primary Color Features
In image dehazing, dark primary color is an important color feature. In most non-sky local areas, some pixels always have very low values in at least one color channel. In this paper, the dark primary color features are defined as: where Ω(x) denotes the local patches centered on x, of which the size is r × r. I c (x) represents the components of haze image I(x) under the color channel C. Pr(x) denotes the DCP color feature map, which denotes the dark primary color features of local patches of the size of r × r and is used to estimate the scene transmission t(x). Since r is directly related to the estimation of the transmission t(x), we adopt multiscale dark primary color features to estimate the transmission t(x). Different scales of dark primary color features can reflect more information of the scene transmission t(x). Different scales of the DCP color feature maps Pr(x) are shown in Figure 4. It is apparent that the DCP color feature maps approximately reflect the distribution of haze in the image. The brighter the area in the image, the higher the haze density is. At the same time, r has an important influence on the shape of DCP color feature maps.

Haze-Lines Color Features
Berman [9] clusters the RGB values of individual images in the Berkeley Segmentation Database, which contains a variety of haze-free images. It is found that a clear natural image can be represented by hundreds of different RGB values at most, and different clusters can be formed in RGB space. For a given cluster, the pixels belonging to the cluster are nonlocal and distributed in different locations of the entire image. Under the influence of haze, the distance from pixels located in different regions of the image to the camera is different. The pixels that belong to the same color cluster finally correspond to different RGB values instead of clustering into one cluster. There is a line in RGB space

Haze-Lines Color Features
Berman [9] clusters the RGB values of individual images in the Berkeley Segmentation Database, which contains a variety of haze-free images. It is found that a clear natural image can be represented by hundreds of different RGB values at most, and different clusters can be formed in RGB space. For a given cluster, the pixels belonging to the cluster are nonlocal and distributed in different locations of the entire image. Under the influence of haze, the distance from pixels located in different regions of the image to the camera is different. The pixels that belong to the same color cluster finally correspond to different RGB values instead of clustering into one cluster. There is a line in RGB space called the haze-line. We can estimate the transmission according to the color features in the image and the position of the pixels on its haze-line.
We define the haze-line feature map C(x) and I A (x) based on the non-local distribution of pixels in RGB space and mapping it into a spherical coordinate system: where r(x) represents the distance from the pixel to atmospheric light A. Berman assumed that θ(x) and ϕ(x) are constant in the same haze-line. The transmission is only related to the distance of color. Based on the above assumption, we can summarize using the following expression: where r max = max x∈L {r(x)}, L represents haze-lines, and C(x) represents the relationship between the color features of the image and the transmission t(x).

RGB Channel Color Features
Since most of the color features are based on the RGB color model, the three components (R channel component, G channel component and B channel component) of the haze image in RGB space are of great value to the estimation of transmission t(x) [14]. We define the RGB feature maps K c (x) as: where K c (x) represents three components of the haze image in RGB color space. As special feature maps, RGB feature maps have been important reference for the estimation of transmission t(x).

Multiscale Structure Feature Extraction
Since the traditional dehazing algorithm simply depends on the color feature to estimate transmission t(x), it has some limitations. The texture and structural features of the object are important information for estimating the transmission t(x), especially for objects with regular shapes or light colors. As a result, the texture and structure of haze images play a key role in the estimation. In this paper, sparse coding is used to extract texture and structural features of haze images, to further estimate the transmission t(x). Sparse coding [15] is an unsupervised learning method, which is used to find a set of super complete base vectors to represent sample data more efficiently. It can realize the automatic selection of features and remove the features without information through learning. In other words, it resets the corresponding weights of these features to 0. Because it can extract the main features of objects effectively, it is widely used in object detection and recognition.
The sparse coding of natural images and the process of dictionary training are shown in Figure 5. V represents the visual dictionary, x n represents a local patch of the image, and u n represents the sparse coefficients that correspond to x n . It is apparent that x n is represented as linear combinations of different atoms of dictionaries V. The process of sparse coding is expressed as follows: where λ u n 1 denotes a regularization constraint for u n , λ denotes a regularization parameter and u n 1 denotes a coefficient penalty term. The sparser the coefficient vector, the smaller the value of the target equation. As such, in the optimization process, the coefficient vector will evolve in a more sparse direction. The essence of sparse coding is that a target vector x i can be linearly fitted by a small number of base vectors. In this paper, the feature map of the image is divided into three steps. where 1 n u λ denotes a regularization constraint for un, λ denotes a regularization parameter and 1 n u denotes a coefficient penalty term. The sparser the coefficient vector, the smaller the value of the target equation. As such, in the optimization process, the coefficient vector will evolve in a more sparse direction. The essence of sparse coding is that a target vector xi can be linearly fitted by a small number of base vectors. In this paper, the feature map of the image is divided into three steps. •

Dictionary Training
In this process, we test a large number of training samples to obtain a set of redundant basis vectors through unsupervised learning, which is also called redundant dictionary V. The dictionary V reflects some essential elements in the training sample, such as the edges and corners of the image. Experiments show that the dictionary learning process simulates the processing of information in the human visual cortex. The calculation of dictionary V can be expressed as the least square problem with quadratic constraints, that is: • Linear fitting An arbitrarily local patch xn which converts to the form of vector can be fitted by a linear combination of several entries in the dictionary V. According to different constraint conditions, we obtain a different fitting coefficient un, such that the image features can be expressed with the coefficient vector un. Since V is known, we perform sparse coding on image X to obtain the corresponding coefficient matrix U, the upper type is rewritten to: We adopt a K-SVD dictionary learning algorithm to train the dictionary V and the sparse coefficient matrix U, and the objective functions are as follows: where vj denotes the vectorized atom that belongs to dictionary V, uj denotes the coefficient vector of the local patch, and Ek is an error matrix. Then, we update the dictionary column by column and the corresponding nonzero sparse coefficient is updated, by which the dictionary V and the sparse coefficient matrix U are obtained.
• Feature weighting • Dictionary Training In this process, we test a large number of training samples to obtain a set of redundant basis vectors through unsupervised learning, which is also called redundant dictionary V. The dictionary V reflects some essential elements in the training sample, such as the edges and corners of the image. Experiments show that the dictionary learning process simulates the processing of information in the human visual cortex. The calculation of dictionary V can be expressed as the least square problem with quadratic constraints, that is: • Linear fitting An arbitrarily local patch x n which converts to the form of vector can be fitted by a linear combination of several entries in the dictionary V. According to different constraint conditions, we obtain a different fitting coefficient u n , such that the image features can be expressed with the coefficient vector u n . Since V is known, we perform sparse coding on image X to obtain the corresponding coefficient matrix U, the upper type is rewritten to: We adopt a K-SVD dictionary learning algorithm to train the dictionary V and the sparse coefficient matrix U, and the objective functions are as follows: where v j denotes the vectorized atom that belongs to dictionary V, u j denotes the coefficient vector of the local patch, and E k is an error matrix. Then, we update the dictionary column by column and the corresponding nonzero sparse coefficient is updated, by which the dictionary V and the sparse coefficient matrix U are obtained.
• Feature weighting We use weighted sparse coding to extract texture and structure features of images. By weighting the sparse coefficient matrix U, the texture and detail of the images are further highlighted, which increases the weight of texture and structural features and makes texture and structure more prominent in the estimated feature map. As such, the reconstructed feature map S l (x) is obtained, where emphl represents the size of local image patch that has been trained. Additionally, the relationship between texture features and the scene transmission graph is more intuitively represented. The weighted sparse coefficients are as follows: where Z denotes the normalized coefficient, which makes u n range from 0 to 1. And u n denotes a weighted sparse coefficient matrix. The diagram of coefficient weighting is shown in Figure 6: Sensors 2018, 18, x FOR PEER REVIEW 8 of 16 We use weighted sparse coding to extract texture and structure features of images. By weighting the sparse coefficient matrix U, the texture and detail of the images are further highlighted, which increases the weight of texture and structural features and makes texture and structure more prominent in the estimated feature map. As such, the reconstructed feature map Sl(x) is obtained, where l represents the size of local image patch that has been trained. Additionally, the relationship between texture features and the scene transmission graph is more intuitively represented. The weighted sparse coefficients are as follows: where Z denotes the normalized coefficient, which makes un′ range from 0 to 1. And un′ denotes a weighted sparse coefficient matrix. The diagram of coefficient weighting is shown in Figure 6: where xn denotes the local image patch, un = (0, 0, …, 0, a, 0, …, 0, b, 0, …, 0, c, 0) and un′ = (0, 0, …, 0, a′, 0, …, 0, b′, 0, …, 0, c′, 0). xn′ denotes the weighted local image patch. Once the dictionary V and the coefficient matrix U' are known, we can obtain the weighted structure feature map Sl(x) by sparse coding. To estimate the transmission accurately, we train the image blocks of different sizes through the dictionary, and construct the multiscale structure feature map Sl(x) (l = 3, 8, 13), which is the same as DCP color feature maps Pr(x).
In addition, to analyze the effect of the value of λ on the structure feature maps, experiments were carried out with λ = 0, 0.1, 0.3 and 0.5 respectively. As we can see in the Figure 7. The structure feature map with λ = 0.1 has no detail loss and has appropriate brightness. So we selects λ = 0.1 to estimate the structure feature maps.  where x n denotes the local image patch, u n = (0, 0, . . . , 0, a, 0, . . . , 0, b, 0, . . . , 0, c, 0) and u n = (0, 0, . . . , 0, a , 0, . . . , 0, b , 0, . . . , 0, c , 0). x n denotes the weighted local image patch. Once the dictionary V and the coefficient matrix U' are known, we can obtain the weighted structure feature map S l (x) by sparse coding. To estimate the transmission accurately, we train the image blocks of different sizes through the dictionary, and construct the multiscale structure feature map S l (x) (l = 3, 8, 13), which is the same as DCP color feature maps P r (x).
In addition, to analyze the effect of the value of λ on the structure feature maps, experiments were carried out with λ = 0, 0.1, 0.3 and 0.5 respectively. As we can see in the Figure 7. The structure feature map with λ = 0.1 has no detail loss and has appropriate brightness. So we selects λ = 0.1 to estimate the structure feature maps. We use weighted sparse coding to extract texture and structure features of images. By weighting the sparse coefficient matrix U, the texture and detail of the images are further highlighted, which increases the weight of texture and structural features and makes texture and structure more prominent in the estimated feature map. As such, the reconstructed feature map Sl(x) is obtained, where l represents the size of local image patch that has been trained. Additionally, the relationship between texture features and the scene transmission graph is more intuitively represented. The weighted sparse coefficients are as follows: where Z denotes the normalized coefficient, which makes un′ range from 0 to 1. And un′ denotes a weighted sparse coefficient matrix. The diagram of coefficient weighting is shown in Figure 6: where xn denotes the local image patch, un = (0, 0, …, 0, a, 0, …, 0, b, 0, …, 0, c, 0) and un′ = (0, 0, …, 0, a′, 0, …, 0, b′, 0, …, 0, c′, 0). xn′ denotes the weighted local image patch. Once the dictionary V and the coefficient matrix U' are known, we can obtain the weighted structure feature map Sl(x) by sparse coding. To estimate the transmission accurately, we train the image blocks of different sizes through the dictionary, and construct the multiscale structure feature map Sl(x) (l = 3, 8, 13), which is the same as DCP color feature maps Pr(x).
In addition, to analyze the effect of the value of λ on the structure feature maps, experiments were carried out with λ = 0, 0.1, 0.3 and 0.5 respectively. As we can see in the Figure 7. The structure feature map with λ = 0.1 has no detail loss and has appropriate brightness. So we selects λ = 0.1 to estimate the structure feature maps.

Estimating the Scene Transmission through GAN
According to the extracted color and structure features, we construct a mapping network between different types of features and the transmission t(x), which can achieve effective estimation of scene transmission. Based on the GAN, we obtain the transmission t(x) by mapping the input features to the multilayer network. The network framework is divided into two parts: generative networks G(*) and adversarial networks D(*). Through the generative networks G(*), we construct a mapping network between the color and structure features of haze images and transmission t(x). We thus obtain the generative transmission t g (x). Meanwhile, the adversarial networks D(*) discriminate and classify the input generative transmission t g (x), to form a whole Generative adversarial network.

Generative Networks
Since the color and structure features are very important to the estimation of the transmission t(x), we assume that there is a nonlinear mapping relationship between the various features and the transmission t(x), which can be expressed as: where x represents a particular position in the image and F k denotes the k th feature map. In our generative networks, k = 10, and F 1 , F 2 , . . . , F 10 represent the S 3 (x), S 8 (x), S 13 (x), K r (x), kg(x), K b (x), P 3 (x), P 7 (x), P 15 (x), C(x) defined in Sections 3.1 and 3.2. f 1 (*), f 2 (*), . . . , f n (*) denotes a series of unknown nonlinear functions. t g (x) represents the generative transmission t g (x). Then, we establish a multilayer generative networks model. Next, the sample sets I set and T set are trained for further determination of f 1 (*), f 2 (*), . . . , f n (*). Finally, we estimate the t g (x) of input haze images by using a trained multilayer generative network. The generative network model and its training process for the estimation of scene transmission is shown in Figure 8. The network consists of an input layer, a hidden layer and an output layer. The number of neurons in the input layer is equal to the number of feature maps. There are twice as many neurons in the hidden layer as in the input layer, and there is only one neuron in the output layer, which represents the generative transmission t g (x).

Estimating the Scene Transmission through GAN
According to the extracted color and structure features, we construct a mapping network between different types of features and the transmission t(x), which can achieve effective estimation of scene transmission. Based on the GAN, we obtain the transmission t(x) by mapping the input features to the multilayer network. The network framework is divided into two parts: generative networks G(*) and adversarial networks D(*). Through the generative networks G(*), we construct a mapping network between the color and structure features of haze images and transmission t(x). We thus obtain the generative transmission tg(x). Meanwhile, the adversarial networks D(*) discriminate and classify the input generative transmission tg(x), to form a whole Generative adversarial network.

Generative Networks
Since the color and structure features are very important to the estimation of the transmission t(x), we assume that there is a nonlinear mapping relationship between the various features and the transmission t(x), which can be expressed as: where x represents a particular position in the image and Fk denotes the k th feature map. In our generative networks, k = 10, and F1, F2, …, F10 represent the S3(x), S8(x), S13(

x), Kr(x), kg(x), Kb(x), P3(x), P7(x), P15(x), C(x) defined in Section 3.1 and Section 3.2. f1(*), f2(*), …, fn(*) denotes a series of unknown nonlinear functions. tg(x) represents the generative transmission tg(x).
Then, we establish a multilayer generative networks model. Next, the sample sets I set and T set are trained for further determination of f1(*), f2(*), …, fn(*). Finally, we estimate the tg(x) of input haze images by using a trained multilayer generative network. The generative network model and its training process for the estimation of scene transmission is shown in Figure 8. The network consists of an input layer, a hidden layer and an output layer. The number of neurons in the input layer is equal to the number of feature maps. There are twice as many neurons in the hidden layer as in the input layer, and there is only one neuron in the output layer, which represents the generative transmission tg(x). The training process of the network is as follows: we extract different features of haze images I1, I2, …, In, from the training sample set I set . Each haze image Ii corresponds to k feature maps. Then, we choose these feature maps as input images, and ground truth Ti in the T set as labels. Finally, the back propagation (BP) algorithm [16] is used for supervised learning of the networks. Thus, we obtain the trained generative network model. Through the trained network model, we can estimate the generative transmission while inputting a haze image: The training process of the network is as follows: we extract different features of haze images I 1 , I 2 , . . . , I n , from the training sample set I set . Each haze image I i corresponds to k feature maps. Then, we choose these feature maps as input images, and ground truth T i in the T set as labels. Finally, the back propagation (BP) algorithm [16] is used for supervised learning of the networks. Thus, we obtain the trained generative network model. Through the trained network model, we can estimate the generative transmission while inputting a haze image: where W (l) 2 p,q represents the weights from the pth neurons in the l layer network to the qth neurons in the next layer network, f (*) means sigmoid functions. As a kind of activation function. It is not only simple to calculate, but also can effectively suppress the error. H j donates the jth output in the hidden layer, and t g (x) and G(I i ) are the output images. In other words, G(I i ) is equal to t g (x).

Adversarial Networks
The adversarial networks mainly classify the input transmission and obtain the correct discriminant probability. Since there is a certain correlation between the transmission and depth information of the scene, we extract the features of the transmission through a convolutional neural network, which helps us obtain the characteristics of the transmission. Then, we regard the characteristic as the criterion of discrimination of the transmission. In this way, we obtain the discriminating probability of two kinds of transmissions. The adversarial networks model is shown in Figure 9.
where , ( ) 2 p q l W represents the weights from the pth neurons in the l layer network to the qth neurons in the next layer network, f(*) means sigmoid functions. As a kind of activation function. It is not only simple to calculate, but also can effectively suppress the error. Hj donates the jth output in the hidden layer, and tg(x) and G(Ii) are the output images. In other words, G(Ii) is equal to tg(x).

Adversarial Networks
The adversarial networks mainly classify the input transmission and obtain the correct discriminant probability. Since there is a certain correlation between the transmission and depth information of the scene, we extract the features of the transmission through a convolutional neural network, which helps us obtain the characteristics of the transmission. Then, we regard the characteristic as the criterion of discrimination of the transmission. In this way, we obtain the discriminating probability of two kinds of transmissions. The adversarial networks model is shown in Figure 9. The BN layer and Relu activation layer are applied to all convolution layers of the adversarial networks. We perceive the input information and extract the haze features of the transmission. The output is 1 if it is a ground truth, whereas it is 0 if it is the generative transmission tg(x).

Loss Function.
To form the overall confrontational training network and improve the robustness of the overall network, G(*) and D(*) should satisfy the following loss function: [lo g (1 ( ( )))] where T represents the ground truth obtained from the training sample, which has been set as the discriminant transmission in the process of discrimination. Meanwhile, T satisfies the Pdata distribution. tg(x) denotes the generative transmission obtained through the generative networks, Figure 9. The adversarial networks.
The BN layer and Relu activation layer are applied to all convolution layers of the adversarial networks. We perceive the input information and extract the haze features of the transmission. The output is 1 if it is a ground truth, whereas it is 0 if it is the generative transmission t g (x).

Loss Function.
To form the overall confrontational training network and improve the robustness of the overall network, G(*) and D(*) should satisfy the following loss function: where T represents the ground truth obtained from the training sample, which has been set as the discriminant transmission in the process of discrimination. Meanwhile, T satisfies the P data distribution. t g (x) denotes the generative transmission obtained through the generative networks, which obeys the P t distribution. I is the input image. Adversarial networks aim to discriminate ground truth T and the generative transmission t g (x). Generative networks aim to minimize (1 − D(G(I)), such that the adversarial networks cannot correctly distinguish the generative transmission. As a result, t g (x) is closer to the real transmission T. Therefore, the final transmission t(x) is estimated through the GAN, and we can obtain the haze-free image according to Equation (3).

Comparison Experiments
To illustrate the effectiveness of our algorithm, we compare the transmission and restored results of different methods, which proves the superiority of our method in the estimation of the transmission. Then, we compare the experimental results with to advanced methods in detail recovery and color preservation, respectively. Finally, we select synthetic haze images as input images to prove the applicability of our method. In addition, we further illustrate the advantages of our method by analyzing the objective index of the experimental results. All of the experimental results were completed in MATLAB 2016a on an Intel Core i7 CPU equipped with 16 GB RAM.

Qualitative Results
The comparison of the transmission is shown in Figure 10. In the first row of Figure 10, it is apparent that the edges of the transmission estimated by using our method are clear and hierarchical. In contrast, the transmissions estimated by using the methods of Cai [17], Chen [18] and Ren [19] show detail loss. For example, the mounds in the bottom-right corners of the images disappeared. As a result, the haze-free images restored by using the three methods have a problem associated with edge blurring in the second row of Figure 10. Although the transmission estimated by using the method of He [6] is full of details, it is too light, so the bottom of the restored image shows color distortion. For the third row of Figure 10, the transmission estimated by using our method has more detail information than the other methods. The other four methods show loss of detail loss. Additionally, our experimental results are clear in detail and natural in color. which obeys the Pt distribution. I is the input image. Adversarial networks aim to discriminate ground truth T and the generative transmission tg(x). Generative networks aim to minimize (1 − D(G(I)), such that the adversarial networks cannot correctly distinguish the generative transmission. As a result, tg(x) is closer to the real transmission T. Therefore, the final transmission t(x) is estimated through the GAN, and we can obtain the haze-free image according to Equation (3).

Comparison Experiments
To illustrate the effectiveness of our algorithm, we compare the transmission and restored results of different methods, which proves the superiority of our method in the estimation of the transmission. Then, we compare the experimental results with to advanced methods in detail recovery and color preservation, respectively. Finally, we select synthetic haze images as input images to prove the applicability of our method. In addition, we further illustrate the advantages of our method by analyzing the objective index of the experimental results. All of the experimental results were completed in MATLAB 2016a on an Intel Core i7 CPU equipped with 16 GB RAM.

Qualitative Results
The comparison of the transmission is shown in Figure 10. In the first row of Figure 10, it is apparent that the edges of the transmission estimated by using our method are clear and hierarchical. In contrast, the transmissions estimated by using the methods of Cai [17], Chen [18] and Ren [19] show detail loss. For example, the mounds in the bottom-right corners of the images disappeared. As a result, the haze-free images restored by using the three methods have a problem associated with edge blurring in the second row of Figure 10. Although the transmission estimated by using the method of He [6] is full of details, it is too light, so the bottom of the restored image shows color distortion. For the third row of Figure 10, the transmission estimated by using our method has more detail information than the other methods. The other four methods show loss of detail loss. Additionally, our experimental results are clear in detail and natural in color. To prove the effectiveness of our method in detail recovery, we selected four challenging defogged natural images for experimental comparison. In the first row of Figure 11, the mountain restored by using the methods of Fattal [13] and Tarel [20] were covered with haze. In contrast, the details of the mountain restored by using the method of Berman [9] disappeared. In the second row of Figure 11, only our method and Berman's method restore the clouds in the sky. However, there is color distortion in the city restored by using the method of Berman [9]. Throughout the entire image, the result restored by our method is sharper at the edges, which is pleasing to the human eye. In the third row of Figure 11, the branches restored by using our method are not only denser, but the trunk is clearly more visible than when using other methods. In addition, the restorations using the To prove the effectiveness of our method in detail recovery, we selected four challenging defogged natural images for experimental comparison. In the first row of Figure 11, the mountain restored by using the methods of Fattal [13] and Tarel [20] were covered with haze. In contrast, the details of the mountain restored by using the method of Berman [9] disappeared. In the second row of Figure 11, only our method and Berman's method restore the clouds in the sky. However, there is color distortion in the city restored by using the method of Berman [9]. Throughout the entire image, the result restored by our method is sharper at the edges, which is pleasing to the human eye. In the third row of Figure 11, the branches restored by using our method are not only denser, but the trunk is clearly more visible than when using other methods. In addition, the restorations using the methods of Berman [9], Fattal [13] and Tarel [20] were covered with residual mist. In the final row of Figure 11, it is obvious that our method is better in haze removal for the end of the road. In conclusion, our method has obvious advantages in detail recovery and structure maintenance compared to other methods.
Sensors 2018, 18, x FOR PEER REVIEW 12 of 16 methods of Berman [9], Fattal [13] and Tarel [20] were covered with residual mist. In the final row of Figure 11, it is obvious that our method is better in haze removal for the end of the road. In conclusion, our method has obvious advantages in detail recovery and structure maintenance compared to other methods. Figure 11. A comparison in detail recovery.
Next, we compared experimental results obtained from color priors, such as those presented in He [6], Zhu [8], Tang [10], and Berman [9]. As displayed in Figure 12, we chose natural images with thick fog, a large sky area and low brightness for experimental comparison. As shown in Figure 12b, the image restored using the method of Zhu [8] is darker and is not completely dehazed. The images restored using the method of Tang [10] have halo effects. In addition, the color of the image is too saturated. Finally, the images restored using the method of Berman [9] show obvious color distortion. In addition, the sun in Figure 12d is enlarged. Besides, it should be noted that the method of He [6] performs better for testing most images, but when it comes to the images with a large sky area, the retored results will have serious color distortion. As opposed to the four methods, the color using our method is closer to that of the input image. For example, the lawn in the last image restored using our method is green. Not only does our method achieve complete dehazing, it effectively avoids halo effects and color distortion.  Next, we compared experimental results obtained from color priors, such as those presented in He [6], Zhu [8], Tang [10], and Berman [9]. As displayed in Figure 12, we chose natural images with thick fog, a large sky area and low brightness for experimental comparison. As shown in Figure 12b, the image restored using the method of Zhu [8] is darker and is not completely dehazed. The images restored using the method of Tang [10] have halo effects. In addition, the color of the image is too saturated. Finally, the images restored using the method of Berman [9] show obvious color distortion. In addition, the sun in Figure 12d is enlarged. Besides, it should be noted that the method of He [6] performs better for testing most images, but when it comes to the images with a large sky area, the retored results will have serious color distortion. As opposed to the four methods, the color using our method is closer to that of the input image. For example, the lawn in the last image restored using our method is green. Not only does our method achieve complete dehazing, it effectively avoids halo effects and color distortion. methods of Berman [9], Fattal [13] and Tarel [20] were covered with residual mist. In the final row of Figure 11, it is obvious that our method is better in haze removal for the end of the road. In conclusion, our method has obvious advantages in detail recovery and structure maintenance compared to other methods. Figure 11. A comparison in detail recovery.
Next, we compared experimental results obtained from color priors, such as those presented in He [6], Zhu [8], Tang [10], and Berman [9]. As displayed in Figure 12, we chose natural images with thick fog, a large sky area and low brightness for experimental comparison. As shown in Figure 12b, the image restored using the method of Zhu [8] is darker and is not completely dehazed. The images restored using the method of Tang [10] have halo effects. In addition, the color of the image is too saturated. Finally, the images restored using the method of Berman [9] show obvious color distortion. In addition, the sun in Figure 12d is enlarged. Besides, it should be noted that the method of He [6] performs better for testing most images, but when it comes to the images with a large sky area, the retored results will have serious color distortion. As opposed to the four methods, the color using our method is closer to that of the input image. For example, the lawn in the last image restored using our method is green. Not only does our method achieve complete dehazing, it effectively avoids halo effects and color distortion.  To more comprehensively evaluate our method, we compare different experimental results by testing synthetic haze images, which were extracted from the Road Image Databases [21] and stereo images [13]. Through the comparison in Figure 13, it is apparent that the images restored using the method of He [6] and our method are the best. The method of Cai [17] causes the color in the restored image to be supersaturated due to underextraction of some features, such as the darker road in Figure 13c. The images recovered using the methods of Chen [18] and the Ren [19] have more residual mist. In addition, the experimental results from the two methods show great differences from the real haze-free images in the Road Image Databases. In contrast, our method also achieved satisfactory results for synthetic images. On the one hand, the color of our experimental results is closer to the real haze-free images than when using the method of He [6]. On the other hand, there is no halo effect in our experimental results, which reflects the advantages of the method in detail recovery and better color preservation. To more comprehensively evaluate our method, we compare different experimental results by testing synthetic haze images, which were extracted from the Road Image Databases [21] and stereo images [13]. Through the comparison in Figure 13, it is apparent that the images restored using the method of He [6] and our method are the best. The method of Cai [17] causes the color in the restored image to be supersaturated due to underextraction of some features, such as the darker road in Figure  13c. The images recovered using the methods of Chen [18] and the Ren [19] have more residual mist. In addition, the experimental results from the two methods show great differences from the real hazefree images in the Road Image Databases. In contrast, our method also achieved satisfactory results for synthetic images. On the one hand, the color of our experimental results is closer to the real hazefree images than when using the method of He [6]. On the other hand, there is no halo effect in our experimental results, which reflects the advantages of the method in detail recovery and better color preservation.

Quantitative Results
To further verify the effectiveness of our method, we used the information entropy [22,23] of the image to evaluate the performance of the method from an objective perspective. Information entropy represents the amount of information contained in the image. The larger the value, the more details the experimental results include. As shown in Table 1, the entropy of images restored using our method are higher than others. It shows that our method outperforms most of other methods in detail recovery. The effectiveness of our method is further verified by the quantitative analysis, which fully demonstrates the advantages of our method of detail recovery. To further verify the effectiveness of our method for synthetic images, we analyze the experimental results in Figure 13 through the information entropy and the peak signal noise ratio (PSNR). PSNR reflects the integrity of image structure. As shown in the Tables 2 and 3, our method and He [6] have advantages over other methods in information entropy. However, when it comes PSNR, He [6] for partial images has low value, which shows that the applicability of this method is poor. In contrast, our method obtains higher values for all results in PSNR, the restored images are clearer and natural, and the effectiveness of image dehazing is improved.

Quantitative Results
To further verify the effectiveness of our method, we used the information entropy [22,23] of the image to evaluate the performance of the method from an objective perspective. Information entropy represents the amount of information contained in the image. The larger the value, the more details the experimental results include. As shown in Table 1, the entropy of images restored using our method are higher than others. It shows that our method outperforms most of other methods in detail recovery. The effectiveness of our method is further verified by the quantitative analysis, which fully demonstrates the advantages of our method of detail recovery. To further verify the effectiveness of our method for synthetic images, we analyze the experimental results in Figure 13 through the information entropy and the peak signal noise ratio (PSNR). PSNR reflects the integrity of image structure. As shown in the Tables 2 and 3, our method and He [6] have advantages over other methods in information entropy. However, when it comes PSNR, He [6] for partial images has low value, which shows that the applicability of this method is poor.
In contrast, our method obtains higher values for all results in PSNR, the restored images are clearer and natural, and the effectiveness of image dehazing is improved. Table 2. The information entropy from images shown in Figure 13.

Haze Image
He Cai Chen Ren Our Table 3. The PSNR from images shown in Figure 13. In addition, the minimum mean square error (MMSE) and structural similarity index (SSIM) are introduced to further illustrate the advantages and disadvantages of each method [24,25], which can help us comprehensively compare the difference among the restored images. As shown in Figure 14, the experimental results using the methods of Cai [17], Chen [18] and Ren [19] in Figure 13 have higher MMSE, indicating that they do not completely achieve dehazing. Meanwhile, although the experimental results using the method of He [6] have lower MMSE, the SSIM are also lower. In contrast, the MMSE of our method is low, indicating that the details of the restored image are clear. At the same time, the SSIM of our method is the highest among all results. In conclusion, the reliability of our method is further verified by quantitative analysis.  Table 2. The information entropy from images shown in Figure 13. In addition, the minimum mean square error (MMSE) and structural similarity index (SSIM) are introduced to further illustrate the advantages and disadvantages of each method [24,25], which can help us comprehensively compare the difference among the restored images. As shown in Figure 14, the experimental results using the methods of Cai [17], Chen [18] and Ren [19] in Figure 13 have higher MMSE, indicating that they do not completely achieve dehazing. Meanwhile, although the experimental results using the method of He [6] have lower MMSE, the SSIM are also lower. In contrast, the MMSE of our method is low, indicating that the details of the restored image are clear. At the same time, the SSIM of our method is the highest among all results. In conclusion, the reliability of our method is further verified by quantitative analysis.

Computational Complexity
To compare and analyze the efficiency of each method, we select images of different sizes to test each method and record its running time. As we can see in Table 4, The running time of our method is longer than that of Ren [19] and is close to that of Cai [17]. However, the experimental results of our method are superior to those of Ren [19] and Cai [17] obviously. Since the Chen [18] uses 500 iterations to estimate the transmission, its computational complexity is higher than others. What's more, He [6] has a high computational complexity when refining the transmission through soft matting. In comparison, although our method is not the fastest, it realizes the effective dehazing without sacrificing the running time.

Computational Complexity
To compare and analyze the efficiency of each method, we select images of different sizes to test each method and record its running time. As we can see in Table 4, The running time of our method is longer than that of Ren [19] and is close to that of Cai [17]. However, the experimental results of our method are superior to those of Ren [19] and Cai [17] obviously. Since the Chen [18] uses 500 iterations to estimate the transmission, its computational complexity is higher than others. What's more, He [6] has a high computational complexity when refining the transmission through soft matting. In comparison, although our method is not the fastest, it realizes the effective dehazing without sacrificing the running time.

Conclusions
In summary, our method has a significantly improves the image quality of sensors in foggy conditions. To obtained the haze-free image, we estimated the transmission according to the features in the image, and constructed a mapping relationship between the features and the transmission map using GAN, which effectively avoided color distortion and retained image details. The experimental results in this paper are better when compared with most of other methods. However, our method is inferior to the sensor image obtained in dense fog. Our future focus will be on the dehazing of images with dense fog.
Author Contributions: All the authors conceived the idea for the paper, K.L. and L.H. implemented the algorithm, performed the experiments and analyzed the results; L.H. and D.B. provided funding acquisition; K.L. wrote the manuscript; and S.M. and S.G. revised the paper.