Underwater Image Enhancement Using Improved CNNBased Defogging

Due to refraction, absorption, and scattering of light by suspended particles in water, underwater images are characterized by low contrast, blurred details, and color distortion. In this paper, a fusion algorithm to restore and enhance underwater images is proposed. It consists of a color restoration module, an end-to-end defogging module and a brightness equalization module. In the color restoration module, a color balance algorithm based on CIE Lab color model is proposed to alleviate the effect of color deviation in underwater images. In the end-to-end defogging module, one end is the input image and the other end is the output image. A CNN network is proposed to connect these two ends and to improve the contrast of the underwater images. In the CNN network, a sub-network is used to reduce the depth of the network that needs to be designed to obtain the same features. Several depth separable convolutions are used to reduce the amount of calculation parameters required during network training. The basic attention module is introduced to highlight some important areas in the image. In order to improve the defogging network’s ability to extract overall information, a cross-layer connection and pooling pyramid module are added. In the brightness equalization module, a contrast limited adaptive histogram equalization method is used to coordinate the overall brightness. The proposed fusion algorithm for underwater image restoration and enhancement is verified by experiments and comparison with previous deep learning models and traditional methods. Comparison results show that the color correction and detail enhancement by the proposed method are superior.


Introduction
The ocean contains various abundant resources. However, it has not been effectively explored and exploited by humans, especially for the underwater world. Underwater image processing plays an indispensable role in the underwater operations by human or underwater robots, e.g., underwater archeology, environment monitoring, device maintenance, object recognition, and search and salvage. Affected by the suspending particles and the attenuation of light in water, the underwater optical images are of low quality, with flaws such as color distortion, low contrast, and even blurred details. Therefore, the degraded underwater images require enhancement and restoration to obtain more underwater information. In enhancing underwater images, classic techniques include histogram equalization, wavelet transform and Retinex algorithm [1][2][3][4]. However, there are some shortcomings for each of these algorithms. For example, loss of details of image and excessive enhancement might happen to histogram equalization. The effectiveness of wavelet transform is limited in processing the images obtained in shallow waters. For Retinex algorithm, the halo effect needs to be solved in the case of large brightness difference. To avoid those shortcomings, two or three out of these enhancement techniques are combined for examples in [5][6][7]. Results show that a fusion algorithm can provide a more comprehensive image enhancement effect. these methods only partly solve the effect of scattering. Although the image contrast is improved, color deviation cannot be well corrected. Wang et al. [23] firstly used T-network and A-network to estimate the blue channel, and then estimated the red and green channels through the relationship between the channels to restore underwater images. In order to learn the differences between different channels of underwater images, Li et al. [24], Sun et al. [25], and Uplavikar et al. [26] directly use a data-driven end-to-end network to learn the mapping relationship between underwater degraded images and clear images. It is known that the performance of data driven based methods largely depends on the quality of training data. The underwater images used for training are usually obtained by synthesis without color deviation (e.g., [16][17][18]) or with color deviation (e.g., [20]). However, due to the complex underwater environment, the color deviation of underwater images only contains various tones of blue or green. To improve the quality of the synthetic data set, Li et al. [27] used Generative Adversarial Networks (GAN) to produce underwater degraded images. However, the restored underwater images are still not realistic enough compared with real underwater images.
As discussed above, each enhancement or restoration method has inherent disadvantage in processing underwater images. One way to gain a high-quality underwater image is a fusion algorithm in which each module can perform well while its disadvantage can be compensated by other module. In this paper, a fusion algorithm is proposed. It includes a color balance algorithm based on the CIE Lab color model, an end-to-end CNN defogging algorithm based on foggy training set and a brightness equalization module. In this way, the mapping relationship between the underwater blurred image and the clear image can be obtained without synthesizing the data set of the underwater image. The inaccurate mapping due to the insufficient reality of the synthesized images can be also avoided. The color balance algorithm uses the channel value redistribution in the color channel to achieve color balance. The defogging algorithm uses deep separable convolution instead of standard convolution to reduce the amount of parameters in the calculation process. Moreover, sub-network is incorporated into the defogging module to reduce the depth of the main network. In the CNN structure, the Basic Attention module is used, which mainly includes Channel Attention block (CA) and Pixel Attention block (PA). CA is used to trace the difference in the distribution of fog between channels while PA is used to record the difference in haze weight between pixels. By aggregating different contextual information of images, a pooling pyramid module is added to improve the capability of the network model that contains global information. Moreover, cross-layer connections are used to preserve edge information. The brightness equalization module uses the Contrast Limited Adaptive Histogram Equalization method in the L channel (Luminance channel in CIE Lab color model) to coordinate the overall brightness.
The main contributions of the study include the construction of CNN to defog the underwater images, and the combination of three techniques to achieve more comprehensive quality of underwater images as well. To verify the fusion algorithm proposed, some degraded underwater images are processed and the results are compared between the proposed method and other algorithms including classic and advanced ones such as CNN-based algorithm. The underwater images used in CNN are derived from UIEB Dataset [28].
The rest of the paper is organized as: Section 2 gives the description of underwater imaging model and the color model used in the study; Section 3 illustrates the methods employed in the proposed fusion algorithm and the training dataset; Section 4 presents the image processing results and corresponding explanation; and the conclusion is given in the final section.  Figure 1 shows an underwater imaging model. In an underwater environment, the light captured by the camera is mainly composed of scene radiation and ambient light reflection. This model can be expressed as: where x represents a pixel in the original underwater image and c represents three color channels; I c (x) is the image captured by the camera; J c (x) represents a clear image or the so-called restoration image; A c represents the global ambient light and t c (x) represents the transmission map.  Figure 1 shows an underwater imaging model. In an underwater environment, light captured by the camera is mainly composed of scene radiation and ambient lig reflection. This model can be expressed as:

Underwater Optical Imaging Model
and b is the constant deviation value (usually set to 1).
As can be seen from definition of Kc(x), the transmittance and ambient light value jointly estimated, which can avoid the cumulative error caused by the training transmittance and the training for ambient light.

CIE Lab Color Model
The CIE Lab color model is a device-independent color system [30]. It uses a dig method to describe human visual perception and makes up for the deficiencies of RGB and CMYK color modes. The Lab color model is composed of three elements: L and b. L is the luminance channel, with a range of [0, 100], which means a variation fr According to (1), the restoration image J c (x) can be expressed as: It can be seen from the underwater imaging model that the key to restoring an underwater image is to estimate the transmittance of the image map and the corresponding ambient light value. To apply NN to the estimation of transmittance and ambient light, Li et al. [29] introduced a variable by deforming the atmospheric scattering model. This method can also be applied to underwater imaging models. Model (2) can be modified as: where and b is the constant deviation value (usually set to 1). As can be seen from definition of K c (x), the transmittance and ambient light value are jointly estimated, which can avoid the cumulative error caused by the training for transmittance and the training for ambient light.

CIE Lab Color Model
The CIE Lab color model is a device-independent color system [30]. It uses a digital method to describe human visual perception and makes up for the deficiencies of the RGB and CMYK color modes. The Lab color model is composed of three elements: L, a and b. L is the luminance channel, with a range of [0, 100], which means a variation from pure black to pure white. The a channel covers colors from dark green to gray, to bright pink, with the value range of [−128, 127]. The b channel includes colors from bright blue to gray, to yellow, with the value range of [−128, 127].

Underwater Image Enhancement Methods
Based on the models in Section 2, in this section some measures to enhance the underwater images are proposed. In order to improve the quality of underwater images, a fusion algorithm is constructed which involves a color balance algorithm, a CNN-based defogging algorithm and contrast limited adaptive histogram equalization (CLAHE). The color balance algorithm is used to eliminate the color deviation caused by the attenuation of light in the water while the CNN defogging algorithm and CLAHE are used to improve the contrast of the image. The process of the fusion algorithm is shown as Figure 2.

Underwater Image Enhancement Methods
Based on the models in Section 2, in this section some measures to enhance the u derwater images are proposed. In order to improve the quality of underwater image fusion algorithm is constructed which involves a color balance algorithm, a CNN-bas defogging algorithm and contrast limited adaptive histogram equalization (CLAH The color balance algorithm is used to eliminate the color deviation caused by the tenuation of light in the water while the CNN defogging algorithm and CLAHE used to improve the contrast of the image. The process of the fusion algorithm is sho as Figure 2.

Color Balance Algorithm
As can be recognized from the proposed scheme in Figure 2, the first step is achieve the color balance. It is known that in underwater environments, the light att uation varies with its wavelength. As the depth increases, an underwater image chan from greenish to bluish.  In order to reduce the influence of color shift, a color balance method based on C Lab color model is proposed. By moving the channel value to the middle position of channel histogram, color balance is achieved. At the first step, the RGB image is c

Color Balance Algorithm
As can be recognized from the proposed scheme in Figure 2, the first step is to achieve the color balance. It is known that in underwater environments, the light attenuation varies with its wavelength. As the depth increases, an underwater image changes from greenish to bluish.

Underwater Image Enhancement Methods
Based on the models in Section 2, in this section some measures to enhance the underwater images are proposed. In order to improve the quality of underwater images, a fusion algorithm is constructed which involves a color balance algorithm, a CNN-based defogging algorithm and contrast limited adaptive histogram equalization (CLAHE). The color balance algorithm is used to eliminate the color deviation caused by the attenuation of light in the water while the CNN defogging algorithm and CLAHE are used to improve the contrast of the image. The process of the fusion algorithm is shown as Figure 2.

Color Balance Algorithm
As can be recognized from the proposed scheme in Figure 2, the first step is to achieve the color balance. It is known that in underwater environments, the light attenuation varies with its wavelength. As the depth increases, an underwater image changes from greenish to bluish.  In order to reduce the influence of color shift, a color balance method based on CIE Lab color model is proposed. By moving the channel value to the middle position of the channel histogram, color balance is achieved. At the first step, the RGB image is con- In order to reduce the influence of color shift, a color balance method based on CIE Lab color model is proposed. By moving the channel value to the middle position of the channel histogram, color balance is achieved. At the first step, the RGB image is converted into the Lab image. The second step is to obtain the channel values Ma and Mb of the  (5) and (6), respectively. At the third step, the a channel and the b channel are updated by adding their respective correction values. For more intuitive results, the range of channel values is converted from [−128, 127] to [0, 255] by satisfying the boundary conditions that if the calculated new channel value is greater than 255, it is set to 255; if it is less than 0, let it be 0. Finally, by converting the processed Lab image to an RGB image, a color-balanced image is obtained.
To verify the effectiveness of the proposed color balance algorithm, it is compared with three conventional color balance algorithms including gray world [31], color shift check correction [32], and automatic white balance [33]. Figure (5) and (6), respectively. At the third step, the a channel and the b channel are updated by adding their respective correction values. For more intuitive results, the range of channel values is converted from [−128, 127] to [0, 255] by satisfying the boundary conditions that if the calculated new channel value is greater than 255, it is set to 255; if it is less than 0, let it be 0. Finally, by converting the processed Lab image to an RGB image, a color-balanced image is obtained.
To verify the effectiveness of the proposed color balance algorithm, it is compared with three conventional color balance algorithms including gray world [31], color shift check correction [32], and automatic white balance [33]. Figure 4 presents the visual comparison of the color balance results of six underwater images. It can be seen that in the case of weak color degradation (images No.1, No.2 and No.3), all methods have good processed results. In the case of serious color degradation (images No.4, No.5 and No.6), the proposed method gains a better color balance effect than the other three methods. For the sixth bluish image, even over-exposure happens to the other three methods.   [31]; (c) processed by color shift check correction [32]; (d) processed by automatic white balance [33]; and (e) processed by proposed method.

CNN based Defogging Algorithm
It can be seen from Figure 4 that although the color deviation is alleviated by the proposed color balance method, it seems that the processed image is covered with a thick layer of fog. In order to improve the contrast of the image, an end-to-end neural network defogging algorithm based on the underwater optical imaging model is proposed, shown as the second step in Figure 2.

Network Architecture
In the study, a modified underwater imaging model as (3) is employed to restore underwater images. In this model, a comprehensive index K(x) is introduced. The motivation is that calculating the ambient light and the transmission map separately will result in the accumulated errors derived from each individual estimation step. Instead, direct estimation of K(x) will reduce such errors. For this purpose, an end-to-end CNN based defogging network is established, as shown in Figure 5. The index K(x) is estimated by a UD-net. Afterwards, the restoration image can be obtained by using (3). To increase the speed of the network, depthwise separable convolution is employed instead of standard convolution. Depthwise separable convolution is a form of decomposed convolution [34]. It separates standard convolution integrals into depthwise convolution and pointwise convolution. In this way, the calculation burden and model size can be significantly reduced. Moreover, it can be seen that in the network construction for estimation of K(x), a sub-network is formed to shorten the depth of the main network [17]. As can be recognized from Figure 4, in the process of sub-network, three different depthwise separable convolutions (11*11, 9*9, 7*7) combined with BMU module are firstly designed to obtain a larger receptive field and learn better global information. Then, the sub-network performs 3*3 depthwise separable convolution and uses a sigmoid activation function to linearize the upsampling result. Finally, a 3-channel feature map will be obtained and output to the main network.

CNN based Defogging Algorithm
It can be seen from Figure 4 that although the color deviation is alleviated by the proposed color balance method, it seems that the processed image is covered with a thick layer of fog. In order to improve the contrast of the image, an end-to-end neural network defogging algorithm based on the underwater optical imaging model is proposed, shown as the second step in Figure 2.

Network Architecture
In the study, a modified underwater imaging model as (3) is employed to restore underwater images. In this model, a comprehensive index K(x) is introduced. The motivation is that calculating the ambient light and the transmission map separately will result in the accumulated errors derived from each individual estimation step. Instead, direct estimation of K(x) will reduce such errors. For this purpose, an end-to-end CNN based defogging network is established, as shown in Figure 5. The index K(x) is estimated by a UD-net. Afterwards, the restoration image can be obtained by using (3). To increase the speed of the network, depthwise separable convolution is employed instead of standard convolution. Depthwise separable convolution is a form of decomposed convolution [34]. It separates standard convolution integrals into depthwise convolution and pointwise convolution. In this way, the calculation burden and model size can be significantly reduced. Moreover, it can be seen that in the network construction for estimation of K(x), a sub-network is formed to shorten the depth of the main network [17]. As can be recognized from Figure 4, in the process of sub-network, three different depthwise separable convolutions (11*11, 9*9, 7*7) combined with BMU module are firstly designed to obtain a larger receptive field and learn better global information. Then, the sub-network performs 3*3 depthwise separable convolution and uses a sigmoid activation function to linearize the upsampling result. Finally, a 3-channel feature map will be obtained and output to the main network.  As can be seen from the construction of the sub-network, it involves five operations: depthwise separable convolution, basic attention module, max-pooling, upsampling and linear combination. It is noted that most conventional image defogging networks such as multi-scale CNN (MSCNN) [17] and all-in-one network [29] equally treat the distribution of fog on different channels and pixels. As a result, the fog density cannot be well estimated especially in complex scenes. In the study, a feature attention block [35] is introduced to solve such a problem. As shown in Figure 6, the feature attention (FA) block consists of Channel Attention (CA) and Pixel Attention (PA), which can provide additional flexibility in dealing with different types of information. In the CA part, firstly the channel-wise global spatial information is transformed into a channel descriptor by using global average pooling. The shape of the feature map changes from C*H*W to C*1*1. Then, to get the weights of different channels, features pass through two convolution layers with ReLu and sigmoid activation functions. Finally, to obtain the output, element-wise product is performed by using the input and the obtained weights of the channel. In the PA part, it directly feeds the input (i.e., the output of the CA part) into two convolution layers with ReLu and sigmoid activation functions. The shape of the feature map changes from C*H*W to 1*H*W. Finally, element-wise multiplication is performed by using the input and the weights of the pixel. It is noted that besides feature attention module, a basic attention block (BAB) structure adopts the way of local residual learning. Local residual learning allows the less important information such as thin haze or low-frequency region to be bypassed through multiple local residual connection so that the main network can focus on effective information. Different from the Convolutional Block Attention Module (CBAM) proposed in [36], the BAB module in the study uses only average pooling instead of the combination of average pooling and max pooling in CBAM, in the meanwhile residual learning is added to the BAB module.
As can be seen from the construction of the sub-network, it involves five operations: depthwise separable convolution, basic attention module, max-pooling, upsampling and linear combination. It is noted that most conventional image defogging networks such as multi-scale CNN (MSCNN) [17] and all-in-one network [29] equally treat the distribution of fog on different channels and pixels. As a result, the fog density cannot be well estimated especially in complex scenes. In the study, a feature attention block [35] is introduced to solve such a problem. As shown in Figure 6, the feature attention (FA) block consists of Channel Attention (CA) and Pixel Attention (PA), which can provide additional flexibility in dealing with different types of information. In the CA part, firstly the channel-wise global spatial information is transformed into a channel descriptor by using global average pooling. The shape of the feature map changes from C*H*W to C*1*1. Then, to get the weights of different channels, features pass through two convolution layers with ReLu and sigmoid activation functions. Finally, to obtain the output, element-wise product is performed by using the input and the obtained weights of the channel. In the PA part, it directly feeds the input (i.e., the output of the CA part) into two convolution layers with ReLu and sigmoid activation functions. The shape of the feature map changes from C*H*W to 1*H*W. Finally, element-wise multiplication is performed by using the input and the weights of the pixel. It is noted that besides feature attention module, a basic attention block (BAB) structure adopts the way of local residual learning. Local residual learning allows the less important information such as thin haze or low-frequency region to be bypassed through multiple local residual connection so that the main network can focus on effective information. Different from the Convolutional Block Attention Module (CBAM) proposed in [36], the BAB module in the study uses only average pooling instead of the combination of average pooling and max pooling in CBAM, in the meanwhile residual learning is added to the BAB module. One of the shortcomings of conventional underwater image enhancement methods is the loss of details, for example the classic histogram equalization. In order to make the processed results contain more details, a pyramid pooling block is incorporated into One of the shortcomings of conventional underwater image enhancement methods is the loss of details, for example the classic histogram equalization. In order to make the processed results contain more details, a pyramid pooling block is incorporated into defogging network shown as Figure 5. The pyramid pooling block that locates before the last convolution layer is an enhancing module (EM) to expand the representational ability of network [37]. The use of pyramid pooling can learn more context information based on different receptive fields. As a result, the image processing results can be refined. Figure 7 Electronics 2022, 11, 150 9 of 21 presents the structure of pyramid pooling block. It is designed to integrate the details of features from multi-scale layers and to obtain global context information by learning on different receptive field. As can be seen, a four-scale pyramid is built by downsampling the outputs of the conventional layers with factors 4*, 8*, 16*, and 32*. Each scale layer contains a 1*1 convolution. After upsampling, feature maps before and after the pyramid pooling are concatenated. Subsequently, a 3*3 convolution is added to align feature maps.
defogging network shown as Figure 5. The pyramid pooling block that locates before last convolution layer is an enhancing module (EM) to expand the representational ab ity of network [37]. The use of pyramid pooling can learn more context informat based on different receptive fields. As a result, the image processing results can be fined. Figure 7 presents the structure of pyramid pooling block. It is designed to in grate the details of features from multi-scale layers and to obtain global context inf mation by learning on different receptive field. As can be seen, a four-scale pyramid built by downsampling the outputs of the conventional layers with factors 4*, 8*, 1 and 32*. Each scale layer contains a 1*1 convolution. After upsampling, feature ma before and after the pyramid pooling are concatenated. Subsequently, a 3*3 convolut is added to align feature maps.

Implementation Details
In this section, we specify the implementation details of our proposed UD-N UD-Net framework is employed in the study. The training environment of the UD-N is: Windows10; NVIDIA GeForce RTX 3080; Inter ® Core™ i5-8500.
The dataset used is the fogging dataset synthesized by Li et al. [38], which is bas on the Outdoor Training Set (OTS) depth database by setting different atmospheric lig intensity Ac and transmittance tc(x). In total, 32,760 images out of the dataset are taken the training set, while 3640 images are taken as the validation set. The dimension of i ages from used dataset is 320*320. In the training process, Gaussian random variab are used to initialize the network weights. The activation function ReLU is used to a non-linearity to the network after each convolution. The loss function is mean squar error (MSE). Learning the mapping relationship between hazy images and correspon ing clean images is achieved by minimizing the loss between the training result and corresponding ground truth image. The optimizer is the Adaptive Moment Estimat (Adam). Detailed parametric setting is as follows. The learning rate of the network is to 0.0001; the number of batch samples (batch_size) is set to 8; the epoch is set to 10; decay parameter of weight is set to 0.001. Detailed parameter settings of our propos UD-Net (as shown in Figure 5) are summarized in Table 1.

Implementation Details
In this section, we specify the implementation details of our proposed UD-Net. UD-Net framework is employed in the study. The training environment of the UD-Net is: Windows10; NVIDIA GeForce RTX 3080; Inter ® Core™ i5-8500.
The dataset used is the fogging dataset synthesized by Li et al. [38], which is based on the Outdoor Training Set (OTS) depth database by setting different atmospheric light intensity A c and transmittance t c (x). In total, 32,760 images out of the dataset are taken as the training set, while 3640 images are taken as the validation set. The dimension of images from used dataset is 320*320. In the training process, Gaussian random variables are used to initialize the network weights. The activation function ReLU is used to add non-linearity to the network after each convolution. The loss function is mean squared error (MSE). Learning the mapping relationship between hazy images and corresponding clean images is achieved by minimizing the loss between the training result and the corresponding ground truth image. The optimizer is the Adaptive Moment Estimation (Adam). Detailed parametric setting is as follows. The learning rate of the network is set to 0.0001; the number of batch samples (batch_size) is set to 8; the epoch is set to 10; the decay parameter of weight is set to 0.001. Detailed parameter settings of our proposed UD-Net (as shown in Figure 5) are summarized in Table 1.

Contrast Limited Adaptive Histogram Equalization (CLAHE)
As aforementioned, a restoration model as (3) applies to processing hazy images in the air and underwater images as well. However, in most cases, the underwater background is darker than the air background. It implies that a further treatment is required for underwater images. In the study, Contrast Limited Adaptive Histogram Equalization (CLAHE) is used to process the Luminance channel (L channel in CIE Lab color space) of underwater images after defogging. In this way, the processed images can have better brightness and contrast. Figure 8 shows the result after CLAHE processing. The parameter setting is that the contrast limit is 1.1 in the area of 3*3. As can be seen, after defogging and subsequent CLAHE process, the quality of the underwater images is significantly improved.

Validation
To validate the proposed method, underwater images are processed and e from both subjective and objective aspects by comparing with several classic

Validation
To validate the proposed method, underwater images are processed and evaluated from both subjective and objective aspects by comparing with several classic and advanced methods used in underwater image processing, including RCP [11], integrated color model (ICM) [39], relative global histogram stretching (RGHS) [40], multi-scale Retinex with color restoration (MSRCR) [41], FUnIE-GAN [42], and UIE-CNN [43]. The images are taken from the UIEB Dataset [28], in which different degradation conditions are considered.

Subjective Visual Evaluation
The subjective visual evaluation consists of three parts, i.e., the comparison with other algorithms, ablation experiment and edge information detection. Figure 9 presents the results of twelve underwater images processed by six methods mentioned and the proposed method in the paper. It can be seen from the comparison that the proposed method generally outperforms the other algorithms, in terms of color balance, image details and contrast. For ICM and RCP, although they can improve the contrast of images, the color deviation cannot be eliminated well. For RGHS, it only shows good results in processing bluish images. Moreover, the robustness of RGHS is poor. For the MSRCR algorithm, it can effectively alleviate the color distortion. However, the processed image is too bright and the overall contrast is low. FUnIE-GAN has demonstrated the ability to restore underwater images to a certain extent, but the effect is mediocre. UIE-CNN performs well in general, but it does not work well when faced with more severely distorted scenes, as can be recognized from the results w.r.t. the images No.1 and No.12.

Validation
To validate the proposed method, underwater images are processed and evaluated from both subjective and objective aspects by comparing with several classic and advanced methods used in underwater image processing, including RCP [11], integrated color model (ICM) [39], relative global histogram stretching (RGHS) [40], multi-scale Retinex with color restoration (MSRCR) [41], FUnIE-GAN [42], and UIE-CNN [43]. The images are taken from the UIEB Dataset [28], in which different degradation conditions are considered.

Subjective Visual Evaluation
The subjective visual evaluation consists of three parts, i.e., the comparison with other algorithms, ablation experiment and edge information detection. Figure 9 presents the results of twelve underwater images processed by six methods mentioned and the proposed method in the paper. It can be seen from the comparison that the proposed method generally outperforms the other algorithms, in terms of color balance, image details and contrast. For ICM and RCP, although they can improve the contrast of images, the color deviation cannot be eliminated well. For RGHS, it only shows good results in processing bluish images. Moreover, the robustness of RGHS is poor. For the MSRCR algorithm, it can effectively alleviate the color distortion. However, the processed image is too bright and the overall contrast is low. FUnIE-GAN has demonstrated the ability to restore underwater images to a certain extent, but the effect is mediocre. UIE-CNN performs well in general, but it does not work well when faced with more severely distorted scenes, as can be recognized from the results w.r.t. the images No.1 and No.12.

Comparison with Other Algorithms
No.1

Ablation Experiment
In order to verify the effectiveness of sub-modules in the proposed fusion algorithm in this paper, six sets of images were subjected to ablation experiments in which each module is gradually removed in the combined algorithm. The results are shown in Figure  10. It can be seen from the ablation experiment that each algorithm module has achieved the corresponding processing effect, which helps to enhance the quality of the final output image. No.12 No.11 Figure 9. Processed results of images from UIEB dataset. (a) Underwater images from UIEB dataset; (b) processed by ICM [39]; (c) processed by RCP [11]; (d) processed by RGHS [40]; (e) processed by MSRCR [41]; (f) processed by FUnIE-GAN [42]; (g) processed by UIE-CNN [43]; and (h) processed by the proposed method.

Ablation Experiment
In order to verify the effectiveness of sub-modules in the proposed fusion algorithm in this paper, six sets of images were subjected to ablation experiments in which each module is gradually removed in the combined algorithm. The results are shown in Figure 10. It can be seen from the ablation experiment that each algorithm module has achieved the corresponding processing effect, which helps to enhance the quality of the final output image.  In order to further verify the dehazing module, it is used to process some fogging images. The proposed method is compared with the several widely used defogging methods including DCP [9], AOD [29], DehazeNet [16], and MSCNN [17]. From the synthetic objective testing set (SOTS) [38], images of different haze concentrations are selected. The qualitative comparison of the visual effect is presented in Figure 11, in which eight images are investigated. As can be recognized, color distortion happens to DCP method (severe in No.6) due to the underlying prior assumptions. In this case, the depth details in image might be lost. For AOD method, it cannot remove the haze well when the fog density is large. DehazeNet method tends to output low-brightness images. MSCNN recovers images with excessive brightness. By contrast, the proposed defog- In order to further verify the dehazing module, it is used to process some fogging images. The proposed method is compared with the several widely used defogging methods including DCP [9], AOD [29], DehazeNet [16], and MSCNN [17]. From the synthetic objective testing set (SOTS) [38], images of different haze concentrations are selected. The qualitative comparison of the visual effect is presented in Figure 11, in which eight images are investigated. As can be recognized, color distortion happens to DCP method (severe in No.6) due to the underlying prior assumptions. In this case, the depth details in image might be lost. For AOD method, it cannot remove the haze well when the fog density is large. DehazeNet method tends to output low-brightness images. MSCNN recovers images with excessive brightness. By contrast, the proposed defogging network produces better images when dealing with hazy images. This can be further confirmed by using two typical objective evaluation metrics including PSNR (Peak Signal to Noise Ratio) and SSIM (Structural SIMilarity) [44]. The quantitative evaluation results and comparison are listed in Tables 2 and 3. The PSNR and SSIM values of the proposed method are greater than those of others in almost all cases, which indicates that the proposed CNN based defogging network has better applicability and robustness. It is noted that all the best performance indices are emphasized by bold font.

Edge Information Detection
In the field of image classification, target detection and feature recognition, the acquisition and application of image edge information play an important role. The higher the quality of the image is, the more obvious the edge information is. On the contrary, the low-quality images have less obvious edge information. To further demonstrate the practicability of proposed method, three representative images, No. 2,No.8 and No.12, are selected to study the performance of edge detection. These three images are obtained from different environments and different backgrounds. Canny edge detection is performed. Figure 12 shows the results of edge detection by different methods. It can be seen that the edge information obtained by the proposed method is more abundant than the other methods.

Objective Quality Evaluation
Subjective visual evaluation is easily affected by human feelings. Different observers may draw different conclusions on the same group of pictures. Therefore, in the study, an objective quality evaluation of the underwater image processing results is also carried out. The intrinsic characteristic indices of the image are expressed in digital forms to objectively measure the image quality. In this paper, average gradient, RMS (Root Mean Square) contrast, UCIQE (underwater color image quality evaluation) and information entropy are employed to objectively evaluate the quality of the enhanced underwater images. Compared with the conventional metrics such as average gradient and RMS, UCIQE provides a comprehensive evaluation way. Chroma, saturation, and luminance contrast of an image can be evaluated by this metric [48]. In general, the higher UCIQE value is, the better quality an image has. UCIQE metric is defined as: where σ c is the standard deviation of chromatically, con 1 is the contrast value of brightness, µ s is the mean value of saturation; and C i stands for the weight coefficient. In the paper, C 1 = 0.4680, C 2 = 0.2745, and C 3 = 0.2576 are chosen by reference to [49].
Besides UCIQE, another comprehensive metric, information entropy, is selected to evaluate the proposed image enhancement method. Information entropy refers to the average amount of information contained in an image [49]. In general, the larger the value of entropy information is, the more information the image includes. Information entropy can be described as: where n is the maximum gray level of an image; i is the gray level of the pixel point; p(i) represents the probability when the pixel value of the image is equal to i. For the twelve underwater images that have been evaluated from subjective visual aspect, the results of the above four objective evaluation metrics by seven different image processing methods are listed in Tables 4-7. In detail, Table 4 gives the results evaluated by the metric average gradient; Table 5 is with the RMS metric; Table 6 is with the UCIQE; Table 7 is with the entropy information.
It can be clearly seen from Tables 4 and 5 that the proposed method is significantly better than other methods in terms of average gradient and contrast. In terms of the comprehensive metrics including UCIQE and entropy information shown in Tables 6 and 7, the performance of the proposed outperforms the other methods in almost all cases. Two exceptions for UCIQE metric include the results obtained by UIE-CNN for image No.1 and by RGHS for image No.4. Nevertheless, it is noted that in these two cases the results obtained by the proposed method gains the second best evaluation. For the metric entropy information, only one exception happens to the image No.6. In this case, the result obtained by RGHS is the best while the proposed method gains the third best evaluation. In general, it can be summarized that the proposed underwater image enhancement strategy is better than other methods, in terms of contrast, color coordination, edge information and information. It performs well in dealing with the problems of color degradation, image blur and low contrast of underwater images. Table 8 shows the comparison of the time spent in processing 100 underwater images by the proposed algorithm and other algorithms including ICM, RCP, RGHS, MSRCR, Funie-Gan and UIE-CNN. It can be seen that the most of the time spent in the proposed algorithm is with a color restoration module that is based on a traditional method.

Conclusions
A hybrid underwater image enhancement and restoration algorithm is proposed that is composed of a color balance algorithm, an end-to-end defogging network algorithm, and a CLAHE algorithm. By studying the color imaging characteristics of underwater images and the histogram distribution characteristics of the image in the CIE Lab color space, a color balance algorithm is proposed to eliminate the color deviation of underwater images. Then, according to the similarity of underwater image imaging characteristics and foggy imaging characteristics, a CNN-based dehazing network is proposed to remove the blur of underwater images and to improve the contrast. Due to the difference between the foggy air environment and the underwater environment, a CLAHE algorithm is used to process the luminance channel in the CIE Lab color space to improve contrast and brightness of underwater images.
The proposed underwater image processing strategy is verified by using foggy air images and validated by using representative underwater images. Processed results show that the proposed method can effectively eliminate the color deviation and ambiguity in underwater images and meanwhile improve the image contrast. Comparison is conducted between the proposed method and other classic and advanced methods. In addition to the visual effect evaluation, four commonly used image quality evaluation metrics (average gradient, RMS contrast, UCIQE, and information entropy) are used to evaluate the effectiveness and advantages of the proposed algorithm. The comparison results indicate that the proposed algorithm is superior to other algorithms in terms of performance and robustness.
Notably, Table 8 shows that the proposed algorithm spends more time in processing images than other algorithms, which affects its real-time performance especially when a large number of images are to be processed. How to improve the real-time performance of the fusion algorithm will be studied in future work.