Enhancement of Underwater Images by CNN-Based Color Balance and Dehazing

: Convolutional neural networks (CNNs) are employed to achieve the color balance and dehazing of degraded underwater images. In the module of color balance, an underwater generative adversarial network (UGAN) is constructed. The mapping relationship between underwater images with color deviation and clean underwater images is learned. In the module of clarity improvement, an all-in-one dehazing model is proposed in which a comprehensive index is introduced and estimated by deep CNN. The third module to enhance underwater images adopts an adaptive contrast improvement method by fusing global and local histogram information. Combined with several underwater image datasets, the proposed enhancement method based on the three modules is evaluated, both by subjective visual effects and quantitative evaluation metrics. To demonstrate the advantages of the proposed method, several commonly used underwater image enhancement algorithms are compared. The comparison results indicate that the proposed method gains better enhancement effects for underwater images in different scenes than the other enhancement algorithms, since it can signiﬁcantly diminish the color deviation, blur, and low contrast in degraded underwater images


Introduction
Underwater information plays an important role in the human exploration and exploitation of the underwater world, for example, underwater archeology [1], underwater localization [2], underwater maintenance [3], underwater target recognition [4], underwater search and salvage [5], underwater environment monitoring [6], etc. Acoustic technology and optical technology are both used to obtain underwater information. Comparatively, optical images and videos present us a more intuitive understanding of underwater objectives. However, the particularity of the underwater environment degrades the visibility and quality of underwater images, and videos, for example, suffer from resultant color deviation, blur, and decrease in contrast. The attenuation and scattering of light propagation in water partly account for the degradation. Other reasons for the degradation are because of the movement of water or underwater creatures [7], temperature and salinity [8], noises such as salt and pepper noise, Gaussian noise, and marine snow [9]. To alleviate the degradation of underwater images and videos, one can resort to advanced equipment, for example, a divergent beam underwater Lidar imaging system [10] and a multistate underwater laser line scan system [11]. However, the expense of the equipment hinders its wide use. By contrast, image processing provides an effective way to obtain high-quality images and videos at low cost. During the last decade, the enhancement and restoration of underwater images received more and more attention.
This paper aims to comprehensively enhance underwater images from the aspects of color, clarity, and contrast. To achieve this, color deviation and blur are diminished by an underwater generative adversarial network and a CNN-based all-in-one dehazing model, respectively. To further improve the contrast of underwater images, an adaptive contrast 3. Fundamentals 3.1. Underwater Imaging The propagation of light in water is affected by underwater environments, which results in the attenuation and scattering of light. Influence factors refer to the density of water, selective refraction and absorption of light by water, underwater suspended particles, movement of water, the temperature, and the salinity of water. Consequently, optical underwater images commonly have the problems of color deviation, blur, and low contrast.
Due to the attenuation and scattering of light in underwater imaging, the received light intensity by a camera can be described as the sum of three parts: where I denotes the total light intensity; E d denotes the part of direct reflection of light; E f denotes the part of forward scattering of light; E b denotes the part of backward scattering of light. For the direct part E d , no scattering of light is considered but the attenuation. The where J(x) denotes the light received from an illumination source to the object; c is the coefficient of the attenuation of light; d(x) is the distance between the sensor (e.g., camera) and the underwater object. t(x) is introduced as the transmittance with definition t(x) = e −cd(x) . For the forward scattering of light E f , it relates to the disturbed reflection of light from an object. A representative disturbance is the underwater suspended particles. For small-angle scattering, E f can be calculated as a convolution [53], where g(x) is the point spread function (PSF). For the backward scattering of light E b , it derives from the reflection by underwater suspended particles instead of the object to be imaged. Therefore, it can be viewed as a noise in the underwater imaging model (1). The model of E b is where B ∞ is the water background. According to the Equations (2)-(4), the received light intensity by a camera expressed as (1) can be rewritten as Usually, the contribution of the forward scattering of light E f in the model (1) is much less than that of the direct reflection E d , and the backward scattering of light E b , especially when the image plant is close to the object to be imaged. Therefore, the model (5) can be simplified as Electronics 2022, 11, 2537 5 of 23 Equation (6) is often used as a restoration model in image processing. J(x) represents a restored image, while I(x) represents the real image received by a sensor-like camera. As aforementioned, t(x) relates to the attenuation of light in water while the second term in (6), i.e., B ∞ (x)(1 − t(x)) represents the backward scattering of light in water. In an idealistic case, when the attenuation and scattering of light in water disappears, obviously I(x) = J(x) holds. However, in real underwater imaging environments, the attenuation and scattering of light are inevitable and have to be considered. As can be inferred from (6), the restored image J(x) depends on t(x) and B ∞ (x) after I(x) is obtained by a sensor. Some methods were proposed to obtain t(x) and B ∞ (x), for example, the maximum intensity prior [54,55]; DCP [56,57]; red channel prior [18]; image blurring and light absorption [58]; and underwater light attenuation prior [59].

Convolutional Neural Network (CNN)
As a kind of artificial neural network, convolution neural network (CNN) is most commonly employed to analyze visual images. Examples of classical CNNs are LeNet-5, AlexNet, Inception, ResNet, and VGGNet. Based on the shared weight architecture of filters, CNN is of low complexity and little pre-processing. Moreover, due to the equivariant and invariant characteristics, CNN guarantees the stability in processing images. A typical CNN architecture consists of an input layer, convolutional layers, pooling layers, a full connection layer, and an output layer, as shown in Figure 1. In the example figure, 3 × 3 and 2 × 2 represent the size of convolutional kernel. When CNN is used in image processing, images are taken as inputs. Convolution is performed to extract features in images by filters and the features are further mapped by pooling. In the full connection layer, all features are collected to form the final output images.
Equation (6) is often used as a restoration model in image processing. J(x) rep a restored image, while I(x) represents the real image received by a sensor-like cam aforementioned, t(x) relates to the attenuation of light in water while the second (6), i.e., ( )(1 − ( )) represents the backward scattering of light in water. In a istic case, when the attenuation and scattering of light in water disappears, obviou = J(x) holds. However, in real underwater imaging environments, the attenuat scattering of light are inevitable and have to be considered. As can be inferred f the restored image J(x) depends on t(x) and ( ) after I(x) is obtained by a senso methods were proposed to obtain t(x) and ( ), for example, the maximum i prior [54], [55]; DCP [56,57]; red channel prior [18]; image blurring and light abs [58]; and underwater light attenuation prior [59].

Convolutional Neural Network (CNN)
As a kind of artificial neural network, convolution neural network (CNN) commonly employed to analyze visual images. Examples of classical CNNs are L AlexNet, Inception, ResNet, and VGGNet. Based on the shared weight architectu ters, CNN is of low complexity and little pre-processing. Moreover, due to the equ and invariant characteristics, CNN guarantees the stability in processing images. A CNN architecture consists of an input layer, convolutional layers, pooling layer connection layer, and an output layer, as shown in Figure 1. In the example figu and 2 × 2 represent the size of convolutional kernel. When CNN is used in ima cessing, images are taken as inputs. Convolution is performed to extract features in by filters and the features are further mapped by pooling. In the full connection l features are collected to form the final output images.
Activation functions, such as ReLU (rectified linear unit), are then employe crease the nonlinear property of CNN. Thus, feature maps can be obtained. Afte another important component, pooling, is conducted. Commonly used pooling c average pooling and max pooling, by which the size of the feature map can be r Moreover, the problem of over-fit can be avoided by pooling. After several convo layers and pooling layers, the full connection layer performs a high-level reaso As a vital component, the filter in CNN can be viewed as shared weights to extract the features in images. For different features, different filters can be selected. Given an input image X = x pq ∈ R M×N and the filter W = {w uv } ∈ R u×v , suppose the stride = 1, standard convolution operation yields Y = y ij as Activation functions, such as ReLU (rectified linear unit), are then employed to increase the nonlinear property of CNN. Thus, feature maps can be obtained. Afterwards, another important component, pooling, is conducted. Commonly used pooling concerns average pooling and max pooling, by which the size of the feature map can be reduced. Moreover, the problem of over-fit can be avoided by pooling. After several convolutional layers and pooling layers, the full connection layer performs a high-level reasoning to form the final output image. In this layer, all local features obtained in the convolutional layers are collected and the neurons between different layers are fully connected, as seen in regular ANN.

Underwater Image Enhancement by UGAN-Based Color Balance, CNN-Based Dehazing and Adaptive Contrast Enhancement
A sketch that depicts the procedure of the proposed underwater image enhancement is shown as Figure 2. As can be seen, CNN is employed in the color balance and dehazing while contrast improvement is conducted afterwards. The enhancement is conducted by three steps, i.e., color balance, dehazing, and contrast improvement. Firstly, CNN is used to form a higher-level architecture, i.e., UGAN to correct the color deviation of degraded underwater images. Secondly, to solve the problem of blur and the low definition of underwater images, a CNN-based integrated dehazing model is proposed. Finally, an adaptive contrast enhancement algorithm based on the fusion of global and local contrast information is used to improve the contrast of the underwater image.
in regular ANN.

Underwater Image Enhancement by UGAN-Based Color Balan Dehazing and Adaptive Contrast Enhancement
A sketch that depicts the procedure of the proposed underwa is shown as Figure 2. As can be seen, CNN is employed in the colo while contrast improvement is conducted afterwards. The enhanc three steps, i.e., color balance, dehazing, and contrast improvemen to form a higher-level architecture, i.e., UGAN to correct the color underwater images. Secondly, to solve the problem of blur and th derwater images, a CNN-based integrated dehazing model is prop tive contrast enhancement algorithm based on the fusion of globa formation is used to improve the contrast of the underwater image

Color Balance by Underwater Generative Adversarial Network (UG
A representative application of CNN is to construct generativ A generative adversarial network (GAN) is a framework for obtain in the form of a contest conducted by a generative network and d [60]. The framework operates in the manner of unsupervised learn ture, the mission of the discriminative network is to identify the the mission of generative network is to successfully cheat the disc creating realistic-looking data. Such a zero-sum game lasts until t wins. The structure of GAN can be depicted as Figure 3. A well network (D) is connected with a generative network (G) that wi discriminative network cannot distinguish whether the data are f or the sample generated by a generative network. Such a contes minimax optimization, i.e.,

Color Balance by Underwater Generative Adversarial Network (UGAN)
A representative application of CNN is to construct generative adversarial networks. A generative adversarial network (GAN) is a framework for obtaining generative models in the form of a contest conducted by a generative network and discriminative network [60]. The framework operates in the manner of unsupervised learning. In the GAN structure, the mission of the discriminative network is to identify the realness of data, while the mission of generative network is to successfully cheat the discriminative network by creating realisticlooking data. Such a zero-sum game lasts until the generative network wins. The structure of GAN can be depicted as Figure 3. A well-trained discriminative network (D) is connected with a generative network (G) that will be updated until the discriminative network cannot distinguish whether the data are from a true training set or the sample generated by a generative network. Such a contest can be modeled as a minimax optimization, i.e., where P data represents the distribution over true data, while is the prior on random noise. D(x) denotes the probability if x derives from true data. G(z) is the synthetic image.  During past years, GAN found increasing applications in the a and even games. The majority of GAN applications are with image p is preferred in forming the generator and discriminator. In the study to enhance underwater images. With this goal, the inputs to the gene be selected as low-quality or degraded underwater images while the ou network would be enhanced underwater images provided a well-t tained. Due to the same dimension of output images as the input imag tecture is selected in constructing the generative network. The U-ne convolutional network. It can be viewed as a combination of encod shown in Figure 4a. In the encoder module, downsampling is perfor tracting path. The spatial information is reduced, while feature inform In the decoder module, the feature and spatial information are combin pling and skip connection (concatenation) with high-resolution feature ing path in the encoder. In the study, the size of the input images is se Each contraction step consists of 4 × 4 filtering with stride 2 followed tivation and batch normalization (BN). In the decoder, the activation as ReLU.
In the discriminative network, PatchGAN is selected as the netw a Markovian discriminator, PatchGAN was proven to be a better disc ular GAN for the purpose of image processing, especially in terms o and image details. The main difference is that regular GAN discrimina that represents the input image is real or fake, while for PatchGAN, it puts a matrix in which each element represents a patch (receptive field that is real or fake. In the study, the size of the output of the discrimin convolution layer consists of 4 × 4 filtering with stride 2 along with L tion. The architecture of discriminator in the GAN is depicted as Figur height of the boxes represent the width and height of the feature map, correspond to different convolutional layers of the network.  During past years, GAN found increasing applications in the areas of art, science, and even games. The majority of GAN applications are with image processing and CNN is preferred in forming the generator and discriminator. In the study, GAN is employed to enhance underwater images. With this goal, the inputs to the generative network can be selected as low-quality or degraded underwater images while the outputs of generative network would be enhanced underwater images provided a well-trained GAN is obtained. Due to the same dimension of output images as the input images, the U-net architecture is selected in constructing the generative network. The U-net is a kind of fully convolutional network. It can be viewed as a combination of encoder and decoder, as shown in Figure 4a. In the encoder module, downsampling is performed along the contracting path. The spatial information is reduced, while feature information is increased. In the decoder module, the feature and spatial information are combined through upsampling and skip connection (concatenation) with high-resolution features from the contracting path in the encoder. In the study, the size of the input images is selected as 256 × 256. Each contraction step consists of 4 × 4 filtering with stride 2 followed by Leaky-ReLU activation and batch normalization (BN). In the decoder, the activation function is selected as ReLU.
convolutional network. It can be viewed as a combination of encoder and shown in Figure 4a. In the encoder module, downsampling is performed alo tracting path. The spatial information is reduced, while feature information i In the decoder module, the feature and spatial information are combined thro pling and skip connection (concatenation) with high-resolution features from t ing path in the encoder. In the study, the size of the input images is selected a Each contraction step consists of 4 × 4 filtering with stride 2 followed by Leak tivation and batch normalization (BN). In the decoder, the activation functio as ReLU.
In the discriminative network, PatchGAN is selected as the network arch a Markovian discriminator, PatchGAN was proven to be a better discriminat ular GAN for the purpose of image processing, especially in terms of imag and image details. The main difference is that regular GAN discriminator outp that represents the input image is real or fake, while for PatchGAN, its discrim puts a matrix in which each element represents a patch (receptive field) in the that is real or fake. In the study, the size of the output of the discriminator is 3 convolution layer consists of 4 × 4 filtering with stride 2 along with Leaky-R tion. The architecture of discriminator in the GAN is depicted as Figure 4b. Th height of the boxes represent the width and height of the feature map, and dif correspond to different convolutional layers of the network.   The loss functions used to train the discriminator and generator are selected as tein GAN (WGAN) function and L1-norm, respectively. To improve the stabi ing, a penalty term is added to the WGAN. Based on (8), the overall loss funct UGAN can be described as re λ1 is a weight factor and the loss function with penalty term is defined as re λGP is the penalty coefficient and P is obtained by sampling uniformly ight lines between and . The loss function for generator in the Equation ned as It is noted that in the applications of GAN, some focused on the generation of re erwater images by GAN (e.g., [45]), while some employed GAN to the color corr nderwater images (e.g., [49]). In this paper, GAN is used for the color balance o ater images. Similar to [49], the generator network in the study is a full convolu der-decoder and U-net architecture is used. Different from [49], in the s hGAN structure is used as the discriminator to guarantee the image resolutio In the discriminative network, PatchGAN is selected as the network architecture. As a Markovian discriminator, PatchGAN was proven to be a better discriminator than regular GAN for the purpose of image processing, especially in terms of image resolution and image details. The main difference is that regular GAN discriminator outputs a scalar that represents the input image is real or fake, while for PatchGAN, its discriminator outputs a matrix in which each element represents a patch (receptive field) in the input image that is real or fake. In the study, the size of the output of the discriminator is 32 × 32. Each convolution layer consists of 4 × 4 filtering with stride 2 along with Leaky-ReLU activation. The architecture of discriminator in the GAN is depicted as Figure 4b. The width and height of the boxes represent the width and height of the feature map, and different boxes correspond to different convolutional layers of the network.
The loss functions used to train the discriminator and generator are selected as Wasserstein GAN (WGAN) function and L1-norm, respectively. To improve the stability of training, a penalty term is added to the WGAN. Based on (8), the overall loss function in the UGAN can be described as where λ 1 is a weight factor and the loss function with penalty term is defined as where λ GP is the penalty coefficient and Px is obtained by sampling uniformly along straight lines between P data and P z . The loss function for generator in the Equation (9) is defined as It is noted that in the applications of GAN, some focused on the generation of realistic underwater images by GAN (e.g., [45]), while some employed GAN to the color correction of underwater images (e.g., [49]). In this paper, GAN is used for the color balance of underwater images. Similar to [49], the generator network in the study is a full convolutional encoder-decoder and U-net architecture is used. Different from [49], in the study, PatchGAN structure is used as the discriminator to guarantee the image resolution and image details.

CNN-Based Integrated Dehazing Model
The model (6) originated from the treatment of images from an air environment. Now it is widely used in underwater image processing to obtain a clean or restored image J(x). Based on (6), the clean image can be determined by As can be inferred, to obtain a clean image J(x), t(x), and B ∞ (x) should be estimated in advance. Accumulated errors result from estimating t(x) and B ∞ (x) individually. In the study, a comprehensive index is used to decrease such errors. Based on (12), an all-in-one dehazing model can be defined as where b is a constant and K(x) is determined by t(x) and B ∞ (x) as It is obvious from the definition (13) that the clean image J(x) depends on the estimation of K(x), which is achieved by using deep CNN in the study, as shown in Figure 5.
It is obvious from the definition (13) that the clean image J(x tion of K(x), which is achieved by using deep CNN in the study, In constructing the CNN for estimating K(x), multi-scale n increase the accuracy and efficiency of estimation. Moreover, a concatenated with a fine-scale network to decrease the informat tion operation. In the study, five convolutional layers are design performed among the convolutional layers. As can be recognize a concatenation between the first and second convolutional layers line); and between the second and third convolutional layers as line). The last concatenations are based on the first four layers line).  In constructing the CNN for estimating K(x), multi-scale networks are designed to increase the accuracy and efficiency of estimation. Moreover, a coarse-scale network is concatenated with a fine-scale network to decrease the information loss during convolution operation. In the study, five convolutional layers are designed and concatenation is performed among the convolutional layers. As can be recognized from Figure 6, there is a concatenation between the first and second convolutional layers (as labelled by the black line); and between the second and third convolutional layers as well (as labelled by red line). The last concatenations are based on the first four layers (as labelled by the blue line). In constructing the CNN for estimating K(x), multi-scale networks ar increase the accuracy and efficiency of estimation. Moreover, a coarse-sca concatenated with a fine-scale network to decrease the information loss du tion operation. In the study, five convolutional layers are designed and con performed among the convolutional layers. As can be recognized from Figu a concatenation between the first and second convolutional layers (as labelled line); and between the second and third convolutional layers as well (as la line). The last concatenations are based on the first four layers (as labelled line). The size of the filter in the first convolutional layer (denoted by con1 i selected as 1 × 1, 3 × 3 in the second layer (con2), 5 × 5 in the third layer (con fourth layer (con4); and 3 × 3 in the last layer (con5). The activation function RELU. Furthermore, to guarantee that the size of the output is the same as the in the study the padding varies with the size of the filter in different convolu   The size of the filter in the first convolutional layer (denoted by con1 in Figure 6) is selected as 1 × 1, 3 × 3 in the second layer (con2), 5 × 5 in the third layer (con3), 7 × 7 in the fourth layer (con4); and 3 × 3 in the last layer (con5). The activation function is selected as RELU. Furthermore, to guarantee that the size of the output is the same as the input image, in the study the padding varies with the size of the filter in different convolutional layers, which simplifies the network structure, compared with the required pooling and upsampling operation in the case of constant padding.
According to the relationship in the convolution operation that is the padding can be determined as in the case of N = W, where N represents the size of the input image; W the size of the output image; F the size of the filter; S the stride; and P the padding. Usually, the value of stride can be kept as 1. Therefore, the calculation of the padding can be simplified as In this way, the paddings in the five convolutional layers are 0, 1, 2, 3, and 1, respectively.

Adaptive Contrast Improvement of Underwater Images
Due to the particularity of underwater environments, the contrast of underwater images are usually low. Conventional means, such as histogram equalization (HE) do not perform well and some unexpected effects will happen, such as excessive enhancement, artifacts, and distortion. To further improve the details of underwater images, in the study an adaptive contrast improvement is proposed. Figure 7 presents the procedure. The core of the approach to contrast improvement is the histogram treatment that is composed of adjustable histogram equalization (AHE) and contrast limited adaptive histogram equalization (CLAHE). Preprocess includes linear stretching and transformation, while postprocess mainly refers to a fusion algorithm under a hue-preserving framework. The fusion algorithm combines the image obtained by AHE and the image obtained by CLAHE.
Electronics 2022, 11, x FOR PEER REVIEW

Adaptive Contrast Improvement of Underwater Images
Due to the particularity of underwater environments, the contrast of underw ages are usually low. Conventional means, such as histogram equalization (HE perform well and some unexpected effects will happen, such as excessive enhan artifacts, and distortion. To further improve the details of underwater images, in t an adaptive contrast improvement is proposed. Figure 7 presents the procedure. of the approach to contrast improvement is the histogram treatment that is com adjustable histogram equalization (AHE) and contrast limited adaptive histogram ization (CLAHE). Preprocess includes linear stretching and transformation, wh process mainly refers to a fusion algorithm under a hue-preserving framework. Th algorithm combines the image obtained by AHE and the image obtained by CLA  Liner stretching is to make the pixel value be within the range of [0, 255] by using the expressionX where c ∈ {r, g, b}; X max and X min are the maximal pixel value and minimal pixel value, respectively. Furthermore, transformation from RGB image to grayscale image is performed by using the expression [61].
Normalization is then conducted to histogram h by where n s is the number of the pixels that have the same scale value s; N is the number of all pixels. A uniformly distributed histogram h U can be obtained on the basis of h I . The size of h U is the same as h I and each element in h U is 1/256. Different from the standard HE, adjustable histogram equalization (AHE) is used in the study by optimizing where λ is a trade-off parameter. The solution of the above quadratic optimization problem is To obtain a proper λ, a tone distortion index is used [62] where T is the transfer function in contrast enhancement. A smaller tone distortion D indicates a smoother tone of the images reproduced by T. Therefore, the trade-off parameter λ can be determined by λ = minD(T).
By using the above AHE, the contrast of an image can be improved globally. However, this global approach to contrast enhancement might not be suitable for the case when local details of an image are necessary. Therefore, in the study a local contrast enhancement technique, the contrast limited adaptive histogram equalization (CLAHE), is combined with the AHE to comprehensively improve the contrast of underwater images. Moreover, to avoid the gamut problem caused by linear stretching (Equation (18)) and transformation from RGB space to grayscale space (Equation (19)), a hue-preserving framework is adopted in the contrast enhancement. The algorithm of the hue-preserving can be expressed as where G I (k) is the image processed by global or local contrast enhancement.
After the global contrast enhancement by AHE, the local contrast enhancement by CLAHE, and treatment by hue preservation (HP) by Equation (25), the channels of the image can be obtained by fusion of where A c is the image processed by AHE and HP while P c is the image processed by CLAHE and HP. The weightsŴ A (k) andŴ P (k) are determined bŷ where W d is determined by the contrast measure C d and the well-exposedness measure B d

Results and Analysis
To evaluate the proposed image enhancement measures, several underwater datasets are used and the evaluation is performed from both subjective and quantitative aspects. Datasets include Imagenet [51], EUVP [52], NYU2 [63], and RUIE [64]. The Imagent dataset is a large visual database designed for visual object recognition, in which more than 14 million images are contained. The EUVP dataset is constructed for enhancement of underwater visual perception, which contains a paired and an unpaired collection of 20 K underwater images of poor and good perceptual quality. The NYU2 dataset consists of more than 400 K images, containing 35,064 distinct objects, covering 894 different classes. The RUIE dataset, containing over 4000 images, is constructed as a large-scale underwater benchmark under natural light, and targets tasks including visibility degradation, color cast, and higher-level detection/classification. From the datasets Imagenet and EUVP, UGAN for color balance is trained and validated. A total of 13,863 images are used as a training set, while 2513 images are used for validation. From the dataset NYU2, CNN for dehazing is trained and validated. The training set includes 24,443 images, while 2813 images are for validation. Based on the UGAN, CNN, and the contrast enhancement proposed, images from the dataset RUIE are evaluated and compared with six conventional and commonly used algorithms in underwater image processing, including multi-scale retina enhancement algorithm with color restoration (MSRCR) [65], red channel prior (RCP) [18], underwater dark channel prior (UDCP) [15], ICM integrated color model (ICM) [24], relative global histogram stretching (RGHS) [66], and Retinex and multilayer perceptron (R-MLP) [67]. The comparison algorithms are briefly described as follows: MSRCR algorithm [65]: where r MSRCR i (x, y) represents the output image processed by MSRCR while r MSR i (x, y) is obtained by using MSR; C i (x, y) denotes the color restoration factor. RCP algorithm [18]: where J R (y), J G (y), J B (y) are original image; Ω(x) is a neighborhood of pixels around the x location. UDCP algorithm [15]: where A is the water background; t o is the threshold of transmittance. ICM algorithm [24]: where H denotes the hue, S denotes the saturation and I denotes the intensity, respectively, in a HSI color model transferred from RGB model. The RGHS algorithm adopts the histogram stretching function [66]: where p i and p o are the input and output pixels, respectively; I min , I max , O min , and O max are the adaptive pixels for the images before and after stretching. R-MLP algorithm [67]: where r (x, y) is the Gamma corrected map after the Retinex algorithm; multilayer perceptron outputs t (x, y) = MLP[t(x, y)]; t(x, y) is the transmission map of the dark channel of r (x, y). Figure 8 presents the visual effects of ten underwater images by using seven different enhancement algorithms, including the above six algorithms and the proposed algorithm. In consideration of the particularity of the underwater environment, the ten images are selective in that the images No.1   As can be seen, although the degraded underwater images can be enhanced by different algorithms in general, comparatively the proposed algorithm performs better than the other algorithms in terms of color balance, clarity, contrast and details. As can be recognized, the images treated by MSRCR are reddish. RCP and UDCP do not deal well with the bluish images, e.g., the images No.1, No.2, No.7, and No.10 in which the color deviation cannot be diminished effectively. For ICM algorithm, the color deviation in bluish and blue-green images cannot be diminished well. Moreover, low brightness happens in the images No.1, No.5, and No.8. For RGHS algorithm, although the brightness and details are improved, the color deviation cannot be dealt with well, especially for bluish and blue-green images. For the R-MLP algorithm, the color deviation is effectively diminished. Generally the image enhancement is better than the other competing algorithms. Nevertheless, the clarity and contrast need to be further improved. By contrast, the color deviation can be effectively diminished by using the proposed GAN. The clarity can be obviously improved by the proposed all-in-one model-based CNN. The contrast and details can be enhanced by the proposed adaptive contrast enhancement method.

Subjective Vision
Although the proposed fusion algorithm outperforms the other algorithms in enhancing the quality of underwater images, it should be noted that the real-time performance of the proposed algorithm needs to be improved. Table 1 compares the time spent in processing 100 images by using different algorithms. The running environment in computer is Pycharm; Windows 10; 16 G RAM; Inter i5-4590 CPU @ 3.30 GHz; and Nvidia Geforce GTX 2070 (8 G). As can be seen, the proposed algorithm spends the most time As can be seen, although the degraded underwater images can be enhanced by different algorithms in general, comparatively the proposed algorithm performs better than the other algorithms in terms of color balance, clarity, contrast and details. As can be recognized, the images treated by MSRCR are reddish. RCP and UDCP do not deal well with the bluish images, e.g., the images No.1, No.2, No.7, and No.10 in which the color deviation cannot be diminished effectively. For ICM algorithm, the color deviation in bluish and blue-green images cannot be diminished well. Moreover, low brightness happens in the images No.1, No.5, and No.8. For RGHS algorithm, although the brightness and details are improved, the color deviation cannot be dealt with well, especially for bluish and blue-green images. For the R-MLP algorithm, the color deviation is effectively diminished. Generally the image enhancement is better than the other competing algorithms. Nevertheless, the clarity and contrast need to be further improved. By contrast, the color deviation can be effectively diminished by using the proposed GAN. The clarity can be obviously improved by the proposed all-in-one model-based CNN. The contrast and details can be enhanced by the proposed adaptive contrast enhancement method.
Although the proposed fusion algorithm outperforms the other algorithms in enhancing the quality of underwater images, it should be noted that the real-time performance of the proposed algorithm needs to be improved. Table 1 compares the time spent in processing 100 images by using different algorithms. The running environment in computer is Pycharm; Windows 10; 16 G RAM; Inter i5-4590 CPU @ 3.30 GHz; and Nvidia Geforce GTX 2070 (8 G). As can be seen, the proposed algorithm spends the most time compared with the other algorithms when dealing with the same image samples under the same computer environment.  [18] 77.03 UDCP [15] 84.57 ICM [24] 118.83 RGHS [66] 155.89 R-MLP [67] 123.82 Proposed 167.56

Ablation Experiments
To verify the effectiveness of each module in the proposed image enhancement algorithm, ablation experiments are conducted. Five images out of Figure 8 are taken for verification. Figure 9 presents the visual effects by reducing each module. The images in the first column, denoted by A3, represents the images processed by GAN, the all-in-one model-based CNN, and adaptive contrast enhancement. The second column, A2, represents the results by GAN and the all-in-one model-based CNN, which implies the removal of contrast enhancement. The third column A1 keeps only the GAN while the all-in-one model-based CNN and contrast enhancement are removed. The last column is with the original images. As can be recognized, the visual effect is getting better and better along with the increasing modules. In detail, only color deviation is diminished by using only GAN while the clarity and contrast are to be improved further. By adding all-in-one modelbased CNN, the clarity is improved, besides the color balance. With the incorporation of contrast improvement, the overall visual effect can be improved comprehensively, in terms of color balance, clarity, contrast, and details.

Quantitative Evaluation
Due to the discrepancy of individual perception, the visual effect of a processed image might be different for a different person. Therefore, quantitative evaluation metrics

Quantitative Evaluation
Due to the discrepancy of individual perception, the visual effect of a processed image might be different for a different person. Therefore, quantitative evaluation metrics are necessary. In the study, four commonly used metrics, including root mean square (RMS) contrast, average gradient, underwater color image quality evaluation (UCIQE), and information entropy are used to evaluate the quality of processed underwater images. The RMS contrast reflects the degree of grayscale difference in an image. Average gradient indicates not only the contrast but also the clarity of an image. Comparatively, UCIQE is a more comprehensive metric, since chroma, saturation, and luminance contrast of an image are concerned in this metric. The last metric, information entropy, is a general metric, since it provides the amount of information contained in an image. The four dimensionless metrics are defined as follows. In general, the higher the value of any metric is, the better quality an image has.
The metric of RMS contrast is defined as: where M and N are the image width and height; I(x, y) is the pixel gray value at the point (x, y).
The metric of an average gradient is defined as: where f = f (x, y) is the gray value.
The UCIQE metric is defined as: where C i are constants; σ c is the standard deviation of chroma; con 1 is the brightness contrast; µ s is the mean saturation. The metric of information entropy is defined as: where i is the gray level with N as the maximum; p(i) is the probability when the pixel value equals i. By means of the above four metrics, Tables 2-5 present the evaluation results of ten aforementioned images processed by MSRCR, RCP, UDCP, ICM, RGHS, R-MLP, and the proposed hybrid enhancement method. It is noted that the best metric is shown in bold to distinguish it from the others. Figures 10-13 show the visual evaluation metrics.  Figure 10. Comparison of RMS contrast [15,18,24,[65][66][67]. Figure 11. Comparison of average gradient [15,18,24,[65][66][67].        As can be recognized from the comparison results, in terms of RMS contrast and average gradient, the proposed method obviously and exclusively gains better enhancement than the other algorithms. In terms of UCIQE, the proposed method generally outperforms the others, with three exceptions in image No.4 by R-MLP, in image No.5 by ICM, and in image No.6 by RCP. Similarly, in terms of information entropy, the proposed method generally outperforms the other algorithms, with four exceptions in image No.2 by RGHS, in images No.5 and No.9 by R-MLP, and in image No.10 by MSRCR. Nevertheless, it is noted that in these four exceptions the proposed method still performs well.
To further verify the effectiveness of the proposed enhancement method, the performance of edge detection is evaluated. In the field of image processing, edge detection is important for image classification, target recognition, and feature identification, since the results of detection reflect the quality of an image. Usually, a high-quality image contains    As can be recognized from the comparison results, in terms of RMS contrast and average gradient, the proposed method obviously and exclusively gains better enhancement than the other algorithms. In terms of UCIQE, the proposed method generally outperforms the others, with three exceptions in image No.4 by R-MLP, in image No.5 by ICM, and in image No.6 by RCP. Similarly, in terms of information entropy, the proposed method generally outperforms the other algorithms, with four exceptions in image No.2 by RGHS, in images No.5 and No.9 by R-MLP, and in image No.10 by MSRCR. Nevertheless, it is noted that in these four exceptions the proposed method still performs well.
To further verify the effectiveness of the proposed enhancement method, the performance of edge detection is evaluated. In the field of image processing, edge detection is important for image classification, target recognition, and feature identification, since the   As can be recognized from the comparison results, in terms of RMS contrast and average gradient, the proposed method obviously and exclusively gains better enhancement than the other algorithms. In terms of UCIQE, the proposed method generally outperforms the others, with three exceptions in To further verify the effectiveness of the proposed enhancement method, the performance of edge detection is evaluated. In the field of image processing, edge detection is important for image classification, target recognition, and feature identification, since the results of detection reflect the quality of an image. Usually, a high-quality image contains more edge information than a low-quality image. In the study, the Canny edge detector is employed. This detector applies a multi-step algorithm to detect a wide range of edges in images. During past decades, it was widely used in various computer vision systems. Figure 14 presents the comparison results of three images by different algorithms. As can be recognized, the proposed enhancement method gains more edge information in images than the other algorithms.

Conclusions
In the study, a CNN-based underwater image enhancement method is proposed. The enhancement strategy involves three modules. First, due to the commonly existing color deviation of underwater images, a CNN-based generative adversarial network (GAN) is constructed to achieve a color balance of underwater images. From the ablation experiments, it is confirmed that GAN can effectively diminish the color deviation. However, the processed images are blurred and the contrast is low. To improve the clarity of images, the conventional imaging model is modified to an all-in-one model, in which a comprehensive index is introduced and estimated by a deep CNN. This introduction is to reduce the accumulated errors resulting from the individual estimation of transmittance and background light. The third measure to enhance underwater images is to improve the overall contrast, and increase local details as well, by using a fusion algorithm. To demonstrate the effectiveness of the proposed method, five commonly used algorithms are compared. Comparison is conducted from subjective visual effects and quantitative evaluation. In the quantitative evaluation, several classical evaluation metrics are employed and edge detection is performed. From the comparison results, it can be seen that the proposed method gains better results over the other algorithms. The proposed method can effectively enhance the underwater image quality since it solves the problems of underwater

Conclusions
In the study, a CNN-based underwater image enhancement method is proposed. The enhancement strategy involves three modules. First, due to the commonly existing color deviation of underwater images, a CNN-based generative adversarial network (GAN) is constructed to achieve a color balance of underwater images. From the ablation experiments, it is confirmed that GAN can effectively diminish the color deviation. However, the processed images are blurred and the contrast is low. To improve the clarity of images, the conventional imaging model is modified to an all-in-one model, in which a comprehensive index is introduced and estimated by a deep CNN. This introduction is to reduce the accumulated errors resulting from the individual estimation of transmittance and background light. The third measure to enhance underwater images is to improve the overall contrast, and increase local details as well, by using a fusion algorithm. To demonstrate the effectiveness of the proposed method, five commonly used algorithms are compared. Comparison is conducted from subjective visual effects and quantitative evaluation. In the quantitative evaluation, several classical evaluation metrics are employed and edge detection is performed. From the comparison results, it can be seen that the proposed method gains better results over the other algorithms. The proposed method can effectively enhance the underwater image quality since it solves the problems of underwater image color degradation, image blur, and low contrast. The enhanced results are more in line with the visual perception of human eyes, conducive to the recognition of human eyes and machines.
Due to the complexity and uncertainty of underwater environments, enhancement of underwater images is challenging and more work needs to be conducted. It is noted that noise was not considered in the study. In the future work, efforts will be devoted to the removal of noises. Moreover, it should be noted that the method proposed in this paper is a combination of three algorithms, which implies it is time-consuming. For some occasions where real-time performance is required, the algorithm proposed is not suitable. Therefore, the improvement of the real-time performance of the algorithm will be studied in the next work. Additionally, it should be noted that the quality measures used in the study do not measure color correctness, and consequently are skewed towards rewarding the oversharpening of images. Such an issue will be studied in future work.