RGB-Based Triple-Dual-Path Recurrent Network for Underwater Image Dehazing

: In this paper, we present a powerful underwater image dehazing technique that exploits two image characteristics—RGB color channels and image features. In using RGB color channels, each color channel is decomposed into two units based on the similarities via the k-mean. This markedly improves the adaptability and identiﬁcation of similar pixels, and thus reduces pixels with a weak correlation, leaving only pixels with a higher correlation. We use an inﬁnite impulse response (IIR) in the triple-dual and parallel interaction structure to suppress hazed pixels via a pixel comparison and ampliﬁcation to increase the visibility of even very minor features. This improves the visual perception of the ﬁnal image, thus improving the overall usefulness and quality of the image. The softmax-weighted fusion is ﬁnally used to fuse the output color channel features to attain the ﬁnal image. This preserves the color, leaving our proposed method’s output very true to the original scene’s. This is accomplished by taking advantage of adaptive learning based on the conﬁdence levels of the pixel contribution variation in each color channel during subsequent fuses. The proposed technique both visually and objectively outperforms the existing methods in several rigorous tests.


Introduction
Underwater images have tremendous usage in marine engineering. However, poor underwater image quality caused by the presence of wavelength-dependent light absorption and scattering [1] often hinders their use. Underwater image dehazing is an approach to combat this, where underwater images are processed to improve quality, thereby increasing their application in the marine environment. The processing focuses on reducing the effects of wavelength-dependent light absorption and scattering. According to Alenezi et al. [1,2], an underwater dehazing model can be defined as: where Γ c (x) denotes the intensities of the c ∈ {R,G,B} color channel at the pixel x in an input underwater image. Λ c (x) denotes the scene radiance image and Θ c denotes the ambient light. Λ c (x)τ c (x) is the direct transmission, representing the attenuated scene radiance by transmission. η c (x) denotes a point spread function of pixel x. Similar to in-air dehazing models [3], underwater dehazing models aim to reduce the effect of haze in images. However, unlike in the atmospheric model, the underwater model presented in (1) considers scene radiance, Λ c (x), as a function of point spread to take into consideration the effects of wavelength-dependent light absorption. This makes underwater image dehazing a complex phenomenon that requires the continual exploration of effective techniques in order to improve the image quality and the usability of underwater images. Recent years have seen underwater image dehazing attracting attention, leading to many suggested techniques. Traditional methods, such as image restoration and enhancement, estimate the dehazing models' parameters to reduce the effect of haze. Such models block, a TDPRN block with a parallel interaction function, image reconstruction and the softmax function for image fusion. The network information is modified from the already existing network by [21]. Given the hazy underwater image, the network decomposes the image into the RGB channels and then the TDPRN uses the feature extraction block and the transmission map estimation block to extract features from the color channels from the hazed underwater images. These features are then fed into the dual-path block via three parallel branches to restore the image features and improve the color of the dehazed images. Unlike [21]'s structure, the proposed structure has three units of convolutionlong short-term memory (convLSTM) in each branch and a convolution layer based on the corresponding color channels' pixels. ConvLSTM has the ability to learn and store information on the input image of the pixel correlation and compare it with the output. The communication between the interacting layers enables a comparison of the correlation patterns, thus enhancing the extraction of features in the output images. This extraordinary communication and comparison help approximate the infinite impulse response (IIR) model, which was already proposed by [21,22]. A parallel interaction function is also proposed to fuse the intermediary features between the branches. Thus, the basic features and information of the dehazed image are recovered alternatively. The corresponding features based on each color channel are then processed stepwise to obtain the ultimate dehazed image via a series of softmax-weighted fusion, whose details are discussed in detail by Zhao et al. [23].
The proposed technique, presented in Figure 1, can produce an output image with improved visual perception. Figure 1 shows a summary of the visual perception improvement of the proposed method compared to the input images. The top row contains the raw (hazed) images. The bottom (second) row shows the corresponding output of the proposed method. The summary presented in Figure 1 indicates that the proposed technique can learn and reduce the effects of haze in the output images.

Contribution
This proposed paper contains the following significant contributions:

1.
The input image is decomposed according to the RGB color channels and the features, with each color channel decomposed into two units based on the similarities via the k-means. The k-means are described in detail by [24,25]. This guarantees the ease of adaptability and identification of similar pixels, and thus, by extension, removes pixels with a weak correlation, leaving only pixels with a higher correlation.

2.
The structure's triple-dual and parallel interaction allows a comprehensive comparison; hence, even minor features, i.e., pixels with the weakest correlations, are considered. This improves the visual perception of the final image.

3.
The use of softmax-weighted fusion in the arrangement of the proposed structure also preserves the color, which explains why the proposed result's color is very similar to the input color. This is achieved via adaptive learning based on the confidence levels of the pixel contribution variation in each color channel during the subsequent fuses. The proposed TDPRN network consists of feature and transmission map blocks. The triple-dual-path block has a series of parallel interaction functions and a softmaxweighted fusion block. The first and second convolution layers increase the width to 16, reduce the resolution of the feature maps in each color channel and increase the width of the image to 32. After each convolution layer, a Leaky ReLU with a slope of −0.01 (based on the experimental findings) is added to the feature extraction block. The block for transmission map uses RGB color channels from the underwater hazy image to estimate transmission maps. The respective color channel image features and their corresponding transmission maps are fed into the dual-path blocks. The blocks contain parallel branches for restoration and dynamic fusion of the basic content of the intermediate image details. The reconstruction block consists of a 9 × 9 convolution layer, a bi-linear up-sampling layer and a 3 × 3 convolution layer. The bi-linear up-sampling layer up-samples the color channel image features to twice the input size. The width of the color channels' feature maps are reduced by the 3 × 3 convolution layer. The reconstructed dehazed underwater color channels are then fused via softmax-weighted fusion to attain final image.

Triple-Dual-Path Block
Equation (1) suggests that the underwater dehazed image can be found from where and Ω c and δ c are the empirical coefficients of the c color channel related to the hazed image scene, such that |Ω c | < |Π c | . F −1 c denotes the inverse Fourier transform and c denotes the radial frequency. The term Θ c (1 − τ c (x)) in (1) is the backward scattering term with Θ c being the background light of the c color channel.
where d c (x) is the underwater depth scene at pixel x c ∈ {i, j}. φ c i are color channel-based linear coefficients derived from the pixel difference plots between the highest and lowest pixel values. ε c (x) is the mean intensity function showing the absolute difference between the pixels in the color channels. c (x) is the mean intensity function showing the absolute difference between the pixels in the color channels. c (x) is the mean intensity function showing the absolute difference between the pixels in the color channels. As one of the improvements in the proposed method as compared to the existing [2,[26][27][28][29][30], this proposed scene depth use of pixel intensities in the color channels strengthens scene artifacts. We re-write (5) as Equation (6) suggests that the scene depth value increases with an increase in the pixel difference between the maximum and minimum values. The decomposition of (6) into three different color channels enhances the accuracy of the representation of the original image. This is because each pixel is a sample of an original image. This further enables accurate estimation of scene depth, and global background light helps improve the accuracy of underwater image dehazing. It also allows the network to concentrate on features per the color channels. This helps control and identification of features more accurately.
Equation (3) suggests that dehazed image has two components: the basic content details τ c (x) and the image details θ c Ψ c . In air dehazing models, η c = 0. However, in underwater imaging, η c is given by (4). Therefore, the proposed dehazing model is assumed to be composed of two functions: Equation (7) indicates that underwater image dehazing comprises two parts-the basic content details and the image details divided by the point spread function η c . The motivation for this approach is that the hazing effect varies throughout the image; thus, the assumption of parameter values does not give accurate results. Therefore, in order to dehaze the images accurately, the treatment of the image pixels should be homogeneous but non-static. Using Λ c 1 and Λ c 2 to approximate the clear image may render the estimation of transmission map and global light useless. In this paper, we consider the d c (x) as given by (6) to tighten the approximation of the transmission map. This is indicated in Figure 2 and helps increase the color concentration of the output image compared to the existing input.
The estimates of Ξ 1 (Γ c , τ c ) and Ξ 2 (Θ c , Ψ c ) are fundamental to meeting the objective of the paper. We employ the infinite impulse response (IIR) due to its versatility, ease of computation and cost-friendliness. Its use in this paper was also guided by the need to amplify the pixels with strong correlation to suppress the hazed pixels. This also explains why the proposed method has a more extensive concentration of color channels than existing methods, as presented in Figure 2. IIR models are often approximated as a cascade of summation of lower structures, via recurrent neural networks [21,31]. Using IIR model, Figure 2. A summary of the effectiveness of including the transmission map function to extract more color channels compared to existing techniques. From left to right are the raw underwater images, results from Fus [32], WCID [33], Ts [34], LD [35] and proposed results. Image, R, G, B color channel concentration are shown from top to bottom.
We use a similar approach to estimate Λ c 2 . With this, we propose a dual-path block for underwater image dehazing based on the IIR models summarized by (8) and (9). Figure 3 illustrates that the proposed recurrent neural network used to approximate IIR for the proposed technique consists of five units with three branches, each branch representing different color channels. Each unit contains ConvLSTM and a pixel-wise convolution layer. ConvLSTM decides the type of information to store and omit from the network at every step each branch gradually reduces the effect of haze in the image, as indicated in Figures 4 and 5. Furthermore, the LSTM's general proven capabilities in handling the long-range dependencies make it usable in establishing correlations between local and global pixel neighborhoods. Thus, image features are extracted and preserved throughout the process. The output features from each branch are fused via softmax-weighted fusion to obtain the final output image (see Figures 3 and 4). The details of the softmax-weighted fusion stack are summarized by [23]. The stack is chosen due to its ability to adaptively learn the variation of pixels in the corresponding {R, G, B} color channels output images and fuse the images based on the contribution's modalities to the final image. the image content and features to produce fine-tuned details in the final images. Thus, every arrow in the block performs a unique function: blue estimates the global atmospheric light, red estimates the transmission maps and the yellow arrow transfers features to the next unit. The process is repeated until the last stage. The arrows enable the solution of complex processes, which would need complex algorithms in other cases. The network is reactive, thus emphasizing the visual aspect of the images in terms of features. Therefore, it does not predict the futuristic outcome of the images such as the end color, hence image tent to retain the initial colors of the input image in the final output. This is the weakness of the proposed method compared to existing techniques. In addition, the arrows ensure continuity of the image features, thus restoring, preserving and emphasizing the strong pixel correlations. The dual interaction enables control and identification of features and content. This is the strength of the proposed technique compared to the existing methods. This ability makes the network focus on suppressing the haze effects while strengthening the image features.

Dataset
In order to test, analyze and compare the proposed algorithm, we performed many tests and simulations. The experiment was conducted in the Zorin OS 16/15 April 2021 using the Tensorflow deep learning framework. The computer used was a BIZON X5000 G2 with 16 GB RAM.

Comparison Methods
In order to compare the visual and perception competitiveness of the proposed methods, the proposed results are compared with those from common or recently developed techniques. These techniques are listed in Table 1

Objective Evaluation of the Proposed Images' Visual Quality
The proposed technique was evaluated based on an objective evaluation because a subjective evaluation is time-consuming, though the latter is more accurate. In this paper, a mixed objective evaluation method was employed. One objective evaluation relied on the reference approaches of the mean Average Precision mAP. Three non-reference approaches were also used; the Naturalness Image-Quality Evaluator (NIQE)52 [43], Normalized Underwater Image-Quality Metric UIQM norm 55 [44] and Underwater Color Image-Quality Evaluator (UCIQE) [45] are used in their values presented in Tables 2-6. The later sections also present the underwater image sharpness measure (UISM) [44].

Subjective Assessment
The perceptual quality of the images is evaluated by the presentation of the different categories of the images shown in Figures 6-10. Figure 6. Visual comparison of real underwater images sourced from [46]. From left to right are raw underwater images, and the proposed images. The proposed results are compared with the results from Ancuti [11], dark channel prior (DCP) [4], histogram distribution prior (HP) [36], and Water-Net [26]. Figure 6 shows that the proposed method tends to retain the original color of the input image when compared to the existing methods. The visual inspection shows that the results of the Ancuti [11] are almost similar to the raw images. There is not much change in the final results, which indicates either a failure of the accurate transmission map estimation or background light. The DCP [4] results also exhibit similar traits to the results by Ancuti [11] but are closer to the raw results than Ancuti's [11]. The HP [36] results have exaggerated red colors. The Water-Net [26] results have a grayish color in the top image, while the bottom is almost similar to the input image. The proposed results have evidently improved the color output in both images. The proposed results are visually more appealing compared to the existing techniques.  [32], WCID [33], Ts [34], LD [35]. Figure 7 shows a subjective comparison of the proposed output with the existing techniques using different scenes. The Fus [32] tends to have an exaggerated red color in three out of five image samples, making its results less appealing. The WCID [33] also has exaggerated red colors in four out of five samples, showing a trend of increasing unwanted artifacts in the final results. The Ts [34] results are more appealing than the Fus and WCID results but tend to darken the whale image (third from top). The LD [35] results are better than the first three. However, the method tends to overexpose the surfaces (see the whale image). Finally, the proposed technique results have more exaggerated colors but are visually appealing. The coral reefs (first image from the top) and whale (third image from the top) show the strength of the proposed balancing of the colors. The second and fourth images suggest a weakness in the proposed methods, that is, the exaggeration of the blue colors in cases of overexposed regions. Figure 8. Visual comparison of synthetic underwater images. From left to right are raw underwater images, proposed and clean (ground-truth) images. The proposed and clean images are compared with results from Ancuti [11], Guo [36], Berman [37], Cosman [8], Zhuang [19], and gl [38]. Figure 8 shows a visual comparison of the performance of the existing and proposed techniques for synthetic underwater images. The synthetic images have ground-truth images (the images in the last column). A visual inspection indicates that the Guo [36] results were closer to the synthetic ground-truth images but failed in the first image (top image) because it is darker than the ground truth. The Zhuang [19] and gl [38] results are almost similar, with the slight difference being the top and bottom images. However, the proposed results produce images with more details than the existing techniques. This observation is a strength of the proposed method because one of the main aims of image dehazing is to expose image details in addition to suppressing the haze effects. Figure 9. Visual comparison of natural underwater images. From left to right are raw underwater images, and proposed images. The proposed images were compared with the results from Ancuti [11], Guo [36], Berman [37], Cosman [8], Zhuang [19], gl [38]. The images are sourced from [38]. Figure 9 shows the comparison of the visual effect of the proposed technique in the case of the synthetic images without the ground-truth images. While the Ancuti [11], Guo [36] and Berman [37] results exhibit reddish regions in the final images, the Cosman [8] results appear overexposed. The Zhuang [19] results have exaggerated colors. The gl [38] is gray such that the algae in the final image, known to be greenish, are also gray. This means the [38] technique is not versatile. Like the previous examples, the proposed method tends to retain the input image colors but enhances the image details. The proposed results, compared to their counterparts, are more visually appealing. Figure 10. Subjective comparison of underwater images from [42]. From left to right are raw underwater images, results from CBF [32], ULAP [39], UWCNN [20], MLFcGAN [40], FUnIEGAN [41], waterNet [26], UICoE-Net [42] and the proposed. Figure 10 shows the subjective evaluation of the proposed technique compared to others in the case where the images have rich colors. The aim here is to show that the proposed method can detect the color variation. In a comparison with the existing methods, CBF [32], ULAP [39],UWCNN [20], MLFcGAN [40], FUnIEGAN [41], waterNet [26] and UICoE-Net [42], the proposed method, on two occasions (images in the second and third rows), appears to outperform the existing methods. For the case of the first and the last row images, the proposed images exhibit its weakness-an exaggeration of the green color channels. This predominant trait might be due to the failure of the network to estimate the global background light accurately. This could be true because the network estimated the global ambient light while the transmission map was slightly predetermined. Tables 2-6 present the objective evaluation for the overall quality indicators for the images presented in Figures 6-10. Besides the indicators, we calculated the average statistics because we could not present everything in the tables. Table 2 indicates that the proposed have better results in the UIQM norm and UCIQE. Table 3 indicates that the proposed have better UIQM norm values. Table 4 indicates that the proposed technique outperforms in all the metrics compared to the existing techniques. Tables 5 and 6 indicate that the proposed have better UIQM norm and UCIQE values. The consistency in better performance in the values of the UIQM norm and UCIQE is due to the ability of the proposed technique to restore image-rich colors. While this could be a weakness because the dehazing methods need to restore the natural colors, it is also an advantage because the resulting output images tend to be more appealing than the existing methods. Figure 11 shows the box plot of the mean Average Precision of the proposed technique compared to the existing methods. The box plot indicates that the proposed has higher values and mean (red line) than its counterparts. The mAP values of the proposed are also higher because the box-plot body is shorter and higher than other box plots. The subjective and objective evaluation indicates that the proposed algorithm has the best effect on the overall underwater quality in various scenes compared to the existing techniques. Table 2. Average niqe, UIQM norm , UIQE comparison of different techniques whose results are partially presented in Figure 6. The best result is bold.  Table 3. Average niqe, UIQM norm , UCIQE comparison of different techniques whose results are partially presented in Figure 7. The best result is bold.  Table 4. Average mAP, niqe, UIQM norm , UCIQE comparison of different techniques whose results are partially presented in Figure 8. The best result is bold.  Table 5. Average niqe, UIQM norm , UCIQE comparison of different techniques whose results are partially presented in Figure 9. The best result is bold.  Table 6. Average niqe, UIQM norm , UCIQE comparison of different techniques whose results are partially presented in Figure 10. The best result is bold. The proposed technique also focused on increasing the sharpness of the final image. Figure 12 presents a graphical summary of the underwater image sharpness measure (UISM). The figure indicates that the proposed technique averagely outperforms the existing techniques. The effectiveness of the proposed network in improving the pixel correlations is presented in Figure 13. This is one of the main aims of the proposed techniques-removing pixels with a weak correlation and leaving only pixels with a higher correlation. The use of the IIR attains this. The IIR is used to amplify the pixels with a strong correlation, thereby suppressing the hazed pixels. This leads to a smoother pixel correlation compared to the original.

Conclusions
We presented an underwater image dehazing technique based on two image characteristics-RGB color channels and image features. Using RGB color channels markedly improved the adaptability and identification of similar pixels and effectively removed pixels with a weak correlation to leave only pixels with a high correlation. The IIR in the triple-dual and parallel interaction structure allowed suppressed hazed pixels to make even minute features, such as pixels with weak correlations, visible. This improved the visual perception of the final image and thus also the overall usefulness and quality of the image. The softmax-weighted fusion used to attain the final image helped preserve the original scene's color. This was accomplished thanks to adaptive learning based on the confidence levels of the pixel contribution variation in each color channel during the subsequent fuses. The proposed technique was compared with the existing state-of-the-art algorithms, both visually and objectively, using various metrics: niqe, mAP, UIQM norm , UCIQE and UISM. The results indicated that the proposed technique outperforms the existing methods. The one significant weakness of the proposed technique is that it predominantly exaggerates green colors in some environments. Future studies may consider the external control mechanism, such as using ground-truth images so that the color of the final image may be restored. This will also help address the weakness of the network.

Conflicts of Interest:
The authors declare no conflict of interest.