A Lightweight Multi-Branch Context Network for Unsupervised Underwater Image Restoration

: Underwater images commonly experience degradation caused by light absorption and scattering in water. Developing lightweight and efficient neural networks to restore degraded images is challenging because of the difficulty in obtaining high-quality paired images and the delicate trade-off between model performance and computational demands. To provide a lightweight and efficient solution for restoring images in terms of color, structure, texture details, etc., enabling the underwater image restoration task to be applied in real-world scenes, we propose an unsupervised lightweight multi-branch context network. Specifically, we design two lightweight multi-branch context subnetworks that enable multiple receptive field feature extraction and long-range dependency modeling to estimate scene radiance and transmission maps. Gaussian blur is adopted to approximate the global background light on the twice-downsampled degraded image. We design a comprehensive loss function that incorporates multiple components, including self-supervised consistency loss and reconstruction loss, to train the network using degraded images in an unsupervised learning manner. Experiments on several underwater image datasets demonstrate that our approach realizes good performance with very few model parameters (0.12 M), and is even comparable to state-of-the-art methods (up to 149 M) in color correction and contrast restoration.


Introduction
Underwater images serve as invaluable repositories of oceanic information, with their quality playing a pivotal role in advancing marine resource exploration.However, due to the presence of suspended particles and light absorption and scattering in water, underwater images often suffer from quality degradation, such as color distortion, low contrast, poor clarity, and blurred details.Undoubtedly, these challenges pose significant obstacles in various applications such as underwater archaeology [1,2], underwater biological research, underwater object detection [3][4][5], and underwater autonomous vehicle navigation.Therefore, our goal is to investigate a more efficient method to recover the color, structure, and texture details of degraded underwater images in order to improve the image quality and visual effects.Underwater image restoration obtains a clean image by estimating unknown parameters (medium transmission maps, global background light, etc.) in the degradation image formation model [6], which is a highly ill-posed problem.Accurately estimating these unknown parameters is difficult because of the diverse underwater environments and their unique optical properties.
Existing underwater image restoration methods mainly include traditional restoration methods [6][7][8][9][10][11][12][13][14][15][16][17][18] and learning-based restoration approaches [19][20][21][22][23][24][25][26][27][28][29][30][31][32].Most traditional restoration methods rely on underwater imaging models, utilizing prior knowledge and optical characteristics to estimate unknown parameters and then reconstruct the real underwater scene.The performance of these methods is heavily dependent on the stability of prior knowledge, like bright channel prior [11], red channel prior [12,13], maximum intensity prior [14], blur prior [15], light attenuation prior [17], etc.However, because of the limitations of prior knowledge, which are only applicable to specific conditions, the estimation of model parameters may significantly err when prior knowledge cannot match the target scene, thus resulting in inaccurate restoration results [16].Learning-based restoration methods explore the relationship between underwater images and unknown model parameters in a synthetic or real-world data-driven manner to obtain more accurate and robust estimates, resulting in more realistic and precise restoration images.Deep learning-based restoration approaches are broadly partitioned into two types: those based on Convolutional Neural Networks (CNN), such as [21][22][23]25,27,[29][30][31], and those based on Generative Adversarial Networks (GAN), such as [19,20,24,26].In addition, there are a few studies based on contrastive learning, such as [28], and a few research based on visual transformers, such as [32].Learning-based methods improve the accuracy of estimating unknown parameters in imaging models to a certain extent.Moreover, these methods do not use prior knowledge to estimate unknown parameters, which avoids the problem of mismatch between prior knowledge and the target scene and is more generally adaptable than traditional restoration methods.However, existing learning-based methods still have some limitations.On one hand, many of these methods rely on large-scale synthetic underwater images as either the source or target domain for training datasets.This reliance stems from the scarcity of real-world paired data.However, synthetic underwater images often fail to fully capture the intricacies of real-world underwater environments, leading to suboptimal accuracy and generalization performance in restoring degraded images.Conversely, restoration approaches based on deep learning mostly design complex and parameter-rich networks to enhance learning capabilities and obtain more accurate restoration results.This makes the training process time-consuming and demands significant computational resources.
To restore degraded underwater images more efficiently while eliminating the need for paired training data, we propose a lightweight unsupervised underwater image restoration method.We decompose the underwater image into a transmission map, global background light, and scene radiance (clean image), and design a lightweight multi-branch context network consisting of three submodules to estimate them separately.Simultaneously, based on the principle of underwater image formation, we design a comprehensive loss function, including self-supervised consistency loss and reconstruction loss, to guide unsupervised network training.On one hand, the lightweight network structure ensures that the restoration process can be performed efficiently, even on resource-constrained devices.On the other hand, the comprehensive loss function ensures that the network learns meaningful representations of the underwater scene, which helps guide the training process and improves the quality of the restoration.The lightweight architecture makes our method simpler and more practical than existing methods with complex models and many learning parameters.Additionally, unsupervised learning makes our method not require ground truth labels, making it more versatile than existing supervised methods.
The primary contributions of this paper include: 1.
We propose a lightweight and effective unsupervised network for real-world underwater image restoration.This network decomposes the degraded underwater image into three components: the transmission map, global background light, and scene radiance.Specifically, two distinct lightweight subnetworks are utilized to predict the transmission map and scene radiance, and Gaussian blur is employed in the BL module to compute the global background light.

2.
We design a multi-branch context attention module and construct different lightweight global context blocks combined with residual learning for modeling long-range dependency rapidly and efficiently in the transmission map estimation subnetwork and scene radiance estimation subnetwork.

3.
Experimental results on multiple datasets demonstrate that our method is comparable to state-of-the-art methods in terms of color restoration and contrast improvement, despite being an unsupervised restoration method with very few parameters.

Related Works
Most underwater image restoration methods typically derive undegraded clean images by constructing a model for the degradation process of an underwater image.This involves estimating crucial unknown parameters, such as the medium transmission map and global background light, and subsequently inverting the model.The accuracy of parameter estimation and the correctness of model construction are pivotal to achieving satisfactory restoration results.Inaccurate parameter estimation can often result in distorted restoration outcomes.

Traditional Restoration Methods
Traditional restoration methods primarily utilize optical properties and various priors to extract features from degraded underwater images and then estimate unknown parameters to achieve underwater image restoration.Dark Channel Prior (DCP) [7] is an image prior widely used in outdoor image dehazing, which estimates the medium transmission map by assuming that there is at least one channel with very low pixel values in any region of the image, thereby achieving image dehazing.Since the simplified underwater image formation model is similar to the outdoor hazing model, numerous underwater image restoration methods proposed by researchers are based on DCP or its variants.For example, Drews et al. [8] introduced the Underwater Dark Channel Prior, which utilizes reliable prior information to estimate the scene depth map and, subsequently, the medium transmission map for underwater image restoration.Liang et al. [10] proposed a method based on the Generalized Underwater Dark Channel Prior, considering the rapid attenuation of all color channels in different water types.It uses prior information from all channels to estimate the medium transmission map under different water types and utilizes image brightness, image blurriness, and wavelength-dependent attenuation as prior knowledge to estimate the global background light, thereby restoring underwater images.In addition, they proposed a white balance method to further enhance the visual effects of the images.Furthermore, some traditional restoration methods infer the estimated transmission map based on depth information from different priors.For example, Song et al. [17] introduced the Underwater Light Attenuation Prior (ULAP), which calculates the depth map by using the correlation across the maximum intensity of the blue-green channel and the difference in red channel intensity with changes in scene depth in underwater images, and then estimates the medium transmission map and global background light, thereby restoring real underwater scenes.
In summary, traditional underwater image restoration methods can accurately estimate transmission maps and global background light by utilizing prior knowledge and optical properties in some cases, enabling the generation of clear underwater images.However, due to the challenge of accurately estimating multiple underwater imaging parameters simultaneously and the assumed underwater optical imaging model does not hold in all scenarios, traditional restoration methods usually produce unstable results, exhibiting limited robustness, flexibility, and heavy time overhead for parameter solving.

Learning-Based Restoration Approaches
Learning-based approaches mainly learn to map degraded images to their corresponding reference images by training neural networks.Therefore, they may need large lots of paired training data.Researchers have developed many approaches for underwater image restoration based on CNN and GAN, and have achieved notable visual enhancement effects.For instance, Fabbri et al. [19] collected numerous real underwater images and categorized them into two groups: high-quality clear images and severely degraded distorted images.They then used these two categories of images to train the CycleGAN [33] network to generate paired images for training underwater image restoration networks.On the contrary, Li et al. [23] substituted indoor scene RGB-D images into a simplified underwater imaging model and added randomly generated degradation parameters as prior knowledge based on attenuation coefficients corresponding to ten degradation levels of the underwater scene, in order to generate synthetic underwater style image datasets of ten different types.They then designed a lightweight convolutional neural network and conducted supervised learning training with different types of synthetic datasets for restoring various underwater images.Additionally, Li et al. [22] introduced a real-world underwater image dataset (UIEBD) and proposed a network based on the CNN (Water-Net), achieving better effects.Li et al. [25] introduced a deep network (Ucolor) based on medium transmission guidance and multi-color space embedding for underwater image enhancement.By effectively combining the benefits of learning-based and physics-based methods and using multiple color space embeddings, Ucolor restored the color and quality of underwater images effectively.Han et al. [28] built a real-world dataset (HICRD) comprising 9676 degraded images and 2000 scientifically reference images and introduced a restoration model called CWR based on image-to-image transformation.This model maximizes mutual information across degraded and reference images through contrastive learning to learn image content and color feature correspondence.It executes a realistic underwater image restoration process through unsupervised learning.
Compared with traditional methods, learning-based approaches learn rich features from the degraded image and estimate unknown parameters through the neural network, relying on the effectiveness of the network structure and needing a substantial amount of underwater paired data to train the network.However, getting ground truth for real-world degraded images is very difficult, and the synthetic underwater image cannot approximate the real-world underwater image perfectly.As a result, networks trained with synthetic data have limited effectiveness for restoration in a broader range of underwater scenarios.

Proposed Method
We design a lightweight multi-branch context network based on unsupervised learning, and the overall structure is depicted in Figure 1.Within the designed network, we utilize various submodules to individually estimate the scene radiance, the transmission map, and the global background light and then invert the imaging model with these estimates to obtain the reconstructed degraded image.We exploit the constraint that the degraded image and the estimated clean image share the same scene radiance in the underwater imaging model and the constraint of minimizing the error between the reconstructed degraded image and the original degraded image to design the loss function for training the entire network in an unsupervised manner.

Underwater Image Formation Model
As shown in Figure 2, Jaffe et al. [34] proposed a model for underwater imaging under scattering media called the Jaffe-McGlamery model, which decomposes the light received by an underwater camera and abstracts the process of underwater image formation as: where E t is the total captured light intensity, which is a linear superposition of three components: the direct component E d , the forward scattering E f and the background scattering E b .Since the light reflected from the surface of the target object will be attenuated and the degree of attenuation increases with the depth of the scene, E d can be further represented as: where E o (λ) denotes the unattenuated intensity of the reflected light from the target, β(λ) denotes the attenuation coefficient, and d denotes the distance between the underwater camera and the target object in the scene (i.e., scene depth).The forward scattering component represents the small-angle scattering loss incurred by the reflected light as it propagates through the water, which can be obtained by calculating the convolution of the direct component E d (d, λ) with the point spread function g(d, λ): The background scattering component represents the loss caused by the scattering of the background light as it passes over the suspended particles in the water and can be defined as: where E ∞ (λ) denotes the ambient light intensity at infinity.When the underwater camera is close to the object being photographed, the forward scattering component is negligible, and Scheckhner and Karpel [35] simplified the direct component E d and the backward scattering E b to be defined as follows: (5) where J(x) denotes the scene radiance (clean image), which can replace the light intensity E o (λ) directly reflected from the target object, t(x) denotes the transmission map, which can replace the residual energy ratio e −β(λ)d arriving at the underwater camera after scattering and attenuation of E o (λ), and B denotes the background light, which can replace the ambient light intensity E ∞ (λ).Thus, the underwater imaging model is simplified as: where I c (x) represents the degraded underwater image captured by the camera, c ∈ (R, G, B) represents the red, green, and blue color channels, and J c (x) represents the undegraded underwater image.It can be seen that if the transmission image t c (x) and the background light B c can be estimated relatively accurately, the clean image J c (x) can be restored from the degraded underwater image I c (x).

Network Architecture
As shown in Figure 1, the designed network consists of three submodules: J-Net, TM-Net, and BL.J-Net is for predicting scene radiance (clean image), TM-Net is used to estimate the transmission map, and BL is used to estimate the global background light.
For J-Net, the degraded raw image is taken as input, and a 3 × 3 convolutional layer is first to learn the detail features such as image color and texture from the input.Then three lightweight global context blocks (ResMCA BlockA) constructed by combining the Multi-Branch Context Attention module (MCA module) with residual learning are utilized for multiple receptive field feature extraction and long-range dependency modeling.Finally, a normalized estimation of the scene radiance is output by a 1 × 1 convolutional layer with a Sigmoid activation function.The inner structure of the MCA module is depicted in Figure 3. Since scene radiance is a property closely related to the scene itself and typically requires more parameters for accurate estimation, ResMCA BlockA with more parameters is utilized in J-Net.
For TM-Net, its structure is similar to that of J-Net.It also takes the degraded raw image as input starting with a 3 × 3 convolutional layer to learn detail features, and then employs three lightweight global context blocks (ResMCA BlockB) to capture local features with different receptive fields and perform global context interaction.It finally outputs a normalized transmission map with three channels through a 1 × 1 convolutional layer with a Sigmoid activation function.This three-channel output accounts for the wavelength-selective nature of light attenuation, with corresponding transmission values for each color channel.To more effectively reduce the redundant parameters of the network, the lightweight global context block here is a residual block constructed by two 1 × 1 convolutional layers and the MCA module, while the lightweight global context block in J-Net is a residual block constructed by a 3 × 3 convolutional layer and the MCA module.As the transmission map describes the underwater environment's impact on light propagation, reflecting more of the optical and physical characteristics underwater, it can be approximated with fewer parameters.Therefore, ResMCA BlockB with fewer parameters is employed in TM-Net.
For the BL submodule, we employ Gaussian blur to remove image content and obtain an estimate of the global background light since the global background light reflects global image attributes and is independent of image content.In our experiments, we found that applying Gaussian blur to the input image after downsampling it by a factor of 0.5 twice results in the restored image with more restored color and structure (higher PSNR, SSIM, and UCIQE).Therefore, we first downsample the input by a factor of 0.5 twice and then apply Gaussian blur with a filter size of [1/4 (h + w)] × 2 + 1 to predict the global background light from the downsampled image, where [1/4 (h + w)] represents the nearest integer less than or equal to 1/4 (h + w), h and w are the height and width of the input, respectively.Since Gaussian blur enforces smoothness on the global background light, high-frequency details in the image will be forced onto the outputs of J-Net and T-Net.

Multi-Branch Context Attention Module
This module aims to capture information at various scales and sizes within the image and effectively model long-range dependency with fewer learning parameters.First, in order to reduce memory access and obtain features with different receptive fields, we split shallow feature maps into four branches.Conventional convolution and depthwise convolution operations are utilized for extracting features from the first three branches, while the maps of another branch, referred to as identity mapping, remain unchanged.As shown in Figure 3, the kernel size of the conventional convolution is 3 × 3. To speed up computation, we used depthwise convolutions with 1 × k and k × 1 (where k = 5 in this paper, please refer to ablation study for details.) instead of k × k depthwise convolutions.Then, a 1 × 1 convolutional layer followed by the Softmax function is applied on the multi-branch feature maps for obtaining attention weights, after which attention pooling is conducted to derive global context features.Subsequently, a two-layer bottleneck is employed to transform the obtained global context features.Finally, the obtained multibranch context features are aggregated to all positions by addition, thus efficiently capturing long-range dependency.

Loss Functions
To train our network from the degraded underwater images in an unsupervised learning manner, we design a comprehensive loss function that includes self-supervised consistency loss, reconstruction loss, and color loss.By combining these different loss components into a single comprehensive loss function, the network can effectively learn to restore underwater images by jointly optimizing multiple aspects such as image fidelity, color accuracy, and contrast enhancement.Each loss function is elaborated on in detail below.
Reconstruction Loss.The proposed method decomposes the degraded image into scene radiance (clean image), transmission map, and global background light, then estimates them from the degraded underwater image through three different submodules in the designed network and utilizes the estimates to reconstruct the top layer degraded image.By minimizing this loss, the network learns to reconstruct the degraded image faithfully, which is essential for generating high-quality restored images.The reconstruction loss L RC is expressed as: where x is the raw degraded image, and I(x) is the reconstructed degraded image obtained by inverting the imaging model with the scene radiance, transmission map and global background light.Self-supervised Consistency Loss.This loss is employed to supervise the consistency of the shared scene radiance in both the raw image and the estimated clean image within the underwater imaging model.It ensures that the estimated clean image remains consistent within the underwater imaging model, regardless of whether it is used as input or generated internally by the network.In accordance with the simplified underwater imaging model, when the raw degraded image I is as the input, the obtained scene radiance component is the estimated clean image Ĵ and the transmission map component as t; when the estimated clean image Ĵ is as the input, the scene radiance component should still be the estimated clean image Ĵ, and the transmission map component becomes 1.
Therefore, our goal is to minimize the loss L SCon as follows: where J 1 and J 2 refer to the output scene radiance estimates using the degraded underwater image and the estimated clean image as inputs, respectively.Note that a stopgradient operation (stopgrad) is performed on J 1 , which helps to stabilize the training process.
Color Loss.Based on the Gray-World color constancy assumption of natural image statistics [36], we adopt an unsupervised color loss to ensure that the colors in the output image are natural and visually appealing, enhancing the overall perceptual quality of the restoration.This loss can be calculated as: where J is the estimated value of the scene radiance and µ(J c ) denotes the average intensity of the channel c.Contrast Enhancement Loss.Similar to Zhu et al. [37], we also employ the contrast enhancement loss L CE to supervise the scene radiance estimation subnetwork, which helps enhance the contrast in the restored image, leading to improved visibility of details and textures.It is calculated as follows: where V(J) is the luminance and S(J) refers to the saturation of the estimated scene radiance J.

Experiments and Analysis
We conduct a quantitative evaluation as well as a qualitative comparison between our method and other 11 different methods on several datasets.These 11 methods include 3 traditional methods (IBLA [15], UDCP [8], and ULAP [17]), 7 deep learning-based methods (UWCNN [23], Water-Net [22], UGAN [19], FUnIE-GAN [24], Ucolor [25], USUIR [29], and Peng et al. [32]), and 1 contrastive learning-based method (CWR [28]).This comparison helps to evaluate the effectiveness of the proposed method in addressing various challenges associated with underwater image restoration.We first present the experimental dataset and evaluation metrics and supplement the implementation details of the experiment, then present the quantitative results, visual comparisons, and results analysis.Finally, we conduct an ablation study and an edge detection application test.

Datasets and Implementation Details
We use five datasets in our experiments: UIEBD [22], LSUI [32], EUVP [24], RUIE [38], and SQUID [39].UIEBD consists of 890 pairs of real-world underwater images and 60 challenging unpaired degraded images.The reference images in UIEBD are obtained by comparing the results of 12 approaches subjectively and selecting the best results.LSUI contains 4279 underwater images along with their corresponding reference images.EUVP includes paired and unpaired images captured under various conditions such as different locations, visibility levels, and natural variations.RUIE is divided into three subsets: the quality, the color cast, and the high-level task-driven.All these images lack corresponding reference images.The underwater image quality subset of RUIE covers various underwater conditions and different levels of image quality.SQUID includes a substantial number of images captured at different locations with varying water properties.These images display colored charts in the scene.These datasets cover a wide range of underwater conditions, including different locations, visibility levels, and natural variations.Each dataset has its unique characteristics, such as paired or unpaired images, the presence or absence of reference images, and diverse underwater scenes.
For training, we extract 700 raw underwater images from the UIEBD dataset and augment them by rotation, horizontal and vertical flipping.All training images are resized to 128 × 128 pixels before being fed into the network because of memory footprint limitations.For test validation, we use the remaining 190 underwater images from UIEBD as the first test set (T-U190).We randomly selected 120 images from LSUI to create the second test set (T-L120).60 unpaired images in UIEBD are used as the third test set (T-U60).The fourth test set (T-E100) is created by randomly sampling 100 unpaired images from EUVP.We select 630 images from the underwater image quality subset of RUIE to form the fifth test set (T-R630).16 demonstrative images from SQUID are used as the sixth test dataset (T-S16).
We train the proposed method on Ubuntu 18.During training, the learning rate is 10 −4 , utilizing a batch size of 1, and training for a maximum of 50 epochs.We optimize the designed network by ADAM.The network training time is less than 30 min.

Evaluation Metrics
Peak Signal-to-Noise Ratio (PSNR) [40] calculates the relationship between the peak signal strength and the Mean Square Error (MSE) of an image.It is used to assess the closeness to the reference image., where a higher PSNR (lower MSE) generally represents a closer match in image content.The Structural Similarity Index (SSIM) [41] evaluates the structural similarity to a reference image, with a higher SSIM score typically indicating greater similarity in image structure and texture.  1 shows that the proposed approach gets the highest SSIM on T-U190 and lower MSE and PSNR values compared to Ucolor [25] and CWR [28].However, it's worth noting that our model has a significantly lower parameter count of only 0.12 M, which is 0.08% of Ucolor [25] and 0.81% of CWR [28].As depicted in Table 2, we also achieve the highest SSIM values on T-L120, along with lower MSE and PSNR values compared to Peng et al. [32] and Ucolor [25], despite our model having only 0.38% of the parameters of Peng et al. [32].The GFLOPs and GMac metrics in Table 1 show that except for UWCNN [23] and Peng et al. [32], our method has the highest computational efficiency, and our method can obviously restore the structure and texture of degraded images more accurately (higher SSIM) than UWCNN [23] and Peng et al. [32].Both tables underscore that our method maintains good underwater image quality restoration capabilities despite its minimal model parameters, and it has a clear advantage in restoring the structure and textures of underwater images.
Table 3 shows that for the UIQM evaluation metric, our method secures the secondbest result on T-U60 and the third-best on T-E100.Regarding the UCIQE evaluation metric, our method achieves a result that is the second-best on T-U60 and only slightly below the third-best USUIR [29] on T-E100.In Table 4, our approach attains the second-highest UCIQE values on T-R630 and T-S16.On the T-R630 test set, our method obtained a UIQM result inferior to CWR [28] and Water-Net [22], while on the T-S16 test set, it trailed only behind the top-performing (UGAN [19]).Both tables illustrate that our approach delivers competitive performance in terms of color restoration, contrast enhancement, and improved image clarity.It is capable of adapting to underwater image restoration across various environmental conditions.
From Figures 5-9, ULAP [17] and FUnIE-GAN [24] somewhat correct the color deviation in the bluish image but exhibit severe color cast in restoration results for shallow water, yellowish, and greenish underwater images, with almost no improvement in brightness for low-illuminated images.Water-Net [22] enhances the brightness of low-illuminated images and alleviates color deviation in shallow water, bluish, and greenish images, but the restored images still appear dark overall.Ucolor [25] and CWR [28] improve the brightness and contrast in low-illuminated underwater images but do not exhibit a significant improvement in contrast and clarity for shallow water images, and some color casts remain in the restored underwater images.Peng et al. [32] effectively restores the color in greenish and bluish underwater images and enhances the contrast in shallow water images, though there's room for improvement in brightness enhancement for low-illuminated images and color restoration for yellowish underwater images.Our method demonstrates robust color restoration ability for shallow water, greenish, bluish, and yellowish underwater images captured at different depths in the sea.Additionally, our method substantially improves brightness and contrast in low-illuminated underwater images.

Analysis and Discussion
IBLA [15], UDCP [8], and ULAP [17] use handcrafted priors to estimate the parameters, making their predictions vulnerable to the dominance of prior information and thus prone to over/under-recovery artifacts.UWCNN [23] trained on a synthetic underwater dataset generated using underwater scene priors, may face challenges when adapting to real-world underwater image restoration.Water-Net [22] introduces the white balance, which is not always reliable, often resulting in visually darker restored images.UGAN [19] trains the CycleGAN [33] network with unpaired real underwater images to generate paired images for network training.However, the performance of UGAN may be constrained by the quality of the training data.FUnIE-GAN [24] may encounter a bottleneck in the intricate feature learning process due to its simplistic model design, potentially causing color cast in the restoration results.Ucolor [25] combines the physical model and deep learning to build a model with many parameters, leading to high-quality restoration for the most degraded images in various scenarios.However, it may not fully utilize the introduced color spaces, resulting in some color casts in the restoration of specific scenarios.CWR [28] combines the generative adversarial network, which might face challenges in generalizing across diverse underwater environments due to its limitations in generating varied images.USUIR [29] ignores long-range dependency modeling during the underwater image feature learning process, potentially leading to inaccurate restoration of degraded underwater images in all scenarios.Peng et al. [32] generates images that may exhibit boundary artifacts because the transformer in the model segments the input image into small patches and processes them independently.Our method, although unsupervised and a lightweight model with very few parameters, effectively models long-range dependency during the feature learning process.This enables accurate estimation of essential parameters for the underwater imaging model, resulting in good restoration performance and strong generalization capabilities.

Ablation Study
To explore the impact of the depthwise convolution kernel parameter k in the Multi-Branch Context Attention (MCA) module on the effectiveness of the whole network, we conduct an ablation study with different values of k in the MCA module.The results on the T-U190 and T-L120 test sets are presented in Table 5.
Table 5 shows that when the parameter k is set to 5, the underwater image restored by the model is closer to the reference image, and its image quality is optimal, so k is set to 5 in this paper.

Application Test
We use the well-known Canny edge detection operator to extract edges from the degraded images from T-U190 and T-U60, as well as from the images restored by our approach.The lower and upper thresholds for edge detection are set to 100 and 200, respectively.The partial results are presented in Figure 10.It shows that compared to raw degraded images, the images restored by our approach exhibit more prominent edge detection features.Our method can effectively reproduce and enhance edge details in underwater images.

Conclusions
This paper introduces a lightweight multi-branch context network based on unsupervised learning for underwater image restoration.We decompose the underwater image into three latent components of the imaging process: scene radiance (clean image), transmission map, and global background light, and design different submodules to estimate them separately.We utilize constraints involving the shared scene radiance between the degraded image and the estimated clean image, and the error minimization between the reconstructed degraded image and the original degraded image.This led to the creation of a comprehensive loss function that considers content, color, contrast, and more.In this manner, we accomplish unsupervised training of the entire network on degraded underwater images.Experiments on multiple datasets indicate that our approach achieves satisfying results and is comparable to state-of-the-art methods, even though it is an unsupervised lightweight approach.The proposed method effectively avoids the need for paired training data and achieves good performance with low computational complexity.It helps accelerate the iteration and optimization of the algorithm and lays the foundation for more advanced underwater image restoration technology in the future.
The proposed method may have limitations in generalizing to diverse underwater environments, especially those with extreme conditions not well-represented in the training data.In future work, we aim to develop a more efficient framework to improve the ability to restore image sharpness and enhance robustness to noise and underwater condition variations.Additionally, we plan to create larger and more diverse underwater image datasets, encompassing various underwater environments, lighting conditions, water qualities, and more, to provide a more powerful benchmark for unsupervised underwater image restoration.

Figure 1 .
Figure 1.The overall network architecture of the proposed method.C 1 and C 2 represent the channel number of feature maps in J-Net and TM-Net, respectively.In this paper, C 1 is set to 64 and C 2 to 128.

Figure 3 .
Figure 3.The Multi-Branch Context Attention (MCA) module.C, H, and W denote the number of channels, height, and width of the input maps, respectively.c denotes the split channels, and r is the bottleneck ratio, which is set to 4. The symbol ⊗ signifies matrix multiplication, while ⊕ indicates element-wise addition.

Figure 10 .
Figure 10.Visual comparison of edges produced by the proposed method on T-U190 and T-U60: the firt row presents degraded images and the edges of degraded images, and the second row shows restored images and their edges.

Table 3 .
The non-reference evaluation outcomes of various methods on T-U60 and T-E100.The uparrow indicates that the higher the value of the metric, the better.

Table 4 .
The non-reference evaluation outcomes of various methods on T-R630 and T-S16.The uparrow indicates that the higher the value of the metric, the better.

Table 5 .
Results of the proposed method with different k settings on T-U190 and T-L120.