Deblurring Network for UV Images of Colorless Chemicals Floating on Sea Surface

.


Introduction
As technology advances, computer vision tasks increasingly demand high-quality and clear images.However, in many computer vision applications, such as target recognition and segmentation, the input images often contain a significant number of lowquality, blurred images.The unclear boundaries of the targets in these blurred images can impede the quality of downstream computer vision tasks.
Motion blur is a common cause of image blurring, which occurs when there is relative motion between the object being photographed and the capturing equipment, such as an optical camera, during the shooting process.
Deblurring techniques are designed to restore and reconstruct clear images from blurred ones.By deblurring an image, the quality of the images improves, which increases the performance of computer vision tasks.
The rapid development of neural networks has garnered significant attention from researchers, with many applying this technology to image-processing tasks, including image deblurring.In this paper, we build upon the work in [1], which proposed a multi-scale recurrent neural network for deblurring by incorporating the CBAM attention mechanism [2], which combines the channel and spatial attention mechanisms.Unlike the network proposed in [1], our approach utilizes a long short-term memory (LSTM) network to establish connections between the intermediate features of the three resolutions, progressing from coarse to fine.
This paper proposes a deblurring network designed for UV images.The following is a brief summary of our work: (1) Our proposed network builds upon the multi-scale recurrent neural network, with the introduction of the CBAM attention mechanism.This module enables attention to be focused on the latent features of both the channel and spatial dimensions; (2) The multi-scale recurrent neural network utilized in this work employs an LSTM network, which is capable of capturing and utilizing relational temporal features; (3) We constructed a UV-image dataset consisting of colorless chemicals floating on water surfaces and utilized it to test the deblurring networkʹs ability to restore clear images.
In conclusion, we compared our proposed method to other advanced methods on a self-collected UV-image dataset of colorless chemicals floating on water, and the results demonstrate its superior deblurring effectiveness.

Related Work
In this section, we review related works, including image-deblurring methods and attention mechanisms.

Image-Deblurring Methods
Image deblurring belongs to the methodology aimed at improving the image quality and falls under the category of image enhancement methods [3][4][5].Image-deblurring methods are broadly divided into traditional deblurring methods and deep learning deblurring methods.
Traditional deblurring methods for non-blind deblurring problems essentially involve the exploration of various deconvolution algorithms.The more classical algorithms include the Lucy-Richardson algorithm, which recovers in an iterative convergence manner [6], the Wiener filtering method [7], inverse filtering, etc. Inverse filtering is the earliest deblurring method, and directly applying inverse filtering is the simplest deblurring method.In 1967, Harris and colleagues [8] implemented image deblurring using Wiener filtering (minimum mean square error) through the approach that minimizes the mean square error between the estimate and the original to recover images.The Lucy-Richardson algorithm based on Bayesian theory uses a non-linear iterative process to solve for the maximum-likelihood solution of blurred images, resulting in clear images.In 2017, Zhang et al. [9] used fully convolutional networks to deblur known blur kernels, resulting in clear images.To address the problem of biased blur kernel estimation in non-blind deblurring, Vasu et al. [10] proposed a deblurring method based on convolutional neural networks in 2018.
All of the above are non-blind deblurring algorithms that deblur based on known blur kernels.However, there are many causes for blurred images, such as the shaking of the shooting device, the movement of the target, lens defocusing, etc., and in most real shooting scenes, the blur kernel is unknown.Therefore, non-blind deblurring algorithms are limited in practical applications.
The crux of blind deblurring algorithms is the estimation of the blur kernel function, which is used in accordance with the image degradation model to estimate the clear images.The main research aim of blind deblurring algorithms is how to accurately estimate the blur kernel function using the imageʹs blur information.Nah et al. [11] proposed DeepDeblur, which refines the output gradually from the coarse scale to the fine scale, and first accomplished image-deblurring tasks in an end-to-end training manner.In 2019, inspired by Spatial Pyramid Matching [12], Zhang et al. [13] proposed the Deep Multi-Patch Hierarchical Network (DMPHN), which processes the blurred images from fine to coarse.In 2021, Zamir et al. [14] built upon the DMPHN model and, in order to strike a better balance between the spatial details and high-level contextual information, proposed a multi-stage progressive image recovery model (MPRNet).
CNNs are typically designed to extract local features from images, which may not generalize well to the problem of dynamic-scene deblurring.Moreover, image blur can have various causes and vary from image to image, making it challenging to directly address this issue using CNNs.To address these challenges, we propose a method that combines CNNs with attention mechanisms to extract image features from multiple domains.By leveraging attention mechanisms across multiple domains, our approach can improve the modelʹs deblurring ability.

Attention Mechanism
The attention mechanism in deep learning is inspired by human behavior, and its various implementations differ in the domains they focus on.Some methods, such as those proposed in [15,16], concentrate on the spatial domain.These methods transform the spatial information in the original image to another space while retaining the most important features.Other methods, such as those proposed in [2,17,18], concentrate on the channel domain.These methods focus on establishing relationships between channels.
In our work, we primarily focus on the attention mechanism in both the spatial and channel domains.Our key idea is to leverage the attention mechanism to identify image regions with residual blur.We learn features in both the spatial and channel domains, and we further improve the deblurring performance of the model by combining attention mechanisms from multiple domains.

Proposed Deblurring Network
Based on SRN-DeblurNet [1], we propose the network architecture shown in Figure 1.The main structure of SRN-DeblurNet is augmented with the CBAM attention mechanism to learn features in both the spatial and channel domains.By combining the attention mechanisms of multiple domains, the modelʹs deblurring ability is further improved.

CBAM Attention Mechanism
The CBAM attention mechanism, as an important part of the network, is structured as shown in Figure 2. The initial restored image output by this module realizes a multiscale structure, allowing the network to have different receptive fields.The output attention feature map adjusts the spatial importance of the feature map during the process of reconstructing the image on the encoding-decoding side, making it more adaptable.The Convolutional Block Attention Module consists of two sub-modules, namely, the channel attention module and the spatial attention module, making it a lightweight and general-purpose module.The module takes the intermediate feature map generated by the convolutional layer of the neural network as input and infers the attention map through the channel and spatial dimensions.The attention map is then multiplied by the input feature map to achieve the adaptive refinement of the features.Its lightweight and general characteristics make it easy to integrate into CNNs with negligible computational overhead.
The channel attention module is designed to capture the inter-channel relationship of the features and generate channel-channel attention maps, which help to identify the most relevant channels in the feature map.Channel attention is concerned with the "what" of the input image, such as detecting specific objects or patterns.In contrast, the spatial attention module is designed to capture the spatial relationship of the features and generate spatial attention maps, which highlight the most informative spatial locations in the feature map.Spatial attention is concerned with the "where" of the input image, such as identifying the important regions of an image.Together, the channel and spatial attention modules work in a complementary manner to improve the overall feature representation of the network [2].

Image-Deblurring Network
The UV-image-deblurring network adopts a "coarse-to-fine" strategy."Coarse-tofine" refers to extracting the characteristics of the input UV image of floating colorless chemicals at three different resolution scales.The network structure is based on the U-net network structure and consists of two parts: an encoder and a decoder.
First, the encoder side uses residual dense blocks to improve the feature extraction and learning capabilities of the input blurred image and retain high-frequency detailed information in the image.The encoder consists of three encoder modules.As shown in Figure 2, each encoder module is composed of a convolution layer and three residual dense blocks in series; each residual dense block is composed of two convolution layers activated by a ReLU A structure composed of function series and connected through a skip connection.The input of this module is the output of the previous residual dense block.
An LSTM module is added between the encoder and the decoder to establish the connection between the intermediate features of the scales at three resolutions "from coarse to fine", so that the features of different scales can interact with each other.
The decoder side consists of three decoder modules.As shown in Figure 2, each decoder module consists of a residual dense block, a CBAM, two residual dense blocks, and a deconvolution layer in series.The addition of the CBAM attention mechanism enables the deblurring network to learn features in both the spatial domain and channel domain, thereby improving the network's deblurring ability by integrating attention mechanisms from multiple fields.The CBAM is an important part of the deblurring network.The preliminary restored image output by this module has a multi-scale structure, so the network has different receptive fields.The output attention feature map adjusts the spatial importance of the feature map during the image reconstruction process on the encoding-decoding sides, resulting in stronger adaptability to different degrees of blur.
The entire network is composed of three identical sub-networks.After the input blurred image is output by the first sub-network, the de-blurred image is upsampled and used as the input of the next sub-network.Similarly, the output of this sub-network is input to the next sub-network through upsampling.In the encoding stage, the image is gradually downsampled to reduce the network parameters.Correspondingly, in the decoding stage, the image is gradually upsampled to restore a clear image.The entire network uses a "coarse-to-fine" strategy to deblur the blurred image at three scales.

Loss and Training Details
We adopt a Euclidean loss function that downsamples both the network output and the ground-truth (GT) image to the same size using bilinear interpolation at each scale.The loss function is defined as follows: where  and * are the network output and the ground-truth image at the i-th scale, respectively; { } is the weight of each scale, which we set empirically to 1.0;  is the number of elements in  to be normalized.Although we also experimented with full mutation and adversarial losses, we found that the L2-norm loss is sufficient to produce clear and unambiguous results.

Experimental Results
Our experiments were conducted on a PC equipped with an NVIDIA GTX1080Ti GPU (Santa Clara, CA, USA) and an Intel i7-4790@3.60GHzCPU (Santa Clara, CA, USA).We implemented our framework [13] on the TensorFlow platform.Our evaluation was comprehensive and aimed to validate different network structures and various network parameters.To ensure fairness, all experiments were performed on the same dataset with the same training configuration unless otherwise specified.
The input UV images of colorless chemicals floating on water surfaces have a different channel number compared to that of ordinary RGB images.While RGB images have three channels, UV images have only one channel.Therefore, for UV images, the modelʹs channel number is set to one.

Dataset
In our study of ultraviolet (UV) images of colorless chemicals floating on water surfaces, we utilized a variety of technical tools for data collection.These included Unmanned Aerial Vehicles (UAVs) and handheld UV cameras, which both played significant roles.The former facilitated broad high-altitude photography, while the latter allowed for the acquisition of detailed, close-up images.
Additionally, we took the diversity of the collection environments into account.The collection points ranged from self-constructed swimming pools to aqueducts, man-made lakes, and seawater.The swimming pool served as a controlled environment where we could conduct preliminary experiments, and the wide surface of the water offered richer samples.Aqueducts, with their diverse environmental conditions and certain dynamism, introduced fresh challenges to our collection efforts.However, sampling from artificial lakes and seawater enabled our data to cover broader and more profound areas, which was significantly beneficial to our research.
Our team faced various challenges proactively, right from the controlled environment of the self-built swimming pool to the aqueducts and natural environments of artificial lakes and seawater.We were successful in acquiring a considerable amount of high-quality data.These data provided an essential foundation for our subsequent indepth examination of colorless chemicals floating on water surfaces.
We collected clear ultraviolet images of colorless chemicals floating on water surfaces and applied a simulated motion-blur algorithm (motionblur) to blur the clear images, resulting in a total of 1079 pairs of clear-blurred UV images.Figure 3 illustrates the effects of the motion-blur processing on the clear images.

Evaluation Methods
The peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) were used to evaluate the effectiveness of the deblurring algorithm.
The PSNR is an engineering term used to measure the fidelity of a signal by comparing it to a reference signal and calculating the ratio of the maximum possible power of the signal to the noise power that affects its accuracy.A higher value indicates lower distortion.A PSNR value higher than 40 dB indicates excellent image quality (i.e., very close to the original image); a range of 30-40 dB usually indicates good image quality (i.e., noticeable but acceptable distortion); a range of 20-30 dB indicates poor image quality; and finally, images with PSNRs below 20 dB are considered unacceptable.
The mean square error (MSE) is a measure of the difference between a clean image (I) and a noisy image (K) of the size   and is defined as follows: Then, the PSNR(dB) is defined as follows: The term  denotes the maximum possible pixel value of the image.In the case of an eight-bit binary representation of each pixel, this value is 255.In general, if each pixel is represented by B-bit binary, then  2 1.For uint8 data, the maximum pixel value is usually 255, whereas for floating-point data, the maximum pixel value is 1.The PSNR(dB) formula represents the ratio of the maximum possible power of a signal to its mean square error, where higher values indicate better image quality.
The above is the calculation method for grayscale images.If it is a color image, there are typically three methods for calculation: (1) Calculate the PSNR separately for each of the three RGB channels, and then compute the average of the three PSNR values; (2) Calculate the MSE for each of the three RGB channels separately, then average the three MSE values and divide by 3; (3) Convert the image to the YCbCr color space, and then compute the PSNR only for the Y component, which represents the image brightness.
The SSIM measures both the degrees of distortion and similarity between two images.
It evaluates the image similarity from three aspects: luminance, contrast, and structure.
Unlike the MSE and PSNR, which measure the absolute error, the SSIM is a perceptual model that takes into account the visual perception of the human eye.The SSIM values range between 0 and 1, with larger values indicating smaller image distortion and higher similarity between the two images: The terms in the above equation are given by the following:

Ablation Experiments
In this section, we evaluate the performances of our key methods.We used a selfbuilt dataset of UV images of colorless chemicals floating on water surfaces for the ablation research.To investigate the influence of the CBAM attention mechanism on the model, we conducted ablation experiments with and without adding the CBAM attention mechanism to the encoder and decoder sides.We kept the rest of the methods fixed for a fair comparison when changing the value of one method.The results are presented in Table 1.Figures 4 and 5 depict the network structure diagrams of the deblurring network with the CBAM added to the encoder side and the CBAM added to the encoder and decoder sides, respectively.
In each of our ablation experiments, we maintain a consistent number of training epochs at 14,000.Ensuring this consistency allows for an accurate and impartial evaluation of the impact of different network components and their respective contributions to overall performance.As shown in Table 1, with the same training steps, adding the CBAM on the decoder side, such as in  and , can significantly improve the model's performance in terms of the PSNR.It was also found that model  with only the CBAM added on the decoder side performed the best in terms of the PSNR, and the SSIM also improved significantly, with no significant difference from the optimal SSIM.However, adding the CBAM on the encoder side resulted in worse PSNR and SSIM performances of the model.In summary, adding the CBAM only on the decoder side significantly improved the deblurring effect of the model, demonstrating that the deblurring network based on the CBAM attention mechanism proposed in this work is superior to the original deblurring network.

Deblurring Results
The clear ultraviolet images of colorless chemicals floating on water surfaces underwent simulated motion-blur processing, and the effect of this processing is shown in Figure 3.It is evident that the outline boundary of the ultraviolet image processed by motion blur becomes unclear, which simulates the blur caused by the relative motion of the shooting equipment and the target object during shooting.
The blurred ultraviolet image of colorless chemicals floating on a water surface was fed into the deblurring network that utilizes the CBAM attention mechanism proposed by us, and the deblurring process was performed.The deblurring effect is shown in Figure 6.The improvement in the image clarity after passing the blurred ultraviolet image through the proposed deblurring network is evident from Figure 6.The resulting image appears sharper, and the details, such as the outline and area of hazardous chemicals, are easier to observe.These deblurred images will be immensely helpful in subsequent monitoring tasks, such as hazardous chemical target segmentation and target detection, as they significantly enhance the precision and accuracy of these tasks.
In order to highlight the merit of the proposed method compared to other deblurring methods designed for visible-light images, the performance was tested on a dataset of ultraviolet images of colorless chemicals floating on water surfaces that were motionblurred.Contemporary, advanced visible-light deblurring methods were adopted for this test.The metrics included the PSNR, SSIM, inference time for a single blurry image, and number of model parameters selected as benchmarks for comparison.The comparative results are shown in Table 2. Table 2 illustrates the significant advantage of the enhanced UV-image-deblurring network that was designed and is explained in this chapter for floating chemical UV images, compared to the current advanced deblurring methods for visible-light images.It shows a remarkable advantage in the PSNR value and SSIM value.The PSNR reached 32.173 dB, an improvement of 1.6% compared to the next best method, MPRNet.The SSIM reached 0.962, an improvement of approximately 0.4% compared to the next best method.Moreover, the UV-image-deblurring network has the smallest number of parameters, meeting the design objective of being lightweight, and its inference time does not significantly differ from those of the other methods.This validates the certain advantage of the improved UV-imagedeblurring method designed according to the characteristics of UV images.
To further validate the performance of the UV-image-deblurring network, various deblurring methods were tested on the chemical UV images.Figure 7 shows the comparison results of the different deblurring methods, where Figure 7a is the input of the original blurry UV image, and Figure 7b-e depict the results of applying various deblurring methods to UV images.As shown in Figure 7b, when the DeblurGANv2 method is applied for UV-image deblurring, the image detail recovery is mediocre, with significant blurring issues still existing in the details.The overall image edge has artifacts and ringing phenomena, resulting in poor image quality.Figure 7c,d present the results of using the DMPN and MPRNet methods, which are currently known for their excellent recovery effects on RGB images.It can be seen that when these two methods are applied for the deblurring of blurry UV images, the deblurring results in a significant improvement in the details compared to the DeblurGANv2 method, with clearer detail information.However, just like with the DeblurGAN method, the edges of the entire image still have severe artifacts and ringing phenomena, making the deblurring results unsatisfactory.Figure 7e displays the recovery effect of the deblurring method proposed for UV images in this chapter.It can be observed that, compared to the previous methods, the detail information of the image was better restored and preserved after deblurring.Furthermore, there are no prominent artifacts and ringing phenomena at the image edge.The overall image is smoother, which leads to better deblurring results.

Conclusions
After a chemical spill, most of the chemicals will float on the surface of the sea, presenting as colorless, and at this time, it is difficult to distinguish between the chemicals and seawater through images under visible light.Despite the fact that UV sensors can detect thin liquid films more effectively, due to equipment vibrations and the impact of waves, the obtained UV images can be blurred, causing the boundary of the chemicals to be unclear and bringing challenges to subsequent segmentation tasks.Deep learning holds great potential for image deblurring, but most applications concentrate on visiblelight images, lacking specific methods for deblurring UV images.Therefore, it is extremely important to study methods specifically for deblurring UV images of colorless chemicals floating on water surfaces.This will help to recover the contour information of the chemicals in the UV images, improve the precision of the segmentation, and more accurately determine the area of the leak, which is of great significance for the emergency response to marine chemical spill incidents.For this reason, we introduced the CBAM attention mechanism and improved the U-net deep learning deblurring network framework based on the methods of deblurring visible-light images.We extracted image features from both the spatial and channel domains to restore the contour information of the chemicals in the blurred UV images, thereby improving the image quality.After comparison and verification, we found that our method outperforms other methods in terms of its deblurring effects and segmentation precision.Compared to the most advanced deblurring methods currently available, such as DeblurGANv2, DeepDeblur, DMPHN, and MPRNet, our method improved by 1.6% and 0.4%, respectively, in the PSNR and SSIM evaluations.
It has been verified that the improved UV-image-deblurring method, designed specifically for UV images of colorless chemicals floating on water surfaces, results in superior visual impressions and the better recovery of detail information in actual UV images after deblurring, compared with the existing deblurring methods designed for visible-light images.

Figure 2 .
Figure 2. Encoder-decoder module structure with CBAM attention mechanism added (The description of modules in different colors is consistent with Figure 1).

Figure 6 .
Figure 6.Deblurring results: (a) original blurred UV image, (b) UV image after deblurring using the method proposed in this study.

Table 1 .
Results of ablation experiments.

Table 2 .
Deblurring effect of each model.