Next Article in Journal
Error Evolutions and Analyses on Joint Effects of SST and SL via Intermediate Coupled Models and Conditional Nonlinear Optimal Perturbation Method
Previous Article in Journal
Tidal Stream Turbine Biofouling Detection and Estimation: A Review-Based Roadmap
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Area Contrast Distribution Loss for Underwater Image Enhancement

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2023, 11(5), 909; https://doi.org/10.3390/jmse11050909
Submission received: 14 March 2023 / Revised: 11 April 2023 / Accepted: 20 April 2023 / Published: 24 April 2023
(This article belongs to the Section Physical Oceanography)

Abstract

:
In this paper, we aim to design a lightweight underwater image enhancement algorithm that can effectively solve the problem of color distortion and low contrast in underwater images. Recently, enhancement methods typically optimize a perceptual loss function, using high-level features extracted from pre-trained networks to train a feed-forward network for image enhancement tasks. This loss function measures the perceptual and semantic differences between images, but it is applied globally across the entire image and does not consider semantic information within the image, which limits the effectiveness of the perceptual loss. Therefore, we propose an area contrast distribution loss (ACDL), which trains a flow model to achieve real-time optimization of the difference between output and reference in training. Additionally, we propose a novel lightweight neural network. Because underwater image acquisition is difficult, our experiments have shown that our model training can use only half the amount of data and half the image size compared to Shallow-UWnet. The RepNet network reduces the parameter size by at least 48% compared to previous algorithms, and the inference time is 5 times faster than before. After incorporating ACDL, SSIM increased by 2.70% and PSNR increased by 9.72%.

1. Introduction

Underwater robotics is quickly becoming an increasingly data-intensive and advanced engineering domain, necessitating complex operations to enable autonomous navigation and robotic automation. Autonomous underwater vehicles (AUVs) have a wide range of applications in tasks such as underwater scene analysis, oceanic biological monitoring, and human–robot collaboration [1,2,3,4].
Underwater image degradation affects the quality of images, through light scattering and attenuation. Backscattering dominates the scattering process, while red light is absorbed the most, blue–green light is the weakest. Color casts occur due to light attenuation, which depends on the wavelength and imaging range. Forward scattering is generally negligible. Many of these applications, especially on a real underwater device, require real-time interpretation of high-resolution images [5,6,7,8]. Figure 1 shows underwater images that have been degraded due to color casts, low illumination, and detail blur. At the same time, there is a need to design suitable lightweight models that can adapt to a variety of environments.
In recent years, the most popular methods for improving images usually used perceptual loss, as described by Johnson et al. [9]. These methods use super-resolution techniques to decrease errors between the output and the reference image. The method of obtaining images by minimizing a loss function has been utilized by Mahendran et al. [10] in feature inversion and by Simonyan and Yosinski et al. [11,12] in feature visualization.
Johnson et al. [9] used perceptual loss functions to train a network that relies on high-level features from a pre-trained loss network. While these methods can produce high-quality images, they are computationally expensive because inference requires solving an optimization problem.
To illustrate the proposed approach, we assume that the noise distribution observed in different marine environments, in various oceans, follows a particular distribution family, such as Laplace or Gaussian distribution. We represent the variation in patches of the same site in the output and the reference images as  R A  and  R A , respectively. Not only do low-level and high-level features depend on the image, but attention to regional information is also necessary to minimize image error. To achieve this, the contrastive coherence preserving loss (CCPL) method [13] samples patches from the output and encourages them to be close to the true ground-truth image. We treat both neighbor patches in an image as the same region character, and view the error as the error distribution of the patch, then search for an error distribution of both the output error distribution of the patch and the true ground-truth image error distribution of the patch. This loss is called the area contrast distribution loss (ACDL).
We propose a lightweight and efficient network for enhancing underwater images, called Shallow-RepNet. The RepBlock is the key element of Shallow-RepNet, designed to enhance the underwater image features and content. During inference, this method uses only one 3 × 3 convolutional layer to replace the multi-branch module used during training. It can reduce the number of parameters while maintaining the module’s performance [14]. Compared to Shallow-UWnet [15], our method is simpler and can efficiently enhance images. In summary, we have three contributions:
1. The area contrast distribution loss (ACDL) for enhancing underwater images, which encourages the distribution of output patches to be similar to the distribution of patches in the ground-truth image. ACDL has been proven to be effective in enhancing underwater images.
2. We propose Shallow-RepNet, which effectively combines lightweight and features. The resulting network is simple and efficient, making it highly promising for enhancing underwater images.
3. We compare Shallow-RepNet with other lightweight underwater image enhancement models, and the results show that it performs significantly better in regions with severe color distortion.

2. Related Works

Underwater image enhancement methods. These algorithms are to generate an image that preserves the important information from one image. A natural question is whether this information can be maintained in different parts of the scene when generating images with different structures. Hou et al. [16] introduced a novel approach for improving the quality of underwater scenes through a joint residual learning framework. The proposed method focuses on simultaneously learning and enhancing residual information to enhance underwater imagery. He et al. [17] introduced the dark-channel prior (DCP), which is widely adopted to calculate the transmission ratio and restore underwater images. Drews et al. [18] designed an underwater dark-channel prior method (UDCP) that focuses on handling blue and green underwater scenarios. Li et al. [19] proposed a color transfer method to correct color distortion in underwater images. This method involves mapping the color distribution of underwater images to the color distribution of reference images, through a color transfer process. By effectively transferring the color information from reference images to underwater images, color distortion can be effectively corrected, resulting in enhanced underwater images with improved visual quality. Liu et al. [20] proposed a method to enhance underwater images using a residual network, which has shown promising results in improving image quality. The network architecture consists of residual blocks that enable the model to learn residual mapping, which enhances the details and colors of the underwater images. The authors also introduced color attenuation before reducing the color distortion caused by water absorption and scattering. Li et al. [21] proposed a novel approach for underwater image enhancement using a gated fusion method. Specifically, they integrated three color correction techniques, namely white balance (WB), gamma correction (GC), and histogram equalization (HE), into a single framework using a gating mechanism. Islam et al. [22] proposed a method based on generative adversarial networks (GANs) to solve the image-to-image translation problem for underwater image enhancement. Their method involves training a generator network to map the degraded underwater images to the corresponding enhanced images, and a discriminator network to distinguish between the generated and ground-truth images. Naik et al. [15] proposed an approach to improve machine learning models by achieving comparable performance with lightweight models. The authors introduced a shallow residual network that uses only a few convolutional layers to extract content and detail features, reducing the model’s complexity and computational cost. Liu et al. [23] introduced an innovative approach to address degradation problems in image processing. Specifically, they proposed an adaptive learning attention network, which utilizes an attention mechanism to adaptively allocate weights to features in different levels of the network. Zhou et al. [24] proposed a new approach for underwater image enhancement, by incorporating a physical constraint as a feedback controller. The physical model is designed to simulate the underwater imaging process, and the feedback controller adjusts the model parameters to guide the enhancement process. Gunjan et al. [25] developed a lightweight fusion-based convolution neural network (FCNN) for enhancing the visual content of underwater images. Their method uses multi-scale fusion to combine information from different scales of the input image, followed by a convolutional neural network to learn the image features. Liu et al. [26] introduced a conditional generative adversarial network (CGAN) that is designed for underwater image color correction. This approach leverages multilevel feature fusion to better capture the complex and dynamic color distortions present in underwater images.
Normalizing flow [27] in image enhancement. Normalizing flow has been proven to be an effective technique for capturing complex distributions and modeling data, and it has shown great potential for improving the quality of underwater images. Several recent articles have used normalizing flows to construct priors for super-resolution. Lugmayr et al. [28] proposed a novel approach, called super-resolution space with normalizing flow (SRFlow), to address the super-resolution problem. Their method utilizes normalizing flows to learn the probability distribution of the high-resolution images given their low-resolution counterparts, allowing for more effective image super-resolution. By leveraging the power of normalizing flows, their approach can model complex distributions while maintaining computational efficiency. Liang et al. [29] presented a novel method, called hierarchical conditional flow (HCFlow), for learning a bijective mapping between high-resolution and low-resolution image pairs. HCFlow is based on the concept of normalizing flows and uses a hierarchical structure of conditional affine coupling layers to capture complex dependencies between image patches. Wang et al. [30] used a normalizing flow model to obtain better exposed images. We leverage normalizing flows to estimate the error of the output distribution and the distribution of the true ground-truth image, to estimate the noise of the marine environment.
Image enhancement loss function. In this paper, we propose that the noise distribution of the marine environment is learnable. Several studies have explored the use of adaptive loss functions. Johnson et al. [9] used feed-forward networks to extract high-level features to optimize a perceptual loss function. Li et al. [31] proposed a novel residual log-likelihood estimation (RLE) method to address the distributional shift problem. The RLE method employs a residual neural network to learn the residual distribution shift between a source domain and a target domain. The RLE method has achieved excellent results in various computer vision tasks, including human pose estimation, and has demonstrated its effectiveness in handling complex distributions. Wu et al. [32] presented a novel contrastive learning loss function, that maintains the coherence of the content source. In contrast to previous methods, we apply normalizing flows to the loss function of image processing, enabling the effective fitting of distributions.
Due to the harsh underwater environment and poor image quality, underwater image enhancement is particularly challenging. In this paper, we propose a compressed model, Shallow-RepNet, which achieves comparable performance to existing state-of-the-art algorithms, but with fewer parameters and faster image processing speeds. ACDL is a novel method proposed in our research, to address the challenge of training content coherence in image enhancement. We utilize random cropping of image patches and apply RLE loss to encourage the similarity of patches in both distributions of the noise.

3. Method

3.1. Area Contrast Distribution Loss (ACDL)

Our work focuses on addressing issues with the perceptual loss function and the flow model’s loss function when processing underwater images. Specifically, the perceptual loss function does not consider semantic information within the image. The flow model’s loss function directly uses the image distribution, which makes it difficult to derive the correct distribution. To solve these problems, we have designed a new loss function that consists of two main parts.
The first part involves randomly sampling a point from the model output image and extracting a feature of consistent width and height based on the sampled point. This feature can be understood as a randomly cropped image patch from the original image, with red, green, and blue color channels. We extract the eight-neighborhood images of the sampled image. The purpose of this is to use the information from the surrounding images to constrain the optimization of this local information and retain the image features of this local region.
Due to the complexity of the marine environment, the noise in underwater images is difficult to determine, and it is necessary to consider how to process the data. Inspired by Kalman filtering [33], we have adopted a processing method that uses the difference image between the eight-neighborhood image and the sampled image. This method can effectively deal with the noise brought by the ocean environment. Note, this paper only sampled eight points for each image, resulting in 64 different images. Of course, increasing the number of image sampling points can help with training the model. By ensuring that the sampling point positions of the output image remain unchanged, the above method can be used to reference the ground-truth image. Finally, a different image is obtained.
In the concrete application of ACDL, the input image and ground-truth image both have three channels: R, G, and B. We randomly sample N image patches from each of the two types of images separately, denoted as  R A  and  R B  (shown in Figure 2). Assuming that the randomly cropped image from  R A  is  I i x , where  x = 1 , , N , its eight neighborhoods can be represented by  I j x , y , where  y = 1 , , 8 . Then, we sample from  R B  at the same location to obtain  E i x  and  E j x , y . The equation is as follows:
F g x , y = I i x I j x , y , F c x , y = E i x E j x , y
d k = F g x , y F c x , y
where ⊖ represents subtraction to calculate the image differences.  F g  and  F c  are the difference images obtained by randomly cropping the input and ground-truth images.
The second part of this method focuses on predicting the distribution of ocean noise. In previous methods, the noise distribution of different channels was not considered, even though the light of different wavelengths experiences different effects in water. To better infer this distribution, we flatten the difference images into a one-dimensional vector, called the difference vector. The difference vector of the ground-truth image needs to be subtracted from that of the output image to obtain the quadratic difference vector.
To avoid overfitting, we use block information to infer the noise distribution. Our goal is not to obtain the true noise distribution, but to infer an approximate noise distribution that can represent all ocean noise distributions.
We first map the quadratic difference vector to a high-dimensional distribution space and then compute the residual log-likelihood estimation (RLE) loss. We assume that all underlying distributions belong to the same density function family, but their mean and variance differ based on the input  I . The RLE loss is defined as follows:
d ¯ k = d k μ / σ
L rle = log P Θ , ϕ ( x I ) x = d k = log P ϕ d ¯ k + log σ = log Q d ¯ k log G ϕ d ¯ k log s + log σ
where  I  represents the reference image.  P Θ , ϕ ( x I )  is a probability density function that depends on the regression model  Θ  and the flow model.  Θ  is used to predict the mean µ and variance  σ ϕ  is the learnable parameters of the flow model.  P ϕ ·  uses flow models to map a zero-mean initial distribution to a zero-mean deformed distribution. We assume that  Q d k  is approximately close to the target distribution, but does not completely approach the true distribution. For simplicity, we temporarily set it to a Gaussian distribution.  G ϕ d k  is the distribution that the flow model needs to learn, and s is a constant.

3.2. Reparameterization Underwater Image Enhancement

ACDL technology can maintain consistency in the enhanced regions of underwater images. Hence, we design a lightweight network that is simple, efficient, and produces better underwater image enhancement results at a lower cost. Ding et al. [14] proposed the RepVGG algorithm, which offers higher accuracy and lower computational cost than other lightweight convolutional neural networks, making it more convenient for applications on embedded devices. Ankita et al. [15] proposed a shallow underwater image enhancement algorithm called Shallow-UWnet, which employs dense connections to three connected convolutional networks and generates enhanced underwater images using the final convolution layer with three kernels. However, we believe that the structure of Shallow-UWnet can be simplified, and the use of dense connections can lead to excessively large memory access cost (MAC) [34,35,36,37]. Models that have lower MAC are typically preferred, because they necessitate less memory access, resulting in lower power and computational resource consumption.
We propose a novel model, Shallow-RepNet, that enhances underwater images by integrating RepVGG and Shallow-UWnet. As shown in Figure 3, Shallow-RepNet comprises three connected RepBlocks. Our model takes RGB underwater images with a resolution of 112 × 112 as input. Firstly, the original image undergoes feature extraction through a 3 × 3 convolutional kernel. Then, the feature is concatenated and further enhanced by the RepBlocks, where each module includes multiple scale convolutions and nonlinear activation functions. Finally, a 3 × 3 convolutional layer integrates these features to generate the enhanced underwater images. We will elaborate on the RepBlock, which comprises three parallel residual layers, a 1 × 1 convolution, and a 3 × 3 convolution. The outputs of each layer are concatenated and activated by ReLU.
Batch normalization (BN) layers are challenging to use in image enhancement networks, particularly for low-light underwater image enhancement. The complexity and diversity of the underwater environment make it difficult to stabilize the mean and variance of the training data. Additionally, the internal covariate shift problem is not severe in shallow networks, adding BN layers may not significantly improve network performance but rather increase computational complexity and training time. However, we use the reparameterization method to increase the complexity of the model, which largely preserves the advantages of BN layers and mitigates their drawbacks. In the RepBlock, skip connections are used to avoid overfitting and utilize residual connections to propagate initial information to the later layers, improving the flow of gradients and avoiding the problems of gradient vanishing and explosion. This process can compensate for the disadvantage of high MAC, caused by dense connections in Shallow-UWnet. The following formula is the mathematical expression for calculating the BN layer together with the convolutional layer.
B N = x i u σ 2 + ε γ + β
W = w i · γ σ 2 + ε , b = β μ σ 2 + ε
where  γ  and  β  refer to two trainable parameters.  ε  is a number that is very close to zero.  w i  represents the weights of the BN layer, while W and b represent the weights and bias, respectively, of the convolution layer after fusion with the BN layer.
Our underwater image enhancement network uses only five convolutional layers during inference, to extract content and detail features, resulting in enhanced underwater images that meet real-world engineering requirements. Experimental results demonstrate that our Shallow-RepNet performs better than existing methods in terms of underwater image enhancement.

3.3. Loss Function

In addition to the proposed ACDL loss function, we also employed three commonly used underwater image loss functions.
The pixel loss,  L 1 , is utilized to maintain pixel consistency. It is presented by:
L 1 = 1 M i = 1 M I g i I l i
where  I g i  and  I l i  represent the enhanced image and the reference image, respectively. M is a dataset.
Perceptual loss based on the VGG network [38], which was pre-trained on the ImageNet dataset [39], is utilized for feature reconstruction of underwater images. The definition of the perceptual loss function is as follows:
L v g g = 1 C W H c = 1 C w = 1 W h = 1 H V I c , w , h c V I c , w , h g 2
C, W, and H denote the channel, width, and height of the image, respectively. V represents the nonlinear transformation through the VGG-19 network.
The following equation is the structural similarity index (SSIM) [40] loss function. The purpose of this function is to enhance local structure and details. SSIM can be mathematically defined as:
SSIM ( x , y ) = 2 μ x μ y + C 1 2 σ x y + C 2 μ x 2 + μ y 2 + C 1 σ x 2 + σ y 2 + C 2
where  C 1  and  C 2  are constants used to maintain stability. Here, x and y represent the input image and the reference image, respectively. The  L ssim  can be written as:
L ssim I g , I l = 1 SSIM I g , I l
The overall training loss is a weighted sum of these four losses, as described below:
L totoal = λ 1 · L 1 + λ v g g · L v g g + λ ssim · L ssim + λ a c d · L acd
where  λ 1 , λ v g g , λ ssim ,  and  λ a c d  are adjustable parameters.

4. Experiments

4.1. Implementation Details

We utilized UIEBD to replicate realistic underwater environments. This dataset comprises 890 sets of paired underwater images, captured under varying lighting conditions, featuring diverse color ranges and degrees of contrast. The reference images were meticulously generated through pairwise comparisons. Although these data are not perfect for data-driven neural networks, they can effectively test the validity of the algorithm. During the testing phase, the images are randomly cropped to 256 × 256.

4.2. Test Datasets

We aim to demonstrate the effectiveness of our proposed model in various underwater scenarios and its ability to process underwater images in real time. To achieve this, we utilized the UIEBD as our training set because it offers a comprehensive range of scenarios at a minimal cost. In practical engineering situations, acquiring a vast number of underwater images is challenging. Furthermore, higher-resolution images generally yield clearer results, as is widely acknowledged. Hence, by using smaller images as our training set, we aim to improve the algorithm’s robustness when the resolution increases.
To assess the performance of our algorithm, we employed different test sets. Specifically, we utilized the UFO-120 dataset [41], which includes images of various complex underwater environments and contains a subset of 120 images for testing. Additionally, we used the EUVP dataset [22], which comprises 5500 pairs of underwater images with low-light backgrounds, and we used a sample of 1000 images as our test set. The reference images in the ground-truth of EUVP were generated using CycleGAN [42]. The input image sizes varied, and we resized them to 256 × 256.

4.3. Metrics

The performance of the model is evaluated based on two metrics: peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). PSNR, introduced by Korhonen et al. [43], measures the ratio of the maximum possible power of a signal to the power of corrupting noise that affects the fidelity of its representation; SSIM measures the structural similarity between two images by comparing their luminance. By utilizing both metrics, we can assess both the quality of image reconstruction and the preservation of structural features in the enhanced output.
Unsupervised evaluation metrics are equally important, since real reference images may not always exist in practical scenarios. Therefore, we introduced the underwater image quality measure (UIQM) [44], which measures three underwater image characteristics: image colorfulness (UICM), sharpness (UISM), and contrast (UIConM). Each attribute evaluates a different aspect of the degradation in underwater images. The UIQM metrics are given by:
UIQM = c 1 × UICM + c 2 × UISM + c 3 × UIConM
where  c 1 = 0.0282 c 2 = 0.2953 , and  c 3 = 3.5753 .

4.4. Comparison with Former Methods

Subjective evaluation is the evaluation of image quality based on human perception. In our study, we conducted a subjective evaluation of the enhanced underwater images using our proposed algorithm and compared it with other state-of-the-art algorithms. We randomly selected images from the UIEBD dataset The evaluators were asked to consider factors such as color restoration, contrast enhancement, and overall visual quality (Figure 4). The computer used to test the performance of the algorithms has an Intel(R) Core(TM) i7-8700 CPU @ 3.20 GHz processor, 32 GB of RAM, and an NVIDIA GeForce RTX 2080 Ti graphics card. It runs on the Ubuntu 18.04.5 LTS (Bionic Beaver) operating system. The VGG-19 network was run on this computer, with a batch size of eight.
On the other hand, objective evaluation is the quantitative analysis of image quality using various metrics such as PSNR, SSIM, and UIQM. We used these metrics to evaluate the performance of our proposed algorithm and compared it with other algorithms. We used the UIEBD dataset for training and testing, and other state-of-the-art algorithms were trained on EUVP datasets. We restricted the training conditions of our algorithm as much as possible, to demonstrate its effectiveness. In summary, our evaluation includes both subjective and objective evaluations, which together provide a comprehensive assessment of the performance of our proposed algorithm.
The underwater image enhancement algorithm we propose was compared with five other algorithms: Water-Net, FUnIE-GAN, Deep SESR [45], and Shallow-UWnet. Our algorithm can overcome the problem of over-enhancement in images, which can be clearly observed from the first and second rows of Figure 5. After subjective comparison, although our algorithm is not perfect, it is completely sufficient for practical applications. The Deep SESR algorithm is almost perfect in detail and content restoration, especially in color transformation. WaterNet feels similar to the previous algorithm subjectively, but its color restoration is slightly unsatisfactory, due to the influence of its reference image. The FUnIE-GAN algorithm is a lightweight model, and the image may be over-enhanced. The overall performance of Shallow-UWnet is relatively average, and there is no significant difference from the above algorithms subjectively.
In Figure 6, we show several examples of the enhancement from the ablation study. Compared with its ablated versions, the proposed Shallow-RepVGG produces more visually pleasing enhancements. In contrast,  L V g g  usually only focuses on the local enhancement while sometimes failing to consider semantic information within the image. With ACDL without  L V g g , Shallow-RepVGG sometimes receives worse recovery for local details. The red rectangles highlight some obvious visual quality improvements.
Underwater image enhancement is an important research area due to the unique challenges posed by underwater imaging. To evaluate the performance of different algorithms, objective analysis using various metrics is necessary. In Table 1, we conducted an objective analysis of various algorithms using metrics such as PSNR, SSIM, and UIQM. The proposed algorithm, trained using the UIEBD dataset, outperformed all other algorithms in terms of objective metrics on the same dataset. However, on other datasets, the metrics of the proposed algorithm were similar to those of other algorithms. The Shallow-UWnet algorithm has the highest values for PSNR, SSIM, and UIQM, likely due to the dataset used. The Shallow-RepNet algorithm has a low PSNR score, mainly due to the limited quantity of data available. Overall, objective analysis using various metrics is essential to evaluate the performance of underwater image enhancement algorithms.
In our study, we used Shallow-UWnet as the baseline model and sought to improve its performance. The UIEBD test set is used for experiments, and the results are presented in Table 2. We replaced ConvBlock with RepBlock and observed a significant improvement in both the SSIM and PSNR metrics. Furthermore, we integrated our proposed ACDL method into the model, which led to a further improvement in performance. These results demonstrate the effectiveness of both techniques in improving the model’s performance. Comparing the SSIM and PSNR of different models, it is evident that the Shallow-RepNet+BN+ L acd  model achieved significant performance improvement over the baseline model, and is well-suited for underwater image enhancement applications. The overall advantage of our proposed model is its efficiency, as it can process test images much faster than other state-of-the-art models, making it a practical solution for real-time underwater image enhancement.
We evaluated the model compression performance metric proposed by Cheng et al. [46], and compared our model’s performance to other algorithms (Table 3). Our proposed model has fewer training parameters during the training and testing phases, which is only half the number of parameters of Shallow-UWnet, and more than an order of magnitude less than the number of training parameters of other algorithms. The compression ratio represents the ratio of the compression model size to the original model size, which is a crucial factor in real engineering applications.
As shown in Table 3, our algorithm can process nearly 50 times as many images per second as Shallow-UWnet. During the inference phase, only five convolutions are used to enhance the image, making it much faster than other algorithms. These results demonstrate that our proposed algorithm is not only effective but also efficient, making it a practical solution for real-time underwater image enhancement applications. We use the best-performing model from Table 2.

5. Conclusions

We introduce a lightweight and efficient model for enhancing underwater images, that utilizes multilayer convolutional neural networks. Our proposed approach combines the two techniques, ACDL, and the efficient model, to produce satisfying results in terms of enhanced image quality. We also demonstrate that our model is much faster in processing test images, making it a practical option for real-time underwater image enhancement applications. The effectiveness and efficiency of our method are validated on a large dataset of underwater images, which highlights its potential in a variety of practical applications in the field of underwater imaging.
A potential application of Shallow-RepNet is for lightweight underwater vehicles with limited computing power. By reducing the size and complexity of the model, our algorithm may be more practical and efficient for such devices.
However, there are still some limitations in our approach. First, the dataset we used for testing was relatively small, and future research could expand upon this with larger and more diverse datasets. Additionally, our model currently focuses on enhancing image quality, but there may be opportunities to incorporate other features such as object detection or recognition.

Author Contributions

Conceptualization, J.Z. (Jiajia Zhou); methodology, J.Z. (Jiajia Zhou); software, J.Z. (Jiajia Zhou) and J.Z. (Junbin Zhuang); validation, J.Z. (Junbin Zhuang); formal analysis, J.Z. (Junbin Zhuang); investigation, J.Z. (Junbin Zhuang) and Y.Z.; resources, J.Z. (Junbin Zhuang) and Y.Z.; data curation, Y.Z. and J.Z. (Junbin Zhuang); writing—original draft preparation, J.Z. (Junbin Zhuang); writing—review and editing, Y.Z. and J.Z. (Junbin Zhuang); visualization, J.Z. (Junbin Zhuang) and Y.Z.; supervision, J.L.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by grants from National Natural Science Foundation of China, no. 51609048 and no. 62101156.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, J.; Skinner, K.A.; Eustice, R.M.; Johnson-Roberson, M. WaterGAN: Unsupervised generative network to enable real-time color correction of monocular underwater images. IEEE Robot. Autom. Lett. 2017, 3, 387–394. [Google Scholar] [CrossRef] [Green Version]
  2. Islam, M.J.; Ho, M.; Sattar, J. Understanding human motion and gestures for underwater human–robot collaboration. J. Field Robot. 2019, 36, 851–873. [Google Scholar] [CrossRef] [Green Version]
  3. Du, X.; Hu, X.; Hu, J.; Sun, Z. An adaptive interactive multi-model navigation method based on UUV. Ocean Eng. 2022, 267, 113217. [Google Scholar] [CrossRef]
  4. Schettini, R.; Corchs, S. Underwater image processing: State of the art of restoration and image enhancement methods. EURASIP J. Adv. Signal Process. 2010, 2010, 746052. [Google Scholar] [CrossRef] [Green Version]
  5. Ancuti, C.; Ancuti, C.O.; Haber, T.; Bekaert, P. Enhancing underwater images and videos by fusion. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 81–88. [Google Scholar]
  6. Ghani, A.S.A.; Isa, N.A.M. Underwater image quality enhancement through integrated color model with Rayleigh distribution. Appl. Soft Comput. 2015, 27, 219–230. [Google Scholar] [CrossRef]
  7. Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
  8. Shortis, M.; Abdo, E.H.D. A review of underwater stereo-image measurement for marine biology and ecology applications. In Oceanography and Marine Biology; CRC Press: Boca Raton, FL, USA, 2016; pp. 269–304. [Google Scholar]
  9. Johnson, J.; Alahi, A.; Li, F. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer International Publishing: Cham, Switzerland, 2016; pp. 694–711. [Google Scholar]
  10. Mahendran, A.; Vedaldi, A. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5188–5196. [Google Scholar]
  11. Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
  12. Yosinski, J.; Clune, J.; Nguyen, A.; Fuchs, T.; Lipson, H. Understanding neural networks through deep visualization. arXiv 2015, arXiv:1506.06579. [Google Scholar]
  13. Wu, Z.; Zhu, Z.; Du, J.; Bai, X. CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part XVI. Springer Nature: Cham, Switzerland, 2022; pp. 189–206. [Google Scholar]
  14. Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
  15. Naik, A.; Swarnakar, A.; Mittal, K. Shallow-uwnet: Compressed model for underwater image enhancement (student abstract). Proc. AAAI Conf. Artif. Intell. 2021, 35, 15853–15854. [Google Scholar] [CrossRef]
  16. Hou, M.; Liu, R.; Fan, X.; Luo, Z. Joint residual learning for underwater image enhancement. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4043–4047. [Google Scholar]
  17. He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar]
  18. Drews, P.L.; Nascimento, E.R.; Botelho, S.S.; Campos, M.F.M. Underwater depth estimation and image restoration based on single images. IEEE Comput. Graph. Appl. 2016, 36, 24–35. [Google Scholar] [CrossRef]
  19. Li, C.; Guo, J.; Guo, C. Emerging from water: Underwater image color correction based on weakly supervised color transfer. IEEE Signal Process. Lett. 2018, 25, 323–327. [Google Scholar] [CrossRef] [Green Version]
  20. Liu, C.; Wu, Q.; Wang, R.; Yang, W. Underwater Image Enhancement using a Residual Network. IEEE Trans. Image Process. 2018, 28, 6076–6087. [Google Scholar]
  21. Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef] [Green Version]
  23. Liu, S.; Fan, H.; Lin, S.; Wang, Q.; Ding, N.; Tang, Y. Adaptive learning attention network for underwater image enhancement. IEEE Robot. Autom. Lett. 2022, 7, 5326–5333. [Google Scholar] [CrossRef]
  24. Zhou, Y.; Yan, K.; Li, X. Underwater image enhancement via physical-feedback adversarial transfer learning. IEEE J. Ocean. Eng. 2021, 47, 76–87. [Google Scholar] [CrossRef]
  25. Verma, G.; Kumar, M.; Raikwar, S. FCNN: Fusion-based underwater image enhancement using multilayer convolution neural network. J. Electron. Imaging 2022, 31, 063039. [Google Scholar] [CrossRef]
  26. Liu, X.; Gao, Z.; Chen, B.M. MLFcGAN: Multilevel feature fusion-based conditional GAN for underwater image color correction. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1488–1492. [Google Scholar] [CrossRef] [Green Version]
  27. Papamakarios, G.; Nalisnick, E.; Rezende, D.J.; Mohamed, S.; Lakshminarayanan, B. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 2021, 22, 2617–2680. [Google Scholar]
  28. Lugmayr, A.; Danelljan, M.; Van Gool, L.; Timofte, R. Srflow: Learning the super-resolution space with normalizing flow. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part V 16. Springer International Publishing: Cham, Switzerland, 2020; pp. 715–732. [Google Scholar]
  29. Liang, J.; Lugmayr, A.; Zhang, K.; Danelljan, M.; Van Gool, L.; Timofte, R. Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 4076–4085. [Google Scholar]
  30. Wang, Y.; Wan, R.; Yang, W.; Li, H.; Chau, L.P.; Kot, A. Low-light image enhancement with normalizing flow. Proc. AAAI Conf. Artif. Intell. 2022, 36, 2604–2612. [Google Scholar] [CrossRef]
  31. Li, J.; Bian, S.; Zeng, A.; Wang, C.; Pang, B.; Liu, W.; Lu, C. Human pose regression with residual log-likelihood estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 11025–11034. [Google Scholar]
  32. Wu, L.; Tian, F.; Xia, Y.; Fan, Y.; Qin, T.; Jian-Huang, L.; Liu, T.Y. Learning to teach with dynamic loss functions. In Advances in Neural Information Processing Systems; Cornell University: Ithaca, NY, USA, 2018; Volume 31. [Google Scholar]
  33. Kalman, R.E.; Bucy, R.S. New results in linear filtering and prediction theory. J. Basic Eng. 1961, 83, 95–108. [Google Scholar] [CrossRef] [Green Version]
  34. Cameron, K.W.; Sun, X.H. Quantifying locality effect in data access delay: Memory logP. In Proceedings of the International Parallel and Distributed Processing Symposium, Nice, France, 22–26 April 2003. [Google Scholar]
  35. Lam, M.D.; Rothberg, E.E.; Wolf, M.E. The cache performance and optimizations of blocked algorithms. ACM SIGOPS Oper. Syst. Rev. 1991, 25, 63–74. [Google Scholar] [CrossRef]
  36. Wang, Z.; Liu, X.; Yang, J.; Michailidis, T.; Swanson, S.; Zhao, J. Characterizing and modeling non-volatile memory systems. In Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece, 17–21 October 2020; pp. 496–508. [Google Scholar]
  37. Sakr, M.F.; Levitan, S.P.; Chiarulli, D.M.; Horne, B.G.; Giles, C.L. Predicting multiprocessor memory access patterns with learning models. ICML 1997, 97, 305–312. [Google Scholar]
  38. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  39. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  40. Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
  41. Islam, M.J.; Luo, P.; Sattar, J. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv 2020, arXiv:2002.01155. [Google Scholar]
  42. Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
  43. Korhonen, J.; You, J. Peak signal-to-noise ratio revisited: Is simple beautiful? In Proceedings of the 2012 Fourth International Workshop on Quality of Multimedia Experience, Melbourne, VIC, Australia, 5–7 July 2012; pp. 37–38. [Google Scholar]
  44. Yang, M.; Sowmya, A. An underwater color image quality evaluation metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef]
  45. Cai, J.; Zhang, W.; Lin, W.; Liu, S.; Wang, Y. Deep self-example-based super-resolution for 489 underwater images. IEEE Trans. Image Process. 2019, 29, 3599–3612. [Google Scholar]
  46. He, Y.; Lin, J.; Liu, Z.; Wang, H.; Li, L.J.; Han, S. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 784–800. [Google Scholar]
Figure 1. Underwater images taken in diverse underwater scenes.
Figure 1. Underwater images taken in diverse underwater scenes.
Jmse 11 00909 g001
Figure 2. Proposed ACDL:  R A  and  R B  represent the input and reference RGB images, and we simultaneously extract blocks from the three channels. ⊖ denotes vector subtraction, and normalizing flow means residual log-likelihood estimation loss. The green dashed lines depict the generation process of the positive pair.
Figure 2. Proposed ACDL:  R A  and  R B  represent the input and reference RGB images, and we simultaneously extract blocks from the three channels. ⊖ denotes vector subtraction, and normalizing flow means residual log-likelihood estimation loss. The green dashed lines depict the generation process of the positive pair.
Jmse 11 00909 g002
Figure 3. Model architecture: Models A and B are equivalent, with model A being the training network during the training phase and model B being the network used during the testing phase. RepBlock contains information from three different scales, and during the testing phase we only need to use one convolution to replace it. Therefore, in the end, we need to use five convolutional networks, instead of the original large network, during the testing phase.
Figure 3. Model architecture: Models A and B are equivalent, with model A being the training network during the training phase and model B being the network used during the testing phase. RepBlock contains information from three different scales, and during the testing phase we only need to use one convolution to replace it. Therefore, in the end, we need to use five convolutional networks, instead of the original large network, during the testing phase.
Jmse 11 00909 g003
Figure 4. Underwater image enhancement by Shallow-RepNet. Top row: degraded underwater images. Bottom row: the corresponding results of our model. These images are from the underwater image enhancement benchmark dataset (UIEBD) [21], with a size of 112 × 112.
Figure 4. Underwater image enhancement by Shallow-RepNet. Top row: degraded underwater images. Bottom row: the corresponding results of our model. These images are from the underwater image enhancement benchmark dataset (UIEBD) [21], with a size of 112 × 112.
Jmse 11 00909 g004
Figure 5. Experimental comparison of underwater images.
Figure 5. Experimental comparison of underwater images.
Jmse 11 00909 g005
Figure 6. Examples of the ablation study are shown from left to right: raw images, enhancements with Shallow-RepVGG without ACDL and with  L V g g , enhancements with ACDL without  L V g g , and enhancements with both ACDL and  L V g g . The red rectangles highlight some obvious visual quality improvements.
Figure 6. Examples of the ablation study are shown from left to right: raw images, enhancements with Shallow-RepVGG without ACDL and with  L V g g , enhancements with ACDL without  L V g g , and enhancements with both ACDL and  L V g g . The red rectangles highlight some obvious visual quality improvements.
Jmse 11 00909 g006
Table 1. Underwater image enhancement performance metric.
Table 1. Underwater image enhancement performance metric.
MetricDatasetsWater-Net [21]FUnIE-GAN [22]Deep SESR [45]Shallow-UWnet [15]Shallow-RepNet
PSNREUVP-Dark24.43 ± 4.6426.19 ± 2.8725.30 ± 2.6327.39 ± 2.7024.49 ± 2.45
UFO-12023.12 ± 3.3124.72 ± 2.5726.46 ± 3.1325.20 ± 2.8822.32 ± 2.42
UIEBD19.11 ± 3.6819.13 ± 3.9119.26 ± 3.5618.99 ± 3.6019.80 ± 2.76
SSIMEUVP-Dark0.82 ± 0.080.82 ± 0.080.81 ± 0.070.83 ± 0.070.79 ± 0.06
UFO-1200.73 ± 0.070.74 ± 0.060.78 ± 0.070.78 ± 0.070.72 ± 0.07
UIEBD0.79 ± 0.090.73 ± 0.110.73 ± 0.110.67 ± 0.130.77 ± 0.08
UIQMEUVP-Dark2.97 ± 0.322.84 ± 0.462.95 ± 0.322.98 ± 0.382.82 ± 0.29
UFO-1202.94 ± 0.382.88 ± 0.412.98 ± 0.372.85 ± 0.372.98 ± 0.33
UIEBD3.02 ± 0.342.99 ± 0.392.95 ± 0.392.77 ± 0.432.79 ± 0.32
Table 2. Shallow-RepNet structure ablation experiments on UIEBD.
Table 2. Shallow-RepNet structure ablation experiments on UIEBD.
MethodSSIM (↑)PSNR (↑)
Shallow-UWnet0.65 ± 0.0915.81 ± 2.79
Shallow-RepNet+BN0.74 ± 0.0918.05 ± 2.73
Shallow-RepNet+BN+ L acd 0.76 ± 0.0719.80 ± 2.93
Table 3. Model compression performance metric.
Table 3. Model compression performance metric.
Model#ParametersCompression RatioTesting per Image (secs)Speed-Up
Shallow-RepNet (Train)127,555---
Shallow-RepNet (Test)114,3070.90.00390.7
Shallow-UWnet219,84010.021
Water-Net1,090,6683.960.5024
Deep SESR2,454,02310.170.167
FUnIE-GAN4,212,70718.170.188
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, J.; Zhuang, J.; Zheng, Y.; Li, J. Area Contrast Distribution Loss for Underwater Image Enhancement. J. Mar. Sci. Eng. 2023, 11, 909. https://doi.org/10.3390/jmse11050909

AMA Style

Zhou J, Zhuang J, Zheng Y, Li J. Area Contrast Distribution Loss for Underwater Image Enhancement. Journal of Marine Science and Engineering. 2023; 11(5):909. https://doi.org/10.3390/jmse11050909

Chicago/Turabian Style

Zhou, Jiajia, Junbin Zhuang, Yan Zheng, and Juan Li. 2023. "Area Contrast Distribution Loss for Underwater Image Enhancement" Journal of Marine Science and Engineering 11, no. 5: 909. https://doi.org/10.3390/jmse11050909

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop