Low-Light Image Enhancement via Retinex-Style Decomposition of Denoised Deep Image Prior

Low-light images are a common phenomenon when taking photos in low-light environments with inappropriate camera equipment, leading to shortcomings such as low contrast, color distortion, uneven brightness, and high loss of detail. These shortcomings are not only subjectively annoying but also affect the performance of many computer vision systems. Enhanced low-light images can be better applied to image recognition, object detection and image segmentation. This paper proposes a novel RetinexDIP method to enhance images. Noise is considered as a factor in image decomposition using deep learning generative strategies. The involvement of noise makes the image more real, weakens the coupling relationship between the three components, avoids overfitting, and improves generalization. Extensive experiments demonstrate that our method outperforms existing methods qualitatively and quantitatively.


Introduction
With the great breakthrough of deep learning in the field of computer vision technology, image processing has been widely used in many fields, e.g., face recognition [1], defect detection [2], medical image retrieval [3], traffic information systems [4], and text recognition [5]. Image defects can be attributed to uncontrolled factors such as insufficient lighting conditions and non-uniform lighting during image capture. These unfavorable elements can be disturbed by backlighting, underexposure, and night-time conditions. Low-light images are usually noisy, low-contrast, color-distorted, and quality-impaired. These shortcomings not only result in an unpleasant visual experience but also affect the performance of many computer vision systems, e.g., for image recognition, object detection, and image segmentation.
Image enhancement has a wide range of applications in different fields, e.g., underwater images [6], high-speed railway images [7], and robot vision [8]. In general, there are two ways to improve the image quality. One is to improve the hardware performance of photographic equipment and the other is to process the obtained image. However, the former has disadvantages such as manufacturing difficulties, high cost, and complicated technology. Therefore, in practical applications, improving the quality of low-light images through enhancement algorithms is of great significance. Low-light image enhancement has two main purposes: improving contrast and suppressing noise. The enhanced image is more suitable for human observation and computer vision systems.
Related studies on low-light image enhancement are reviewed, including those using conventional methods and deep learning methods. Traditional low-light enhancement methods include methods based on histogram equalization (HE) and the Retinex model. Histogram equalization is a method of using an image histogram to adjust contrast in the field of image processing (BPDHE [9], DHE [10], histogram modification [11]). HE methods may increase the contrast of noise and reduce the contrast of useful signals. In view of the shortcomings of the HE method, many improved versions have been proposed, e.g., clipped AHE [12], CLAHE [13], CVC [14], and contrast enhancement algorithm [15].
Retinex model-based methods decompose low-light images into reflection and illumination components [16]. Given a low-light image S, it can be decomposed into S = R I, where S represents the low-light image, R represents the reflectance, I represents the illumination map, and represents the dot product operation. In addition, many improved versions of Retinex models have been derived from the Retinex theory, including the singlescale Retinex model [17], the multi-scale Retinex model [18], the naturalness preserved enhancement algorithm [19], the fusion-based enhancing method [20], and illumination map estimation [21]. There are also some algorithms based on the variational Retinex model, e.g., the variational Retinex model formulated as a quadratic optimization problem [22], the variational framework for Retinex introducing a bright channel [23], the variational Retinex model based on the L 2 -norm [24], the hybrid L 2 -L p variational model with bright channel prior [25], and the maximum-entropy-based Retinex model [26]. Based on the computational complexity of variational methods, the disadvantage of this method is that processing images is time-consuming.
With the development of artificial intelligence, deep learning methods have also been widely used in the field of low-light image enhancement. Lore et al. [27] proposed a method of enhancing natural low-light images using a stacked sparse denoising autoencoder. Tao et al. [28] introduced a CNN method utilizing multi-scale feature maps to perform low-light image enhancement. Ignatov et al. [29] proposed a residual convolutional network that combines the composite perceptual error functions of content, color, and texture losses to improve the color and detail sharpness of the image. Shen et al. [30] put forward a convolutional neural network that directly learns the end-to-end mapping between dark and bright images for low-light image enhancement. Gharbi et al. [31] introduced a neural network architecture using input/output image pairs to perform image augmentation in real time and with full-resolution images. Wei et al. [32] designed a deep network called Retinex-Net based on the Retinex model, including Decom-Net for decomposition and Enhance-Net for lighting adjustment. Wang et al. [33] proposed a convolutional neural network based on the global prior information generated in the encoder-decoder network to enhance images. Chen et al. [34] presented a fully end-to-end convolutional network for processing low-light images using raw image data. Chen et al. [35] proposed an unpaired learning method for image enhancement based on a bidirectional generative adversarial network (GAN) framework. Zhang et al. [36] constructed an efficient network (KinD) trained on paired images shot under different exposure conditions. Wang et al. [37] proposed a neural network for enhancing underexposed photos by introducing intermediate lighting into the network to correlate the input with the expected enhancement result. Jiang et al. [38] proposed an unsupervised generative adversarial network trained with unpaired images. Yang et al. [39] suggested a semi-supervised learning method for low-light image enhancement based on a deep recursive band network (DRBN). Lv et al. [40] presented an end-to-end lightweight network for non-uniform illumination image enhancement that retains the advantages of the Retinex model and overcomes its limitations. Wang et al. [41] proposed the Deep Lightening Network (DLN) composed of several lightening back-projection (LBP) blocks to estimate residuals between low-light and normal-light images and the residual between low and normal light images. Zhu et al. [42] proposed the Edge-Enhanced Multi-Exposure Fusion Network (EEMEFN), which includes a multi-exposure fusion module and an edge enhancement module to enhance extremely low-light images. Liu et al. [43] obtained a Retinex-inspired Unrolling with Architecture Search (RUAS), where a cooperative architecture search was used to discover low-light prior architectures from a compact search space, and reference-free losses were used to train the network. Li et al. [44] presented a progressive-recursive image enhancement network (PRIEN) that uses a recursive unit to progressively enhance the input image. Zhang et al. [45] proposed dynamic fields to learn and make inferences from a single image, and then enforce temporal consistency.
Fu et al. [46] suggested a novel unsupervised low-light image enhancement network (LE-GAN) based on generative adversarial networks using unpaired low-light/normal-light images for training. Zhao et al. [47] proposed a unified deep zero-reference framework termed RetinexDIP for enhancing low-light images; however, noise was not considered in the decomposition process. Liu et al. [48] proposed the Retinex-based fast algorithm (RBFA) to achieve low-light image enhancement. Liang et al. [49] proposed a low-light image enhancement model based on deep learning. Li et al. [50] presented a low-light image enhancement method based on a deep symmetric encoder-decoder convolutional network. Han et al. [51] proposed a DIP based on a noise-robust super resolution method. Ai and Kwon [52] used attention U-Net for extreme low-light image enhancement. Zhao et al. [53] proposed a multi-path interaction network to improve the quality of the image.
In this paper, we propose a novel RetinexDIP method to enhance images. Noise components are introduced into our network, and three components are generated by the DIP network. The involvement of noise makes the image more real, weakens the coupling relationship between the three components, avoids overfitting, and improves generalization. The illumination map is obtained by iterating and adjusting the input noise, and then the enhanced image is generated based on the Retinex model. Our training process is a zero-reference process and does not require any paired or even unpaired data, which is similar to existing methods (EnlightenGAN [38], CycleGAN [54], Zero-DCE [55]). The novel RetinexDIP method can be applied to various poorly lit environments and has good generalization. The loss function in this paper is composed of four parts: the spatial reconstruction loss, illumination-consistency loss, reflectance loss, and illumination smoothness loss. The experimental results show that the normal light images generated by our method are natural and clear and the method has excellent performance according to both visual observation and objective evaluation indicators. The main contributions of this paper are as follows: 1.
We propose a novel noise-added RetinexDIP method to enhance images.

2.
Three components are generated by the DIP network.

3.
The zero-reference process avoids the risk of overfitting and improves generalization.

4.
The experimental results show that our method significantly outperforms some current state-of-the-art methods.
The rest of the paper is organized as follows. Section 2 details our proposed approach. Section 3 presents the experimental results, and the last section concludes the paper.

Materials and Methods
Given a low-light image S and considering noise, the image can be decomposed into: where S represents the low-light image, R represents the reflectance, I represents the illumination map, N denotes the noise, and represents the dot product operation. Adding hand-crafted priors to components makes the components more coupled. Deep Image Prior (DIP) means that complex prior knowledge does not need to be introduced, as it can be encoded in the structure of the neural network itself [56]. In practical problems, it is difficult to find pairs of low-light and normal images. Therefore, generative models are becoming more and more important.
In this paper, we implement image decomposition based on Retinex theory and generative strategies, taking into account the noise factor. The overall framework of this method is shown in Figure 1. As can be seen from Figure 1, there are three encoder-decoder networks (DIP1, DIP2, and DIP3) in the model. These DIP networks are all convolutional operations. DIP1 is used to generate noise N, and DIP2 and DIP3 are used to generate the reflectance R and the latent illumination I. All three DIP networks use white Gaussian noise as input and obey z 1,2,3 ∼ N(0, σ 2 ), where σ 2 represents the variance of the Gaussian distribution. The noise is obtained via random sampling and has the same size as the image. Overall framework of our method, where z 1,2,3 ∼ N(0, σ 2 ) signify Gaussian noise. DIP1 is used to generate noise N, and DIP2 and DIP3 are used to generate the reflectance R and the latent illumination I. l rec , l i−c , l re f , and l i−s represent reconstruction loss, illumination-consistency loss, reflectance loss, and illumination smoothness loss, respectively. I 0 is the initial illumination, S 0 is the input image, and S is the enhanced image.
To evaluate the quality of the augmented images, the following four types of losses were employed to train our model.
Reconstruction Loss. The reconstruction loss is defined according to the following form: where N is the noise generated by DIP1, denoted by g N , R is the latent reflectance generated by DIP2, denoted by g R , and I is the illumination generated by DIP3, denoted by g I . S 0 is the observed image. Illumination-consistency Loss. As in [47], we also consider the illuminationconsistency loss, which is defined as where I 0 is the initial illumination obtained by for every pixel p. Reflectance Loss. In this paper, the reflectance R is considered, and the total variation (TV) constraint [57] is defined as where ∇ denotes the first-order operator containing a horizontal component ∇ h and a vertical component ∇ v .

Illumination Smoothness Loss.
We also use the illumination reflection gradientweighted TV constraint, defined as where W is the weight matrix. According to the weight strategy in [21], it is set via: where is a small decimal to ensure that the denominator is not 0.
Combining the four losses, we minimize the objective function as follows: arg min where λ 1 , λ 2 , and λ 3 are the balance parameters. The enhanced image S is composed of noise N, reflectance R, and the latent illumination I. or Next, enhancement using only estimated illumination is described. There are two commonly used composition strategies: one is to remove the illumination component, considering the reflectance as the enhancement result, i.e.Ŝ = S/I, and the other is to adjust the illumination and reconstruct the result with the reflectance, i.e.Ŝ =Î R. In this paper, we use a variant of the former strategy, that is,Ŝ = S/Î (refer to [21] for details).
We adjust the illumination distribution of decomposition using the gamma correction I = I γ , where γ is the correction factor. To sum up, the enhanced result is given by: The whole operation process is shown in Algorithm 1.

Experiment
In this section, the experimental parameter settings, public low-light image datasets, and performance metrics are introduced. The results of our approach with different methods are also be discussed.

Performance Criteria
In this paper,we measure the experimental results from visual observations and objective evaluation indicators. The following evaluation indicators were used.
Natural Image Quality Evaluator (NIQE). The inspiration for NIQE is based on constructing a series of features used to measure image quality and using these features to fit a multivariate Gaussian model. In the evaluation process, the distance between the image feature model parameters (to be evaluated) and the pre-established model parameters is used to determine the image quality. A lower NIQE score indicates better preservation of naturalness. For details, refer to [60].
No-reference Image Quality Metric for Contrast Distortion (NIQMC). NIQMC is defined as a simple linear fusion of global and local quality measures [61]. A higher NIQMC score represents better image contrast.

Colorfulness-Based Patch-Based Contrast Quality Index (CPCQI).
CPCQI is a colorbased PCQI metric that evaluates the enhancement effect between input and enhanced output in terms of mean strength, signal strength, and signal structure components [62]. A larger CPCQI value indicates a higher contrast ratio.
The specific process for our method is shown in detail step by step in Figure 2. First, we evaluate the different methods qualitatively. As shown in Figures 3-6, we select local regions and zoom in on them for intuitive comparison with other methods. The following conclusions can be drawn from the observation of Figure 3. The enhancement effect of the NPE, SRIE, and KinD methods is not obvious. The LIME and RetinexDIP methods produce over-enhancement effects in these regions. The processing result of Zero-DCE has unnatural color. Our method yields natural exposure and clear details. In Figure 4, it can be seen that our method enhances the image and the edges are clearly visible. The result of KinD has an unnatural color. By considering Figure 5, it can be seen that the method proposed in this paper does not have the problems of overexposure and artifacts when improving the contrast. From Figure 6, it can also be concluded that our method improves the contrast effectively and maintains the natural color at the same time.     In the following, we compare the proposed method with other methods quantitatively. The red, green, and blue scores represent the top three in the corresponding dataset, respectively. Table 1 presents the NIQE metrics of different methods on the six datasets. Notably, a lower NIQE score indicates better preservation of naturalness. Our method achieves the best results on the MEF and VV datasets and the second-best results on the average of the six datasets and LIME. Table 2 presents the NIQMC metrics of the different methods on the six datasets. A higher NIQMC score represents better image contrast. Our method is in the top three for DICM, LIME, MEF, NPE, VV, and the average of the six datasets. Table 3 presents the CPCQI of the different methods on the six datasets. A larger CPCQI value indicates a higher contrast ratio. Our method achieves the best results on DICM, Fusion, NPE, and the average of the six datasets, and also performs well on several other datasets.
As shown in Table 4, the runtimes of different methods were compared. In the experiment, we compare the runtimes of three traditional methods (LIME, NPE, SRIE) and three deep learning methods (KinD, Zero-DCE, RetinexDIP) to that of our method, with eight different input image sizes. Compared with NPE, SRIE, and RetinexDIP, we find that our method is more efficient on high-resolution images. Unlike traditional methods such as NPE and SRIE, the proposed method uses the DIP network to compute reflections and illumination. Benefiting from the convolutional structure, the runtime of the DIP model changes very little as the image resolution grows. Compared with RetinexDIP, the proposed method converges faster and requires less runtime due to the consideration of noise. Compared with Zero-DCE and KinD, our method can also save memory, since Zero-DCE and KinD are pixel-wise methods, while the proposed method is based on Retinex decomposition. The proposed method does not require the actual resolution of the image in the operation, and the memory will not increase significantly with an increase in the image resolution.

Conclusions
In this paper,we propose a novel low-light image enhancement method via Retinex decomposition of denoised Deep Image Prior. Noise is considered in the image decomposition using deep learning generative strategies. As a comparison, we also consider six other methods, i.e., LIME, NPE, SRIE, KinD, Zero-DCE, and RetinexDIP. Extensive experiments demonstrate that our method outperforms existing methods qualitatively and quantitatively. Unlike some other learning-based methods, the method proposed in this paper is a no-reference method, which means that only the input images are required without any extra data. Taking the reflection noise into consideration, our experiments show that the denoised Deep Image Prior can produce images with less noise.
In real scenes, noise always conforms to some scene-dependent distribution such as the Poisson distribution. In future work, other approaches such as normalizing flow will be considered to simulate a more realistic noise distribution than that of DIP.