DEGAN: Decompose-Enhance-GAN Network for Simultaneous Low-Light Image Lightening and Denoising

Zhang, Jialiang; Ji, Ruiwen; Wang, Jingwen; Sun, Hongcheng; Ju, Mingye

doi:10.3390/electronics12143038

Open AccessArticle

DEGAN: Decompose-Enhance-GAN Network for Simultaneous Low-Light Image Lightening and Denoising

by

Jialiang Zhang

¹

,

Ruiwen Ji

²,

Jingwen Wang

¹,

Hongcheng Sun

³

and

Mingye Ju

^3,*

¹

School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210046, China

²

School of Mathematics and Science, Shanghai Normal University, Shanghai 200234, China

³

School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210046, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(14), 3038; https://doi.org/10.3390/electronics12143038

Submission received: 30 May 2023 / Revised: 5 July 2023 / Accepted: 6 July 2023 / Published: 11 July 2023

(This article belongs to the Special Issue Recent Advances in Object Detection and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Images taken in low-light situations frequently have a significant quality reduction. Taking care of these degradation problems in low-light images is essential for raising their visual quality and enhancing high-level visual task performance. However, because of the inherent information loss in dark images, conventional Retinex-based approaches for low-light image enhancement frequently fail to accomplish real denoising. This research introduces DEGANet, a revolutionary deep-learning framework created particularly for improving and denoising low-light images. To overcome these restrictions, DEGANet makes use of the strength of a Generative Adversarial Network (GAN). The Decom-Net, Enhance-Net, and an Adversarial Generative Network (GAN) are three linked subnets that make up our novel Retinex-based DEGANet architecture. The Decom-Net is in charge of separating the reflectance and illumination components from the input low-light image. This decomposition enables Enhance-Net to effectively enhance the illumination component, thereby improving the overall image quality. Due to the complicated noise patterns, fluctuating intensities, and intrinsic information loss in low-light images, denoising them presents a significant challenge. By incorporating a GAN into our architecture, DEGANet is able to effectively denoise and smooth the enhanced image as well as retrieve the original data and fill in the gaps, producing an output that is aesthetically beautiful while maintaining key features. Through a comprehensive set of studies, we demonstrate that DEGANet exceeds current state-of-the-art methods in both terms of image enhancement and denoising quality.

Keywords:

retinex theory; adversarial generative network (GAN); low-light image enhancement; image processing

1. Introduction

The quality of an image can be significantly impacted by insufficient illumination during image capture, leading to various issues such as reduced contrast and limited visibility. This may ultimately make it more difficult to accomplish complex visual tasks like semantic segmentation [1], object detection, image recognition, etc. However, by enhancing low-light images, exceptional and sharp results can be achieved, greatly improving the efficiency of these advanced visual tasks. This enhancement can, in turn, enhance the performance of intelligent systems in real-world applications, such as the performance and reliability of autonomous vehicles and systems that rely on visual information for navigation, etc. So, the importance of low-light image enhancement cannot be overstated.

Several techniques, including [1,2,3,4,5,6,7], have been developed during the past several decades to improve photographs taken in low-light situations. These methods [8,9] produced upgraded pictures with greater visual quality and have made substantial progress in enhancing image contrast. Furthermore, noise is a significant degrading element in low-light photos. Numerous approaches [8,9,10] have integrated supplementary denoising techniques, including methods involving preparation or refinement. But these techniques may result in blurring or loss of original pixel information and noise amplification [11].

We present DEGANet, a novel method for low-light image enhancement that makes use of the spatial and the frequency information [8] to create enhanced pictures with good visual quality, in order to overcome these difficulties. For improving low-light images, conventional Retinex-based techniques [4,5,7,8,12,13,14,15] have been frequently employed. By breaking a picture down into a reflectance component and an illumination component, these techniques enable the independent modification of each. Although Retinex-based approaches have achieved some success, they often face challenges in achieving true denoising and restoring lost information in dark images due to the inherent information loss. This limitation highlights the need for a more robust and effective solution that can address these challenges.

To move beyond the drawbacks of conventional Retinex-based techniques, DEGANet makes use of the strength of Generative Adversarial Network (GAN) [16]. Three linked subnets make up our suggested architecture: the Decom-Net, Enhance-Net, and an Adversarial Generative Network (GAN). Together, these subnets allow for considerable increases in low-light image quality. The Decom-Net is in charge of separating the reflectance component and illumination component from the input low-light picture. This decomposition is a crucial step, as it enables the Enhance-Net to effectively augment the illumination component. By doing so, the overall image quality is significantly enhanced. Denoising low-light images pose a substantial challenge due to the presence of complex noise patterns and non-uniform intensity distributions caused by insufficient light and the inherent information loss in dark images. To address these challenges, we integrate a GAN into our framework. The GAN not only efficiently denoises the enhanced image [10], but also recovers the original information and supplements the missing details. This results in a visually pleasing output with preserved important details.

The rest of this paper is structured as follows. Section 2 briefly examines the pertinent studies on low-light improvement approaches and GAN denoising. In Section 3, we present the architecture of the proposed DEGANet. Additionally, we discuss the settings of the loss function. The experimental findings are presented in Section 4, and some closing notes are provided in Section 5.

2. Related work

2.1. Low-Light Image Enhancement Methods

Several strategies for improving the contrast of photographs with low illumination have been developed during the last several decades. These strategies can be broadly divided into two categories: traditional methods and learning-based methods. Traditional methods, including histogram equalization and the Retinex hypothesis, have been the mainstay of contrast enhancement techniques for years.

Histogram equalization, also known as BBHE in [17], is a simple but efficient approach for increasing contrast by modifying the image’s histogram. It has been widely used for its simplicity and effectiveness in several contrast enhancement scenarios. However, one drawback of this method is that it does not consider the visual perception and the contextual information of the images, which may lead to over-enhancement and under-enhancement in some areas of the image.

Retinex theory, as another traditional method, has a broader scope and aims to provide more natural enhancement by separating an image’s illumination and reflectance components to increase its visual appeal. In the Retinex theory of image enhancement, the illumination component alludes to the general lighting circumstances, and the reflectance component to the inherent qualities of objects. The contrast can be increased by modifying the illumination component’s pixel dynamic range. Yet, the application of Retinex theory is often challenging due to the difficulty in accurately estimating the illumination component.

Inspired by the Retinex theory, numerous approaches have been presented, including MSRCR [12], SRIE [7], MF [15], BIMEF [18], and others. Although these methods enhance image contrast and provide visually pleasing results, they may sometimes fail to handle images with severe illumination variations or noise.

On the other hand, with the advancement of computational power and the availability of large-scale datasets, learning-based models using deep learning techniques like CNN and GAN have been developed, which have achieved remarkable results in many computer vision tasks including low-light image enhancement. Chen et al. proposed EnlightenGAN [2], an effective unsupervised generative adversarial network for low-light image enhancement. Guo-Dong Fan et al. introduced a low-light image enhancement model Mllen-IC [19], leveraging multiscale features and an illumination constraint. These learning-based models, such as MSR-Net [20] and RetinexNet [9], have taken advantage of the power of deep learning to better handle complex lighting conditions and noise. However, these models still have limitations. For instance, they heavily rely on the spatial information of low-light images and might fail to fully utilize the frequency information, which could lead to insufficient enhancement and the loss of details.

2.2. Adversarial Generative Network Denoising Methods

In recent years, Generative Adversarial Networks (GANs) [16] have attracted significant attention due to their excellent performance in various image restoration tasks, including denoising [10]. GANs represent a shift from earlier techniques such as K-SVD [21], GMM [22], and K-LLD [23], which traditionally rely on manual feature extraction or separate denoising steps.

GANs comprise two primary components: a generator network and a discriminator network. The generator aims to produce realistic images, while the discriminator strives to distinguish between authentic and generated images. As the training progresses, the two networks continually compete, resulting in the generator eventually producing high-quality images nearly indistinguishable from the real ones.

In the field of denoising, GANs can effectively reduce the noise in images while preserving the crucial details and structure [10]. The generator network generates a denoised version from a noisy input image, while the discriminator contrasts the denoised image with the original, noise-free version. The goal is to minimize the discrepancy between the original noise-free image and the denoised one. However, the choice of loss functions and the balance between the generator and discriminator can impact the denoising performance of GAN, presenting a challenge in the practical application of these models.

A popular approach to utilizing GANs for denoising tasks is to integrate them into an end-to-end trainable framework that features an encoder–decoder architecture as the generator network. The encoder extracts key details and structure from the noisy input image, and the decoder reconstructs the denoised image from the encoded representation. Then, the discriminator network compares the denoised image with the original, noise-free image, thereby offering guidance for the generator network’s optimization.

GANs bring numerous advantages for denoising tasks. Firstly, they can generate high-quality denoised images that retain crucial details and structures while effectively eliminating noise, thereby outperforming conventional denoising techniques. Secondly, GAN can be trained on a broad array of noise types and levels, providing adaptability for various denoising tasks. Thirdly, GAN can handle complex noise patterns and distributions, effectively mitigating issues such as information loss and artifacts that frequently arise in traditional denoising techniques [21,22,23]. Lastly, the end-to-end training process obviates the need for hand-crafted features or separate denoising steps, streamlining the entire procedure.

Incorporating the strengths of GANs, we propose DEGANet for denoising tasks, and delivering high-quality, adaptable, and robust solutions. Our network leverages the remarkable capabilities of GANs to denoise images effectively while preserving critical details and structures, establishing itself as a valuable tool for image enhancement and restoration.

3. Proposed Method

In this section, we present the architecture of the proposed DEGANet. Additionally, we discuss the settings of the loss function.

3.1. DEGAN Network Architecture

In our work, we aim to address these issues by proposing DEGANet, a unique deep-learning framework that combines the Retinex theory with sophisticated deep-learning algorithms, notably utilizing spatial and frequency information from poorly lit pictures. The DEGANet architecture is comprised of three linked subnetworks: Decom-Net, Enhance-Net, and an Adversarial Generative Network (GAN). The Enhance-Net may successfully supplement the illumination component by decomposing an input low-light picture into a reflectance component and an illumination component using the Decom-Net. By including a GAN in the architecture, DEGANet can effectively denoise and improve the enhanced picture while retrieving the original information and supplementing missing data. These subnets collaborate to separate the input poorly illuminated image into its illumination component and reflectance component, improve the illumination component, then denoise and improve the enhanced image, producing high-quality, aesthetically appealing output images. In terms of picture augmentation and denoising quality, our technique outperforms current state-of-the-art methods, making it appropriate for a wide range of real-time applications.

As shown in Figure 1, our DEGANet completed the intermediate process including the illumination and reflectance components from Decom-Net, the enhanced illumination component from Enhance-Net, and the denoised component produced by GAN. As shown in Figure 2, a low-light image is initially processed through the Decom-Net as input

S_{l o w}

, resulting in the division of the image into two distinct components: the illumination map

I_{l o w}

and the reflectance map

R_{l o w}

. The illumination component

I_{l o w}

encapsulates the estimated lighting conditions within the image, while the reflectance component

R_{l o w}

preserves the fundamental scene information. Subsequently, these two elements, the illumination component

I_{l o w}

, and the reflectance component

R_{l o w}

are supplied as inputs to the Enhance-Net, which is specifically crafted to enhance the lighting conditions of low-light images. The Enhance-Net processes these inputs and yields an enhanced illumination Map

{\hat{I}}_{l o w}

. The aim of the Enhance-Net is to adeptly augment the illumination component, which significantly contributes to the enhancement of visibility and the perceptual quality of low-light images. Upon amalgamating the enhanced illumination component

{\hat{I}}_{l o w}

and reflectance component

R_{l o w}

through element-by-element multiplication, we acquire the output, Enhanced Result

S_{e n h a n c e d}

, which represents the enhanced image exhibiting heightened brightness and improved visual appeal. The GAN Result

S_{d e n o s i e d}

is the final image that has been both enhanced by the Enhance-Net and then denoised by GAN.

Decom-Net: Retinex-based techniques focus on obtaining accurate illumination and reflectance maps, which greatly influence the subsequent enhancement and denoising processes. Consequently, creating an effective network for weakly illuminated image decomposition is crucial. The residual network [24] is extensively employed in a variety of image processing with excellent outcomes. Due to its skip connection structure, residual networks facilitate the optimization of deep neural networks during training, without causing gradient vanishing or explosions. Taking inspiration from its advantages, we enhance the performance of DecomNet by incorporating RM(Residual Modules). Each Residual Modules consists of 5 convolutional layers with kernel sizes of

\{1, 3, 3, 3, 1\}

and kernel quantities of

\{64, 128, 256, 128, 64\}

, respectively. Additionally, we introduce a shortcut connection with a

64 \times 1 \times 1

convolutional layer. A

64 \times 3 \times 3

convolutional layer is also added in each RM’s pre- and post-stages.

The Decom-Net operates on paired images in different lighting conditions (

S_{l o w}

and

S_{n o r m a l}

) simultaneously, learning to decompose both images based on the shared reflectance component principle. In the training phase, there is no need for explicit GT decomposition. Instead, the network incorporates necessary information, such as reflectance consistency and illumination component smoothness, through carefully designed loss metrics. Notably, the illumination component and reflectance component of the standard-light image do not directly contribute to the network’s training process or merely serve as a reference for decomposition. For a detailed description of the DecomNet architecture, please refer to Figure 2.

Enhance-Net: Enhance-Net aims to increase the brightness of the illumination component, resulting in visually pleasing outcomes. Our Enhance-Net comprises two modules: CEM (Contrast Enhancement Module)and DRM (Detail Reconstruction Module). Drawing inspiration from previous endeavors in picture restoration such as [8,25], our Enhance-Net leverages spatial features, particularly frequency features, and achieves significant improvements by utilizing knowledge gained from these initiatives. To mitigate the loss of feature information, the CEM implements multi-scale fusion and concatenation on the outputs of each deconvolution layer in the expanding route. This approach enhances images by utilizing spatial features while minimizing information loss. The Fourier transform is employed to convert digital images from the spatial domain to the frequency domain, treating them as signals. Conversely, the CEM performs multi-scale fusion and concatenation on the outputs of each deconvolution layer in the expanding route to preserve spatial information for contrast enhancement, thereby reducing feature information loss. This process becomes possible through the application of the inverse Fourier transform.

Therefore, spectral data from the picture may be extracted using the Fourier transform. High-frequency signals indicate quick changes in the picture, such as features or noise, whereas low-frequency signals reflect smooth, steady changes, such as background components. The degraded version may be used to reconstruct a crisp picture and gather more data by amplifying the high-frequency signals.

The Fourier transform produces a matrix with the same dimensions as the original picture. The frequency domain data of the picture are represented by the points in the matrix. Each point is a complex number

A + j B

, where the amplitude is represented by the modulus

\sqrt{A_{2} + B_{2}}

and the phase angle by

arctan \frac{A}{B}

. We build our DRM using the complicated convolution approach suggested in [26], which makes use of the frequency domain data for detailed reconstruction.

As a result, the amplitude and phase data can be integrated into the frequency domain. One FIP (Frequency Information Processing) block and two SFSC (Spatial-Frequency-Spatial Conversion) blocks make up the DRM. Consolidating information flow between the frequency and spatial domains is the primary goal of the SFSC blocks. Before undergoing the fast Fourier transform for frequency domain conversion, the first Resblock analyses the features in the spatial domain. The complex Resblock handles the data in the frequency domain, facilitating seamless information transfer between different domains. The FIP block functions as a high-pass filter for fine detail reconstruction by boosting the image’s edge outlines. The fast Fourier transform is used to translate the input image directly into the frequency domain to produce the image-level signal, while the SFSC block produces the feature-level signal. When the outcomes of DRM and CEM are combined, the improved illumination component is created. It is crucial to remember that both DRM and CEM have 64 output channels. To minimize dimensionality, we use a

3 \times 3

convolution layer and a

1 \times 1

convolution layer. The architecture of Enhance-Net is shown in Figure 3.

A low-light image is initially processed through the Decom-Net, resulting in the division of the image into two distinct components: the illumination component (

I_{l o w}

) and the reflectance component (

R_{l o w}

). The illumination component encapsulates the estimated lighting conditions within the image, while the reflectance component preserves the fundamental scene information. Subsequently, these two elements,

I_{l o w}

and

R_{l o w}

, are supplied as inputs to the Enhance-Net, which is specifically crafted to enhance the lighting conditions of low-light images. The Enhance-Net processes these inputs and yields an enhanced illumination component (

{\hat{I}}_{l o w}

). The aim of the Enhance-Net is to adeptly augment the illumination component, which significantly contributes to the enhancement of visibility and perceptual quality of low-light images. Upon amalgamating the

{\hat{I}}_{l o w}

and

R_{l o w}

through element-by-element multiplication, we acquire the final output,

S_{e n h a n c e d}

, which represents the enhanced image exhibiting heightened brightness and improved visual appeal. Its formula is as follows:

S_{e n h a n c e d} = R_{l o w} \circ {\hat{I}}_{l o w}

(1)

Figure 1e displays the enhanced result. Meanwhile, the Generative Adversarial Network (GAN) generates the denoised result, complementing the enhanced image and ensuring that the final output is visually appealing while preserving important information from the original low-light image.

Adversarial Generative Network: In this GAN-based denoising framework, the goal is to train a generator that can produce high-quality denoised images while being adversarially challenged by a discriminator.

The training process involves iteratively updating both the generator and the discriminator networks using a combination of losses to ensure realistic image generation and accurate denoising. The generator makes use of a U-Net architecture, which is well-suited for image-to-image translation jobs and is capable of maintaining spatial information and fine features. The generator is optimized throughout the training phase using a mixture of MSE (mean squared error) loss, BCE (Binary Cross-Entropy) loss, and SSIM (Structural Similarity Index Measure [27]) loss. The BCE loss encourages the discriminator to classify generated images as genuine, the SSIM loss ensures structural similarity between generated and ground-truth images, and the MSE loss concentrates on pixel-level accuracy. CNNs (Convolutional Neural Networks) are used in the discriminator’s construction to discriminate between genuine and fake images. It is trained with the BCE loss to discriminate between genuine denoised pictures and the generator’s outputs. The training procedure alternates between training the discriminator and training the generator. The discriminator is initially changed in each cycle to better discriminate between genuine and produced pictures. The generator is then trained to generate pictures that are more likely to mislead the discriminator. This adversarial training process continues until the generator produces denoised images that are indistinguishable from the ground truth. The use of learning rate schedulers for both generator and discriminator optimizers ensures that the learning rates are adaptively adjusted during training, which can lead to more stable and efficient convergence. Overall, this GAN-based denoising framework leverages the strengths of the U-Net architecture and the adversarial training process to produce high-quality denoised images that are both visually appealing and structurally accurate. The architecture of GAN is illustrated in Figure 4.

In our GAN model, the discriminator is a deep convolutional neural network meant to discriminate between actual and produced photos. It consists of six convolutional layers, with each layer followed by a leaky ReLU activation function, except for the last layer. The vanishing gradient problem, which frequently affects deep neural networks, is lessened with the use of the leaky ReLU activation function. The network gradually raises the number of channels from the input picture while decreasing the spatial dimensions in the convolutional layers by utilizing a stride of 2 and a kernel size of 4. This design decision enables the network to extract hierarchical characteristics from the input picture effectively while minimizing computational cost. Finally, a sigmoid activation function is used to determine if the input picture is genuine or fabricated. This precise discriminator architecture decision allows the model to properly learn and capture complicated patterns in the input photos while preserving computational efficiency during training.

The generator utilizes a U-Net-based architecture tailored for the task of generating high-quality images. This architecture is specifically chosen due to its proven effectiveness in image-to-image translation tasks and its ability to preserve spatial information throughout the network. The U-Net structure consists of an encoding path that captures hierarchical features and a decoding path that reconstructs the output image while maintaining spatial information. Additionally, skipping connections between corresponding encoding and decoding layers helps retain fine-grained details in the generated images. Considering the specific task, the U-Net architecture is advantageous as it enables the generator to generate high-quality images with preserved spatial information and fine details, making it well-suited for producing realistic outputs in GAN applications.

3.2. Loss Function

The decomposition loss (

ζ_{D c}

), the enhancement loss (

ζ_{E h}

), and the GAN loss (

ζ_{G a n}

) are the three components that make up the whole loss function since they are all learned individually during the training phase.

Decomposition loss:

ζ_{D c - c o n}

represents the content loss, whereas

ζ_{D c - p e r}

represents the perceptual loss. As a content loss, we utilize the L1 loss.

\begin{matrix} ζ_{D c - c o n} = \frac{1}{N} \sum_{i = 1}^{N} |R_{l o w}^{i} \circ I_{l o w}^{i} - S_{l o w}^{i}| + \frac{1}{N} \sum_{i = 1}^{N} |R_{n o r}^{i} \circ I_{n o r}^{i} - S_{n o r}^{i}| \\ + λ_{1} \frac{1}{N} \sum_{i = 1}^{N} |R_{n o r}^{i} \circ I_{l o w}^{i} - S_{l o w}^{i}| + λ_{2} \frac{1}{N} \sum_{i = 1}^{N} |R_{l o w}^{i} \circ I_{n o r}^{i} - S_{n o r}^{i}| \end{matrix}

(2)

In all the formulas, the index

i \in [1, N]

represents the ith sample used to compute the loss. The N denotes the total number of samples.

R_{l o w}

and

I_{l o w}

, respectively, represent the illumination component and the reflectance component of a low-light image.

I_{n o r}

and

S_{n o r}

represents the illumination component and the reflectance component of a normal-light image. The illumination component encapsulates the estimated lighting conditions within the image, while the reflectance component preserves the fundamental scene information. A pre-trained VGG16 model’s features are used to determine the perceptual loss. In contrast to other methods, we employ features that were retrieved before the activation layer, as seen below.

ζ_{D c - p e r} = \frac{1}{N} \sum_{i = 1}^{N} (\frac{1}{C_{j} H_{j} W_{j}} ∥ ϕ_{j} (R_{l o w}^{i} \circ I_{l o w}^{i}) - ϕ_{j} (S_{l o w}^{i}) ∥ + \frac{1}{C_{j} H_{j} W_{j}} ∥ ϕ_{j} (R_{n o r}^{i} \circ I_{n o r}^{i}) - ϕ_{j} (S_{n o r}^{i}) ∥)

(3)

The VGG16 pre-trained model’s jth layer is indicated by the index j. The default setting for j is 4. In this formula, C (Channels) represents the number of color channels in the image, with the default assumption of 3 for RGB images. H (Height) represents the vertical dimension of the image, typically measured in pixels. W (Width) represents the horizontal dimension of the image, also measured in pixels. Therefore,

C_{j} H_{j} W_{j}

denotes the total number of elements in the feature map of the jth layer of the VGG16 model. It is calculated by multiplying the number of channels, height, and width of that layer’s feature map. In the formula, this value is used to normalize the result of the perceptual loss. The decomposition loss

ζ_{D c}

is expressed as follows:

ζ_{D c} = λ_{3} ζ_{D c_{c} o n} + λ_{4} ζ_{D c_{p} e r}

(4)

Enhancement loss: Content loss, perceptual loss, and detail preservation loss are all types of augment loss. We create content loss and perceptual loss using the same method as decomposition loss as follows:

ζ_{E h - c o n} = \frac{1}{N} \sum_{i = 1}^{N} |S_{l o w}^{i} - S_{n o r}^{i}|

(5)

ζ_{E h - p e r} = \frac{1}{C_{j} H_{j} W_{j}} {∥ ϕ_{j} (S_{l o w} - ϕ_{j} (S_{n o r})) ∥}_{2}^{2}

(6)

Furthermore, to facilitate the recovery of additional details, we incorporate a frequency loss in R2Rnet [8] into Ehance-Net, leveraging the frequency information utilized within the network. The Wasserstein distance is used to reduce the disparities between the real and imaginary portions of enhanced and low-light images. The expression for the frequency loss is as follows:

ζ_{E h - f r e} = \frac{1}{N^{2}} \sum_{k = r e a l}^{i m a g} inf_{r \sim \prod (S_{l o w}^{k}, S_{n o r}^{k})} E_{(x, y) \sim [∥ S_{l o w}^{k} - S_{l o w}^{k} ∥]}

(7)

In this equation, N represents the total number of samples. k represents the dimension in the frequency domain and can be either real or imaginary. The symbol inf denotes the minimum value, while

\prod (S_{l o w}^{k}, S_{n o r}^{k})

represents the joint distribution composed of frequency components from low-light and normal-light images.

E_{(x, y) \sim [| S_{l o w}^{k} - S_{l o w}^{k} |]}

represents the expectation of

(x, y)

under the constraint of differences in frequency components. The enhancement loss is formulated as

ζ_{E h} = ζ_{E h_{c o n}} + λ_{5} ζ_{E h_{p} e r} + λ_{6} ζ_{E h - f r e}

(8)

GAN loss: The GAN loss plays a crucial role in the performance of generative adversarial networks, encompassing two main components that work synergistically to optimize the model’s outcomes. These components include the discriminator loss

ζ_{G A N - d i}

, which is responsible for the accurate classification of real and fake images, and the generator loss

ζ_{G A N - g e}

, which focuses on producing images that closely resemble real ones while maintaining structural and pixel-level accuracy. By carefully balancing these two components, the GAN achieves its goal of generating high-quality, realistic denoising images.

The loss for classifying real images and the loss for classifying fake images make up the discriminator loss. For both portions, we use the binary cross-entropy loss (BCELoss), as follows:

ζ_{G A N - d i} = B C E_{l o s s} (D (y_{g e n}), y_{r e a l}) + B C E_{l o s s} (D (y_{g e n}), y_{f a k e})

(9)

In this equation,

y_{r e a l}

denotes the label for real images,

y_{g e n}

denotes the generated images,

y_{f a k e}

denotes the label for fake images, and

D (y_{g e n})

represents the classification result of the discriminator for the generated image.

The SSIM loss, the MSE loss, and the BCE loss make up the three components of the generator loss. The MSE loss maintains the pixel-level accuracy whereas the SSIM loss aims to keep the output image architecturally similar to the original. The discriminator classifies the final picture as real as a consequence of the BCE loss. The specific formulas for the three generator loss components are as follows:

SSIM loss (Structural Similarity Index Measure Loss):

S S I M_{l o s s} (x, y) = 1 - S S I M (x, y)

(10)

In this instance, the produced and reference pictures are denoted by x and y, respectively. The structural similarity of the two pictures is compared using the SSIM.

MSE loss (Mean Squared Error Loss):

M S E_{l o s s} (x, y) = \frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - y_{i})}^{2}

(11)

N stands for the total number of pixels, and the pixel values of the reference image and the produced image are denoted by

x_{i}

and

y_{i}

, respectively. The MSE loss calculates the mean squared error between the output image and the reference image.

BCE loss (Binary Cross-Entropy Loss):

B C E_{l o s s} (y, t) = - \frac{1}{N} \sum_{i = 1}^{N} [t_{i} log (y_{i}) + (1 - t_{i}) log (1 - y_{i})]

(12)

Here, the target labels (real or fake) are represented by t and the predicted probabilities of the discriminator are represented by y. The BCE loss calculates the binary cross-entropy between the target labels and the anticipated probabilities. We formulate the generator loss as follows:

ζ_{G A N - g e} = λ_{7} \cdot S S I M_{l o s s} (y_{g e n}, y_{r e a l}) + λ_{8} \cdot M S E_{l o s s} (y_{g e n}, y_{r e a l}) + λ_{9} \cdot B C E_{l o s s} (D (y_{g e n}), y_{r e a l})

(13)

The weights for the SSIM loss, MSE loss, and BCE loss in this equation are

λ_{7}

,

λ_{8}

, and

λ_{9}

, respectively.

4. Experiments

4.1. Implementation Details

Our approach is implemented using PyTorch and runs on hardware with an NVIDIA GTX3080Ti GPU and an Intel Core i9-12900H CPU. We conduct 20 training iterations using the LSRW dataset, achieving rapid convergence and obtaining the best results. The Adam optimizer [28] is employed with a learning rate of

l r = 10^{- 3}

,

β_{1} = 0.9

, and

β_{2} = 0.999

. The LSRW [8] dataset we selected consists of paired low/normal-light images captured in real scenes. It contains 5650 paired images captured using a Nikon D7500 camera and a mobile phone. We chose the LSRW dataset because it encompasses a diverse range of scenes and lighting conditions, providing a comprehensive evaluation of our method’s performance. Additionally, the dataset exhibits high image quality, which further enhances the performance of our method. In Equations (2), (4), (8) and (13), we measure the contribution of each element by using

λ_{1 - 9}

, which are set as [0.01, 0.01, 0.1, 0.1, 0.01, 0.01, 0.1, 0.1, 0.1]. Regarding the choice of decimal places, we consider two factors. First, we want to ensure sufficient precision to capture small performance variations. Second, we need to consider computational costs, as excessive decimal places can increase computational complexity. Therefore, we choose to retain two decimal places to strike a balance between precision and computational efficiency. To determine the optimal coefficients, we perform 11 experiments for each parameter, starting from 0.00 and incrementing by 0.01. Our system accepts low-light images as input, that is, pictures taken in very dark environments. In our research, we define “low-light” images as those with an average pixel intensity below or close to 50 (on a scale from 0 to 255) across all RGB channels. This threshold was set based on used datasets, our preliminary investigations, and analysis of various low-light conditions. It can adapt to various image resolutions. We have tested our system on a range of image sizes from 640 × 480 to 2304 × 1728 pixels, and it has proven effective in enhancing and denoising these images. DEGANet is designed to process low-light images, which generally have low contrast levels and color saturation. We evaluate the performance of the network for each experiment. This fine-tuning process ensures that the proposed network achieves the best balance and performance in improving low-light images. In order to evaluate the performance of our method, we used multiple measurement metrics. These metrics include Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and No-Reference Image Quality Evaluation (NIQE [29]), among others. These metrics provide quantitative evaluations of image quality, structural similarity, and perceptual quality, allowing for a comprehensive assessment of the method’s effectiveness.

4.2. Ablation Study

In this section, we tested the efficiency of different components of DEGAN, including modules and some loss functions. After deleting these elements, we used PSNR and SSIM measures to evaluate the resultant images’ quality. The results are shown in Table 1. Figure 5 shows the visual comparison of the ablation study.

We performed trials by independently deleting the GAN, perceptual loss, frequency loss, SSIM loss, and MSE loss in order to examine the effects of various module and loss function settings. Performance suffered when the perceptual loss, frequency loss, SSIM loss, or MSE loss were eliminated. From Figure 5, it can be observed that after removing the perceptual loss, the image details become noticeably blurry. The model’s capacity to successfully restore colors is impacted by a large reduction in visual contrast brought on by the elimination of the frequency loss. Moreover, the removal of the SSIM loss exhibits a trend of image degradation. On the other hand, the MSE loss effectively restores pixel-level information. In Figure 5e, it is evident that removing the MSE loss leads to increased blurriness and noticeable degradation and distortion in the image. Finally, when the GAN module is omitted, the enhanced image, despite going through the Decom-Net and Enhance-Net, suffers from poor recovery due to the loss of color information and the presence of significant noise in low-light images. The role of our GAN module is precisely to denoise and comprehensively improve the image quality. This can also be observed in Table 2, where each module plays a crucial role in the overall performance. Furthermore, the removal of the GAN led to a significant decrease in both PSNR and SSIM metrics. This emphasizes the crucial role played by both components in achieving optimal performance for the model.

4.3. Comparison with State-of-the-Art on the Real Datasets

On the following seven public datasets: LOL [9], LSRM [8], LIME [4], DICM [30], NPE [5], MEF [31], and VV, the suggested approach is contrasted with a number of other current advanced methods (MF [15], NPE, SRIE [7], BIMEF [18], MSRCR [12], LIME, RetinexNet [9], DSLR [32], MBLLEN [6], EnlightenGAN [2], Mllen-IC [19], and Zero-DCE [3]). We use Zero-DCE’s publicly accessible pre-trained models since it uses unpaired data. In our methodology, we do not include any pre-processing steps before the images are input to DEGANet. We feed the raw low-light images directly into the system. To ensure the best results, we normalize the input data, which is standard practice in deep learning. The results are shown in Table 3 show how well our technique performed on the LOL dataset. The best FSIM score is 0.955 dB, while DEGANet receives a PSNR value of 19.907 dB. Figure 6 shows the visual comparison of LOL and LSRW datasets. Traditional method NPE causes under-enhancement, while Retinex-based methods (e.g., LIME, RetinexNet) blur details or amplify noise. Our method enhances contrast, preserves details, and suppresses noise. Despite potential cross-domain issues with learning-based methods, our method performs optimally on LIME and NPE datasets and comparably on others. Visual comparisons are in Figure 7 on the LIME, DICM [30], and MEF [31] dataset.

Due to the inclusion of only low-light images in certain datasets such as LIME, NPE, DICM, MEF, and VV, evaluation methods that rely on paired datasets like SSIM and PSNR cannot be used. Therefore, we utilized the NIQE metric for evaluating non-paired datasets. The corresponding results are given in Table 4. As we can observe, EnlightenGAN provided the highest image quality (lowest NIQE score) on the MEF and VV datasets, with scores of 3.577 and 2.582, respectively. Our method excelled on the LIME dataset, delivering a score of 3.333.

We also considered specific low-light image quality measures, among which we chose NLIEE [33]. NLIEE is a proposed metric that evaluates the effect of low-light image enhancement algorithms (LIEAs) by extracting 18 features from light enhancement, color comparison, noise measurement, and structure evaluation aspects. On the LIME and the NPE dataset, our method outperformed the others, scoring 55.402 and 50.664. We tested different methods and different datasets, calculating their NLIEE scores, as shown in Table 5.

4.4. Preprocessing for Improving Face Detection

In recent times, image enhancement has gained significant importance as a preliminary approach for improving the performance of high-level vision tasks, such as dark face segmentation techniques [34]. We investigate the impact of light enhancement on the DARK FACE dataset, which was designed specifically for the face identification job under low-light situations. To validate the excellent performance of DEGAN in facial detection preprocessing tasks, we employed RetinaFace [35] as the facial detection tool. We next conduct trials using RetinexNet [9] and MBLLEN [6]. Figure 8 shows visual examples of the face detection findings. Table 6 shows the precision of the face detection results. The results show that our method greatly improves face recognition accuracy when compared to when no pre-processing is used, and it also outperforms MBLLEN.

The relevance of image enhancement as a pre-processing mechanism for the advancement of high-level vision tasks has been growing lately [34]. We delve into the effects of light enhancement on the DARK FACE dataset, purposefully constructed for the face detection task under low-light conditions. The DARK FACE dataset contains real-world nighttime low-light photos. We randomly selected 1000 images from the training set, enhanced them using image enhancement techniques, processed them using a cutting-edge pre-trained face detection technique: RetinaFace [35], and finally validated the accuracy using corresponding labels. We further run experiments employing RetinexNet [9] and MBLLEN [6]. Visual examples of the face detection results are depicted in Figure 8. The precision of the face detection outcomes is represented in Table 6. The findings confirm that our approach significantly boosts the accuracy of face recognition compared to when pre-processing is not applied, and it also surpasses the accuracy achieved by MBLLEN.

4.5. Limitations and Contributions

Our DEGANet model relies on supervised learning, which necessitates a large quantity of labeled data for training. Obtaining high-quality low-light images and their corresponding noise-free clear images for training can be challenging. Additionally, our model does not perform well when presented with special cases such as non-uniform illumination and complex light sources in low-light images. As depicted in Figure 9, we can observe that enhanced images tend to be overexposed when the original image contains non-uniform illumination and complex light sources. Consequently, many details are lost, leading to subpar image restoration results.

Next, we will elaborate on the main contributions of our proposed method. DEGANet, a revolutionary deep-learning framework, is proposed for improving and denoising low-light images. DEGANet overcomes the constraints of existing Retinex-based approaches by using the power of a Generative Adversarial Network (GAN), delivering higher performance in both picture improvement and denoising quality. Decom-Net, Enhance-Net, and Adversarial Generative Network (GAN) are three linked subnets in our revolutionary Retinex-based DEGANet architecture. These subnets collaborate to dissect low-light images, enhance the illumination component, and denoise the enhanced images, resulting in aesthetically pleasant output with crucial features intact. Based on the experimental results obtained from publicly available datasets, it has been observed that our strategy surpasses the performance of current cutting-edge methods. Our approach yields remarkable outcomes in terms of brightness, information recovery, and noise reduction. This enhanced performance holds the potential to greatly enhance the effectiveness of intricate visual tasks such as face detection across numerous real-world applications.

5. Conclusions

In this paper, a novel deep-learning framework called DEGANet is introduced to address the degradation issues in low-light images, which impact their visual quality and hinder high-level visual tasks. Traditional Retinex-based methods struggle with true denoising due to inherent information loss in dark images. DEGANet leverages a Generative Adversarial Network (GAN) to overcome these limitations and significantly improve image quality. The architecture consists of three interconnected subnets: Decom-Net, Enhance-Net, and an Adversarial Generative Network (GAN). Decom-Net decomposes the input low-light image into reflectance and illumination components, allowing Enhance-Net to augment the illumination component effectively. By integrating a GAN into the framework, DEGANet efficiently denoises the enhanced image, recovers the original information, and supplements the missing details. Comprehensive experiments show the superior performance of DEGANet compared to existing state-of-the-art techniques in terms of both image enhancement and denoising quality.

Although our proposed DEGANet is promising, future works towards further improving the performance may focus on the following directions: (1) future work could explore its use in tasks such as image inpainting, super-resolution, and image synthesis; (2) future research can explore other architectures, like transformer-based models or capsule networks, which might provide additional improvements in denoising performance. Further research and discussion on non-uniform illumination are warranted; (3) other promising strategies, such as attention mechanisms or the integration of other generative models, could also be incorporated into the framework to potentially boost the network’s capabilities further.

Author Contributions

Conceptualization, M.J.; Methodology, J.Z.; Software, J.Z.; Validation, R.J.; Formal analysis, R.J.; Investigation, H.S.; Data curation, H.S.; Writing—original draft, J.W.; Writing—review & editing, M.J.; Visualization, J.W. and H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61902198; in part by the Natural Science Foundation of Jiangsu Province under Grant BK20190730; in part by the Research Foundation of Nanjing University of Posts and Telecommunications under Grant NY219135; and in part by the Key Laboratory of Radar Imaging and Microwave Photonics, Ministry of Education, Nanjing University of Aeronautics and Astronautics.

Institutional Review Board Statement

Did not require ethical approval.

Informed Consent Statement

The study did not involve humans.

Data Availability Statement

Due to privacy concerns, it is inconvenient to provide.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this paper:

DEGAN	Decompose-Enhance-GAN Network
GAN	Generative Adversarial Network
RM	Residual Modules
GT	Ground truth
CEM	Contrast Enhancement Module
DRM	Detail Reconstruction Module
FIP	Frequency Information Processing
SFSC	Spatial–Frequency–Spatial Conversion
MSE	Mean Squared Error
BCE	Binary Coss-Entropy
SSIM	Structural Similarity Index Measure
CNNs	Convolutional Neural Networks
PSNR	Peak Signal-to-Noise Ratio
FSIM	Feature SIMilarity Index
NIQE	Natural Image Quality Evaluator

References

Ke, X.; Lin, W.; Chen, G.; Chen, Q.; Qi, X.; Ma, J. EDLLIE-Net: Enhanced Deep Convolutional Networks for Low-Light Image Enhancement. In Proceedings of the 2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC), Beijing, China, 10–12 July 2020; pp. 59–68. [Google Scholar] [CrossRef]
Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. EnlightenGAN: Deep Light Enhancement Without Paired Supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef] [PubMed]
Guo, C.; Li, C.; Guo, J.; Loy, C.; Hou, J.; Kwong, S.; Cong, R. Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement. In Proceedings of the 2020 IEEE/CVF International Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1780–1789. [Google Scholar]
Guo, X.; Li, Y.; Ling, H. LIME: Low-Light Image Enhancement via Illumination Map Estimation. IEEE Trans. Image Process. 2017, 26, 982–993. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Zheng, J.; Hu, H.M.; Li, B. Naturalness Preserved Enhancement Algorithm for Non-Uniform Illumination Images. IEEE Trans. Image Process. 2013, 22, 3538–3548. [Google Scholar] [CrossRef] [PubMed]
Feifan, L.; Feng, L.; Wu, J.; Lim, C. MBLLEN: Low-light Image/Video Enhancement Using CNNs. In Proceedings of the British Machine Vision Conference (BMVC), Newcastle, UK, 3–6 September 2018. [Google Scholar]
Cai, J.; Gu, S.; Zhang, L. Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images. IEEE Trans. Image Process. 2018, 27, 2049–2062. [Google Scholar] [CrossRef] [PubMed]
Hai, J.; Xuan, Z.; Yang, R.; Hao, Y.; Zou, F.; Lin, F.; Han, S. R2rnet: Low-light image enhancement via real-low to real-normal network. J. Vis. Commun. Image Represent. 2023, 90, 103712. [Google Scholar] [CrossRef]
Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep Retinex Decomposition for Low-Light Enhancement. In Proceedings of the British Machine Vision Conference, British Machine Vision Association, Scotland, UK, 20–24 November 2018. [Google Scholar]
Lyu, Q.; Guo, M.; Pei, Z. DeGAN: Mixed noise removal via generative adversarial networks. Appl. Soft Comput. 2020, 95, 106478. [Google Scholar] [CrossRef]
Ren, X.; Li, M.; Cheng, W.H.; Liu, J. Joint Enhancement and Denoising Method via Sequential Decomposition. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar] [CrossRef] [Green Version]
Jobson, D.; Rahman, Z.; Woodell, G. A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process. 1997, 6, 965–976. [Google Scholar] [CrossRef] [Green Version]
Ren, X.; Yang, W.; Cheng, W.H.; Liu, J. LR3M: Robust Low-Light Enhancement via Low-Rank Regularized Retinex Model. IEEE Trans. Image Process. 2020, 29, 5862–5876. [Google Scholar] [CrossRef]
Singh, K.; Parihar, A.S. A comparative analysis of illumination estimation based Image Enhancement techniques. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February 2020; pp. 1–5. [Google Scholar] [CrossRef]
Fu, X.; Zeng, D.; Huang, Y.; Liao, Y.; Ding, X.; Paisley, J. A fusion-based enhancing method for weakly illuminated images. Signal Process. 2016, 129, 82–96. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:stat.ML/1406.2661. [Google Scholar] [CrossRef]
Kim, Y.T. Contrast enhancement using brightness preserving bi-histogram equalization. IEEE Trans. Consum. Electron. 1997, 43, 1–8. [Google Scholar] [CrossRef]
Ying, Z.; Li, G.; Gao, W. A Bio-Inspired Multi-Exposure Fusion Framework for Low-light Image Enhancement. arXiv 2017, arXiv:1711.00591. [Google Scholar]
Fan, G.D.; Fan, B.; Gan, M.; Chen, G.Y.; Chen, C.L.P. Multiscale Low-Light Image Enhancement Network With Illumination Constraint. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7403–7417. [Google Scholar] [CrossRef]
Shen, L.; Yue, Z.; Feng, F.; Chen, Q.; Liu, S.; Ma, J. MSR-net: Low-light Image Enhancement Using Deep Convolutional Network. arXiv 2017, arXiv:cs.CV/1711.02488. [Google Scholar]
Scetbon, M.; Elad, M.; Milanfar, P. Deep K-SVD Denoising. arXiv 2020, arXiv:cs.LG/1909.13164. [Google Scholar] [CrossRef]
Parameswaran, S.; Deledalle, C.A.; Denis, L.; Nguyen, T.Q. Accelerating GMM-Based Patch Priors for Image Restoration: Three Ingredients for a 100× Speed-Up. IEEE Trans. Image Process. 2019, 28, 687–698. [Google Scholar] [CrossRef] [Green Version]
Chatterjee, P.; Milanfar, P. Clustering-Based Denoising With Locally Learned Dictionaries. IEEE Trans. Image Process. 2009, 18, 1438–1451. [Google Scholar] [CrossRef] [Green Version]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Zou, W.; Jiang, M.; Zhang, Y.; Chen, L.; Lu, Z.; Wu, Y. SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring. arXiv 2021, arXiv:eess.IV/2110.05803. [Google Scholar]
Trabelsi, C.; Bilaniuk, O.; Zhang, Y.; Serdyuk, D.; Subramanian, S.; Santos, J.F.; Mehri, S.; Rostamzadeh, N.; Bengio, Y.; Pal, C.J. Deep complex networks. arXiv 2017, arXiv:1705.09792. [Google Scholar]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:cs.LG/1412.6980. [Google Scholar]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “Completely Blind” Image Quality Analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
Lee, C.; Lee, C.; Kim, C.S. Contrast Enhancement Based on Layered Difference Representation of 2D Histograms. IEEE Trans. Image Process. 2013, 22, 5372–5384. [Google Scholar] [CrossRef]
Ma, K.; Zeng, K.; Wang, Z. Perceptual Quality Assessment for Multi-Exposure Image Fusion. IEEE Trans. Image Process. 2015, 24, 3345–3356. [Google Scholar] [CrossRef] [PubMed]
Lim, S.; Kim, W. DSLR: Deep Stacked Laplacian Restorer for Low-Light Image Enhancement. IEEE Trans. Multimed. 2021, 23, 4272–4284. [Google Scholar] [CrossRef]
Zhang, Z.; Sun, W.; Min, X.; Zhu, W.; Wang, T.; Lu, W.; Zhai, G. A No-Reference Evaluation Metric for Low-Light Image Enhancement. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
Yang, W.; Yuan, Y.; Ren, W.; Liu, J.; Scheirer, W.J.; Wang, Z.; Zhang, T.; Zhang, Q.; Xie, D.; Pu, S.; et al. Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study. IEEE Trans. Image Process. 2020, 29, 5737–5752. [Google Scholar] [CrossRef]
Serengil, S.I.; Ozpinar, A. LightFace: A Hybrid Deep Face Recognition Framework. In Proceedings of the 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), Istanbul, Turkey, 15–17 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 23–27. [Google Scholar] [CrossRef]

Figure 1. DEGAN produced results for decomposition. The illumination and reflectance components are generated by Decom-Net, the enhanced illumination component by Enhance-Net, and the denoised output by GAN.

Figure 2. DEGANet’s suggested architecture. The three subnets of DEGANet work together to decompose low-light pictures into illumination and reflectance maps, enhance the illumination map, and reduce noise.

Figure 3. The Enhance-Net architecture is suggested. CED and DRM are the two modules that make up the Enhance-Net. While DRM employs frequency features to maintain image details, CEM uses spatial features to improve brightness.

Figure 4. The proposed GAN architecture. A GAN-based denoising framework employing a U-Net architecture in the generator and a deep convolutional neural network in the discriminator, together producing visually appealing and structurally accurate high-quality denoised images.

Figure 5. Ablation study of the contribution of each component.

Figure 6. Visual comparison of the decomposed illumination maps and enhanced results with state-of-the-art low-light image enhancement methods on the LOL and LSRW datasets.

Figure 7. Visual comparison of the decomposed illumination maps and enhanced results with state-of-the-art low-light image enhancement methods on the LIME, DICM [30], and MEF [31] datasets.

Figure 8. Face detection results are shown below. As a pre-processing stage, we employ RenitexNet, MBLLEN, and our DEGAN, followed by RetinaFace recognition.

Figure 9. This demonstrates the limitations of DEGANet in handling non-uniform illumination and overexposure caused by complex light sources.

Table 1. This table displays the findings of the ablation investigation conducted on the LOL dataset. In this table, “w/o” means without.

Conditions	PSNR	SSIM
Default	20.001	0.755
w/o per-loss	19.707	0.691
w/o fre-loss	19.883	0.655
w/o SSIM loss	18.165	0.654
w/o MSE loss	17.761	0.667
w/o GAN	17.561	0.634

Table 2. This table displays the findings of the ablation investigation conducted on the LOL dataset. In this table, “w/o” means without.

Conditions	PSNR (dB)	SSIM
Default	20.001	0.755
w/o per-loss	19.707	0.691
w/o fre-loss	19.883	0.655
w/o SSIM loss	18.165	0.654
w/o MSE loss	17.761	0.667
w/o GAN	17.561	0.634

Table 3. On the LOL dataset, we conducted a quantitative evaluation of low-light image-enhancing algorithms. Underlining is used to emphasize the best outcomes.

Methods	PSNR (dB)	SSIM	FSIM	MAE	GMSD
MF	16.925	0.634	0.926	0.109	0.095
NPE	17.355	0.546	0.875	0.093	0.115
SRIE	11.557	0.531	0.887	0.220	0.103
BIMEF	13.769	0.640	0.907	0.103	0.085
MSRCR	13.964	0.514	0.827	0.046	0.151
LIME	17.267	0.513	0.850	0.097	0.123
RetinexNet	16.013	0.661	0.851	0.071	0.146
DSLR	15.036	0.667	0.883	0.196	0.144
MBLLEN	18.860	0.756	0.904	0.032	0.103
Zero-DCE	14.584	0.610	0.911	0.161	0.087
EnlightenGAN	17.246	0.654	0.923	0.079	0.085
Mllen-IC	16.454	0.677	0.925	0.081	0.101
Ours	20.001	0.755	0.955	0.087	0.087

Table 4. NIQE values on six datasets. Underlining is used to emphasize the best outcomes.

Methods	MEF	LIME	NPE	VV	DICM	ACG
NPE	4.256	3.905	3.403	3.031	2.845	3.488
LIME	4.447	4.155	3.796	2.750	3.001	3.630
RetinexNet	4.408	4.361	3.943	3.816	4.209	4.147
MBLLEN	3.654	4.073	5.000	4.294	3.442	4.063
Zero-DCE	4.024	3.912	3.667	3.217	2.835	3.531
EnlightenGAN	3.577	3.725	4.089	2.582	3.566	3.446
Mllen-IC	4.215	3.631	3.766	3.458	2.933	3.541
Ours	4.174	3.333	3.405	3.076	3.526	3.502

Table 5. NLIEE score on different datasets. Underlining is used to emphasize the best outcomes.

Methods	LOL	LIME	DICM	MEF	NPE	VV
NPE	39.772	45.423	45.334	39.072	37.881	42.758
SRIE	47.354	46.785	47.379	42.347	45.369	43.354
LIME	44.532	49.753	52.347	44.532	43.679	43.743
RetinexNet	37.438	47.377	52.732	46.377	39.735	44.343
MBLLEN	42.395	49.577	53.254	44.879	39.898	45.764
Zero-DCE	45.607	50.3304	51.259	47.803	43.773	45.772
EnlightenGAN	46.806	53.766	52.364	45.806	43.227	49.228
Mllen-IC	43.905	53.477	50.909	46.986	40.007	43.095
Ours	44.447	55.402	50.309	47.602	50.664	44.452

Table 6. The accuracy of the findings of facial detection. As a pre-processing stage, we employ RenitexNet, MBLLEN, and our DEGAN, followed by RetinaFace recognition. “w/o” means without.

Methods	Accuracy (%)
Our	47.62
w/o Pre-processing	31.50
MBLLEN	43.87
Renitex	51.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Ji, R.; Wang, J.; Sun, H.; Ju, M. DEGAN: Decompose-Enhance-GAN Network for Simultaneous Low-Light Image Lightening and Denoising. Electronics 2023, 12, 3038. https://doi.org/10.3390/electronics12143038

AMA Style

Zhang J, Ji R, Wang J, Sun H, Ju M. DEGAN: Decompose-Enhance-GAN Network for Simultaneous Low-Light Image Lightening and Denoising. Electronics. 2023; 12(14):3038. https://doi.org/10.3390/electronics12143038

Chicago/Turabian Style

Zhang, Jialiang, Ruiwen Ji, Jingwen Wang, Hongcheng Sun, and Mingye Ju. 2023. "DEGAN: Decompose-Enhance-GAN Network for Simultaneous Low-Light Image Lightening and Denoising" Electronics 12, no. 14: 3038. https://doi.org/10.3390/electronics12143038

APA Style

Zhang, J., Ji, R., Wang, J., Sun, H., & Ju, M. (2023). DEGAN: Decompose-Enhance-GAN Network for Simultaneous Low-Light Image Lightening and Denoising. Electronics, 12(14), 3038. https://doi.org/10.3390/electronics12143038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DEGAN: Decompose-Enhance-GAN Network for Simultaneous Low-Light Image Lightening and Denoising

Abstract

1. Introduction

2. Related work

2.1. Low-Light Image Enhancement Methods

2.2. Adversarial Generative Network Denoising Methods

3. Proposed Method

3.1. DEGAN Network Architecture

3.2. Loss Function

4. Experiments

4.1. Implementation Details

4.2. Ablation Study

4.3. Comparison with State-of-the-Art on the Real Datasets

4.4. Preprocessing for Improving Face Detection

4.5. Limitations and Contributions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI