Joint Luminance Adjustment and Color Correction for Low-Light Image Enhancement Network

Zhang, Nenghuan; Han, Xiao; Liu, Chenming; Gang, Ruipeng; Ma, Sai; Cao, Yizhen

doi:10.3390/app14146320

Open AccessArticle

Joint Luminance Adjustment and Color Correction for Low-Light Image Enhancement Network

by

Nenghuan Zhang

¹

,

Xiao Han

²,

Chenming Liu

¹,

Ruipeng Gang

¹,

Sai Ma

¹ and

Yizhen Cao

^3,*

¹

Academy of Broadcasting Science, National Radio and Television Administration, Beijing 100866, China

²

School of Information and Communication Engineering, Communication University of China, Beijing 100024, China

³

School of Computer and Cyber Sciences, Communication University of China, Beijing 100024, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 6320; https://doi.org/10.3390/app14146320

Submission received: 15 May 2024 / Revised: 16 July 2024 / Accepted: 16 July 2024 / Published: 19 July 2024

(This article belongs to the Special Issue Recent Advances in Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Most of the existing low-light enhancement research focuses on global illumination enhancement while ignoring the issues of brightness unevenness and color distortion. To address this dilemma, we propose a low-light image enhancement method that can achieve good performance in luminance adjustment and color correction simultaneously. Specifically, the Luminance Adjustment Module is designed to model the global luminance adjustment parameters while taking into account the relationship between global and local illumination features, in order to prevent overexposure or underexposure. Furthermore, we design a Color Correction Module based on the attention mechanism, which utilizes the attention mechanism to capture global color features and correct the color deviation in the illumination-enhanced image. Additionally, we design a color loss function based on a 14-dimensional statistical feature vector related to color, enabling further restoration of the image’s true color. We conduct empirical studies on multiple public low-light datasets, demonstrating that the proposed method outperforms other representative state-of-the-art models regarding illumination enhancement and color correction.

Keywords:

low-light image enhancement; luminance adjustment; color correction; image restoration

1. Introduction

Insufficient illumination, environmental factors, or technical limitations of equipment often result in low-light images. The captured images and videos have problems such as inadequate brightness, low contrast, and color distortion. However, with the progress of society and the continuous evolution of people’s lifestyles, it is necessary to capture images and videos in various scenes with insufficient lighting. For example, people often record their travels at night. Urban surveillance may conduct person searches, vehicle tracking, or evidence collection for illegal and criminal activities, especially when lighting is insufficient or at night. Additionally, when shooting in memorable scenes such as underwater or in mines, most images and videos suffer from inadequate lighting. As one of the primary image quality enhancement issues, low-light image enhancement technology aims to improve the visual quality of such low-light images and restore lost image information. This technology has been widely used in medical image processing, autonomous driving, industrial inspection, and other fields, thereby attracting the attention of researchers. Low-light image enhancement algorithms can be divided into two categories: low-light image enhancement methods based on traditional techniques and deep learning-based low-light image enhancement methods.

Low-light image enhancement methods based on traditional techniques generally include the following approaches: histogram equalization algorithms, transmittance estimation-based algorithms, Retinex decomposition-based algorithms, and fusion-based algorithms. The histogram equalization method increases contrast and highlights some details by expanding the coverage of grayscale values, but excessive enhancement will cause the loss of image details. Local histogram equalization [1] considers the local characteristics of the image and applies histogram equalization to different areas of the image to preserve image details. Dynamic histogram equalization [2] segments the image histogram according to local minima, specifies a specific gray level range for each segmentation, and performs equalization operations separately, reducing additional noise introduced during the contrast enhancement. Enhancing low-light images using algorithms for transmittance estimation is inspired by the image dehazing task [3]. Li et al. refined the transmittance and optimized the atmospheric light value, which solved the noise and edge problems well and provided good theoretical support for low-light image enhancement [4]. Retinex theory decomposes images into reflective and illumination components to achieve a stable perception of color. The single-scale Retinex algorithm [5] generally uses Gaussian filtering to filter out the low-frequency incident components, and then the reflection components can be calculated for subsequent processing. MSR [6] uses multiple scale Gaussian filter operators based on the SSR method and then weights the final result. Based on the ideas of Retinex theory, researchers have also explored solutions combined with other low-light enhancement methods. For example, Guo et al. proposed a low-light illumination estimation algorithm to model illumination, find the maximum value in the three channels of RGB to estimate the illumination of each pixel individually, and then add it to the initial illumination map [7]. A lighting estimation coefficient is used to adjust the lighting of each pixel as the final lighting map. Ren et al. focused on noise and designed a Low-Rank Regularized Retinex Model to add a low-rank prior to the Retinex decomposition process to suppress the noise in the reflection image [8]. This method has led to great improvements in both image and video enhancement.

The above-mentioned traditional techniques generally require many mathematical skills and rigorous mathematical derivation. The derived iterative process is generally complex and is more conducive to practical applications. With the emergence of large-scale datasets, low-light image enhancement based on deep learning has become the current mainstream technology. Supervised image enhancement methods use paired training data and leverage neural networks to learn mapping relationships from low-light images to enhanced images. The deep learning image enhancement method based on Retinex theory combines the classic Retinex image decomposition theory with deep learning methods to learn the mapping from the original image to the Retinex component, in order to achieve the purpose of image enhancement. Wei et al. collected a low-light image dataset called LOL (Low-Light Dataset) and proposed the RetinexNet network on the LOL dataset [9]. The network comprises a Decom-Net for decomposition and an Enhance-Net for illumination adjustment composition. In low-light image enhancement, increasing only the area’s brightness will amplify the image’s hidden artifacts. LLDiffusion proposed a joint learning framework of image generation and image enhancement to learn degraded representations and developed a dynamic diffusion module that simultaneously considered color mapping and potentially degraded representations to guide the diffusion process [10]. The PyDiff algorithm proposes a pyramid diffusion model to enhance the image, sampling in a pyramid by gradually increasing the resolution during the back-propagation process [11]. The PyDiff algorithm uses a global corrector to alleviate the global degradation problem. Unsupervised methods do not require paired datasets, significantly improving research efficiency. Yang et al. proposed a cyclic interactive generative adversarial network for unsupervised low-light image enhancement, which can better convey the illumination distribution between low-light images and real images [12]. The cyclic interaction process can effectively suppress synthetic real noise. Methods based on curve fitting have achieved good enhancement results in low-light image tasks. Guo et al. proposed a lightweight deep network called Zero-DCE, which can iteratively apply approximation pixels and specific high-level Order curves, effectively mapping within a wide dynamic range and enhancing low-light images [13]. Guo et al. proposed the Zero-DCE++ algorithm [14] based on Zero-DCE. Based on the original network structure, depth-separable convolution replaces the traditional convolution, and the input image is downsampled to estimate the curve. Then, upsampling is performed to restore the enhanced image, and the parameter amount and calculation amount of the improved model are significantly reduced, improving the model’s inference speed.

Although there are many methods to improve the quality of low-light images in existing low-light enhancement tasks, the enhanced images still need to be improved compared with standard images. On the one hand, the brightness of the captured image is low or uneven due to insufficient lighting or uneven lighting in the shooting environment. Uneven brightness will lead to significant differences between different areas of the image, with some regions being brighter and some incredibly dark, which poses a more substantial challenge to image brightness adjustment. On the other hand, in low-light conditions, the amount of incident light may not be enough to accurately capture the color and detail of a scene or object, and white balance may be affected, resulting in color deviations. For example, images taken under yellow light may show a yellow color cast. These factors can all lead to desaturated colors, insufficient contrast, and overall color distortion. Finally, the shooting equipment will perform a shutter delay in low-light shooting environments. These can easily cause blurred image content and unclear object edge structures. Reducing content blur while adjusting image brightness is also a research difficulty in low-light image enhancement.

To address the issues of low brightness, uneven brightness, and color distortion in low-light images, this paper proposes a novel network with joint luminance adjustment and color correction for low-light image enhancement, called LACCNet. Considering that inadequate illumination leads to comprehensive quality problems such as low brightness, uneven brightness, and blurry content in images, it is challenging to improve the overall image quality by solving only one problem. Therefore, this paper implements low-light image enhancement from two aspects: brightness adjustment and color correction. The main contributions of this paper are summarized as follows.

First, to address the problem of uneven brightness, we designed a Luminance Adjustment Module. This Luminance Adjustment Module simulates the idea of color space image transformation, models adaptive contrast and brightness adjustment curves, and employs a non-overlapping window attention mechanism to model the relationship between global and local brightness features. This module reduces the extent to which parts of the image become overly bright or dark after enhancement.

Second, to address the color distortion problem in enhanced images, we designed a Color Correction Module. This module leverages the attention mechanism to capture global color features and act on the image features after brightness adjustment. This module can correct color deviations in images to obtain colors that are more accurate and closer to the real ones.

Third, by introducing color-based statistical information as prior knowledge, we also designed a color loss function. The color loss function can further correct the color of the enhanced low-light image and improve the color quality.

2. Related Work

In this section, we discuss existing research work related to luminance adjustment and color correction in low-light image enhancement tasks.

2.1. Luminance Adjustment in Low-Light Image Enhancement

Luminance adjustment is the most important task in low-light image enhancement. Some methods exploit the structural characteristics of the generative network itself to directly enhance image lighting. Given the phenomenon that enhanced images are prone to overexposure or underexposure under different lighting conditions, Wu et al. proposed the Retinex-based deep unfolding network URetinex-Net, which combines an implicit prior regularization model with Retinex theory to effectively suppress noise and retain more feature details [15]. The low-light image enhancement method based on the diffusion model improves the quality of low-light images by simulating light’s diffusion and propagation process. ExposureDiffusion [16] solves this noise problem by integrating the diffusion and physically based exposure models. It proposes an adaptive residual layer using dynamic denoising strategies for areas with different signal-to-noise ratios. Some luminance adjustments are accomplished through adaptive brightness curves. Wen et al. proposed a self-referenced depth adaptive curve estimation method (SelfDACE), which uses a neural network to map each pixel through a depth-adaptive adjustment curve to preserve the local image structure [17]. For more accurate luminance adjustment, some algorithms designed special luminance adjustment components. Zhang et al. were inspired by Retinex and proposed the KinD network [18]. The network consists of two components. One component is used to adjust the image brightness, and the other is used to remove image degradation. Based on KinD, Zhang et al. added a mapping function and proposed the KinD++ algorithm to make the enhanced image more realistic [19].

2.2. Color Correction in Low-Light Image Enhancement

Previous researchers have done some work to address the color distortion problem in low-light image enhancement. For example, the low-light image enhancement algorithm based on the tone mapping algorithm can be closer to the real illumination image by adjusting the pixel color while enhancing the illumination. The tone-mapping algorithm achieves the effect of contrast enhancement by processing pixels and their neighboring pixels. The enhanced image has better-detailed information. MSRCR [20] adds color restoration based on the MSR and can achieve dynamic range compression, color consistency, and brightness correction simultaneously to improve the enhanced effect further. Wang et al. proposed the LLNeRF algorithm to directly synthesize standard images from sRGB low-light images in an unsupervised manner, using the neural radiation field decomposition method to reduce noise and correct distorted color information [21]. In order to solve the problem of unpredictable brightness degradation and noise during low-light image enhancement, Yang et al. proposed NeRCo, an implicit Neural Representation method for Cooperative low-light image enhancement [22]. It robustly recovers perceptual-friendly results in an unsupervised manner and unifies the diverse degradation factors of real-world scenes with a controllable fitting function, leading to better robustness.

3. Method

3.1. Framework

Firstly, we introduce the overall network structure of our proposed LACCNet, as shown in Figure 1. The low-light image enhancement task is essentially an image generation task, so we apply a UNet-like network structure as the base network. The encoder is mainly composed of three Basic Encoder Blocks (BEBs) and 3 Luminance Adjustment Modules (LAMs). BEBs can extract features from the input image. LAM includes one Attention Luminance Adjustment Module (ALAM) and two Convolution Luminance Adjustment Modules (CLAMs). The Luminance Adjustment Module can adaptively adjust the image illumination and obtain luminance-adjusted image feature information. The neck structure consists of two Middle Blocks (MBs) that do not change the shape and size of feature maps. The Decoder Module is mainly responsible for multi-scale decoding of the image features extracted by the encoder and generating enhanced image content. Considering that the content of the input image is blurred, if the features extracted by the encoder are directly added to the decoder features, the content of the generated image will still be blurred. Inspired by LEDNet, we applied the Deblur Decoder Block (DDB), including a content deblur module based on dynamic filters [23], to obtain a brightness-enhanced image with clearer content. There is a residual connection operation between each group of Encoder Module and Decoder Module of the same size, represented by a blue line. Residual connections can both improve the stability of the network and also help the network better learn the nonlinear complex features of images. We also designed a Color Correction Module (CCM), including a Color Feature Extractor Block (CFEB), a Self-Attention Block (SAB), and a Learnable Color Matrix. The global color features of the shallow feature map of the input image are captured through the transformer structure, and color correction is performed before outputting the final enhanced image. In addition, on the basis of content loss and perceptual loss, we also introduce a color loss function to enhance the network’s ability to reconstruct image color information and further improve the visual effect of the enhanced image.

The structure of the Basic Encoder Block, Deblur Decoder Block, and Middle Block is shown in Figure 2. These blocks are mainly composed of the superposition of convolution and activation functions. The residual connection can prevent gradient vanishing and improve network performance. In the Basic Encoder Block, the downsampling of features is performed through convolution with a stride of 2. The Deblur Decoder Block adopts an interpolation operation for feature upsampling. In the Deblur Decoder Block, we use PReLU as the activation function, which can better prevent gradient vanishing problems and improve network stability compared to the ReLU function. Below, we provide a detailed introduction to the design of the Luminance Adjustment Module, Color Correction Module, and loss function in separate sections.

3.2. Luminance Adjustment Module

The goal of the Luminance Adjustment Module is to adjust the brightness of low-light images in order to obtain enhanced images. The principle of brightness adjustment is designed based on the perceptual characteristics of the human eye towards light. The human eye’s perception of light is nonlinear, meaning that the perception effect of the exact brightness change varies under different brightness levels. Therefore, brightness adjustment needs to be achieved using a nonlinear method. We propose a simulation approach for nonlinear color space image transformation, which improves image brightness by modeling adaptive contrast and brightness adjustment parameters. In addition, low-light images have low brightness and exhibit uneven brightness. For images with uneven brightness, if the overall brightness is directly enhanced to the same extent, the enhanced image will have problems of local over-brightness or over-darkness. Adjusting images with uneven brightness to make the overall brightness of the enhanced image more consistent requires increasing the receptive field for learning, in order to obtain more comprehensive brightness distribution information. Therefore, to achieve the consistency goal of brightness adjustment, a convolutional brightness adjustment module is added to the image encoding, completed by convolutional layers and a nonlinear brightness adjustment module. This module can model global brightness adjustment parameters, thereby guiding the model to perform better brightness enhancement. On the other hand, many parts of the image contain light sources or shadows. If the same brightness adjustment operation is applied to all areas, it will cause the image to have local over-brightness or over-darkness problems. We also designed a brightness adjustment module based on non-overlapping window attention [24] to avoid uneven brightness after adjustment. The attention mechanism can weigh and sum the patches of each window so that the model can better focus on particular important regions and details in the image, while also paying attention to the relationship between global and local features related to brightness, capturing global brightness distribution information. Further, we adjust the areas that are too dark or too bright in the local area using the non-overlapping window attention mechanism to model the relationship between global brightness features and local brightness features, thereby reducing the degree of local over-brightness or over-darkness in the enhanced image. The structure of the Luminance Adjustment Module is shown in Figure 3.

As shown in Figure 3, we designed two types of Luminance Adjustment Modules: the Attention Luminance Adjustment Module (ALAM) and the Convolution Luminance Adjustment Module (CLAM). In ALAM, the illumination is adjusted using a Window Attention Block (WAB) based on non-overlapping sliding windows, where q is the learnable query, while k and v are simple copies of the output of the upper layer. WAB can capture contextual information in the image, model the relationship between windows, and capture the relationship between global illumination features and local illumination of the window, thereby obtaining an adaptive illumination-enhanced image. The Nonlinear Luminance Adjustment Block (NLAB) is a component common to ALAM and CLAM. In the NLAB, the image transformation process of gamma brightness correction is simulated, where

γ_{1}

and

γ_{2}

are two learnable matrix parameters. When initialized, the

γ

matrix is an identity matrix with all elements being 1.

3.3. Color Correction Module

Low-light images that suffer from low brightness and uneven brightness are often accompanied by color distortion. The poor lighting condition in the shooting environment leads to insufficient light reaching the camera sensor, ultimately resulting in incomplete or distorted color information in the final captured image. On the other hand, low-light enhancement algorithms usually perform ray compensation on images to improve their quality under low-light conditions. If the light compensation is inappropriate, the image’s color may appear unnatural or inaccurate, exacerbating color distortion. If the brightness issue is solved without addressing color distortion, the enhanced image cannot meet the human visual pursuit of high-quality images. Therefore, while enhancing the brightness of low-light images, color correction is also necessary to improve the overall quality of the enhanced image. We have designed a Color Correction Module based on the attention mechanism, as shown in Figure 4. It captures global color features using the attention mechanism and applies them to the image features after illumination adjustment to correct color deviation and obtain more accurate and realistic colors. Firstly, two shallow feature encoders are used to extract the shallow features of the image through convolutional operations, which helps to reduce computational complexity and obtain global color features of the image in CFEB. It is then followed by an SAB, where the self-attention mechanism is the main operation. In the Self-Attention Block, the query vector q is initialized as an all-0 matrix without requiring additional multi-head attention. Among them, q is a learnable embedding query whose key k and value v are generated by encoding features without adding positional encoding from the original transformer. After passing through a linearly fully connected layer, we added a learnable color matrix as an additional parameter. The color matrix is initialized as a unit matrix of all 1s and finally applied to the enhanced image generated by the decoder for final color correction.

Low-light images exhibit incomplete color information due to insufficient light. Therefore, enhanced images are often accompanied by color distortion. The goal of the color correction algorithm is to improve the color accuracy and to restore the image by correcting the color of the image, so that the image can express the original scene more realistically, clearly, and accurately. The principle of the image color correction algorithm is to correct the color of the image by adjusting the pixel values in the image and changing their color distribution. Compared with the linear algorithm, the nonlinear color correction algorithm has a higher degree of restoration for complex scenes and images with uneven colors. Commonly used nonlinear color correction algorithms include the color correction matrix. The color correction matrix multiplies a 3 × 3 matrix coefficient by the RGB color value of the scene captured by the sensor to obtain a color that is close to what the human eye actually sees. It can be adjusted and optimized according to the color characteristics and needs of the image to achieve better visual effects and user experience. The color correction matrix is shown in Equation (1):

[\begin{matrix} R^{'} \\ G^{'} \\ B^{'} \end{matrix}] = [\begin{matrix} c_{11} & c_{12} & c_{13} \\ c_{21} & c_{22} & c_{23} \\ c_{31} & c_{32} & c_{33} \end{matrix}] [\begin{matrix} R \\ G \\ B \end{matrix}],

(1)

where R, G, and B, respectively, represent the three color channel values of each pixel of the original image.

3.4. Loss Functions

In image quality enhancement, content loss and perceptual loss are the most commonly used loss functions. Content loss focuses on the similarity between the corresponding pixels in the two images, and perceived loss focuses on the semantic similarity of features. Both of these loss functions constrain the overall content of the image. However, many existing low-light enhancement methods often generate enhanced images with issues such as color distortion and weak contrast. Color features are volatile visual features that are easily affected by lighting changes. Inspired by the theory of color constancy, eliminating the influence of lighting on image color and obtaining color attributes on object surfaces that are independent of lighting can provide computer vision systems with a perception function of color constancy similar to that of human vision systems. This article designs a color loss function that helps automatically adjust the image’s contrast to the optimal state while performing color correction. Specifically, we extract color-related features from the image, including color and lighting, to form a 14-dimensional feature vector.

The color moment feature can globally reflect the color distribution of images. Compared with the commonly used color histogram, it does not need to quantify the color and has the advantage of being simple and effective. The first-order moment of the image color distribution is the mean, which reflects the overall sensitivity of the image. The larger the value, the brighter the image. The second-order moment is the standard deviation, which reflects the color distribution range of the image. The larger the value, the more comprehensive the color distribution range. The third moment is the deviation, which is the cube root of the center distance, reflecting the deviation of the image color distribution. In the RGB color space, the calculation formulas of the color moments of an image are presented in Equations (2)–(4):

μ_{i} = \frac{1}{N} \sum_{j = 1}^{N} p_{i, j}

(2)

σ_{i} = {(\frac{1}{N} \sum_{j = 1}^{N} {(p_{i, j} - μ_{i})}^{2})}^{\frac{1}{2}}

(3)

s_{i} = {(\frac{1}{N} \sum_{j = 1}^{N} {(p_{i, j} - μ_{i})}^{3})}^{\frac{1}{3}},

(4)

where

p_{i, j}

represents the i-th color component of the j-th pixel of the color image, and N represents the number of pixels in the image.

Color cast indices can more accurately reflect the distribution characteristics of image colors. The calculation should be performed in a color space with independent luminance and chrominance components. We chose to perform the calculation in the RLAB color space. First, calculate the mean

μ_{a^{R}}

,

μ_{b^{R}}

and variance

σ_{a^{R}}^{2}

,

σ_{b^{R}}^{2}

of the two image color components

a^{R}

and

b^{R}

in the RLAB color space. Then, calculate the three color cast indices

σ

, D, and

D_{σ}

separately:

σ = \sqrt{σ_{a^{R}}^{2} + σ_{b^{R}}^{2}}

(5)

D = |μ - σ|

(6)

D_{σ} = \frac{D}{σ},

(7)

where

μ = \sqrt{μ_{a^{R}}^{2} + μ_{b^{R}}^{2}}

.

Chroma reflects the depth of color and corresponds to a component of the RLAB color space in cylindrical coordinates. The higher the chroma value, the more intense the color. The average chroma is calculated as in Equation (8):

μ_{c^{R}} = \frac{1}{N} \sum_{x, y} c^{R} (x, y),

(8)

where

c^{R} = \sqrt{{(a^{R})}^{2} + {(b^{R})}^{2}}

.

The average lighting value reflects the overall brightness of the image. We choose to calculate the illumination average from the RLAB perceived color space, taking the average of its illumination component

L^{R}

as in Equation (9):

μ_{L^{R}} = \frac{1}{N} \sum_{x, y} L^{R} (x, y) .

(9)

Then, we denote the concatenation of these 14-dimensional color-related features as C, which is the guide feature for color loss. The variable C is a 14-dimensional feature vector, which is concatenated from a 9-dimensional color moment vector, a 3-dimensional color cast index, a 1-dimensional average lighting value, and a 1-dimensional chroma vector. The color loss function is defined as in Equation (10):

L_{c o l o r} = {‖ C (X) - C (Y) ‖}_{2}^{2},

(10)

where X represents the enhanced image generated by the network, and Y represents the normal illumination image.

The content loss function and perceptual loss function are calculated using Equations (11) and (12), respectively. The convolutional layers in the VGG-19 network can extract the texture and structural information of the image, while the fully connected layers of the network can extract the semantic information of the image. Therefore, we use the pre-trained VGG-19 network as an abstract feature extractor for images in the perceptual loss:

L_{r e c} = | X - Y |

(11)

L_{p e r} = {‖ ϕ_{i} (Y) - ϕ_{i} (X) ‖}_{2}^{2},

(12)

where

ϕ

represents the VGG-19 network, and i represents the i-th feature extraction layer of the VGG-19 network.

Finally, the overall loss function of the network is obtained by weighting multiple loss functions, and the calculation is as follows:

L_{a l l} = λ_{c o l o r} L_{c o l o r} + λ_{r e c} L_{r e c} + λ_{p e r} L_{p e r},

(13)

where

λ_{c o l o r}

,

λ_{r e c}

, and

λ_{p e r}

are the weighting coefficients of color loss, content loss, and perceptual loss and are set to 0.01, 1, and 0.08, respectively.

4. Experiments

4.1. Datasets

We select several publicly available and highly recognized datasets in low-light image enhancement: LOL-Blur [25], LOL-V1 [9], LOL-V2 [26], LSRW [27], SICE [28], DICM [29], and MEF [30]. Among them, the datasets with paired images include LOL-Blur, LOL-V1, LOL-V2, LSRW, and SICE, the datasets without paired images include DICE and MEF. LOL-V1 was taken in a natural environment. The dataset contains 500 paired images. The image content contains multi-scene images. The training dataset contains 485 pairs of images. The test data contain 15 pairs of images, with image resolutions between 400 and 600. In LOL-V2, the image scenes are divided into indoor and outdoor. The first set includes real low-light images, containing 689 pairs of images in the training set and 100 pairs in the test. The second set consists of synthetic images, containing 900 pairs of training images and 100 pairs of test images. The resolutions of the images are 400 × 600 and 384 × 384, respectively. LOL-Blur is a large-scale joint low-light enhancement and deblurring dataset. The dataset contains pairs of images with different darkness levels and motion blur in dark dynamic scenes. The training dataset contains 10,200 pairs of images. Testing data contain 1800 pairs of images with an image resolution of 1120 × 640. LSRW is shot using Huawei equipment and Nikon cameras. The image content is primarily indoor objects or architectural images, including a few outdoor images. Huawei images contain 2450 pairs of training data and 30 pairs of test data. The Nikon part contains 3150 pairs of training images and 20 pairs of testing images. The image resolutions are 960 × 720 and 960 × 640, respectively. SICE contains 589 pairs of large-scale multiple-exposure images. The image scenes include natural scene images and indoor images, and the image size is 3872 × 2592 to 6000 × 4000. DICM comprises 69 dark images under natural lighting. The images are unpaired, and the image resolution ranges from 931 × 480 to 720 × 960. The MEF contains 17 unpaired images. The images include natural scenery, indoor and outdoor images, and artificial building images. The image resolution ranges from 512 × 339 to 512 × 384. We selected LOL-Blur as the training dataset and used test sets from other datasets to verify the effectiveness of our proposed method.

4.2. Experiment Settings

In all experiments in this article, the input image size was randomly cropped to 256 × 256, and the batch size was set to 16. The Adam optimization method was used to update the network weights during the training process, with the first-order moment parameter set to 0.9 and the second-order moment parameter set to 0.99. The initial learning rate was set to 1 ×

10^{- 4}

, and the cosine annealing algorithm adjusts the learning rate. The window size in the window attention mechanism was set to 8. The experiments in this article were all conducted on the Ubuntu 22.04 LTS system, based on the PyTorch network framework. The server configuration is a 2-core CPU (2.9 GHz) with 256 GB memory and two NVIDIA GeForce RTX 3090 GPUs (24 GB video memory).

4.3. Quantitative Evaluation

The effect of image quality enhancement is mainly evaluated from two aspects. The subjective evaluation is completed through comparisons of visual results. Objective evaluation makes comparisons through the calculation results of objective indicators. Objective evaluation indicators are divided into reference image evaluation methods and non-reference image evaluation methods. We use three indicators—PSNR, SSIM, and LPIPS—as the reference image evaluation method and NIQE as the non-reference image evaluation method to quantitatively analyze the low-light image enhancement effect. We detail the principles and calculation methods of each evaluation method below.

4.3.1. Peak Signal-to-Noise Ratio (PSNR)

PSNR is currently the most widely used image quality evaluation method. This method evaluates image quality by measuring the average energy ratio between the maximum signal and the noise signal and requires a real reference image for calculations. The unit is dB. The larger the value, the better the image quality. The calculation formula is as follows:

P S N R = 10 l o g_{10} (\frac{M A X_{X}^{2}}{M S E})

(14)

M S E = \frac{1}{H \times W} \sum_{i = 0}^{H - 1} \sum_{j = 0}^{W - 1} {[X (i, j) - Y (i, j)]}^{2},

(15)

where X and Y represent the enhanced and real reference images, respectively. H and W are the height and width of the image, respectively.

M A X (x)

is the maximum pixel value of the image. Generally, the value of 8-bit integer data is 255, and the floating-point data value is 1. Since PSNR evaluates similarity based on the mean square error between corresponding pixels of two images, it does not consider the human eye’s visual characteristics. The human eye is more sensitive to changes in regional differences with lower spatial frequencies and is less sensitive to brightness. Also, more so than chromaticity, sensitivity to changes in one area is affected by adjacent areas. Therefore, the evaluation results of PSNR sometimes need to be more consistent with the subjective perception of the human eye.

4.3.2. Structural Similarity Index (SSIM)

SSIM is a metric that evaluates the structural similarity between two images. SSIM imitates the human visual system (HVS) to implement the relevant theory of structural similarity and is more sensitive to the perception of local structural changes in the image. It quantifies the quality of an image from three aspects: brightness, contrast, and structure. The mean is used to estimate brightness, the variance is used to estimate contrast, and the covariance is used to estimate structural similarity. SSIM values range from 0 to 1. The larger the SSIM value between two images, the closer they are. When its value is the maximum value 1, the two images are the same. Given two pictures of X and Y, the illumination

L (X, Y)

, contrast, and structure between the two are as follows:

L (x, y) = \frac{2 μ_{x} μ_{y} + c_{1}}{μ_{x}^{2} + μ_{y}^{2} + c_{1}^{'}}

(16)

C (x, y) = \frac{2 σ_{x} σ_{y} + c_{2}}{σ_{x}^{2} + σ_{y}^{2} + c_{2}^{'}}

(17)

S (x, y) = \frac{σ_{x y} + c_{3}}{σ_{x} σ_{y} + c_{3}} .

(18)

SSIM is calculated as in Equation (19).

S S I M (x, y) = [L {(x, y)}^{α} \cdot C {(x, y)}^{β} \cdot S {(x, y)}^{γ}] .

(19)

Let

α = β = γ = 1

, and then the SSIM can be calculated as in Equation (20):

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}

(20)

where

μ_{x}

and

μ_{y}

represent the pixel gray average values of images X and Y, respectively;

σ_{x}

and

σ_{y}

represent the pixel gray standard deviation of image X and image Y, respectively;

σ_{x y}

represents the covariance of the grayscale of the image X and the image Y pixels;

c_{1} = {(k_{1} L)}^{2}

and

c_{2} = {(k_{2} L)}^{2}

are two constants to avoid division by zero; and L represents the maximum range of pixel values. Generally, the value of 8-bit integer data is 255, and the value of floating point data is 1. In general,

k 1 = 0.01

and

k 2 = 0.03

.

4.3.3. Learned Perceptual Image Patch Similarity (LPIPS)

Learned Perceptual Image Patch Similarity (LPIPS) is a learning-based perceptual image patch similarity metric used to evaluate the perceptual quality of images. The design of LPIPS is inspired by the human eye’s perception of images, and it approximates the visual similarity perceived by humans by learning a neural network model. The model uses a convolutional neural network (CNN) to extract features from local patches of an image and calculate similarity scores between patches. The calculation formula of LPIPS is not a simple mathematical formula but is implemented through a deep neural network. Typically, LPIPS models use two images as input, and they output a perceptual similarity score between them. Specifically, the calculation process of LPIPS is as follows: a pre-trained CNN model (usually a deep learning-based image classification model) is used to extract feature representations of the original image and the reconstructed image. Using the extracted feature representation as input, a distance metric function calculates the similarity score between images. The similarity score represents the perceptual difference between images. The smaller the value, the smaller the perceptual difference between images, and the better the image quality. The LPIPS score usually ranges from 0 to 1. The smaller the value, the higher the perceived quality of the image. Compared with traditional image quality evaluation indicators (such as PSNR and SSIM), LPIPS focuses more on factors the human eye perceives and can better capture the perceptual differences between images. It is widely used in tasks such as image generation and editing and is especially suitable for scenarios where perceptual quality needs to be considered. It should be noted that LPIPS is a learning-based metric, and its performance is affected by the CNN model and training data used. Therefore, when using LPIPS for image quality assessment, it is necessary to use datasets and pre-trained models similar to the training model to ensure the accuracy and reliability of the assessment results.

4.3.4. Natural Image Quality Evaluator (NIQE)

In image quality evaluation, sometimes the evaluation result index of PSNR and SSIM is very high, but the viewing effect of human eyes could be better. Especially in recent years, following the introduction of GAN networks, in image super-resolution reconstruction tasks, the higher the PSNR and SSIM are, the lower the super-resolution reconstruction effect is. In addition, PSNR and SSIM require reference images to calculate, but in practical applications, the corresponding high-quality reference images cannot be obtained. Therefore, NIQE [31] was proposed as an indicator of image reconstruction quality. It extracts a series of features from high-quality natural images and then uses them to fit a multivariate Gaussian model. This Gaussian model is then used to measure the difference in multivariate distribution between the image and these features, and the image quality is estimated through this difference. The smaller the value of NIQE, the better the perceived quality of the image. The NIQE algorithm has better prediction stability, monotonicity, and consistency. It has better consistency with the subjective quality evaluation of the human eye, is closer to the human visual system, and can effectively evaluate image quality.

4.4. Comparison with State-of-the-Art Methods

In order to fully evaluate the effectiveness and advancement of the method proposed in this paper, the comparison experiments were conducted with other advanced low-light image enhancement methods, including Retinexformer [32], SMG [33], LEDNet [25], SNR [34], Zero-DCE++[19], and EnlightenGAN [35]. Retinexformer designs a lighting-guided Transformer that uses lighting representation to model non-local interactions. This framework restores damage by estimating lighting information to achieve the purpose of enhancing low-light images. SMG is a new framework for low-light image enhancement that simultaneously models appearance and structure. It uses structural features to guide appearance enhancement, resulting in precise and realistic results. LEDNet combines low-light enhancement and deblurring operations and uses adaptive skip connections to associate the synergy between the encoder and decoder. It introduces pyramid pooling modules and filters adaptive convolution in the encoder and decoder. It simultaneously implements lighting enhancement and deblurring operations on images and releases the public dataset LOL-Blur. SNR is a novel solution for low-light image enhancement that collectively exploits signal-to-noise ratio-aware transformers and convolutional models to dynamically enhance pixels with spatial-varying operations. Computing an SNR is proposed before guiding the feature fusion and formulating the SNRaware transformer with a new self-attention model to avoid tokens from noisy image regions of very low SNR. Guo et al. [19] proposed the Zero-DCE++ algorithm, based on the original network structure Zero-DCE, in which depth-separable convolution is used to replace the traditional convolution, and the input image is down-sampled to estimate the curve and then up-sampled. Sampling recovery enhances the image, the improved model’s parameter amount and calculation amount are significantly reduced, and the model’s inference speed is improved. Jiang et al. [35] proposed the unsupervised algorithm EnlightenGAN, which uses the U-Net network as the generator, and adopts a global–local dual discriminator structure and a self-regularization perceptual loss function to ensure the enhanced image is closer to the real image.

Figure 5 shows a visual comparison of the low-light enhancement effect of our method and other comparison methods on an image in the LOL-Blur test set. Figure 6 shows the results of the LOL-V1 test data. It can be seen that the image brightness reconstructed by ZeroDCE and EnlightenGAN methods is still very dark, and the color is distorted. The images reconstructed by Retinexformer, SMG, and SNR methods have color distortion and overexposure problems, and the content is blurry. The brightness of the image reconstructed using the LEDNet method is close to the results of our proposed method. In summary, the image reconstructed by our proposed method has apparent brightness and content clarity advantages.

In addition to the above subjective evaluation, this section also compares the results of our proposed method and other methods on objective evaluation indicators. Table 1 shows the results of the comparison of the three evaluation indicators of PSNR, SSIM, and LPIPS with the LACCNet proposed in this paper and the comparisons with the LOL-Blur and LOL-V1 test sets. It can be seen from the results that, compared with the SOTA methods, the LACCNet method proposed in this paper has the best overall effect on the LOL-Blur dataset. PSNR reached 26.728, an increase of 0.998, and LPIPS was 0.199, a decrease of 0.035. LACCNet’s results on the LOL-V1 dataset have also reached a more advanced level. However, the results show that LACCNet performs better on the LOL-Blur dataset, which has more severe blur problems. The result may reflect our addition of the Deblur Decoder Block to LACCNet to make it more suitable for blurred scenes.

Table 2 shows the PSNR, SSIM, and LPIPS results of our LACCNet and SOTA methods on the LOL-V2-real, LOL-V2-syn, and LSRW test sets. From the results, it can be seen that LACCNet has better performance on the LOL-V2-real and LOL-V2-real datasets. All indicators on LSRW have reached the best level. Among them, the PSNR on LSRW is 20.087, which is significantly improved compared to the previous method, increasing by 2.006. Since no reference images exist in DICE and MEF, the results are not listed in Table 1 and Table 2.

Table 3 shows the results of our LACCNet and other comparison methods on the non-reference evaluation index NIQE. The NIQE index is more consistent with the subjective quality evaluation of the human eye, closer to the human visual system, and can effectively evaluate image quality. The smaller the NIQE value is, the better the visual perception quality of the image is. As can be seen from Table 3, the method proposed in this paper achieves the best NIQE index on the LOL-Blur, LOL-V1, LOL-V2-real, LOL-V2-syn, LSRW, and DICM datasets.

4.5. Ablation Experiments

4.5.1. Effectiveness Analysis of Luminance Adjustment Module

First, we conducted ablations on the components of the Luminance Adjustment Module architecture. In actual low-light shooting environments, there are situations where the overall brightness is low, and there are also low-light environments with local highlights. Therefore, to enhance low-light images, we must not only improve the overall brightness of the image but also avoid improper adjustment of local areas when the brightness is enhanced, resulting in local over-brightness and thus affecting the overall visual performance of the image. The Luminance Adjustment Module mentioned in this paper considers the demand for overall brightness enhancement and an adjustment for uneven brightness. In order to verify the effectiveness of the Luminance Adjustment Module, different contrast networks were constructed to perform ablation experiments as follows: (a) LACCNet (w/o LAM) indicates a network in which the entire Luminance Adjustment Module is removed, and other structures remain unchanged; (b) LACCNet (w/o ALAM) involves removing the window attention adjustment component in the Luminance Adjustment Module and keeping other structures unchanged; (c) LACCNet (w/o CLAM) involves removing the convolution adjustment component in the Luminance Adjustment Module, while other structures remain unchanged; and (d) LACCNet represents the complete network structure.

Figure 7 compares the enhancement effects of these four network structures on one example image in the LOL-V1 dataset. It can be seen from the table that the network with the Luminance Adjustment Module has the best lighting enhancement effect, and there is no image overexposure or over-darkness.

4.5.2. Effectiveness Analysis of Color Correction Module

In order to verify and analyze the effectiveness of the Color Correction Module proposed in this paper, two sets of comparison experiments were conducted. The two comparison networks are: (a) LACCNet (w/o CCM), which involves removing the network structure of the entire Color Correction Module and the decoder directly. The enhanced image results are output, and other network structures remain unchanged; and (b) LACCNet, a network that adds a complete Color Correction Module. Figure 8 shows the enhancement results of these two networks on two sample images in the LOL-Blur dataset. It can be seen that the enhanced image obtained by network (b) is closest to the corresponding ground truth color.

4.5.3. Effectiveness Analysis of Color Loss Function

In addition to content loss and perceptual loss, this paper also designed a color loss function to correct the color distortion during low-light image enhancement. In order to verify the effectiveness of the color loss function, this chapter designed two sets of experiments: (a) LACCNet (w/o color loss), in which only content loss and perceptual loss are used, without adding the color loss function; and (b) LACCNet, in which three loss functions are used: content loss, perceptual loss, and color loss. Figure 9 shows the test sample images of these two sets of experiments in the LOL-Blur dataset. It can be seen that the image color generated by network (b) with color loss is closer to the ground truth, the color is brighter, and it has a specific improvement effect on the color distortion problem. Therefore, the addition of a color loss function is practical.

In order to further objectively analyze the effectiveness of each functional module designed, the four indicators PSNR, SSIM, LPIPS, and NIQE were used for objective evaluation on the LOL-Blur test set. Table 4 shows the comparison of the objective evaluation results. As can be seen from Table 4, in the ablation experiment results of the Luminance Adjustment Module, the network without the Luminance Adjustment Module performed the worst in terms of PSNR, SSIM, LPIPS, and NIQE. The network structure without the Color Correction Module has three lower indicators: SSIM, LPIPS, and NIQE. The network without the color loss function has three lower indicators: SSIM, LPIPS, and NIQE. In contrast, the network that added the complete Luminance Adjustment Module, the Color Correction Module, and the color loss function performs best on all four metrics.

5. Conclusions

This paper proposes a low-light image enhancement method called LACCNet that combines luminance adjustment and color correction while solving low-light problems, including uneven brightness, color distortion, and blurred content. We introduce the the non-overlapping window attention mechanism and the brightness adjustment curve to formulate a new Luminance Adjustment Module, called LAM. Then, we design a CCM that utilizes the attention mechanism to simulate and learn the color correction matrix to achieve color authenticity restoration. Finally, we introduce a novel color loss function to further improve the color similarity between the enhanced image and the ground truth. Extensive quantitative and qualitative experiments show that our proposed LACCNet dramatically outperforms other SOTA methods on six public datasets.

Author Contributions

The authors confirm their contribution to the paper as follows: study conception and design: N.Z., Y.C., X.H., C.L., R.G. and S.M.; data collection: N.Z. and X.H.; analysis and interpretation of results: X.H. and N.Z.; draft manuscript preparation: Y.C. and X.H. All authors reviewed the results and approved the final version of the manuscript.

Funding

This research was supported by the project (JGKFKT2304) of the Key Laboratory of Convergent Media and Intelligent Technology of the Ministry of Education, Communication University of China, and the project “Research on Key Technologies and Standards for Virtual Shooting Production” (JBKY20240230) of the Academy of Broadcasting Science, National Radio and Television Administration of China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in LOL-Blur [25], LOL-V1 [9], LOL-V2 [26], LSRW [27], SICE [28], DICM [29], and MEF [30].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, J.; Li, Y.; Cao, L.; Li, Y.; Li, N.; Gao, H. Range-restricted pixel difference global histogram equalization for infrared image contrast enhancement. Opt. Rev. 2021, 28, 145–158. [Google Scholar] [CrossRef]
Abdullah-Al-Wadud, M.; Kabir, M.H.; Dewan, M.A.A.; Chae, O. A dynamic histogram equalization for image contrast enhancement. IEEE Trans. Consum. Electron. 2007, 53, 593–600. [Google Scholar] [CrossRef]
Dong, X.; Pang, Y.; Wen, J. Fast efficient algorithm for enhancement of low lighting video. In Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, Barcelona, Spain, 11–15 July 2011; pp. 1–6. [Google Scholar]
Li, D.; Shi, H.; Wang, H.; Liu, W.; Wang, L. Image Enhancement Method Based on Dark Channel Prior. In Proceedings of the 2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), Shijiazhuang, China, 22–24 July 2022; pp. 200–204. [Google Scholar]
Jobson, D.J.; Rahman, Z.u.; Woodell, G.A. Properties and performance of a center/surround retinex. IEEE Trans. Image Process. 1997, 6, 451–462. [Google Scholar] [CrossRef] [PubMed]
Rahman, Z.u.; Jobson, D.J.; Woodell, G.A. Multi-scale retinex for color image enhancement. In Proceedings of the 3rd IEEE International Conference on Image Processing, Lausanne, Switzerland, 19 September 1996; Volume 3, pp. 1003–1006. [Google Scholar]
Guo, X.; Li, Y.; Ling, H. LIME: Low-light image enhancement via illumination map estimation. IEEE Trans. Image Process. 2016, 26, 982–993. [Google Scholar] [CrossRef] [PubMed]
Ren, X.; Yang, W.; Cheng, W.H.; Liu, J. LR3M: Robust low-light enhancement via low-rank regularized retinex model. IEEE Trans. Image Process. 2020, 29, 5862–5876. [Google Scholar] [CrossRef] [PubMed]
Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep retinex decomposition for low-light enhancement. arXiv 2018, arXiv:1808.04560. [Google Scholar]
Wang, T.; Zhang, K.; Shao, Z.; Luo, W.; Stenger, B.; Kim, T.K.; Liu, W.; Li, H. LLDiffusion: Learning degradation representations in diffusion models for low-light image enhancement. arXiv 2023, arXiv:2307.14659. [Google Scholar]
Zhou, D.; Yang, Z.; Yang, Y. Pyramid diffusion models for low-light image enhancement. arXiv 2023, arXiv:2305.10028. [Google Scholar]
Ni, Z.; Yang, W.; Wang, H.; Wang, S.; Ma, L.; Kwong, S. Cycle-interactive generative adversarial network for robust unsupervised low-light enhancement. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10 October 2022; pp. 1484–1492. [Google Scholar]
Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13 June 2020; pp. 1780–1789. [Google Scholar]
Li, C.; Guo, C.; Loy, C.C. Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4225–4238. [Google Scholar] [CrossRef] [PubMed]
Wu, W.; Weng, J.; Zhang, P.; Wang, X.; Yang, W.; Jiang, J. Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18 June 2022; pp. 5901–5910. [Google Scholar]
Wang, Y.; Yu, Y.; Yang, W.; Guo, L.; Chau, L.P.; Kot, A.C.; Wen, B. Exposurediffusion: Learning to expose for low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 12438–12448. [Google Scholar]
Wen, J.; Wu, C.; Zhang, T.; Yu, Y.; Swierczynski, P. Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement. arXiv 2023, arXiv:2308.08197. [Google Scholar]
Zhang, Y.; Zhang, J.; Guo, X. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1632–1640. [Google Scholar]
Zhang, Y.; Guo, X.; Ma, J.; Liu, W.; Zhang, J. Beyond brightening low-light images. Int. J. Comput. Vis. 2021, 129, 1013–1037. [Google Scholar] [CrossRef]
Jobson, D.J.; Rahman, Z.u.; Woodell, G.A. A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process. 1997, 6, 965–976. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Xu, X.; Xu, K.; Lau, R.W. Lighting up nerf via unsupervised decomposition and enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 12632–12641. [Google Scholar]
Yang, S.; Ding, M.; Wu, Y.; Li, Z.; Zhang, J. Implicit neural representation for cooperative low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 12918–12927. [Google Scholar]
Jia, X.; De Brabandere, B.; Tuytelaars, T.; Gool, L.V. Dynamic filter networks. Adv. Neural Inf. Process. Syst. 2016, 29, 667–675. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
Zhou, S.; Li, C.; Change Loy, C. Lednet: Joint low-light enhancement and deblurring in the dark. In Proceedings of the European Conference on Computer Vision, Tel-Aviv, Israel, 23–27 October 2022; pp. 573–589. [Google Scholar]
Yang, W.; Wang, W.; Huang, H.; Wang, S.; Liu, J. Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Trans. Image Process. 2021, 30, 2072–2086. [Google Scholar] [CrossRef] [PubMed]
Hai, J.; Xuan, Z.; Yang, R.; Hao, Y.; Zou, F.; Lin, F.; Han, S. R2rnet: Low-light image enhancement via real-low to real-normal network. J. Vis. Commun. Image Represent. 2023, 90, 103712. [Google Scholar] [CrossRef]
Cai, J.; Gu, S.; Zhang, L. Learning a deep single image contrast enhancer from multi-exposure images. IEEE Trans. Image Process. 2018, 27, 2049–2062. [Google Scholar] [CrossRef] [PubMed]
Lee, C.; Lee, C.; Kim, C.S. Contrast enhancement based on layered difference representation of 2D histograms. IEEE Trans. Image Process. 2013, 22, 5372–5384. [Google Scholar] [CrossRef] [PubMed]
Ma, K.; Zeng, K.; Wang, Z. Perceptual quality assessment for multi-exposure image fusion. IEEE Trans. Image Process. 2015, 24, 3345–3356. [Google Scholar] [CrossRef] [PubMed]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a ’Completely Blind’ Image Quality Analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
Cai, Y.; Bian, H.; Lin, J.; Wang, H.; Timofte, R.; Zhang, Y. Retinexformer: One-stage retinex-based transformer for low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 12504–12513. [Google Scholar]
Xu, X.; Wang, R.; Lu, J. Low-light image enhancement via structure modeling and guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 18–22 June 2023; pp. 9893–9903. [Google Scholar]
Xu, X.; Wang, R.; Fu, C.W.; Jia, J. Snr-aware low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17714–17724. [Google Scholar]
Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The framework of the proposed LACCNet. A low-light image is used as input, a Luminance Adjustment Module is added to the encoder–decoder structure, a Color Correction Module is established to correct color distortion, and the final enhanced result image is output. The blue line represents the residual connection.

Figure 2. The structure of the basic blocks. Among them are the: (a) Basic Encoder Block; (b) Deblur Decoder Block; (c) Middle Block.

Figure 3. The structure of the Luminance Adjustment Module, in which: (a) is the Attention Luminance Adjustment Module; (b) is the Nonlinear Luminance Adjustment Block, which is common to the Luminance Adjustment Module and the Convolution Luminance Adjustment Module; (c) is the Window Attention Block; and (d) is the Convolution Luminance Adjustment Module.

Figure 4. The structure of CCM, in which: (a) is the overall CCM structure, including a CFEB, an SAB, and a Learnable Color Matrix; (b) is the CFEB; (c) is the SAB; and (d) is the Self-Attention mechanism.

Figure 5. Comparison of visualization effects of different low-light image enhancement methods on the LOL-Blur test set.

Figure 6. Comparison of visualization effects of different low-light image enhancement methods on the LOL-V1 test set.

Figure 7. Comparison chart of ablation experiment results of Luminance Adjustment Module on the LOL-V2-real dataset.

Figure 8. Comparison chart of ablation results of the Color Correction Module on the LOL-V1 dataset.

Figure 9. Comparison chart of ablation experiment results of color loss function on the LOL-V2-syn dataset.

Table 1. Comparison results of PSNR, SSIM, and LPIPS indicators with other SOTA methods on LOL-Blur and LOL-V1 datasets.

Methods	LOL-Blur			LOL-V1
Methods	PSNR	SSIM	LPIPS	PSNR	SSIM	LPIPS
Retinexformer	17.525	0.686	0.455	25.160	0.845	0.252
SMG	19.669	0.746	0.429	24.306	0.893	0.308
LEDNet	25.740	0.850	0.224	14.857	0.746	0.374
SNR	16.191	0.687	0.431	24.610	0.842	0.257
Zero-DCE	18.448	0.643	0.481	14.861	0.681	0.372
EnlightenGAN	16.677	0.633	0.478	17.555	0.733	0.381
LACCNet (Ours)	26.728	0.841	0.199	24.388	0.870	0.227

Table 2. Comparison results of PSNR, SSIM, and LPIPS indicators with other SOTA methods on LOL-V2-real, LOL-V2-syn, and LSRW datasets.

Methods	LOL-V2-real			LOL-V2-syn			LSRW
Methods	PSNR	SSIM	LPIPS	PSNR	SSIM	LPIPS	PSNR	SSIM	LPIPS
Retinexformer	22.800	0.840	0.288	25.670	0.930	0.105	17.779	0.537	0.407
SMG	24.620	0.867	0.293	25.620	0.905	0.156	18.081	0.636	0.384
LEDNet	18.752	0.783	0.392	18.612	0.811	0.258	16.508	0.521	0.424
SNR	21.480	0.849	0.261	24.140	0.928	0.087	17.653	0.579	0.496
Zero-DCE	18.058	0.705	0.352	17.756	0.845	0.178	15.857	0.472	0.417
EnlightenGAN	18.684	0.740	0.368	16.486	0.811	0.226	17.592	0.508	0.404
LACCNet (Ours)	24.711	0.869	0.259	24.921	0.869	0.198	20.087	0.672	0.359

Table 3. Comparison results of NIQE with other SOTA methods on 6 data sets.

Methods	LOL-Blur	LOL-V1	LOL-V2-real	LOL-V2-syn	LSRW	DICM
Retinexformer	4.715	3.489	3.966	4.022	3.481	3.686
SMG	7.285	6.131	5.858	6.123	6.156	6.139
LEDNet	4.764	5.491	5.358	5.093	4.832	4.789
SNR	8.241	5.217	4.638	4.129	7.215	4.643
Zero-DCE	5.088	7.496	7.666	4.392	3.698	3.954
EnlightenGAN	4.779	4.778	5.156	4.073	3.320	3.570
Ours	4.685	4.897	3.912	3.986	3.294	3.589

Table 4. The objective evaluation results of LACCNet’s ablation experiments on the LOL-Blur test set.

Modules	Networks	PSNR	SSIM	LPIPS	NIQE
LAM	LACCNet (w/o LAM)	23.50	0.71	0.41	5.71
	LACCNet (w/o ALAM)	25.96	0.81	0.36	5.09
	LACCNet (w/o CLAM)	24.75	0.79	0.39	5.38
CCM	LACCNet (w/o CCM)	23.19	0.75	0.29	4.77
Color Loss	LACCNet (w/o Color Loss)	23.32	0.69	0.35	4.92
	LACCNet	26.73	0.84	0.20	4.69

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, N.; Han, X.; Liu, C.; Gang, R.; Ma, S.; Cao, Y. Joint Luminance Adjustment and Color Correction for Low-Light Image Enhancement Network. Appl. Sci. 2024, 14, 6320. https://doi.org/10.3390/app14146320

AMA Style

Zhang N, Han X, Liu C, Gang R, Ma S, Cao Y. Joint Luminance Adjustment and Color Correction for Low-Light Image Enhancement Network. Applied Sciences. 2024; 14(14):6320. https://doi.org/10.3390/app14146320

Chicago/Turabian Style

Zhang, Nenghuan, Xiao Han, Chenming Liu, Ruipeng Gang, Sai Ma, and Yizhen Cao. 2024. "Joint Luminance Adjustment and Color Correction for Low-Light Image Enhancement Network" Applied Sciences 14, no. 14: 6320. https://doi.org/10.3390/app14146320

APA Style

Zhang, N., Han, X., Liu, C., Gang, R., Ma, S., & Cao, Y. (2024). Joint Luminance Adjustment and Color Correction for Low-Light Image Enhancement Network. Applied Sciences, 14(14), 6320. https://doi.org/10.3390/app14146320

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Joint Luminance Adjustment and Color Correction for Low-Light Image Enhancement Network

Abstract

1. Introduction

2. Related Work

2.1. Luminance Adjustment in Low-Light Image Enhancement

2.2. Color Correction in Low-Light Image Enhancement

3. Method

3.1. Framework

3.2. Luminance Adjustment Module

3.3. Color Correction Module

3.4. Loss Functions

4. Experiments

4.1. Datasets

4.2. Experiment Settings

4.3. Quantitative Evaluation

4.3.1. Peak Signal-to-Noise Ratio (PSNR)

4.3.2. Structural Similarity Index (SSIM)

4.3.3. Learned Perceptual Image Patch Similarity (LPIPS)

4.3.4. Natural Image Quality Evaluator (NIQE)

4.4. Comparison with State-of-the-Art Methods

4.5. Ablation Experiments

4.5.1. Effectiveness Analysis of Luminance Adjustment Module

4.5.2. Effectiveness Analysis of Color Correction Module

4.5.3. Effectiveness Analysis of Color Loss Function

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI