Semantic-Aware Low-Light Image Enhancement by Learning from Multiple Color Spaces

Jiang, Bo; Wang, Xuefei; Yang, Naidi; Liu, Yuhan; Chen, Xi; Wu, Qiwen

doi:10.3390/app15105556

Open AccessArticle

Semantic-Aware Low-Light Image Enhancement by Learning from Multiple Color Spaces

by

Bo Jiang

^1,†

,

Xuefei Wang

^1,†,

Naidi Yang

^2,3,†,

Yuhan Liu

¹,

Xi Chen

¹ and

Qiwen Wu

^1,*

¹

School of Educational Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing 210049, China

²

Key Laboratory of Flexible Electronics (KLOFE), School of Flexible Electronics (Future Technologies), Nanjing Tech University (NanjingTech), Nanjing 211816, China

³

Institute of Advanced Materials (IAM), Nanjing Tech University (NanjingTech), Nanjing 211816, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(10), 5556; https://doi.org/10.3390/app15105556

Submission received: 26 February 2025 / Revised: 26 April 2025 / Accepted: 6 May 2025 / Published: 15 May 2025

(This article belongs to the Special Issue Deep Learning and Transformer Technologies for Image/Video Enhancement and Restoration)

Download

Browse Figures

Versions Notes

Abstract

Extreme low-light image enhancement presents persistent challenges due to compounded degradations involving underexposure, sensor noise, and structural detail loss. Traditional low-light image enhancement methods predominantly employ global adjustment strategies that disregard semantic context, often resulting in incomplete detail recovery or color distortion. To address these limitations, we propose a semantic-aware knowledge-guided framework (SKF) that systematically integrates semantic priors for improved illumination recovery. Our framework introduces three key modules: A Semantic Feature Enhancement Module for integrating hierarchical semantic features, a Semantic-Guided Color Histogram Loss to enforce color consistency, and a Semantic-Guided Adversarial Loss to enhance perceptual realism. Furthermore, we improve the semantic-guided color histogram loss by leveraging multi-color space constraints. Inspired by human visual perception mechanisms, our enhanced loss function calculates color discrepancies across three color spaces—RGB, LAB, and LCH—through three components:

l o s s_{r g b}

,

l o s s_{l a b}

and

l o s s_{l c h}

. These components collaboratively optimize image contrast and saturation, thereby simultaneously enhancing contrast preservation and chromatic naturalness.

Keywords:

low-light image enhancement; semantic information; semantic-aware knowledge-guided framework; multi-color spaces constraints

1. Introduction

With the continuous advancement of image processing and computer vision technologies, Low-Light Image Enhancement (LLIE) has emerged as a critical research area within computer vision. LLIE focuses on recovering visual information from images captured under insufficient illumination, addressing challenges such as underexposure, nonlinear sensor noise contamination, and structural detail loss. These issues are prevalent in various practical applications, including nighttime surveillance, autonomous driving, and medical imaging. For instance, in nighttime surveillance, low-light conditions often result in poor visibility, making it difficult to identify objects or individuals. Similarly, in autonomous driving, low-light environments can impair the performance of object detection and recognition systems, thereby posing significant safety risks. The ramifications extend to medical imaging diagnostics, where low-light conditions may lead to clinically consequential misinterpretations of pathological signatures due to poor image quality. Therefore, effective low-light image enhancement is essential for ensuring the successful completion of these high-level vision tasks and has become a subject worthy of further exploration.

Conventional methods for low-light image enhancement, such as histogram equalization and Retinex-based techniques, have been widely used due to their simplicity and computational efficiency. However, these methods face several limitations. For example, Gray-level transformations struggle with higher luminance ranges, which can result in the loss of details in both over-exposed and under-exposed regions [1]. Histogram equalization variants such as Adaptive Histogram Equalization (AHE) and Contrast-Limited AHE (CLAHE) partially mitigate these issues but often introduce noise amplification and localized over-enhancement artifacts, ultimately degrading contrast homogeneity across spatial regions [2]. Retinex-inspired approaches, while theoretically grounded, suffer from two persistent drawbacks: (i) chromatic inaccuracies accompanied by halo artifacts near high-intensity light sources; (ii) non-generalizable parameter configurations requiring laborious per-image tuning—a critical limitation for real-world deployment [3].

In recent years, deep learning has offered a divergent path for improving low-light images by using complex neural network models. It has achieved remarkable success, with the groundbreaking Sustainable Illumination Decomposition (SID) method, which is particularly notable for processing extremely low-light images. Since its inception, SID has guided the evolution of deep learning architectures for LLIE. Lamba et al. [4] improved the SID methodology by simplifying the system architecture and introducing a novel amplifier module. This modification enables real-time LLIE restoration while directly utilizing dark, raw image intensity values for amplification ratio estimation. Architectural parallelism achieved an additional 30% acceleration, though without significant optimization in restoration quality. Maharjan et al. [5] replaced the U-net architecture proposed by Chen et al. [6] with residual learning to better preserve critical image features. Their method demonstrates superior performance in both Peak Signal-to-Noise Ratio (PSNR) and color representation compared to SID. Despite these advances, existing deep learning methods often overlook the importance of semantic information, leading to issues such as color distortion and loss of fine details.

Semantic information serves as a pivotal element in low-light image enhancement, guiding networks to differentiate regional characteristics during enhancement and facilitating more vivid image restoration. However, existing approaches typically implement global and homogeneous improvements in low-light images without considering region-specific semantic contexts. Networks devoid of semantic priors tend to distort original color distributions, exemplified by erroneous transformations like converting black hair to gray during enhancement. Liang et al. [7] introduced semantic brightness consistency loss to ensure smooth and natural luminance recovery within identical semantic categories. Although the enhanced images improve downstream semantic segmentation performance, semantic information remains underutilized. The method proposed by Zhang et al. focuses on enhancement for maintaining color consistency, which deviates from simplistic global brightness adjustment. However, it still processes color attributes holistically.

To address these challenges, we propose a Semantic-Aware Knowledge-Guided Framework (SKF) that systematically integrates semantic priors to improve illumination recovery and enhance visual quality. The specific contributions of our work can be summarized as follows:

(1) Semantic Feature Enhancement (SE) Module: This module integrates hierarchical semantic features with the enhancement network, enabling more precise and context-aware image enhancement.

(2) Semantic-Guided Color Histogram (SCH) Loss: By leveraging multi-color space constraints (RGB, LAB, and LCH), this loss function ensures color consistency and naturalness while preserving local color information.

(3) Semantic-Guided Adversarial (SA) Loss: This loss function enhances the perceptual realism of the enhanced images through adversarial training, ensuring global consistency and semantic integrity.

2. Related Work

2.1. Low-Light Image Enhancement

2.1.1. Traditional Methods

In early stages, histogram equalization [8,9,10] and gamma correction [11,12] emerged as two classical approaches for low-light image enhancement. The Retinex model [13,14] enhances the brightness and contrast of the image by decomposing and processing the reflectance and illumination components. Although this model effectively improves low-light image quality with better visual perception, it suffers from high computational complexity, prolonged processing time, and noise susceptibility when enhancing extreme low-light images characterized by severe noise, color distortion, detail loss, and non-uniform illumination. These limitations render traditional approaches inadequate for handling extreme low-light conditions beyond their operational thresholds.

2.1.2. Deep Learning-Based Methods

Recent years have witnessed remarkable progress in neural networks [6,15,16] for this domain. The learning strategies employed encompass supervised learning (SL), reinforcement learning (RL), unsupervised learning (UL), zero-shot learning (ZSL), and semi-supervised learning (SSL).

(1) Supervised Learning

Supervised learning trains models through paired datasets where input-output mappings are explicitly defined. For LLIE, this involves using low/normal-light image pairs for training. The pioneering deep learning work in LLIE was LLNet [15], which employed stacked sparse denoising autoencoders for simultaneous low-light brightening and denoising. MBLLEN [16] first extended LLIE techniques to low-light video restoration. Subsequently, Chen et al. proposed “Learning to See in the Dark” (SID) [6], introducing an end-to-end network alongside the SID dataset containing raw short-exposure low-light images and corresponding long-exposure references. Despite being a milestone for extreme low-light restoration, SID exhibits two critical limitations: (1) extremely high computational costs hinder deployment on resource-constrained devices; (2) reliance on prior gain knowledge about real exposure levels, which proves impractical in real-world scenarios.

Addressing these issues, Lamba et al. [4] proposed a lightweight parallel architecture with a novel amplifier module that estimates magnification factors directly from input intensities, achieving 30% acceleration without quality degradation. Recent advancements integrate Retinex theory with data-driven enhancement. KinD [17] and KinD++ [18] combined Retinex decomposition with deep networks for illumination adjustment. Zhu et al. [19] developed the EEMEFN framework with multi-exposure fusion and edge enhancement stages. Other notable architectures include luminance pyramid networks (LPNet) [20], residual networks [21], and Laplacian pyramid-based DSLR [22]. Recently, Wang et al. proposed FourLLIE [23], incorporating Fourier frequency information through amplitude transformation estimation in Fourier space and SNR-guided spatial detail restoration.

(2) Unsupervised Learning

In Low-Light Image Enhancement (LLIE), unsupervised learning refers to a learning strategy without paired images. Unsupervised learning methods learn from unlabeled data, providing a novel approach to tackle the problem of low-light image enhancement. For instance, unsupervised learning might utilize one dataset of low-exposure images and another of normally exposed images. Compared to supervised learning, unsupervised learning does not require a vast amount of labeled data, making it more suitable for practical scenarios where high-quality labels are scarce. However, unsupervised learning also faces some challenges in low-light image enhancement, such as difficulties in evaluation due to the lack of label information and network generalization performance. To address these challenges, researchers have proposed a range of innovative solutions, including adaptive loss function design and adversarial training strategies. Among them, EnlightenGAN [24] regularizes unpaired training by extracting information from the input itself and introduces a series of innovations for low-light image enhancement, including a global-local discriminator structure, self-regularized perceptual loss fusion, and attention mechanisms. It employs an attention-guided U-Net as the generator to ensure that the enhanced results resemble real, normally illuminated images. Additionally, they proposed global and local self-feature preservation losses to preserve the image content before and after enhancement. This is crucial for stabilizing the training of such unidirectional Generative Adversarial Network (GAN) structures.

(3) Semi-Supervised Learning

Semi-supervised learning is a learning approach that lies between supervised learning and unsupervised learning. In Low-Light Image Enhancement (LLIE), semi-supervised learning can be understood as a learning strategy with a small number of paired images and many unpaired images. A representative method of semi-supervised learning in LLIE is DRBN [25]. It proposes a deep recursive band network for semi-supervised low-light enhancement. By introducing Long Short-Term Memory (LSTM) networks and an image quality assessment network pre-trained on an aesthetic visual analysis dataset, better enhancement performance is achieved. The advantage of semi-supervised learning lies in its ability to more fully utilize existing data, especially when it is costly or difficult to obtain a large amount of labeled data. For LLIE, semi-supervised learning can help the model better adapt to different scenes and lighting conditions, improving its generalization ability.

(4) Zero-Shot Learning

Supervised and unsupervised learning methods either exhibit limited generalization capabilities or suffer from unstable training processes. Supervised learning demonstrates strong dependence on the training dataset—when the parameter differences between low-light images and the training set become significant, the quality of restored images deteriorates dramatically. Moreover, collecting sufficiently effective training data itself constitutes a challenging task. To enhance generalization capacity, zero-shot learning proposes to learn enhancement solely from test images. The learned model becomes image-specific, and this approach inherently adapts to different settings of unseen images. Notably, in low-level vision tasks, the concept of zero-shot learning emphasizes that the method requires neither paired nor unpaired training data, which differs from its definition in high-level vision tasks. In LLIE, Zhang et al. [26] first introduced zero-shot learning for low-light image restoration. They proposed the ExCNet framework where, given a test backlit image I, ExCNet estimates the optimal “s-curve” parameters for I within limited iterations. A block-based loss function was designed to maximize visibility across all image blocks while preserving relative differences between adjacent blocks, ensuring properly exposed outputs. Zhu et al. [27] developed the RRDNet with three branches that explicitly predict illumination, reflectance, and noise components of input images. This architecture relies on internal optimization for individual inputs, guaranteeing generalization across diverse shooting scenarios and illumination conditions. Furthermore, they proposed a novel loss function to preserve rich textural details in restored results through decomposition optimization. Liu et al. [28] presented the RUAS network. Unlike CNN-based LLIE methods requiring heuristic neural architecture design, RUAS first establishes a principled approach for constructing base network structures, then automatically discovers embedded atomic prior architectures. A cooperative two-level search strategy was developed for RUAS to simultaneously discover architectures from a compact search space for illumination estimation and noise removal. Empowered by zero-reference loss functions, zero-shot learning methods demonstrate remarkable generalization capabilities while requiring minimal parameters and achieving fast inference speeds.

Unsupervised learning, semi-supervised learning, and zero-shot learning, despite their respective merits, present inherent constraints: (1) The absence of ground-truth labels in unsupervised learning may result in unstable model performance, particularly in complex scenes. The lack of supervisory signals complicates optimization processes, potentially causing models to converge to suboptimal solutions. (2) Semi-supervised methods inherit limitations from both supervised and unsupervised paradigms while failing to fully leverage their respective strengths, resulting in limited practical adoption within real-world applications. (3) Although zero-shot learning methods eliminate data dependency through principled design of zero-reference loss functions, they cannot comprehensively encapsulate all essential attributes of real-world low-light conditions. Consequently, their restoration outputs generally underperform compared to data-driven approaches in terms of quantitative metrics.

2.2. Semantic-Guided Methods

In recent years, with the emergence of semantic information concepts in computer vision, an increasing number of semantic-guided methods have been developed across various applications, demonstrating the reliability of semantic priors. Semantic-guided approaches can be categorized into two paradigms: loss-level semantic guidance and feature-level semantic guidance.

2.2.1. Loss-Level Semantic Guidance

This methodology integrates semantic awareness by designing or adapting loss functions, enabling networks to adjust parameters based on semantic constraints during training. Liu et al. [29] introduced high-level vision tasks (e.g., image segmentation) into image denoising through a joint loss function combining Reconstruction Loss (

L_{D}

) and High-Level Vision Task Loss (

L_{H}

). The latter employs semantic information via cross-entropy loss to measure discrepancies between segmentation predictions and ground-truth labels, ensuring enhanced images retain accuracy for downstream tasks like object classification. In super-resolution tasks, Aakerberg et al. [30] proposed a guided loss function that jointly optimizes semantic segmentation and super-resolution by aligning segmentation outputs with ground-truth labels. For LLIE, Liang et al. [7] designed a brightness consistency loss, enforcing luminance uniformity across identical semantic regions (e.g., clustered objects) to prevent local over-/under-exposure, which proved critical for natural enhancement.

2.2.2. Feature-Level Semantic Guidance

Diverging from loss-level approaches, feature-level methods focus on extracting intermediate features from semantic segmentation networks and integrating semantic priors into feature representation spaces. Wang et al. [31] leveraged semantic maps to guide texture restoration in super-resolution through Spatial Feature Transformation (SFT) layers. These layers modulate network behavior via affine transformations conditioned on segmentation probability maps, dynamically adjusting feature maps. Fan et al. [32] developed a semantic-aware Retinex framework comprising three components: information decomposition, reflectance restoration, and illumination adjustment. Semantic features enhanced reflectance estimation, while reconstructed reflectance guided illumination refinement. Wu et al. [33] proposed a Semantic-Aware Embedding (SE) module that computes cross-modal similarity between segmentation and enhancement features to generate semantic-aware activation maps, enabling context-sensitive enhancement.

2.3. Section Summary

This section reviews traditional methods, deep learning approaches, and semantic-guided techniques for Low-Light Image Enhancement (LLIE). Traditional methods like histogram equalization and Retinex models improve brightness and contrast but suffer from computational complexity and noise sensitivity. Recent deep learning advances include supervised (LLNet, MBLLEN, SID), unsupervised (EnlightenGAN), semi-supervised (DRBN), and zero-shot methods (ExCNet, RRDNet), which enhance performance yet face challenges like high computational costs and data dependency. Semantic-guided methods optimize enhancement through semantic constraints, categorized into loss-level (through adaptive loss functions) and feature-level (via semantic priors) strategies. Future research should focus on refining existing approaches by deeper integration of semantic information to improve enhancement quality and broaden applicability.

3. Overall Architecture

Semantic-aware knowledge-guided low-light image enhancement aims to restore low-light images to their normal-light counterparts by integrating semantic information with image enhancement techniques. This approach effectively enhances low-light images, improving both their visual quality and semantic perception. Traditional low-light image enhancement algorithms often overlook the inherent semantic information within images. In contrast, the Semantic-Aware Knowledge-Guided Framework (SKF) incorporates semantic information into the low-light image enhancement network, thereby providing richer contextual information and greater robustness. This section first introduces the original LLFlow network, detailing its normalized flow model and training process, along with a comparative analysis against conventional algorithms based on deterministic processes. Subsequently, the technical design of the semantic-aware knowledge-guided low-light image enhancement algorithm is elaborated, focusing on the three key components of the SKF framework: the Semantic Feature Enhancement (SE) module, the Semantic-guided Color Histogram (SCH) loss, and the Semantic-guided Adversarial (SA) loss. Finally, a macro-level overview of the low-light image enhancement problem is provided.

3.1. Original Network: LLFlow

LLFlow proposes a low-light image enhancement model based on normalizing flow, which consists of two key components. First, the encoder g takes the low-light image

x_{l}

as input and generates an illumination-invariant color map

g (x_{l})

, which can be interpreted as a reflectance map inspired by Retinex theory. Second, LLFlow employs an invertible network to map normally exposed images to latent codes z, thereby learning the distribution of normally exposed images under low-light conditions. During training, the method maximizes the log-likelihood of the well-exposed image

x_{h}

based on the change-of-variables theorem. Additionally, LLFlow utilizes a stochastic selection mechanism to determine the mean of the Gaussian-distributed latent variable z from either the color map

C (x_{h})

of a reference image or the color map

g (x_{l})

extracted from the low-light image by a conditional encoder. During inference, z can be randomly sampled from

N (g (x_{l}), 1)

, allowing for the generation of diverse normally exposed images based on the learned conditional distribution

f_{flow} (x | x_{l})

. Compared to existing pixel-wise reconstruction loss methods based on deterministic processes, LLFlow employs the negative log-likelihood (NLL) loss for normalizing flow training and conditions it on low-light images or features. This approach naturally captures the structural information of images and measures visual distances on the image manifold. Consequently, LLFlow can more accurately model the complex conditional distribution of normally exposed images, thereby improving the quality of low-light image enhancement, including more balanced exposure, effective noise and artifact suppression, and richer color representation.

3.2. SKF Framework

The SKF framework consists of three main components: the Semantic Feature Enhancement (SE) module, the Semantic-Guided Color Histogram (SCH) loss, and the Semantic-Guided Adversarial (SA) loss.

First, the Semantic Feature Enhancement (SE) module is introduced into the enhancement network. This module integrates semantic-level features with the enhancement network’s features, leveraging semantic information to guide image enhancement. By fusing semantic information with image features, the enhancement network can better preserve and enhance the semantic details of the image, resulting in more semantically aware enhancement effects.

After passing through the enhancement network, the image is constrained by the Semantic-Guided Color Histogram (SCH) loss, which ensures that the enhanced image maintains color consistency with the original image while taking semantic information into account. This ensures that the colors in the enhanced image appear more natural and realistic.

In addition to the color histogram loss, the framework employs the Semantic-Guided Adversarial (SA) loss. By using adversarial training, this loss enhances the robustness and stability of the enhancement network while ensuring that the enhanced images appear more visually realistic and natural.

3.3. Overall Architecture of the LLIE Method

For a low-light image I with width W and height H, the entire process can be described as follows:

I_{out} = E (I_{l o w}, θ)

(1)

where

I_{l o w}

represents the low-light image,

I_{out}

denotes the restored image, and F is a framework with trainable parameters. The goal of deep learning is to find the optimal

θ

that minimizes the discrepancy between the restored image and the ground truth (gt) image captured under normal lighting conditions:

θ^{'} = a r g m i n L (I_{gt}, I_{out})

(2)

where

I_{gt}

is the normally illuminated image, and

I_{out}

is the enhanced image. The loss function L governs the optimization of the network.

For the semantic-guided extreme low-light image enhancement network, the process can be divided into two steps. First,

M = N N_{s e g m e n t} (I_{l o w}, θ_{s})

(3)

where M represents the semantic prior, including segmentation results and multi-scale intermediate features.

F_{s e g m e n t}

is a pre-trained semantic segmentation network that serves as the Semantic Knowledge Base (SKB), and its parameters

θ_{s}

are frozen during training. Then, M is used as an input to the enhancement stage:

I_{o u t} = N N_{e n h a n c e m e n t} (I_{l o w}, M; θ_{e})

(4)

where

I_{o u t} \in R^{W \times H \times 3}

, and

F_{e n h a n c e m e n t}

represents the enhancement network. During training, the optimization objective is minimized while keeping

θ_{s}

fixed, allowing updates to be guided by M:

θ_{e} = a r g m i n L (I_{out}, I_{gt}, M)

(5)

4. Model Details

In this chapter section, we first provide a systematic review of the low-light image enhancement problem and introduce the segmentation network adopted in this study, along with the proposed Semantic-Aware Knowledge-Guided Framework (SKF), as illustrated in Figure 1. Specifically, we conduct an in-depth analysis of the core components of the SKF framework, including the Semantic Feature Enhancement (SE) module, the Semantic-Guided Color Histogram (SCH) loss, and the Semantic-Guided Adversarial (SA) loss. The implementation mechanisms of these modules and their roles in the low-light image enhancement task are thoroughly explained.

Furthermore, based on the SKF framework, we propose a semantic-guided multi-color space loss to improve the color fidelity and visual quality of enhanced images. The design principles and optimization strategies of this loss function are discussed in detail.

4.1. Segmentation Network

Mainstream semantic segmentation networks include U-Net [34], SegNet [35], DeepLab [36], PSPNet [37], and HRNet [31]. This study selects HRNet for extreme low-light image enhancement due to the following advantages: (1) Multi-Scale Feature Retention: HRNet preserves multi-resolution features, enhancing feature extraction in low-light conditions and improving segmentation accuracy. (2) Cross-Branch Feature Fusion: Its multi-branch parallel structure and cross-branch feature aggregation enhance representation capability and generalization in semantic segmentation. (3) High Computational Efficiency: HRNet maintains high precision while enabling efficient inference, reducing the computational cost, and accelerating low-light image restoration. We integrate a pre-trained HRNet into the low-light image enhancement network to perform semantic segmentation on the enhanced image

I_{out}

, generating a segmentation map (segmap) and extracting multi-scale segmentation features to further assist enhancement.

4.2. Semantic Feature Enhancement (SE) Module

Low-light image enhancement networks focus on illumination adjustment and detail restoration, while semantic segmentation networks specialize in scene understanding and object recognition. Due to the heterogeneity between these two types of features, semantic information fusion must account for their differences.

This study proposes the Semantic Feature Enhancement (SE) module, which fuses semantic and image features. It uses a pre-trained HRNet to extract multi-scale semantic features, which are then combined with image features via a cross-modal interaction mechanism. This involves computing similarity between reference and target features and converting semantic info into attention weights that enhance the network’s performance. Its detailed structure is shown in Figure 2.

As shown in Figure 1, the SKF framework includes three SE modules, extracting semantic features

F_{s}^{b}

and image features

F_{i}^{b}

at different resolutions (

(H / 2^{4 - b}, W / 2^{4 - b}), b = 0, 1, 2

). Pixel-wise interactions are performed within the SE module to generate the fused feature map

F_{o}^{b}

.

4.2.1. Feature Alignment and Fusion

First, convolution layers align the feature channels of the segmentation and enhancement networks to ensure consistent dimensions. Then, the transposed attention mechanism is adopted for efficient fusion. Proposed by Zamir et al. [38], this mechanism emphasizes the relative relationships between different positions in the input sequence, providing a more efficient way to model global dependencies than traditional Self-Attention.

Since the computational complexity of self-attention increases quadratically with resolution, its application to high-resolution images results in excessive computational costs. In contrast, transposed attention exhibits linear complexity, significantly reducing computation and memory consumption, making it more suitable for low-light image enhancement. The structure is illustrated in Figure 3, where segmentation features serve as Query (Q), while the original image features act as Key (K) and Value (V).

4.2.2. Semantic-Aware Attention Map

The SE module first computes the semantic-aware attention map

A^{b}

:

A^{b} = S o f t m a x (W_{k} (F_{i}^{b}) W_{q} (F_{s}^{b}) / \sqrt{C})

(6)

where

W_{k} (\cdot)

and

W_{q} (\cdot)

denote channel transformation operations implemented via convolution layers, C is the number of feature channels, and

A^{b} \in R^{C C}

models the similarity between semantic and image features. The final fused feature

F_{o}^{b}

is computed as follows:

F_{o}^{b} = F N (W_{v} (F_{i}^{b}) A^{b} + F_{i}^{b})

(7)

where

W_{v} (\cdot)

performs channel transformation, and FN represents a feedforward network. The output

F_{o}^{b}

serves as the input to the (b + 1)th layer of the low-light enhancement network. In summary, the SE module optimizes the enhancement network by integrating semantic and image features through cross-modal interaction. By leveraging the transposed attention mechanism, this module reduces computational costs while improving the efficiency of semantic prior utilization, thereby significantly enhancing low-light image enhancement quality.

4.3. Semantic-Guided Color Histogram (SCH) Loss

Color histograms are an important tool for describing the color statistics of an image. They provide global features of the color distribution and are commonly used to measure the color similarity between two images. If the color histograms of two images are similar, it indicates that their color distributions are also consistent. In image enhancement tasks, color histograms are often used to constrain the color restoration process. However, traditional color histograms primarily capture global color information and overlook color feature differences between instances, which may lead to local color biases and detail loss. To address this issue, this study proposes the semantic-guided color histogram (SCH) loss. The core idea is to leverage semantic information to guide the calculation of color histograms, allowing the color distribution to be adjusted for each instance, thus preserving local color information more accurately during the enhancement process. The overall structure of the SCH loss is shown in Figure 4.

4.3.1. Instance-Level Color Histogram Calculation

To perform instance-level color histogram calculation, the image is first segmented into different instance regions using the semantic map obtained from the semantic segmentation network. The specific calculation is as follows:

\begin{matrix} p^{c} = I \cdot I_{s e g}^{c}, P = p^{0}, p^{1}, \dots, p^{n} \end{matrix}

(8)

where I is the enhanced image,

I_{seg}^{c}

is the semantic segmentation mask for class c, and · denotes element-wise multiplication.

p^{c}

represents the image instance region of class c, and P is the set of all instance regions. This method allows for the calculation of color histograms for each semantic instance separately, thus enabling more precise modeling of the color distribution for different instances.

4.3.2. Differentiable Color Histogram Estimation

Traditional color histograms are discrete and cannot be directly used for gradient computation, making them difficult to use in end-to-end training of neural networks. Backpropagation relies on the chain rule to compute gradients, and the discrete nature of traditional histogram calculation makes it non-differentiable, preventing the gradient from propagating.

To resolve this issue, this study borrows the concept of Kernel Density Estimation (KDE) and adopts a differentiable color histogram estimation method. This approach allows color histograms to be optimized end-to-end during training and seamlessly integrated into the neural network framework. Inspired by DeepHist [39], the method smooths the histogram calculation using a Gaussian kernel, making it differentiable.

For the R channel of the c-th instance block, the color histogram estimation process is defined as follows:

\begin{matrix} x_{i j}^{h} & = x_{j} - \frac{i - 0.5}{255} \\ x_{i j}^{l} & = x_{j} - \frac{i + 0.5}{255} \end{matrix}

(9)

where

x_{j}

represents the j-th pixel value in the instance block, and i is the current pixel intensity being computed, with

i \in [0, 255]

.

x_{i j}^{h}

and

x_{i j}^{l}

represent the high and low anchor points for estimating the color histogram, respectively. Next, the differentiable color histogram estimate is computed as follows:

H_{i}^{c} = \sum_{j} (S i g m o i d (α \dots x_{i j}^{h}) - S i g m o i d (α \dots x_{i j}^{j})) H^{c} = {i, H_{i}^{c}}_{i = 0}^{255}

(10)

where

H_{i}^{c}

represents the estimated number of pixels with intensity i, and

σ (\cdot)

is the Sigmoid activation function, which smoothly approximates the binary histogram calculation, making the process differentiable. The factor

α

is set to 400 in this study to ensure higher calculation precision. In this formulation, the difference between the two Sigmoid functions estimates the number of pixels with a specific color intensity iii. For example, when

x_{j}

is exactly equal to iii, the contribution to

H_{i}^{c}

will be 1. Therefore,

H^{c}

represents the final instance-level differentiable color histogram, which can be optimized during back-propagation.

4.3.3. Semantic-Guided Color Histogram Loss

To ensure that the color distribution of the enhanced image closely matches that of the ground truth, an

L_{1}

loss is employed to constrain the differentiable color histogram. The final semantic-guided color histogram loss is defined as follows:

L_{S C H} = \sum_{c} | | H^{c} (I_{o u t}) - H^{c} (I_{g t}) {| |}_{1}

(11)

where

H^{c} (I_{out})

and

H^{c} (I_{gt})

represent the instance-level color histograms of the enhanced and ground truth images, respectively. The SCH loss effectively guides the optimization of the neural network, ensuring more natural and accurate color restoration.

4.4. Semantically Guided Adversarial (SA) Loss

Traditional adversarial losses often fail to differentiate subtle semantic inconsistencies, especially under low-light conditions where global structures and object boundaries become ambiguous. To address this limitation, we propose a Semantically Guided Adversarial (SA) Loss, which incorporates semantic priors to guide both local and global discriminators.Adversarial loss is a key component in training Generative Adversarial Networks (GANs), optimizing the interaction between the generator (G) and the discriminator (D). In image inpainting, global and local discriminators are widely used to enhance realism [40,41]. Inspired by EnlightenGAN [24], we propose an improved adversarial loss, Semantically Guided Adversarial (SA) Loss, which integrates semantic information to refine the discrimination process. The overall structure of the SA loss is illustrated in Figure 5.

4.4.1. Semantically Guided Local Adversarial Loss

Traditional local discriminators rely on random cropping, which may not focus on the most challenging forged regions. Instead, we introduce a segmentation-guided strategy that selects the most likely forged regions for optimization. Specifically, given an instance set P, we compute discriminator scores for each instance and select the one with the lowest score for training:

x_{f} = P^{t}, D (P^{t}) = m i n (D (P^{0}), \dots, D (P^{c l a s s}))

(12)

The local adversarial loss is defined as follows:

L_{l o c a l} = \overset{m i n}{G} \overset{m a x}{D} E_{x_{f} p_{r e a l}} M S E (D (x_{r}), 0) + E_{x_{f} p_{f a k e}} M S E (D (x_{f}), 1)

(13)

where

x_{f} \sim p_{fake}

represents the least realistic instance from the generated image

I_{out}

, and

x_{r} \sim p_{real}

is a randomly cropped real image patch.

4.4.2. Semantically Guided Global Adversarial Loss

To ensure global consistency and preserve semantic integrity, we incorporate semantic features into the global discriminator. By integrating the generated image

I_{out}

with the pre-Softmax output of a segmentation network

I_{seg}^{'}

, we form a new feature representation:

(x_{f} = {s e g}^{'}) = f (I_{o u t}, I_{s e g}^{'})

(14)

The global adversarial loss is defined as follows:

L_{l o c a l} = \overset{m i n}{G} \overset{m a x}{D} E_{x_{f} p_{r e a l}} M S E (D (x_{r}), 0) + E_{x_{f} p_{f a k e}} M S E (D (x_{f}), I_{s e g}^{'}), 1)

(15)

4.4.3. Overall Loss Function

The final SA loss is formulated as follows:

L_{S A} = L_{g l o b a l} + L_{l o c a l}

(16)

To ensure image enhancement quality during adversarial training, we incorporate reconstruction loss (e.g.,

L_{1}

loss, MSE loss, or SSIM loss) and Semantically Guided Color Histogram (SCH) loss. The overall loss function is the following:

L_{a l l} = L_{r e c o n} + λ_{S C H} L_{S C H} + λ_{S A} L_{S A}

(17)

where

L_{recon}

represents the reconstruction loss, and

λ_{SCH}

and

λ_{SA}

are hyperparameters balancing different loss components.

In conclusion, compared with conventional discriminators that rely solely on appearance cues, our semantically guided discriminators leverage class-aware information to better distinguish between realistic and unrealistic content, especially at semantically sensitive regions like object contours and scene boundaries. Specifically, the Semantically Guided Local Adversarial Loss focuses on the most challenging regions identified by segmentation priors, while the Semantically Guided Global Adversarial Loss ensures holistic semantic consistency across the entire image. Together, they enable the discriminator to perform both fine-grained and global semantic validation, significantly improving the realism and structural integrity of enhanced images.

4.5. Semantically Guided Multi-Color Space Loss

Building upon the SCH loss in the SKF framework, we propose an improved approach—Semantically Guided Multi-Color Space Loss—to enhance color restoration in low-light image enhancement.

4.5.1. Selection and Analysis of Color Spaces

Color images consist of multiple channels, with commonly used color spaces including RGB, LAB, and LCH. The RGB color space has strong physical significance and is suitable for storage and display. However, its three channels (R, G, B) are highly correlated and sensitive to brightness, shadows, and noise, leading to color distortion in extreme low-light conditions. Processing color solely in the RGB space often fails to restore the original colors of objects accurately. In contrast, LCH and LAB color spaces provide color representations more aligned with human vision.

The LCH color space, comprising hue (H), chroma (C), and lightness (L), reflects human color perception more directly. Since LCH separates color attributes, adjustments become more intuitive and flexible. This separation particularly benefits low-light environments by preserving color clarity and fidelity, ensuring more accurate color analysis and processing. The LAB color space decomposes color into lightness (L), a red-green channel (a), and a yellow-blue channel (b), where the a and b channels are independent of the L channel. This allows separate processing of color and brightness information, enabling more precise color recovery in low-light scenarios. To address the limitations of color processing solely in the RGB space, we introduce a multi-color space loss function that integrates RGB, LAB, and LCH color spaces.

4.5.2. Semantically Guided Color Histogram Loss

The color histogram is a statistical tool for describing the color distribution in images, often used for color analysis. However, traditional color histograms primarily represent global color distributions, overlooking distinctions between different local instances. This can lead to local color deviation and loss of details.

To mitigate these issues, we first generate a semantic segmentation map using an HRNet semantic segmentation network. We then compute semantically guided color loss across different color spaces.

In the RGB space, we employ the Semantically Guided Color Histogram (SCH) Loss, as previously described, and thus omit redundant details here.

In the LAB color space, we compute losses separately for the L channel (lightness) and AB channels (color). The L-channel loss is measured using Mean Squared Error (MSE). The A and B channel losses are calculated using a histogram distance-based probability distribution method. The formulation is as follows:

L^{I_{g t}}, A^{I_{g t}}, B^{I_{g t}} = R G B 2 L A B (I_{g t})

(18)

L^{I_{o u t}}, A^{I_{o u t}}, B^{I_{o u t}} = R G B 2 L A B (I_{o u t})

(19)

L_{l a b} = E_{I_{o u t}, I_{g t}} [{(L_{I_{g t}} - L_{I_{o u t}})}^{2} - \sum_{i = 1}^{n} Q (A_{i}^{I_{g t}}) l o g (Q (A_{i}^{I_{o u t}})) - \sum_{i = 1}^{n} Q (B_{i}^{I_{g t}}) l o g (Q (B_{i}^{I_{o u t}}))]

(20)

where

I_{g t}

and

I_{o u t}

represent the ground truth and enhanced images, respectively. Q denotes the quantization operator, and

R G B 2 L A B

represents the color space conversion function.

Similarly, in the LCH color space, we compute losses for the L, C, and H channels: The L and C channel losses use MSE loss. The H channel loss is computed using cross-entropy based on probability distributions. The corresponding formulations are as follows:

L^{I_{g t}}, C^{I_{g t}}, H^{I_{g t}} = R G B 2 L C H (I_{g t})

(21)

L^{I_{o u t}}, C^{I_{o u t}}, H^{I_{o u t}} = R G B 2 L C H (I_{o u t})

(22)

L_{l c h} = E_{I_{o u t}, I_{g t}} [- \sum_{i = 1}^{n} Q (L_{i}^{I_{g t}}) l o g (Q (L_{i}^{I_{o u t}})) + {(C_{I_{g t}} - C_{I_{o u t}})}^{2} + {(H_{I_{g t}} - H_{I_{o u t}})}^{2}]

(23)

where

I_{g t}

and

I_{o u t}

denote the ground truth and enhanced images, respectively, and RGB2LCH is the color space conversion function.

Total Semantically Guided Multi-Color Space Loss By integrating the losses across multiple color spaces, we define the total semantically guided multi-color space loss as follows:

L_{c o l o r} = L_{r g b} + L_{l a b} + L_{l c h}

(24)

In the improved SKF framework, we incorporate this semantically guided multi-color space loss into the overall loss function, which is formulated as follows:

L_{a l l} = L_{r e c o n} + λ_{c o l o r} L_{c o l o r} + λ_{S A} L_{S A}

(25)

where

L_{r e c o n}

represents the reconstruction loss.

λ_{c o l o r}

and

λ_{S A}

are hyperparameters that balance the contributions of the different loss terms.

4.6. Module Interactions Within the Framework

The SE module first extracts and fuses semantic features with image features. The fused features are then processed by the enhancement network, with the SCH loss ensuring color consistency. Finally, the SA loss refines the output through adversarial training. This collaborative workflow allows the framework to effectively integrate semantic information, achieving superior low-light image enhancement results.

5. Experiments

This section conducts comparative and ablation studies between the original semantic-aware knowledge-guided low-light image enhancement algorithm and its improved variant. Evaluations are performed on the LOL [42] and LOL-v2 [43] datasets, with performance metrics including PSNR, SSIM, LPIPS, and LRC PSNR. Hardware specifications: NVIDIA GeForce RTX 4090 GPU (24 GB VRAM) and Intel i9-13900k processor.

5.1. Experimental Settings

Datasets. The proposed framework was evaluated on two benchmark datasets: LOL dataset, which contains 485 low/normal-light training pairs and 15 test pairs captured in real-world scenarios, and LOL-v2 Dataset (Real Subset), an expanded version with 689 training pairs and 100 test pairs, offering greater diversity in scene types and illumination conditions.

Evaluation Metrics. Five quantitative metrics were adopted:

Peak Signal-to-Noise Ratio (PSNR): Measures pixel-level reconstruction accuracy. Higher values indicate better fidelity.
Structural Similarity Index (SSIM): Assesses structural preservation in luminance, contrast, and texture. Values range [0, 1], with higher scores denoting superior structural consistency.
Learned Perceptual Image Patch Similarity (LPIPS): Quantifies perceptual similarity using deep features. Lower values reflect better perceptual quality.
Natural Image Quality Evaluator (NIQE): Blind image quality assessment without reference images. Lower scores indicate more natural outputs.
Peak Signal-to-Noise Ratio under Low Resolution Conditions, LRC PSNR): PSNR computed under low-resolution conditions to evaluate robustness to resolution degradation.

Comparative Methods. Both the original SKF framework and its modified variant utilize the LLFlow network as the baseline architecture, designated as LLFlow-SKF and LLFlow-SKF+, respectively.

Implementation Details. For LLFlow-SKF, we conducted testing on GPUs allocated via Google Colab, utilizing the official LLFlow network codebase with identical testing configurations. The Semantic Feature Enhancement (SE) module was strategically integrated into the decoder of the LLFlow network to ensure optimal feature interaction. Notably, the Semantic-Guided Adversarial (SA) loss was excluded during training due to the unavailability of enhanced outputs in this phase. As for LLFlow-SKF+, the training process was fully executed on Google Colab GPUs under the same hardware and software constraints, ensuring consistency in experimental conditions.

5.2. Qualitative Analysis

The qualitative assessments on the LOL-v2 and LOL datasets are illustrated in Figure 6 and Figure 7. As shown in Figure 6, the SKF framework enhances the LLFlow method’s enhancement capability, generating images with more realistic color effects, uniform color distribution, and stronger contrast, thereby achieving higher authenticity. Specifically, due to color unevenness, illumination inconsistency, and noise interference, LLFlow’s results appear unrealistic. These issues can be mitigated through SKF, which achieves more consistent color restoration for objects like tables, walls, and clothing, along with more natural detail recovery, making the outputs closer to reference images.

5.3. Quantitative Analysis

The quantitative results on the LOL and LOL-v2 datasets are summarized in Table 1. The experimental results demonstrate that the SKF framework achieves significant performance improvements over the baseline LLFlow network across most metrics. Specifically, LLFlow-SKF attains PSNR values of 30.804 dB on the LOL dataset and 29.194 dB on the LOL-v2 dataset, establishing new state-of-the-art (SOTA) benchmarks. The SSIM values also show consistent enhancements, with average increases of 0.003 (LOL) and 0.002 (LOL-v2), indicating improved luminance recovery and structural preservation. Notably, the substantial reduction in LPIPS scores further validates that the semantic priors introduced by SKF align better with human perceptual intuition. However, a slight degradation in LRC PSNR is observed on the LOL-v2 dataset, suggesting potential trade-offs in low-resolution robustness. These results collectively underscore the effectiveness of SKF in enhancing both objective metrics and visual quality while highlighting areas for future refinement.

The improvements translate directly into practical value across critical application domains such as nighttime surveillance, autonomous driving, and medical imaging. For instance, in nighttime surveillance systems, the higher PSNR and lower LPIPS values correspond to enhanced image detail clarity and more accurate object recognition capabilities, which are essential for security monitoring. In the autonomous driving domain, our model’s superior preservation of structural information in low-light scenarios has decisive significance for obstacle detection and path planning during nighttime navigation, directly impacting driving safety.

5.4. Albation Studies

The ablation study presented in Table 2 demonstrates the effectiveness of different components in the LLFlow-SKF model across two datasets: LOL and LOL-v2. The baseline LLFlow model achieves respectable performance with PSNR values of 27.476 and 27.524 on the respective datasets. With the addition of the Squeeze-and-Excitation (SE) module, there is a noticeable improvement in performance, increasing PSNR to 28.121 on LOL and 27.979 on LOL-v2 while also reducing LPIPS values, indicating better perceptual quality. Further enhancement is observed when incorporating the Self-Calibrated Heads (SCH), which significantly boosts PSNR to 29.901 on LOL and 28.805 on LOL-v2. The complete LLFlow-SKF model, which includes all components (LLFlow + SE + SCH + SA), achieves the best performance with PSNR values of 30.804 and 29.194 on LOL and LOL-v2, respectively, along with the lowest LPIPS scores of 0.064 and 0.093, indicating superior perceptual quality. These results clearly validate the contribution of each component to the overall performance of the LLFlow-SKF model for low-light image enhancement.

6. Conclusions

In this paper, we present a systematic enhancement of semantic-aware low-light image enhancement algorithms, focusing on the Semantic-aware Knowledge-Guided Framework (SKF). The key findings demonstrate that integrating semantic priors into the enhancement process—through modules such as Semantic Feature Enhancement, Semantic-Guided Color Histogram Loss, and Semantic-Guided Adversarial Loss—significantly improves both the visual quality and quantitative performance of low-light image restoration. It is notable that replacing the traditional RGB-only color constraint with multi-color space losses (incorporating RGB, LAB, and LCH) leads to more robust color fidelity and better perceptual realism, especially under challenging lighting conditions.

Experimental results on benchmark datasets show that the SKF framework achieves state-of-the-art performance in terms of PSNR, SSIM, and perceptual metrics while also providing more natural and consistent color restoration. These improvements highlight the importance of semantic information for guiding both global and local enhancement, reducing color distortion, and preserving structural details.

In summary, the SKF framework marks a significant advancement in low-light image enhancement by systematically leveraging semantic knowledge. In future work, further exploration into lightweight architectures, real-time inference, and broader semantic integration could extend the applicability of SKF to a wider range of real-world scenarios and devices.

Author Contributions

Conceptualization, B.J. and N.Y.; Methodology, B.J., X.W., N.Y. and Q.W.; Software, B.J. and Q.W.; Resources, B.J.; Data curation, X.W.; Writing—original draft, X.W., N.Y., Y.L. and X.C.; Writing—review & editing, Y.L. and X.C.; Project administration, Q.W.; Funding acquisition, B.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 61907025).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: Available online: https://github.com/langmanbusi/Semantic-Aware-Low-Light-Image-Enhancement (accessed on 26 February 2025).

Acknowledgments

The authors would like to thank all the anonymous reviewers for their valuable suggestions to improve this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Raji, A.; Thaibaoui, A.; Petit, E.; Bunel, P.; Mimoun, G. A gray-level transformation-based method for image enhancement. Pattern Recognit. Lett. 1998, 19, 1207–1212. [Google Scholar] [CrossRef]
Yuan, Z.; Zeng, J.; Wei, Z.; Jin, L.; Zhao, S.; Liu, X. Clahe-based low-light image enhancement for robust object detection in overhead power transmission system. Procedia Eng. 2023, 38, 2240–2243. [Google Scholar] [CrossRef]
Hao, W.; He, M.; Ge, H.; Wang, C.; Gao, Q. Retinex-like Method for Image Enhancement in Poor Visibility Conditions. Procedia Eng. 2023, 15, 2798–2803. [Google Scholar] [CrossRef]
Lamba, M.; Mitra, K. Restoring extremely dark images in real time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3487–3497. [Google Scholar]
Maharjan, P.; Li, L.; Li, Z.; Xu, N.; Ma, C.; Li, Y. Improving extreme low-light image denoising via residual learning. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 916–921. [Google Scholar]
Chen, C.; Chen, Q.; Xu, J.; Koltun, V. Learning to see in the dark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3291–3300. [Google Scholar]
Liang, D.; Li, L.; Wei, M.; Yang, S.; Zhang, L.; Yang, W.; Du, Y.; Zhou, H. Semantically contrastive learning for low-light image enhancement. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 1555–1563. [Google Scholar]
Reza, A.M. Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. J. VLSI Signal Process. Syst. Signal Image Video Technol. 2004, 38, 35–44. [Google Scholar] [CrossRef]
Ibrahim, H.; Kong, N.S.P. Brightness preserving dynamic histogram equalization for image contrast enhancement. IEEE Trans. Consum. Electron. 2007, 53, 1752–1758. [Google Scholar] [CrossRef]
Abdullah-Al-Wadud, M.; Kabir, M.H.; Dewan, M.A.A.; Chae, O. A dynamic histogram equalization for image contrast enhancement. IEEE Trans. Consum. Electron. 2007, 53, 593–600. [Google Scholar] [CrossRef]
Veluchamy, M.; Subramani, B. Image contrast and color enhancement using adaptive gamma correction and histogram equalization. Optik 2019, 183, 329–337. [Google Scholar] [CrossRef]
Chiu, Y.S.; Cheng, F.C.; Huang, S.C. Efficient contrast enhancement using adaptive gamma correction and cumulative intensity distribution. In Proceedings of the 2011 IEEE International Conference on Systems, Man, and Cybernetics, Anchorage, AK, USA, 9–12 October 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2946–2950. [Google Scholar]
Ng, M.K.; Wang, W. A total variation model for retinex. SIAM J. Imaging Sci. 2011, 4, 345–365. [Google Scholar] [CrossRef]
Kimmel, R.; Elad, M.; Shaked, D.; Keshet, R.; Sobel, I. A variational framework for retinex. Int. J. Comput. Vis. 2003, 52, 7–23. [Google Scholar] [CrossRef]
Lore, K.G.; Akintayo, A.; Sarkar, S. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognit. 2017, 61, 650–662. [Google Scholar] [CrossRef]
Lv, F.; Lu, F.; Wu, J.; Lim, C. MBLLEN: Low-Light Image/Video Enhancement Using CNNs. In Proceedings of the BMVC, Newcastle upon Tyne, UK, 3–6 September 2018; Northumbria University: Newcastle upon Tyne, UK, 2018; Volume 220, p. 4. [Google Scholar]
Zhang, Y.; Zhang, J.; Guo, X. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1632–1640. [Google Scholar]
Zhang, Y.; Guo, X.; Ma, J.; Liu, W.; Zhang, J. Beyond brightening low-light images. Int. J. Comput. Vis. 2021, 129, 1013–1037. [Google Scholar] [CrossRef]
Zhu, M.; Pan, P.; Chen, W.; Yang, Y. Eemefn: Low-light image enhancement via edge-enhanced multi-exposure fusion network. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13106–13113. [Google Scholar]
Li, J.; Li, J.; Fang, F.; Li, F.; Zhang, G. Luminance-aware pyramid network for low-light image enhancement. IEEE Trans. Multimed. 2020, 23, 3153–3165. [Google Scholar] [CrossRef]
Wang, L.W.; Liu, Z.S.; Siu, W.C.; Lun, D.P. Lightening network for low-light image enhancement. IEEE Trans. Image Process. 2020, 29, 7984–7996. [Google Scholar] [CrossRef]
Lim, S.; Kim, W. DSLR: Deep stacked Laplacian restorer for low-light image enhancement. IEEE Trans. Multimed. 2020, 23, 4272–4284. [Google Scholar] [CrossRef]
Wang, C.; Wu, H.; Jin, Z. Fourllie: Boosting low-light image enhancement by fourier frequency information. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 7459–7469. [Google Scholar]
Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef]
Yang, W.; Wang, S.; Fang, Y.; Wang, Y.; Liu, J. From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3063–3072. [Google Scholar]
Zhang, L.; Zhang, L.; Liu, X.; Shen, Y.; Zhang, S.; Zhao, S. Zero-shot restoration of back-lit images using deep internal learning. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1623–1631. [Google Scholar]
Zhu, A.; Zhang, L.; Shen, Y.; Ma, Y.; Zhao, S.; Zhou, Y. Zero-shot restoration of underexposed images via robust retinex decomposition. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
Liu, R.; Ma, L.; Zhang, J.; Fan, X.; Luo, Z. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10561–10570. [Google Scholar]
Liu, D.; Wen, B.; Liu, X.; Wang, Z.; Huang, T.S. When image denoising meets high-level vision tasks: A deep learning approach. arXiv 2017, arXiv:1706.04284. [Google Scholar]
Aakerberg, A.; Johansen, A.S.; Nasrollahi, K.; Moeslund, T.B. Semantic segmentation guided real-world super-resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2022; pp. 449–458. [Google Scholar]
Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 606–615. [Google Scholar]
Fan, M.; Wang, W.; Yang, W.; Liu, J. Integrating semantic segmentation and retinex model for low-light image enhancement. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2317–2325. [Google Scholar]
Wu, Y.; Pan, C.; Wang, G.; Yang, Y.; Wei, J.; Li, C.; Shen, H.T. Learning semantic-aware knowledge guidance for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1662–1671. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Granada, Spain, 20 September 2018; Proceedings 4. Springer: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
Avi-Aharon, M.; Arbelle, A.; Raviv, T.R. Deephist: Differentiable joint and color histogram layers for image-to-image translation. arXiv 2020, arXiv:2005.03995. [Google Scholar]
Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Trans. Graph. (ToG) 2017, 36, 1–14. [Google Scholar] [CrossRef]
Li, Y.; Liu, S.; Yang, J.; Yang, M.H. Generative face completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3911–3919. [Google Scholar]
Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep Retinex Decomposition for Low-Light Enhancement. In Proceedingss of the British Machine Vision Conference, Newcastle, UK, 3–6 September 2018; British Machine Vision Association: Durham, UK, 2018. [Google Scholar]
Yang, W.; Wang, W.; Huang, H.; Wang, S.; Liu, J. Sparse Gradient Regularized Deep Retinex Network for Robust Low-Light Image Enhancement. IEEE Trans. Image Process. 2021, 30, 2072–2086. [Google Scholar] [CrossRef] [PubMed]

Figure 1. SKF Framework Architecture Diagram.

Figure 2. SE module. x in circle means matrix multiplication operation, black dot in circle means element multiplication operation, + in circle means element add operation. green: Fib and blue: Fsb images are explained below in the text.

Figure 3. Transposed attention mechanism. x in circle means matrix multiplication operation, + in circle means element add operation.

Figure 4. Structure of SCH loss.

Figure 5. Structure of SA loss.

Figure 6. The test results of LLFlowSKF and LLFlow networks in the LOL_v2 dataset, where

I_{out}

-LLFlow represents the test result of LOL_v2 in the LLFlow network;

I_{out}

-LLFlow_SKF represents the test result of LOL_v2 in the LLFlow-SKF network.

Figure 6. The test results of LLFlowSKF and LLFlow networks in the LOL_v2 dataset, where

I_{out}

-LLFlow represents the test result of LOL_v2 in the LLFlow network;

I_{out}

-LLFlow_SKF represents the test result of LOL_v2 in the LLFlow-SKF network.

Figure 7. The test results of LLFlowSKF and LLFlow networks in the LOL dataset, where

I_{out}

-LLFlow represents the test result of LOL in the LLFlow network;

I_{out}

-LLFlow_SKF represents the test result of LOL in the LLFlow-SKF network.

Figure 7. The test results of LLFlowSKF and LLFlow networks in the LOL dataset, where

I_{out}

-LLFlow represents the test result of LOL in the LLFlow network;

I_{out}

-LLFlow_SKF represents the test result of LOL in the LLFlow-SKF network.

Table 1. Performance comparison of different models on LOL and LOL_v2 datasets.

Models	LOL Dataset				LOL-v2 Dataset
Models	PSNR	SSIM	LPIPS	LRC PSNR	PSNR	SSIM	LPIPS	LRC PSNR
LLFlow	27.476	0.942	0.078	6.502	27.524	0.91	0.111	7.787
LLFlow-SKF	30.804	0.945	0.064	6.503	29.194	0.912	0.093	7.755
KinD++	18.970	0.804	0.175	5.605	19.087	0.817	0.180	6.372
Zero-DCE	14.861	0.562	0.335	5.103	18.059	0.580	0.313	5.843
EnlightenGAN	17.483	0.652	0.322	5.406	18.640	0.677	0.309	6.193

Table 2. Ablation study on different components on LLFlow-SKF.

Models	LOL Dataset				LOL-v2 Dataset
Models	PSNR	SSIM	LPIPS	LRC PSNR	PSNR	SSIM	LPIPS	LRC PSNR
LLFlow	27.476	0.942	0.078	6.502	27.524	0.91	0.111	7.787
LLFlow + SE	28.121	0.942	0.074	6.502	27.979	0.91	0.101	7.778
LLFlow + SE + SCH	29.901	0.944	0.068	6.503	28.805	0.911	0.093	7.766
LLFlow + SE + SCH + SA (LLFlow-SKF)	30.804	0.945	0.064	6.503	29.194	0.912	0.093	7.755

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, B.; Wang, X.; Yang, N.; Liu, Y.; Chen, X.; Wu, Q. Semantic-Aware Low-Light Image Enhancement by Learning from Multiple Color Spaces. Appl. Sci. 2025, 15, 5556. https://doi.org/10.3390/app15105556

AMA Style

Jiang B, Wang X, Yang N, Liu Y, Chen X, Wu Q. Semantic-Aware Low-Light Image Enhancement by Learning from Multiple Color Spaces. Applied Sciences. 2025; 15(10):5556. https://doi.org/10.3390/app15105556

Chicago/Turabian Style

Jiang, Bo, Xuefei Wang, Naidi Yang, Yuhan Liu, Xi Chen, and Qiwen Wu. 2025. "Semantic-Aware Low-Light Image Enhancement by Learning from Multiple Color Spaces" Applied Sciences 15, no. 10: 5556. https://doi.org/10.3390/app15105556

APA Style

Jiang, B., Wang, X., Yang, N., Liu, Y., Chen, X., & Wu, Q. (2025). Semantic-Aware Low-Light Image Enhancement by Learning from Multiple Color Spaces. Applied Sciences, 15(10), 5556. https://doi.org/10.3390/app15105556

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semantic-Aware Low-Light Image Enhancement by Learning from Multiple Color Spaces

Abstract

1. Introduction

2. Related Work

2.1. Low-Light Image Enhancement

2.1.1. Traditional Methods

2.1.2. Deep Learning-Based Methods

2.2. Semantic-Guided Methods

2.2.1. Loss-Level Semantic Guidance

2.2.2. Feature-Level Semantic Guidance

2.3. Section Summary

3. Overall Architecture

3.1. Original Network: LLFlow

3.2. SKF Framework

3.3. Overall Architecture of the LLIE Method

4. Model Details

4.1. Segmentation Network

4.2. Semantic Feature Enhancement (SE) Module

4.2.1. Feature Alignment and Fusion

4.2.2. Semantic-Aware Attention Map

4.3. Semantic-Guided Color Histogram (SCH) Loss

4.3.1. Instance-Level Color Histogram Calculation

4.3.2. Differentiable Color Histogram Estimation

4.3.3. Semantic-Guided Color Histogram Loss

4.4. Semantically Guided Adversarial (SA) Loss

4.4.1. Semantically Guided Local Adversarial Loss

4.4.2. Semantically Guided Global Adversarial Loss

4.4.3. Overall Loss Function

4.5. Semantically Guided Multi-Color Space Loss

4.5.1. Selection and Analysis of Color Spaces

4.5.2. Semantically Guided Color Histogram Loss

4.6. Module Interactions Within the Framework

5. Experiments

5.1. Experimental Settings

5.2. Qualitative Analysis

5.3. Quantitative Analysis

5.4. Albation Studies

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI