DFE-Net: A Dual-Frequency Enhancement Network for Low-Light and Overexposed Image Restoration

Zhou, Shengyou; Chen, Han; Cui, Wen; Chen, Shiming; Wu, Zhaojie; Chen, Yan

doi:10.3390/electronics15112398

Open AccessArticle

DFE-Net: A Dual-Frequency Enhancement Network for Low-Light and Overexposed Image Restoration

by

Shengyou Zhou

,

Han Chen

,

Wen Cui

,

Shiming Chen

,

Zhaojie Wu

and

Yan Chen

^*

PLA Joint Logistics Support Force University of Engineering, Chongqing 401331, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(11), 2398; https://doi.org/10.3390/electronics15112398

Submission received: 8 May 2026 / Revised: 29 May 2026 / Accepted: 31 May 2026 / Published: 1 June 2026

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

In practical imaging applications, low-light and overexposure are two common types of image degradation problems with inherent conflicts, and existing methods struggle to achieve accurate restoration of both degradations within a unified framework. To address this challenge, this paper proposes DFE-Net based on explicit frequency decoupling. The network adopts a symmetric U-Net architecture and embeds discrete wavelet transform (DWT) and inverse discrete wavelet transform (IWT) to construct an explicit dual-frequency processing mechanism, which optimizes the low-frequency information carrying global illumination and the high-frequency information containing detailed textures, respectively. In the encoder, DWT decouples features into low-frequency and high-frequency sub-bands and feeds them into dedicated enhancement modules. The low-frequency enhancement block integrates SS2D and a gated convolutional feed-forward network to efficiently model global contextual dependencies with linear complexity and accurately restore image illumination and contrast; the high-frequency enhancement block adopts CMT attention combined with a matching convolutional feed-forward network, enabling the detail restoration process to be guided by the optimized low-frequency information and ensuring the collaborative optimization of global structure and local textures. The decoder completes the reconstruction and fusion of the processed sub-bands through IWT. The quantitative and qualitative experimental results on the MSEC, SICE, and LOLv1 datasets demonstrate that DFE-Net achieves or surpasses existing state-of-the-art methods in various metrics while maintaining low model complexity.

Keywords:

low-light image enhancement; overexposure correction; discrete wavelet transform; U-Net

1. Introduction

High-quality images serve as the foundation for system perception and decision-making in numerous fields such as computer vision [1], autonomous driving [2], and security surveillance [3]. However, real-world imaging environments are complex and variable, and captured images frequently suffer from severe abnormal exposure due to insufficient or excessive illumination, namely, the problems of low light and overexposure [4]. Such degraded images not only have poor visual effects but also lose contrast, color, and detailed information, which will significantly reduce the reliability of subsequent high-level visual tasks [5]. Therefore, developing enhancement techniques that can effectively handle both low-light and overexposed images simultaneously is of great value for improving the robustness and practicality of vision systems in complex real-world scenarios.

Although existing studies have begun to explore the unified correction of variable exposure conditions, enabling a single model to perform both low-light enhancement and overexposure correction still faces core challenges. The fundamental reason is that the “brightening” mapping from low-light images to normal images and the “darkening” mapping from overexposed images to normal images are essentially two distinct physical processes with inherent conflicts. When learning these two reverse operations simultaneously, existing methods tend to fall into performance trade-offs and struggle to achieve optimality under both extreme conditions. In addition, the global illumination and color structure of images are mainly contained in low-frequency components, while details such as edges and textures are concentrated in high-frequency components [6]. Designing a network architecture that can explicitly understand and separately optimize these different frequency components is the key to improving restoration accuracy and adaptive capability.

Deep learning-based image enhancement methods have achieved remarkable progress [7]. In recent years, researchers have recognized the importance of unified processing of variable exposure problems and proposed a variety of solutions. For instance, some works specialize in the multi-exposure correction task and attempt to integrate Retinex theory or design dual-branch structures to handle different exposure inputs [8]. Meanwhile, to address low-light problems in ultra-high-resolution images [9], some methods [10] innovatively introduce state space models (e.g., Mamba [11]) combined with wavelet transform to reduce computational complexity while avoiding information loss. These works demonstrate that exploring efficient global modeling architectures and frequency domain tools is an important trend in the current field.

Nevertheless, existing methods still share a common limitation: they fail to explicitly and structurally decouple and enhance the intrinsic frequency components of images in a targeted manner. Architectures based on CNNs, Transformers, or Mambas mostly process image features in a mixed way in the spatial domain [12]. This approach prevents the network from clearly distinguishing and separately optimizing the low-frequency information that determines the main content of images and the high-frequency information that carries detailed textures, resulting in inaccurate enhancement processes. Specifically, high-frequency details may be lost when adjusting global illumination, or detail restoration cannot be coordinated with global illumination, thereby impairing the naturalness and fidelity of restoration effects.

To address the above challenges, this paper proposes a dual-frequency enhancement network (DFE-Net) based on explicit frequency decoupling. We innovatively integrate DWT into the network core, separate low-frequency and high-frequency components of images at multiple scales, and design dedicated enhancement paths for them, respectively. The low-frequency path focuses on restoring correct global illumination and contrast, while the high-frequency path is dedicated to reconstructing clear details and textures. The main contributions of this paper can be summarized as the following three points:

A unified dual-frequency enhancement network, DFE-Net, is proposed, which deeply embeds DWT and IWT into the U-Net architecture to construct a complete processing pipeline involving explicit frequency domain decoupling, targeted feature enhancement and multi-dimensional collaborative reconstruction. It effectively resolves the inherent task conflict between low-light enhancement and overexposure correction within a single framework and achieves synchronous and accurate restoration of two types of extremely exposed degraded images.
Two core components, namely, the low-frequency enhancement block (LFEBlock) and the high-frequency enhancement block (HFEBlock), are designed. The LFEBlock integrates SS2D and a gated convolutional feed-forward network (GCFFN) to efficiently model global contextual dependencies with linear complexity and complete adaptive feature modulation, thereby accurately restoring image illumination and contrast. The HFEBlock adopts CMT attention combined with an L2 nearest neighbor channel template matching mechanism, so that high-frequency detail restoration is guided by the optimized low-frequency global information, ensuring the collaborative optimization of global structure and local textures.
Comprehensive experiments are conducted on the multi-exposure datasets, MSEC and SICE, and the low-light dataset, LOLv1. Both the quantitative and qualitative results show that DFE-Net surpasses existing mainstream state-of-the-art methods in multiple metrics, with outstanding advantages in overexposure correction and low-light detail recovery while maintaining lightweight computational efficiency. Adequate ablation experiments verify the rationality and effectiveness of the proposed overall framework, core designs, and key parameter configurations.

2. Related Works

2.1. Low-Light Image Enhancement

Low-light image enhancement aims to improve the visibility and detail of images captured under insufficient illumination. Traditional methods are mainly based on Retinex theory [13], which performs enhancement by estimating illumination and reflection components. With the development of deep learning, data-driven approaches have become mainstream. Zero-DCE proposed by Guo et al. [14] and its improved version Zero-DCE++ [15] estimate image-specific high-order curves through zero-reference learning, achieving flexible and efficient unpaired enhancement and alleviating the demand for large-scale paired data. Retinexformer, introduced by Cai et al. [16], integrates Retinex theory with the Transformer architecture in a single stage and models long-range dependencies via an illumination-guided multi-head self-attention mechanism, significantly boosting enhancement performance in complex scenes. Luo et al. [17] presented SPNet, which adopts a self-paced learning strategy and illumination gradient constraints to enable the network to process samples from easy to hard, improving the stability of unsupervised training and enhancement quality. Zhou et al. [18] proposed GPP-LLIE, which leverages vision–language models to extract generative perceptual priors for guiding Transformer-based diffusion models, making progress in producing visually realistic results. Nevertheless, although these methods perform excellently in brightening dark regions and restoring details, their core network architectures and training objectives are designed for the unidirectional mapping “from dark to bright”. Consequently, when directly applied to overexposed images that require the inverse adjustment “from bright to dark”, they often suffer from limited performance or unsatisfactory correction results due to the mismatch between task objectives and model capabilities.

2.2. Single-Image Exposure Correction

The core goal of single-image exposure correction is to restore the visual quality lost due to improper exposure (including underexposure and overexposure) from a single input. As shown in Figure 1, pixel values of underexposed images are mostly concentrated in the low-brightness range (0–100), appearing globally dark and requiring brightness enhancement; in contrast, pixel values of overexposed images cluster in the high-brightness range (150–255), appearing overly bright and requiring brightness suppression. The inherent challenge of this task lies in the fact that the model must uniformly learn two mapping relationships with completely opposite brightness adjustment directions, namely, simultaneous “brightening” and “darkening”. Early methods were mostly based on physical models and fusion strategies. Chang et al. [19] proposed to handle overexposed and underexposed regions separately, using WLS filtering and an extended Retinex model for respective correction, and finally obtained results through a saliency-based fusion algorithm. With the popularity of deep learning, end-to-end unified correction networks have become a research hotspot. Afifi et al. [20] constructed a large-scale multi-exposure dataset and proposed a coarse-to-fine deep neural network, which first corrects global color and then refines local details. Eyiokur et al. [21] introduced an end-to-end exposure correction model that synthesizes corrected images via an image encoder, continuous residual blocks, and a decoder and employs perceptual loss, feature matching loss, and a multi-scale discriminator to improve generation quality. Nsampi et al. [22] designed a novel network framework to address color distortion during exposure correction and introduced feature matching loss to ensure exposure consistency. These methods have made remarkable progress in unified correction frameworks, but most conduct mixed feature processing in the spatial domain and fail to explicitly decouple and target the frequency components that carry different physical information in images.

2.3. Image Enhancement Methods Based on Frequency Domain Analysis

Frequency domain analysis provides a new perspective for image enhancement beyond the pixel domain. Fourier transform can convert images into the frequency domain and separate global amplitude and phase information. Zhang et al. [23] utilized the Discrete Cosine Transform (DCT) in an exposure correction network to separate information in the frequency domain for guiding spatial domain restoration. However, Fourier transform lacks spatial localization capability. Owing to its favorable time frequency localization property, wavelet transform can losslessly decompose an image into sub-bands of different scales, thereby separating the smooth content and detailed textures of an image more finely. Zou et al. [24] employed wavelet transform to achieve information-lossless downsampling in low-light enhancement and designed network modules for processing low-frequency and high-frequency components separately based on the observation that “information mainly exists in low frequencies”. The WavEnhancer model [25] unifies wavelet transform and Transformer, optimizing different frequency bands simultaneously via the self-attention mechanism in the wavelet domain to balance local details and high-level features. Recently, hybrid methods integrating the advantages of wavelet and Fourier transforms have shown great potential. The WalMaFa model [10] combines the global brightness enhancement ability of wavelet transform with the local detail restoration ability of Fourier transform. SPJFNet [26] further proposes a self-mined prior-guided joint frequency enhancement mechanism, which automatically mines intrinsic feature priors of images and synchronously optimizes low-frequency brightness and high-frequency details. These works demonstrate that explicit decoupling and targeted processing of different frequency components in the frequency domain are effective. Nevertheless, how to deeply integrate this advanced frequency domain decoupling idea with network architectures at the structural level and apply it to the more challenging task of unified exposure correction remains to be further explored.

3. Method

3.1. Network Architecture

The proposed DFE-Net is an end-to-end dual-frequency enhancement network designed for the unified restoration of single low-light or overexposed images. As illustrated in Figure 2, the network adopts a symmetric U-Net [27] architecture as a whole, whose core innovation lies in the explicit dual-frequency decoupling and collaborative enhancement mechanism constructed based on DWT [28] in the encoder and decoder.

First, the network performs shallow feature extraction on the input image

I_{i n} \in R^{H \times W \times 3}

via a 3 × 3 convolutional layer to obtain the initial feature map

F_{0}

. Meanwhile, the downsampled versions

I_{↓ 2}, I_{↓ 4}, I_{↓ 8}

generated from

I_{i n}

through the PixelUnshuffle [29] operation are embedded by separate convolutional layers, respectively. The resulting multi-scale (MS) features are fused with features at the corresponding levels of the encoder, so that rich context information of the original image is directly injected into each processing stage. Unlike simple downsampling, PixelUnshuffle rearranges pixels to preserve fine-grained spatial details across scales, serving as a direct supplement to the subsequent frequency domain processing.

In the encoder, each encoding block follows the workflow of decomposition–enhancement–separate output. Specifically, the input features are first decomposed into one low-frequency sub-band and three directional high-frequency sub-bands by DWT. After fusing with the image features at the corresponding scale, the low-frequency sub-band is fed into LFEBlock for processing. LFEBlock integrates SS2D [30] to model long-range dependencies, thereby restoring the global illumination and contrast of the image. The three high-frequency sub-bands are first integrated by the SKFF module [31] and then sent to the subsequent HFEBlock together with the enhanced low-frequency features. These modules utilize the CMT attention mechanism [32] to repair detailed textures and suppress noise under the guidance of the optimized global context. Each encoding block produces two outputs: the low-frequency features processed by LFEBlock and the high-frequency features processed by HFEBlock. Among them, the low-frequency features are passed to the next-layer encoding block for further downsampling and feature abstraction at a deeper level, while the high-frequency features are directly transmitted to the corresponding decoding blocks in the decoder via skip connections to preserve and convey rich, detailed information.

In the decoder, each decoding block is responsible for receiving and fusing the low-frequency features from the deep encoding blocks and the high-frequency features from the same-layer encoding blocks. Concretely, the low-frequency features are first refined by their own sequence of LFEBlock modules. At the same time, the high-frequency features are calibrated and enhanced by HFEBlock under the guidance of the low-frequency features optimized at the current step. Finally, the processed low-frequency features and the channel-expanded high-frequency features are concatenated along the channel dimension, upsampled, and reconstructed via IWT [33], outputting feature maps with doubled resolution to gradually recover high-resolution details.

Through the cascaded encoding–decoding process across three scales, the network achieves coarse-to-fine image restoration. Eventually, the output of the decoder is mapped back to the RGB space via a convolutional layer and connected with the input image via a residual connection to generate the enhanced image

I_{o u t}

. The entire model is trained end to end by minimizing the L1 loss between the output and the ground truth image. By virtue of its hierarchical multi-scale guidance, consistent dual-frequency decoupling enhancement throughout the network, and the dual-path feature propagation mechanism with vertical deepening of low frequencies and horizontal supplementation of high frequencies, this framework realizes the collaborative and accurate restoration of global illumination and local textures in exposure-degraded images.

3.2. Motivation: Exposure Correction Based on Frequency Domain Decoupling

The essence of image enhancement is to reconstruct the information lost or distorted due to degradation. Under abnormal exposure conditions (low light and overexposure), degradation does not affect all image components uniformly. Observations show that insufficient or excessive illumination mainly distorts the global smooth regions and contrast of an image, which correspond to low-frequency information in the frequency domain, while blurred details, amplified noise, or saturation are mainly reflected in edge and texture regions, corresponding to high-frequency information in the frequency domain. However, most existing deep learning methods perform end-to-end mapping in the spatial domain, where the internal feature representations are highly coupled. The network implicitly learns to mix and process these components with distinct physical meanings, which limits the model’s transparency, controllability, and effectiveness in handling extreme and opposite degradations.

To address the above issues, we advocate an explicit frequency domain decoupling strategy. Its core idea is as follows: Inside the model, separate image features into sub-components carrying different physical information, design dedicated sub-networks for targeted processing, and finally fuse them through a collaborative mechanism. This requires that the mathematical tools used can not only achieve frequency separation but also satisfy other key requirements of image restoration tasks, such as lossless multi-scale analysis, spatial localization capability, and reversibility.

Although the classic Fourier transform enables global frequency domain analysis, it lacks spatial localization and is thus unsuitable for image restoration that requires preserving local structures. In contrast, DWT and its inverse provide an ideal solution. By using a set of separable low-pass and high-pass filters, DWT decomposes an image

I

into four sub-bands:

\begin{array}{l} L L = ϕ_{h} * (ϕ_{v} * I) ↓_{2} \\ L H = ψ_{h} * (ϕ_{v} * I) ↓_{2} \\ H L = ϕ_{h} * (ψ_{v} * I) ↓_{2} \\ H H = ψ_{h} * (ψ_{v} * I) ↓_{2} \end{array}

(1)

where

*

denotes convolution,

↓_{2}

denotes 2× downsampling, and

ϕ

and

ψ

represent low-pass and high-pass filters, respectively. I_LL is the low-frequency approximation sub-band containing the main structure and illumination information of the image; I_LH, I_HL, and I_HH are high-frequency detail sub-bands that capture horizontal, vertical, and diagonal edges and textures, respectively. This transformation can be inverted via the inverse discrete wavelet transform:

I = I W T (I_{L L}, I_{L H}, I_{H L}, I_{H H})

(2)

IWT upsamples and filters the sub-bands using the corresponding reconstruction filter banks and sums them to reconstruct the original image. This property of DWT and IWT brings key advantages to network design. First, unlike ordinary pooling operations, the DWT-based downsampling process causes no information loss, thus preserving complete image details and providing a richer information basis for subsequent enhancement tasks. Second, decomposition and reconstruction are fully reversible, offering a natural mathematical foundation for constructing symmetric encoders and decoders. Finally, the separated low-frequency and high-frequency sub-bands have clear physical interpretations: the low-frequency sub-band encodes global illumination and structure, while the high-frequency sub-bands carry details and textures. This allows us to design targeted dedicated neural network modules (LFEBlock and HFEBlock) to perform precise and interpretable intervention on different types of degradation.

Based on the above analysis, we propose the core design motivation of DFE-Net: to construct an explicit dual-frequency processing pipeline running through the encoder and decoder by using DWT and IWT. In the encoder, we decouple features via DWT and process different sub-bands through the low-frequency enhancement module and high-frequency enhancement module, respectively, where high-frequency processing is guided by the enhanced low-frequency information. In the decoder, the processed sub-bands are reconstructed and upsampled via IWT. This design enables the network to explicitly and targetedly correct the differential degradation caused by abnormal exposure on different frequency components of the image, thereby laying a solid foundation for achieving unified, accurate, and interpretable restoration of low-light and overexposed images.

3.3. Low-Frequency Enhancement Block

The LFEBlock serves as the core component in DFE-Net dedicated to processing the low-frequency components of images, whose structure is illustrated in Figure 3. It takes the low-frequency sub-band features decomposed by DWT as input, and its main task is to restore the global illumination distribution and overall contrast of the image. Its design integrates the high efficiency of state-space models in long-sequence modeling with the trainability of traditional residual connections, and the detailed implementation is as follows.

The input to the block is the serialized low-frequency feature

x_{i n} \in R^{B \times L \times C}

, where

B

denotes the batch size,

L = H \times W

is the total number of spatial locations, and

C

denotes the number of channels. First, the features are normalized by a layer normalization layer. Subsequently, the normalized features are fed into the SS2D module. As an efficient implementation of state-space models on 2D image data, the SS2D module crucially traverses the feature map along four parallel scanning directions (horizontal, vertical, and their reverses), thereby modeling long-range dependencies between arbitrary positions in the image with linear complexity. This mechanism enables the block to integrate global contextual information of the entire image so as to achieve globally consistent adjustment of illumination. The output of the SS2D module is then added to the input of the block via a residual connection. Notably, this residual path introduces a learnable channel scaling parameter

λ_{1}

, which allows the network to adaptively adjust the mixing weight of the identity mapping and the transformation path, enhancing optimization stability. The output of this step is formulated as:

x_{1} = L a y e r N o r m (x_{i n}) * λ_{1} + S S 2 D (L a y e r N o r m (x_{i n}))

(3)

Here,

λ_{1}

and

λ_{2}

(in Equation (5)) are learnable scaling parameters initialized to 0.0. This initialization facilitates a near-identity mapping at the beginning of training, stabilizing the optimization process. Both parameters are updated via standard backpropagation without any explicit regularization constraints.

Afterwards, the features go through another sub-stage consisting of layer normalization, a feed-forward network, and a residual connection. First,

x_{1}

is normalized by layer normalization again and then fed into GCFFN. GCFFN expands the channel dimension via a 1 × 1 convolution and extracts spatial features using depthwise convolution. It implements a simple gating mechanism through channel splitting and completes adaptive feature modulation combined with the GELU activation, finally outputting the reduced-dimensional features via a 1 × 1 convolution. This process can be expressed as:

\begin{array}{l} x_{1}^{'}, x_{1}^{″} = C h u n k (C o n v (C o n v (x_{1}))) \\ x_{2} = C o n v (G E L U (x_{1}^{'}) * x_{1}^{″}) \end{array}

(4)

Similarly, this sub-stage also adopts a residual connection with a learnable scaling parameter λ₂. Therefore, the final output of LFEBlock is:

x_{o u t} = x_{1} * λ_{2} + x_{2}

(5)

The entire LFEBlock ensures smooth gradient flow through two residual connections with learnable scaling and allows the network to flexibly balance the roles of different components. Ultimately, the output features serve two key purposes: first, they are passed as low-frequency features to the next-level encoding block for deeper processing; second, they act as enhanced global context to provide guidance for the HFEBlock at the same level.

3.4. High-Frequency Enhancement Block

The HFEBlock is the core component in DFE-Net responsible for restoring image details and textures, whose structure is illustrated in Figure 4. Its design is based on a key principle: the recovery of image details should not be carried out in isolation but under the guidance of optimized global illumination and structural information. This block takes two inputs: the high-frequency feature

x_{h} \in R^{B \times C \times H \times W}

fused by SKFF, and the low-frequency feature

x_{l} \in R^{B \times C \times H \times W}

from the LFEBlock at the same level, which serves as the semantic guidance. The core of the block is a CMT attention integrated with a cross-modal matching transformation mechanism, which enables the low-frequency context to provide adaptive and fine-grained guidance for the high-frequency processing.

The data flow inside the block starts with layer normalization of the features. Subsequently, the normalized high-frequency features and guidance features are fed into the CMT attention. This block first generates query Q, key K, and value V vectors via convolutional projection. Different from conventional self-attention, this block introduces a matching transformation mechanism. First, according to the L2 distance between feature vectors, the top K most relevant feature prototypes are dynamically selected from the guidance feature

x_{l}

for each position of the high-frequency feature

x_{h}

, achieving non-local feature matching. The matched feature prototypes are concatenated with the original query feature, and then the adaptive fusion of the original feature and the matched feature are completed through PAConv [34] attention convolution.

\begin{array}{l} M (x_{h}, x_{l}) = T o p K ({‖F l a t t e n (x_{h}) - F l a t t e n (x_{l})‖}_{2}) \\ Q^{'} = P A C o n v (x_{h} \oplus M (x_{h}, x_{l})) \end{array}

(6)

where

{‖\cdot‖}_{2}

denotes the L2 distance and

⨁

denotes channel concatenation. Note that the TopK operation is non-differentiable. To enable end-to-end training, we adopt the Straight-Through Estimator (STE) during backpropagation. The gradients are directly copied from the output to the input of the TopK module, treating the selection as an identity mapping. This allows the network to learn stable feature matching patterns without hindering gradient flow. Thus, semantic information from the low-frequency x_l context is injected into the query vector

Q^{'}

. The enhanced query

Q^{'}

is used to compute attention with key K and value V, and a learnable temperature parameter

τ

is introduced to adaptively adjust each attention head:

A t t e n t i o n (Q^{'}, K, V) = S o f t \max (\frac{Q^{'} K^{T}}{τ}) V

(7)

The output of the attention module is added to the original high-frequency feature via a residual connection. Then, the features go into a matching convolutional feed-forward network (MCFFN), which replaces the standard fully connected layers with a combination of 1 × 1 convolution and depthwise convolution to construct input and output projection branches of features. The MCFFN also integrates the aforementioned matching transformation mechanism to ensure that the guidance information takes effect continuously in deeper feature transformation. In addition, the MCFFN adopts a residual connection as well.

The output of the HFEBlock is the detailed feature calibrated under dual guidance (attention and feed-forward network). This feature contains texture information consistent with global illumination and is directly transmitted to the corresponding level of the decoder through skip connections, finally participating in reconstructing clear and natural enhanced images. Through the formulated feature matching and conditional fusion mechanism, this block achieves precise collaboration between low-frequency and high-frequency information, which is the key for DFE-Net to realize high-quality exposure correction.

4. Experiments

4.1. Experimental Settings

Datasets: To comprehensively evaluate the performance of the proposed DFE-Net in handling various exposure degradation problems, we conduct experiments on three widely used benchmark datasets. MSEC is a multi-exposure image correction dataset that contains paired multi-exposure abnormal images and normal-exposure images. The SICE [35] dataset also focuses on multi-exposure scenes and provides abundant underexposed and overexposed image samples. LOLv1 [36] is a classic dataset in the field of low-light image enhancement, dedicated to evaluating the model’s restoration ability in extremely dark conditions. The combined use of these three datasets can effectively verify the generalization and robustness of the model in addressing underexposure and overexposure issues within a unified framework.

Evaluation metrics: We adopt PSNR and SSIM as quantitative evaluation metrics to measure the gap between the enhanced images and the ground truth reference images in terms of pixel-level fidelity and structural similarity. A higher PSNR value and an SSIM value closer to 1 indicate better enhancement performance.

Implementation details: To ensure fair comparison and reproducibility, we strictly adhere to the standard dataset splits established in prior works. For the MSEC dataset [20], we strictly follow the official fixed split released by the authors: 17,675 training pairs, 750 validation pairs, and 5905 testing pairs. For the SICE dataset [32], we follow the same multi-exposure data source and paired construction protocol defined in Cai et al. We use the publicly released Part1 (589 sequences, 4413 images) with the author-provided reference images to build exposure–correction pairs, and we partition the 589 sequences at the sequence level into training/validation/testing with a 7:1:2 ratio to keep entire scenes intact. For the LOLv1 dataset [33], we adopt its official split with 485 training pairs and 15 testing pairs.

4.2. Comparison with State-of-the-Art Methods

To comprehensively evaluate the performance of the proposed DFE-Net, we select a total of 11 advanced image enhancement methods for comprehensive comparison. These methods cover three mainstream technical routes: CNN-based methods, including RetinexNet, MIRNet, and HWMNet [37]; Transformer-based methods, including UFormer [38], LLFormer [39], RetinexFormer, HVI-CIDNet [40], and LYT-Net [41]; and emerging state-space model (Mamba)-based methods, including Wave-Mamba, MambaLLIE [42], and Retinexmamba [43]. To ensure a fair comparison, all baseline results presented in this section are obtained by retraining the official implementations using the exact same training protocols (detailed in Section 4.1) as our proposed method. Strict quantitative and qualitative evaluations are conducted on the multi-exposure correction datasets, MSEC and SICE, and the low-light enhancement dataset, LOLv1.

4.2.1. Quantitative Result Analysis

Table 1 presents the PSNR and SSIM metrics of DFE-Net and all comparative methods on the MSEC and SICE datasets. Table 2 shows the PSNR and SSIM results on the LOLv1 dataset.

On the MSEC dataset, DFE-Net demonstrates comprehensive competitiveness. Specifically, on the underexposure subset, DFE-Net achieves a PSNR of 23.63 dB and an SSIM of 0.897. Although its PSNR is slightly lower than the 23.67 dB of MambaLLIE and its SSIM is also inferior to the latter’s 0.911, its performance significantly outperforms all other methods, except MambaLLIE. This validates the effectiveness of DFE-Net in handling low-light degradation. On the more challenging overexposure subset, the advantage of DFE-Net becomes more prominent. Its PSNR reaches 23.10 dB, surpassing the second-ranked MambaLLIE by 0.61 dB, demonstrating its superior capability in recovering overexposed information. In terms of average metrics, DFE-Net achieves the highest PSNR of 23.41 dB, and its SSIM of 0.894 is also very close to the optimal performance. These results indicate that DFE-Net achieves excellent and balanced performance on the MSEC dataset, especially in the challenging task of overexposure correction.

On the SICE dataset, the advantages of DFE-Net are more comprehensive. Its average PSNR of 23.45 dB and average SSIM of 0.799 rank first among all methods. Specifically, on the underexposure subset, DFE-Net achieves a PSNR of 25.47 dB with a significant lead, surpassing the closest competitor, Retinexmamba, by 1.54 dB. On the overexposure subset, the 21.44 dB PSNR of DFE-Net also outperforms other comparative methods. Notably, the overall performance of DFE-Net on SICE surpasses that of Mamba-based architectures, including Wave-Mamba and MambaLLIE, which highlights the design superiority of combining the efficient sequence modeling ability of Mamba with the explicit frequency domain decoupling strategy in the wavelet domain.

On the LOLv1 dataset, DFE-Net achieves the best performance among all comparative methods, with a high PSNR of 24.80 dB and an SSIM of 0.883. This result carries two important implications. First, it significantly outperforms state-of-the-art methods specially designed for low-light scenarios, including RetinexFormer and HWMNet, proving that the proposed unified framework of DFE-Net also possesses top-tier capability in solving classical low-light problems. Second, in comparison with Mamba-based counterparts, DFE-Net shows clear advantages over Retinexmamba, Wave-Mamba, and MambaLLIE. This strongly validates that the design of integrating the global modeling ability of Mamba with the core explicit wavelet-domain dual-frequency decoupling and collaboration mechanism in this paper can more effectively address the compound challenges of global brightness reduction and local detail–noise mixing caused by insufficient illumination, compared with simply applying Mamba or combining it with Retinex theory. This result is consistent with the excellent performance on the MSEC and SICE datasets, jointly proving that the DFE-Net framework has strong and balanced generalization ability in handling the full range of abnormal exposure (from underexposure to overexposure).

4.2.2. Model Efficiency Analysis

Excellent performance accompanied by excessive computational cost would limit its practical application. Therefore, we compare the parameters and computational complexity of each model in Table 3. To ensure a fair comparison, all GFLOPs are calculated using a unified input resolution of 256 × 256 for all methods. DFE-Net only requires 3.90 GFLOPs and 8.70 M parameters. Compared with Transformer-based methods of similar performance, it has significant advantages in computational efficiency. For instance, the computational cost of LLFormer is about 5.6 times that of DFE-Net, and UFormer about 10.5 times. Compared with large-scale CNN models, such as MIRNet and HWMNet, with heavy computation, the efficiency advantage of DFE-Net is even more obvious. Even compared with lightweight Mamba-based methods, DFE-Net maintains comparable computational cost to MambaLLIE while achieving better overall performance on the SICE and LOLv1 datasets, and it exhibits outstanding performance in the overexposure task on MSEC. This reflects the excellent performance–efficiency trade-off of DFE-Net.

4.2.3. Qualitative Result Analysis

Figure 5, Figure 6 and Figure 7 show the enhancement results of DFE-Net and the top three methods in the quantitative analysis on the MSEC, SICE, and LOLv1 datasets, respectively. On the MSEC dataset, visual comparison and analysis are performed on the enhancement results of one underexposed sample and one overexposed sample. For the underexposed image, the output of HWMNet deviates from the ground truth in color reproduction with an overall pale tone, especially reflected in the color distortion of distant buildings; the result of LLFormer shows an obvious color shift, with an unnatural red tint in the sky region. Both MambaLLIE and DFE-Net produce relatively natural overall visual effects, but MambaLLIE is slightly insufficient in detail sharpness, e.g., the outlines of trees and streets behind the figures are blurred. In a comprehensive visual comparison, the enhanced result of DFE-Net is closest to the ground truth in color authenticity, detail sharpness, and overall naturalness. For the overexposed image, methods such as HWMNet, LLFormer, and MambaLLIE perform poorly in restoring the texture structure of highlight regions; for example, the window outlines of building facades are blurred with obvious detail loss. In contrast, DFE-Net can better recover the texture hierarchy and structural information of overexposed regions.

On the SICE dataset, we also select one underexposed and one overexposed image for qualitative comparison. For the underexposed image, RetinexFormer and Wave-Mamba suffer from insufficient color restoration, with low saturation of the red sign, especially the latter; Retinexmamba performs poorly in the sharpness of character edges in this region. Although the color saturation of some building surfaces in DFE-Net is slightly lower than that of the ground truth, its overall enhancement effect is closest to the reference in color balance, detail preservation, and visual authenticity. In the overexposed image, due to severe local overexposure, all comparative methods struggle to recover details in extremely bright regions, such as rooftops and distant mountains. However, DFE-Net reconstructs significantly more content and structural information in these regions with higher sharpness than other models. In regions with relatively normal exposure (e.g., trees), the color and texture sharpness restored by DFE-Net are closest to the ground truth among all methods.

On the low-light dataset LOLv1, visual evaluation is conducted on two underexposed images. All comparative models perform well in overall brightness improvement but differ in detail and color restoration. The output images of LLFormer suffer from insufficient overall color saturation. Retinexmamba fails to accurately restore the colors around the lights in the second image, where the ground truth shows distinct red tones while its restoration lacks adequate color expression. RetinexFormer presents insufficient detail sharpness in the bookshelf area of the first image, with dark book colors deviating from the real tones. Overall, the enhanced results of DFE-Net are closest to the ground truth in local detail sharpness, color fidelity, and global contrast, showing superior visual consistency.

4.3. Ablation Studies

To thoroughly verify the effectiveness of each core design module in DFE-Net, we conduct systematic ablation studies on the MSEC and SICE datasets. Several model variants are constructed by gradually removing or replacing key components in the network, aiming to quantitatively evaluate the contribution of each module to the final enhancement performance. The quantitative results of the ablation experiments are summarized in Table 4 and Table 5.

Parameter Selection Rationale: Beyond the core modules, the performance of DFE-Net depends on the K-value in CMT attention and the softmax temperature τ. These hyperparameters were determined through preliminary development. Specifically, K = 3 was chosen because it aligns with the typical number of dominant frequency components in natural images, providing sufficient contextual guidance without introducing noise. τ = 1.0 was adopted to maintain the standard softmax distribution, balancing focus and context. The current results demonstrate that DFE-Net is robust under these default settings.

The effectiveness of core components is clearly demonstrated through a series of comparative experiments in Table 4. First, removing either the LFEBlock or the HFEBlock leads to a drastic drop in model performance, confirming that explicit dual-frequency decoupling and parallel processing are fundamental to the effectiveness of DFE-Net. Specifically, on the SICE dataset, the model without the LFEBlock achieves an average PSNR of only 18.35 dB, and the model without the HFEBlock yields 21.40 dB, whereas the full model reaches 23.45 dB, showing a significant gap. Furthermore, removing the CMT attention from the HFEBlock, or replacing the SS2D module specially designed for long-range dependency modeling with ordinary operations in the LFEBlock, results in obvious performance degradation. For example, on the MSEC dataset, removing the SS2D module from the LFEBlock reduces the average PSNR by 0.41 dB, and discarding the CMT module from the HFEBlock decreases the average PSNR by 0.55 dB. This proves that the proposed collaboration mechanism that guides high-frequency detail restoration with optimized low-frequency information, as well as the design of efficiently modeling global illumination dependencies using SS2D, are both crucial. In addition, removing the multi-scale input mechanism also causes consistent performance loss, indicating that directly injecting multi-scale information of the original image into the encoding process helps preserve richer details and context. As evidenced by the “MS” column in Table 4, removing this mechanism leads to a consistent performance drop across all datasets (e.g., 0.68 dB PSNR on MSEC and 1.13 dB on SICE). This validates the complementarity between PixelUnshuffle and DWT. While DWT performs structural frequency decoupling, the multi-scale inputs supply spatially dense pixel contexts from the original image, preventing the loss of fine details during hierarchical abstraction.

Furthermore, in-depth observation of different exposure conditions is conducted to analyze the targeted contributions of each component. From the data of the “underexposure” and “overexposure” subsets in Table 4, it can be found that the CMT module brings particularly significant performance gains for underexposed images, suggesting that the restoration of high-frequency details under extremely dark conditions heavily relies on reliable global structure priors provided by the low-frequency path. In some cases, the SS2D module contributes more to the correction of overexposed images, which is consistent with the recognition that the overexposure correction task highly depends on precise global brightness and contrast adjustment. These subtle differences further validate the rationality and necessity of each component design.

The wavelet decomposition level is a critical hyperparameter. In Table 5, we analyze the changes in model performance and parameter count on multiple datasets when the decomposition level varies from 1 to 5. The experimental results show that there exists an optimal interval for the decomposition level. On the MSEC and LOLv1 datasets, 3-level decomposition achieves the best overall performance; on the SICE dataset, 2-level and 3-level decomposition yield close and optimal results. When the level is less than 3, the model may fail to perform effective frequency domain information decoupling on sufficient scales. When the level exceeds 3, the performance does not continue to improve and even declines, accompanied by a significant increase in model parameters. Therefore, choosing 3-level decomposition strikes the best balance between model performance and computational efficiency, which is adopted as the default configuration of DFE-Net.

The ablation studies verify from multiple dimensions that every core design module in DFE-Net is indispensable. The explicit dual-frequency processing framework serves as the foundation, the CMT attention and SS2D are the keys to the efficient operation of the high-frequency and low-frequency paths, respectively, the multi-scale input provides beneficial supplementation, and 3-level wavelet decomposition is a verified effective configuration. The collaborative operation of all these components jointly contributes to the superior restoration ability of DFE-Net under diverse exposure degradation conditions.

5. Conclusions

To address the performance trade-off dilemma in the unified restoration of low-light and overexposed images caused by the inherent conflict between “brightening” and “darkening” mappings, this paper innovatively proposes a dual-frequency enhancement network based on explicit frequency domain decoupling. The core of this method lies in deeply integrating DWT into the network architecture to construct an interpretable processing pipeline of “decomposition–targeted enhancement–collaborative reconstruction”. Specifically, the network explicitly separates low-frequency sub-bands carrying global illumination and structure, as well as high-frequency sub-bands containing detailed textures at multiple scales via DWT, and designs dedicated paths for respective processing. The low-frequency path introduces the SS2D module to efficiently model long-range dependencies and achieve precise correction of illumination and contrast; the high-frequency path utilizes the CMT attention mechanism, enabling the detail restoration process to be adaptively guided by the enhanced low-frequency semantic information so as to ensure the harmonious unity of global and local information. Extensive experiments conducted on multiple benchmark datasets, including MSEC, SICE, and LOLv1, show that DFE-Net significantly outperforms current state-of-the-art CNN-based, Transformer-based, and Mamba-based methods in both quantitative metrics and visual quality. It particularly exhibits outstanding advantages in recovering textures of overexposed images and details in extremely dark scenes while maintaining high computational efficiency. Ablation studies fully verify the necessity and effectiveness of the proposed dual-frequency framework and its core components. This paper provides a powerful, efficient, and mechanism-clear new solution for handling complex and variable exposure degradation problems. Future work can explore the generalization ability of the framework in other image restoration tasks (such as dehazing and deraining) as well as further optimization in directions such as adaptive wavelet basis selection.

Author Contributions

Conceptualization, investigation, methodology, and writing—original draft, S.Z.; formal analysis, validation, and writing—review and editing, H.C.; conceptualization and methodology, W.C.; visualization, writing—review and editing, and software, S.C.; validation and writing—review and editing, Z.W.; funding acquisition, resources, and supervision, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (Grant 12304518), the China Postdoctoral Science Foundation (Grant GZC20233617), the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant KJZD-M202412901), and the PLA Joint Logistics Support Force University of Engineering Youth Independent Innovation Fund Project (Grant QN26-71).

Data Availability Statement

Data will be made available upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Wu, P.; He, X.; Dai, W.; Zhou, J.; Shang, Y.; Fan, Y.; Hu, T. A Review on Research and Application of AI-Based Image Analysis in the Field of Computer Vision. IEEE Access 2025, 13, 76684–76702. [Google Scholar] [CrossRef]
Ashraf, M. Enhancing Perception and Decision-Making in Autonomous Systems through Vision-based Technologies: Focus on Robotics, Drones, and Self-Driving Cars. Int. J. Sci. Res. 2024, 13, 794–797. [Google Scholar] [CrossRef]
Zhou, W.; Yang, L.; Zhao, L.; Zhang, R.Y.; Cui, Y.F.; Huang, H.P.; Qie, K.; Wang, C. Vision Technologies with Applications in Traffic Surveillance Systems: A Holistic Survey. ACM Comput. Surv. 2026, 58, 58. [Google Scholar] [CrossRef]
Zhao, C.; Wang, H.; Yan, Q.; Zhang, J.; Zhu, Y.; Sun, J.; Zhang, Y. A Review of Optical Image Enhancement for Extreme Space Environments. Adv. Astronaut. 2025, 8, 171–199. [Google Scholar] [CrossRef]
Pei, Y.; Huang, Y.; Zou, Q.; Zhang, X.; Wang, S. Effects of Image Degradation and Degradation Removal to CNN-Based Image Classification. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1239–1253. [Google Scholar] [CrossRef] [PubMed]
Zhou, M.L.; Leng, H.Y.; Fang, B.; Xiang, T.; Wei, X.K.; Jia, W.J. Low-light Image Enhancement via a Frequency-based Model with Structure and Texture Decomposition. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 187. [Google Scholar] [CrossRef]
Qi, Y.; Yang, Z.; Sun, W.; Lou, M.; Lian, J.; Zhao, W.; Deng, X.; Ma, Y. A Comprehensive Overview of Image Enhancement Techniques. Arch. Comput. Methods Eng. 2022, 29, 583–607. [Google Scholar] [CrossRef]
Guo, J.; Ma, J.; García-Fernández, Á.F.; Zhang, Y.; Liang, H. A survey on image enhancement for Low-light images. Heliyon 2023, 9, e14558. [Google Scholar] [CrossRef]
Liu, X.; Wu, Z.; Li, A.; Vasluianu, F.A.; Zhang, Y.; Gu, S.; Zhang, L.; Zhu, C.; Timofte, R.; Jin, Z.; et al. NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2024; pp. 6571–6594. [Google Scholar]
Tan, J.; Pei, S.; Qin, W.; Fu, B.; Li, X.; Huang, L. Wavelet-Based Mamba with Fourier Adjustment for Low-Light Image Enhancement. In Proceedings of the Computer Vision—ACCV 2024; Springer: Berlin/Heidelberg, Germany, 2025; pp. 160–175. [Google Scholar]
Gu, A.; Dao, T.J.A. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
Vijayalakshmi, D.; Nath, M.K.; Acharya, O.P. A Comprehensive Survey on Image Contrast Enhancement Techniques in Spatial Domain. Sens. Imaging 2020, 21, 40. [Google Scholar] [CrossRef]
Hussein, R.R.; Hamodi, Y.I.; Sabri, R.A. Retinex theory for color image enhancement: A systematic review. Int. J. Electr. Comput. Eng. 2019, 9, 5560–5569. [Google Scholar] [CrossRef]
Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1777–1786. [Google Scholar]
Li, C.; Guo, C.; Loy, C.C. Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4225–4238. [Google Scholar] [CrossRef] [PubMed]
Cai, Y.; Bian, H.; Lin, J.; Wang, H.; Timofte, R.; Zhang, Y. Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 12470–12479. [Google Scholar]
Luo, Y.; Chen, X.; Ling, J.; Huang, C.; Zhou, W.; Yue, G. Unsupervised Low-Light Image Enhancement With Self-Paced Learning. IEEE Trans. Multimed. 2025, 27, 1808–1820. [Google Scholar] [CrossRef]
Zhou, H.; Dong, W.; Liu, X.; Zhang, Y.; Zhai, G.; Chen, J. Low-light image enhancement via generative perceptual priors. In Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence; AAAI Press: Washington, DC, USA, 2025; pp. 10752–10760. [Google Scholar]
Meng, C.; Huajun, F.; Zhihai, X.; Qi, L. Exposure Correction and Detail Enhancement for Single LDR Image. Acta Photon. Sin. 2018, 47, 0410003. [Google Scholar] [CrossRef]
Afifi, M.; Derpanis, K.G.; Ommer, B.; Brown, M.S. Learning Multi-Scale Photo Exposure Correction. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9153–9163. [Google Scholar]
Eyiokur, F.I.; Yaman, D.; Ekenel, H.K.; Waibel, A. Exposure Correction Model to Enhance Image Quality. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 675–685. [Google Scholar]
Nsampi, N.E.; Hu, Z.; Wang, Q. Learning Exposure Correction Via Consistency Modeling. In Proceedings of the 32nd British Machine Vision Conference 2021, Online, 22–25 November 2021. [Google Scholar]
Zhang, J.M.; Jiang, J.; Wu, M.S.; Feng, Z.J.; Shi, X.N. Illumination-guided dual-branch fusion network for partition-based image exposure correction. J. Vis. Commun. Image Represent. 2025, 106, 104342. [Google Scholar] [CrossRef]
Zou, W.; Gao, H.; Yang, W.; Liu, T. Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement. arXiv 2024, arXiv:2408.01276. [Google Scholar] [CrossRef]
Li, Z.N.; Chen, X.H.; Guo, S.N.; Wang, S.Q.; Pun, C.M. WavEnhancer: Unifying Wavelet and Transformer for Image Enhancement. J. Comput. Sci. Technol. 2024, 39, 336–345. [Google Scholar] [CrossRef]
Zhang, T.; Liu, P.; Zhang, Z.; Zhou, Q. SPJFNet: Self-Mining Prior-Guided Joint Frequency Enhancement for Ultra-Efficient Dark Image Restoration. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 12798–12806. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Edwards, T.S. Discrete Wavelet Transforms: Theory and Implementation. 1991. Available online: https://www.semanticscholar.org/paper/Discrete-Wavelet-Transforms%3A-Theory-and-Edwards/f7efbe4055f84612ec0851f6ccd11d2d4999141b (accessed on 1 January 2025).
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1874–1883. [Google Scholar]
Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; pp. 62429–62442. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H.; Shao, L. Learning Enriched Features for Real Image Restoration and Enhancement. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 492–511. [Google Scholar]
Guo, J.; Han, K.; Wu, H.; Tang, Y.; Chen, X.; Wang, Y.; Xu, C. CMT: Convolutional neural networks meet vision transformers. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12165–12175. [Google Scholar]
Po-Cheng, W.; Chao-Tsung, H.; Liang-Gee, C. An efficient architecture for two-dimensional inverse discrete wavelet transform. In Proceedings of the 2002 IEEE International Symposium on Circuits and Systems (ISCAS), Phoenix-Scottsdale, AZ, USA, 26–29 May 2002; pp. II-312–II-315. [Google Scholar]
Xu, M.; Ding, R.; Zhao, H.; Qi, X. PAConv: Position adaptive convolution with dynamic kernel assembling on point clouds. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 20–25 June 2021; pp. 3172–3181. [Google Scholar]
Cai, J.; Gu, S.; Zhang, L. Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images. IEEE Trans. Image Process. 2018, 27, 2049–2062. [Google Scholar] [CrossRef]
Wei, C.; Wang, W.; Yang, W.; Liu, J.J.A. Deep Retinex Decomposition for Low-Light Enhancement. arXiv 2018, arXiv:1808.04560. [Google Scholar] [CrossRef]
Fan, C.M.; Liu, T.J.; Liu, K.H. Half Wavelet Attention on M-Net+ for Low-Light Image Enhancement. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 3878–3882. [Google Scholar]
Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general U-shaped transformer for image restoration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Bordeaux, France, 18–24 June 2022; pp. 17662–17672. [Google Scholar]
Wang, T.; Zhang, K.H.; Shen, T.R.; Luo, W.H.; Stenger, B.; Lu, T. Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI), Washington, DC, USA, 7–14 February 2023; pp. 2654–2662. [Google Scholar]
Yan, Q.; Feng, Y.; Zhang, C.; Pang, G.; Shi, K.; Wu, P.; Dong, W.; Sun, J.; Zhang, Y. HVI: A New Color Space for Low-light Image Enhancement. In Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–17 June 2025; pp. 5678–5687. [Google Scholar]
Brateanu, A.; Balmez, R.; Avram, A.; Orhei, C.; Ancuti, C. LYT-NET: Lightweight YUV Transformer-Based Network for Low-Light Image Enhancement. IEEE Signal Process. Lett. 2025, 32, 2065–2069. [Google Scholar] [CrossRef]
Weng, J.; Yan, Z.; Tai, Y.; Qian, J.; Yang, J.; Li, J. MambaLLIE: Implicit retinex-aware low light enhancement with global-then-local state space. arXiv 2024, arXiv:2405.16105v1. [Google Scholar] [CrossRef]
Bai, J.; Yin, Y.; He, Q.; Li, Y.; Zhang, X. Retinexmamba: Retinex-Based Mamba for Low-Light Image Enhancement. In Proceedings of the Neural Information Processing, San Diego, CA, USA, 2–7 December 2025; pp. 427–442. [Google Scholar]

Figure 1. Underexposure, normal exposure, and overexposure histograms.

Figure 2. Network architecture.

Figure 3. Schematic diagram of LFEBlock.

Figure 4. Schematic diagram of HFEBlock.

Figure 5. Qualitative comparison of different methods on the MSEC dataset.

Figure 6. Qualitative comparison of different methods on the SICE dataset.

Figure 7. Qualitative comparison of different methods on the LOLv1 dataset.

Table 1. Quantitative comparison of different methods on the MSEC and SICE datasets.

Method	Source	MSEC						SICE
		Under		Over		Average		Under		Over		Average
		PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
RetinexNet	BMVC’18	15.94	0.721	17.64	0.8060	16.62	0.7552	18.04	0.613	7.72	0.429	12.88	0.521
MIRNet	ECCV’20	21.84	0.855	19.04	0.832	20.72	0.846	15.42	0.527	14.04	0.562	14.73	0.544
HWMNet	ICIP’22	22.99	0.897	22.76	0.889	22.90	0.894	18.67	0.639	16.74	0.648	17.71	0.644
UFormer	CVPR’21	19.18	0.736	15.17	0.694	17.57	0.719	19.42	0.679	15.33	0.609	17.37	0.644
LLFormer	AAAI’23	23.23	0.896	22.68	0.888	23.01	0.893	21.27	0.723	19.36	0.719	20.31	0.721
RetinexFormer	ICCV’23	22.35	0.896	22.73	0.893	22.52	0.896	23.32	0.706	20.54	0.699	21.93	0.703
HVI-CIDNet	CVPR’25	20.80	0.821	20.50	0.818	20.68	0.819	7.74	0.068	20.74	0.758	14.24	0.413
LYT-Net	SPL’25	21.37	0.857	20.40	0.841	20.98	0.851	20.69	0.734	18.17	0.710	19.43	0.722
Retinexmamba	NIP’25	22.40	0.890	19.89	0.848	21.40	0.873	23.93	0.805	20.26	0.750	22.09	0.778
Wave-Mamba	arXiv’24	22.73	0.898	22.10	0.886	22.48	0.893	23.35	0.799	20.62	0.777	21.98	0.788
MambaLLIE	arXiv’25	23.67	0.911	22.49	0.895	23.20	0.905	23.38	0.804	20.09	0.743	21.73	0.774
DFE-Net	Ours	23.63	0.897	23.10	0.890	23.41	0.894	25.47	0.816	21.44	0.782	23.45	0.799

Table 2. Quantitative comparison of different methods on the LOLv1 dataset.

Method	Source	PSNR	SSIM
RetinexNet	BMVC’18	17.55	0.573
MIRNet	ECCV’20	17.71	0.725
HWMNet	ICIP’22	21.53	0.804
UFormer	CVPR’21	18.72	0.699
LLFormer	AAAI’23	21.69	0.781
RetinexFormer	ICCV’23	23.95	0.828
HVI-CIDNet	CVPR’25	21.42	0.801
LYT-Net	SPL’25	20.05	0.775
Retinexmamba	NIP’25	23.31	0.798
Wave-Mamba	arXiv’24	21.64	0.822
MambaLLIE	arXiv’25	21.43	0.820
DFE-Net	Ours	24.80	0.883

Table 3. Computation amount and parameter amount of different methods. GFLOPs are calculated based on an input resolution of 256 × 256. Parameters are measured in millions (M).

Method	GFLOPs	Parameters (M)
RetinexNet	2.04	0.84
MIRNet	196.33	31.76
HWMNet	254.21	66.56
UFormer	41.09	5.29
LLFormer	22.03	24.51
RetinexFormer	17.01	0.61
HVI-CIDNet	2.03	7.90
LYT-Net	0.42	0.18
Retinexmamba	9.49	14.36
Wave-Mamba	1.79	8.88
MambaLLIE	3.99	6.08
DFE-Net	3.90	8.70

Table 4. Ablation experiment of key modules and sub-modules on the MSEC and SICE datasets. Note: LF indicates the low-frequency path is active; HF indicates the high-frequency path is active; SS2D denotes the Spatial State Space Module used in LFEBlock; CMT denotes the CMT attention used in HFEBlock; MS denotes the multi-scale input via PixelUnshuffle. The symbol “✓” indicates the component is present, while “×” indicates its removal.

Dataset	LF	HF	SS2D	CMT	MS	Under		Over		Average
Dataset	LF	HF	SS2D	CMT	MS	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
MSEC	×	✓	✓	✓	✓	19.11	0.837	18.78	0.826	18.98	0.833
	×	✓	✓	×	✓	18.59	0.817	18.17	0.802	18.43	0.811
	✓	×	✓	✓	✓	23.17	0.887	22.23	0.878	22.80	0.884
	✓	×	×	✓	✓	23.24	0.887	22.22	0.875	22.83	0.883
	✓	✓	×	✓	✓	23.32	0.893	22.54	0.885	23.00	0.890
	✓	✓	✓	×	✓	22.90	0.890	22.77	0.885	22.85	0.887
	✓	✓	✓	✓	×	23.01	0.891	22.32	0.883	22.73	0.888
	✓	✓	✓	✓	✓	23.63	0.897	23.10	0.890	23.41	0.894
SCIE	×	✓	✓	✓	✓	18.82	0.703	17.89	0.677	18.35	0.690
	×	✓	✓	×	✓	15.97	0.661	15.53	0.667	15.75	0.664
	✓	×	✓	✓	✓	23.03	0.792	19.77	0.770	21.40	0.781
	✓	×	×	✓	✓	21.96	0.766	19.19	0.739	20.57	0.753
	✓	✓	×	✓	✓	23.32	0.797	20.99	0.7719	22.16	0.784
	✓	✓	✓	×	✓	23.39	0.800	20.50	0.771	21.94	0.785
	✓	✓	✓	✓	×	23.95	0.805	20.68	0.789	22.32	0.797
	✓	✓	✓	✓	✓	25.47	0.8167	21.44	0.782	23.45	0.799

Table 5. Ablation experiment of wavelet decomposition levels on the MSEC, SICE, and LOLv1 datasets.

DWT Levels	MSEC						SICE						LOLv1		Parameter (M)
	Under		Over		Average		Under		Over		Average		PSNR	SSIM
	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
1	22.66	0.889	21.91	0.878	22.36	0.884	24.73	0.820	21.04	0.782	22.89	0.801	20.97	0.860	2.89
2	22.89	0.891	21.98	0.877	22.52	0.885	24.83	0.816	21.61	0.801	23.22	0.809	23.94	0.873	5.97
3	23.63	0.897	23.10	0.890	23.41	0.894	24.47	0.8167	21.44	0.782	23.45	0.799	24.80	0.883	8.70
4	22.98	0.887	21.93	0.874	22.56	0.882	24.56	0.815	21.24	0.802	22.90	0.808	23.96	0.869	11.73
5	22.13	0.880	21.75	0.875	21.98	0.878	24.78	0.812	21.42	0.796	23.10	0.804	23.34	0.862	15.20

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, S.; Chen, H.; Cui, W.; Chen, S.; Wu, Z.; Chen, Y. DFE-Net: A Dual-Frequency Enhancement Network for Low-Light and Overexposed Image Restoration. Electronics 2026, 15, 2398. https://doi.org/10.3390/electronics15112398

AMA Style

Zhou S, Chen H, Cui W, Chen S, Wu Z, Chen Y. DFE-Net: A Dual-Frequency Enhancement Network for Low-Light and Overexposed Image Restoration. Electronics. 2026; 15(11):2398. https://doi.org/10.3390/electronics15112398

Chicago/Turabian Style

Zhou, Shengyou, Han Chen, Wen Cui, Shiming Chen, Zhaojie Wu, and Yan Chen. 2026. "DFE-Net: A Dual-Frequency Enhancement Network for Low-Light and Overexposed Image Restoration" Electronics 15, no. 11: 2398. https://doi.org/10.3390/electronics15112398

APA Style

Zhou, S., Chen, H., Cui, W., Chen, S., Wu, Z., & Chen, Y. (2026). DFE-Net: A Dual-Frequency Enhancement Network for Low-Light and Overexposed Image Restoration. Electronics, 15(11), 2398. https://doi.org/10.3390/electronics15112398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DFE-Net: A Dual-Frequency Enhancement Network for Low-Light and Overexposed Image Restoration

Abstract

1. Introduction

2. Related Works

2.1. Low-Light Image Enhancement

2.2. Single-Image Exposure Correction

2.3. Image Enhancement Methods Based on Frequency Domain Analysis

3. Method

3.1. Network Architecture

3.2. Motivation: Exposure Correction Based on Frequency Domain Decoupling

3.3. Low-Frequency Enhancement Block

3.4. High-Frequency Enhancement Block

4. Experiments

4.1. Experimental Settings

4.2. Comparison with State-of-the-Art Methods

4.2.1. Quantitative Result Analysis

4.2.2. Model Efficiency Analysis

4.2.3. Qualitative Result Analysis

4.3. Ablation Studies

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI