NRGS-Net: A Lightweight Uformer with Gated Positional and Local Context Attention for Nighttime Road Glare Suppression

Yang, Ruoyu; Chen, Huaixin; Luo, Sijie; Wang, Zhixi

doi:10.3390/app15158686

Open AccessArticle

NRGS-Net: A Lightweight Uformer with Gated Positional and Local Context Attention for Nighttime Road Glare Suppression

¹

School of Resources and Environment, University of Electronic Science and Technology of China, Chengdu 611731, China

²

Novel Product R & D Department, Truly Opto-Electronics Co., Ltd., Shanwei 516600, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8686; https://doi.org/10.3390/app15158686

Submission received: 13 July 2025 / Revised: 2 August 2025 / Accepted: 3 August 2025 / Published: 6 August 2025

(This article belongs to the Special Issue Computational Imaging: Algorithms, Technologies, and Applications)

Download

Browse Figures

Versions Notes

Abstract

Existing nighttime visibility enhancement methods primarily focus on improving overall brightness under low-light conditions. However, nighttime road images are also affected by glare, glow, and flare from complex light sources such as streetlights and headlights, making it challenging to suppress locally overexposed regions and recover fine details. To address these challenges, we propose a Nighttime Road Glare Suppression Network (NRGS-Net) for glare removal and detail restoration. Specifically, to handle diverse glare disturbances caused by the uncertainty in light source positions and shapes, we designed a gated positional attention (GPA) module that integrates positional encoding with local contextual information to guide the network in accurately locating and suppressing glare regions, thereby enhancing the visibility of affected areas. Furthermore, we introduced an improved Uformer backbone named LCAtransformer, in which the downsampling layers adopt efficient depthwise separable convolutions to reduce computational cost while preserving critical spatial information. The upsampling layers incorporate a residual PixelShuffle module to achieve effective restoration in glare-affected regions. Additionally, channel attention is introduced within the Local Context-Aware Feed-Forward Network (LCA-FFN) to enable adaptive adjustment of feature weights, effectively suppressing irrelevant and interfering features. To advance the research in nighttime glare suppression, we constructed and publicly released the Night Road Glare Dataset (NRGD) captured in real nighttime road scenarios, enriching the evaluation system for this task. Experiments conducted on the Flare7K++ and NRGD, using five evaluation metrics and comparing six state-of-the-art methods, demonstrate that our method achieves superior performance in both subjective and objective metrics compared to existing advanced methods.

Keywords:

low-light image; Unet-transformer; glare suppression; gated positional attention

1. Introduction

With the rapid advancement of autonomous driving and advanced driver assistance systems (ADASs), there is an increasing demand for enhanced environmental perception capabilities of in-vehicle sensing systems under nighttime and low-light conditions. As a crucial component of intelligent driving perception, electronic rearview mirrors capture real-time external scenes through cameras and project them onto in-vehicle displays, effectively overcoming the limited field of view inherent in traditional mirrors. However, complex artificial light sources at night, such as streetlights, oncoming headlights, and illuminated billboards, often generate glare, glow, and flare effects. These bright areas overexpose parts of the image and hide important surrounding details. This loss of visual clarity makes it harder for drivers to see the road and potential hazards accurately, compromising safety. Consequently, effectively suppressing glare interference and recovering details in occluded regions have become a key challenge for ensuring nighttime driving safety and enhancing the reliability of intelligent driving systems.

Current solutions for overexposure correction and nighttime image enhancement primarily include traditional methods and deep learning-based approaches. Traditional methods, such as histogram equalization [1] and Retinex-based image enhancement [2], have long been discussed in the classical literature on digital signal processing [3,4] and digital image processing [5].

However, these methods often result in over-enhancement, loss of details, and unnatural visual appearances, while also struggling to distinguish and suppress localized strong glare regions. Additionally, gamma correction, although capable of improving overall brightness, is ineffective in addressing local overexposure caused by glare. With the development of computer vision [6] and deep learning, methods based on generative adversarial networks (GANs), convolutional neural networks (CNNs), and Transformer architectures have become research hotspots. While effective at enhancing visibility in low-light conditions, these methods lack precise mechanisms to detect and suppress localized glare. As a result, local overexposure remains a challenge, limiting their generalization and robustness in complex nighttime scenarios. Additionally, many of these models are computationally heavy, which further limits their practical deployment.

To address these challenges, we improved the network architecture and proposed the Nighttime Road Glare Suppression Network (NRGS-Net), which effectively suppresses glare and restores details while reducing the number of model parameters. Specifically, to handle the diverse glare interference caused by the uncertainty of light source positions and shapes, we designed a gated positional attention (GPA) module, which integrates positional encoding and local contextual information to guide the network in accurately locating and suppressing glare regions, thereby improving the visibility of affected areas. Additionally, we proposed an improved Uformer backbone architecture, wherein the downsampling stage employs an efficient structure combining depthwise separable convolutions with pointwise convolutions, reducing computational load while preserving critical spatial information. The upsampling stage incorporates a residual PixelShuffle module to enable efficient restoration of affected regions. In the Locally Context-Aware Feed-Forward Network (LCA-FFN), we utilize depthwise convolutions to capture local contextual features, combined with Squeeze-and-Excitation (SE) channel attention to adaptively adjust channel-wise feature weights. This design effectively suppresses irrelevant or interfering features, further enhancing the model’s capability to recover fine details in glare-affected regions. In summary, the main contributions of this work are as follows:

We propose a lightweight enhancement network (NRGS-Net) for nighttime road glare suppression, which effectively mitigates glare interference in nighttime road images.
We design a lightweight and efficient Uformer backbone by reengineering the upsampling and downsampling structures, reducing model parameters from 20.47 M to 17.88 M while preserving essential spatial information. In addition, we enhance the feed-forward network with adaptive channel weighting to improve detail restoration and structural fidelity under nighttime conditions.
We introduce a gated positional attention (GPA) module, which integrates positional encoding and local contextual information to effectively guide the network in accurately locating and suppressing glare regions of various shapes, thereby enhancing local contrast and visibility in affected areas.
We construct and release the Night Road Glare Dataset (NRGD), a real-world dataset featuring diverse nighttime glare scenes. This dataset supports future research and enables fair performance comparison.
We conduct extensive experiments on NRGD and Flare7k++ dataset using five commonly used evaluation metrics to compare our method with six state-of-the-art approaches. The results demonstrate the effectiveness and generalization ability of the proposed method in glare suppression, visibility enhancement, and detail restoration under nighttime road scenarios.

2. Related Work

2.1. Traditional Methods

In terms of traditional methods, Vitoria et al. [7] proposed an automatic detection and removal method for glare artifacts caused by bright light sources in images captured by cameras. This method utilizes geometric, morphological, luminance, and chromatic features for glare detection, combined with the optical characteristics of the lens system and filters to screen candidate glare regions. By defining confidence measures, the method adaptively selects the true glare regions and generates glare masks, which are then used with sample-based inpainting methods to reconstruct the affected areas, achieving effective and automatic glare removal in images. Subsequently, Zhang et al. [8] introduced a single-image glare removal method based on image decomposition, further enhancing glare suppression and image quality. Specifically, the method decomposes the glare image into scene and glare layers, corrects brightness attenuation and color deviation, and employs a local standard deviation-based contrast enhancement algorithm to restore details and contrast, addressing color distortion and contrast reduction problems typically induced under strong light sources. To address the quality degradation in coal mine surveillance videos caused by fogging, low illumination, and glare, Si et al. [9] proposed a hybrid enhancement algorithm (SSR-BF) that combines single-scale Retinex (SSR) with bilateral filtering (BF). By incorporating BF within the SSR framework, the method reduces noise and preserves edge details, thereby enhancing the quality of coal mine surveillance images. Additionally, Mandal et al. [10] proposed a region-based local image enhancement method, which divides each frame into three independent regions and applies a dynamic patch-based low-light vision enhancement strategy, adapting patch sizes according to pixel brightness. This approach effectively suppresses over-enhancement in the sky and dark regions while emphasizing the brightness of roads and surrounding areas. By introducing state judgment to avoid frequent atmospheric light calculations and using local Gaussian filtering along regional boundaries, the method reduces computational time and improves efficiency. In terms of adaptive enhancement, Rahman et al. [11] proposed an adaptive gamma correction method that dynamically adjusts gamma values to enhance image contrast, thereby improving adaptability to various lighting conditions. Chen et al. [12] proposed a minimum mean brightness error bi-histogram equalization method, aiming to enhance contrast while better preserving image details. Tang et al. [13] built upon the dark channel prior algorithm by integrating denoising and dehazing techniques to effectively mitigate halo artifacts in images. In summary, traditional methods for glare suppression in low-light images have achieved certain levels of success in improving image quality. However, these methods still face challenges such as reliance on physical or rule-based modeling, limited adaptability, overall brightness reduction, halo issues, and high computational complexity, which constrain their further application in complex and dynamic scenes.

2.2. Deep Learning Methods

In recent years, deep learning methods have achieved remarkable progress in low-light image enhancement and glare removal [14,15,16]. To address the limitations of traditional methods in adaptability, detail preservation, and handling of complex scenes, researchers have proposed various efficient and generalizable deep network architectures.

In Retinex theory-driven image enhancement, Wu et al. [17] proposed URetinex, which leverages deep learning to replace handcrafted priors used in traditional methods, thereby improving decomposition efficiency. Liu et al. [18] designed a lightweight network, RUAS, which adopts a collaborative no-reference learning strategy to enable fast low-light image enhancement. However, merely increasing brightness often amplifies hidden artifacts, degrading image quality. To address this issue, Zhang et al. [19] proposed a decomposition network that separates the input image into illumination and reflectance components, effectively suppressing artifacts while further improving enhancement quality.

In the domain of unsupervised image enhancement, Jiang et al. [20] introduced EnlightenGAN, which enables image enhancement without requiring paired training data, while Cui et al. [21] proposed a lightweight Transformer-based enhancement model that achieves efficient enhancement with low computational costs.

For multimodal fusion and exposure correction, Guo and Zhou et al. [22,23] utilized the fusion of infrared and visible images to improve nighttime visibility, while Huang et al. [24] combined image inversion and exposure fusion to effectively address underexposure and overexposure issues. However, these methods often require input images from the same scene, making data acquisition costly, and are generally not specifically designed for glare suppression.

In glare removal and adaptation to specific scenarios, researchers have integrated physical modeling with deep networks for improvement. Liu et al. [25] proposed a unified variational retinal model for haze and glow suppression, while Sharma et al. [26] utilized HDR to expand the dynamic range for glare reduction. Jin et al. [27] developed a layered decomposition and light effect suppression network for unsupervised light effect suppression and dark region enhancement, albeit with high computational complexity. Dong et al. [28] designed a lightweight network for low-light glare removal and detail restoration, incorporating dynamic detection and modular processing for efficient resource allocation. Jin et al. [29] further combined guided APSF simulation with gradient-adaptive convolution to achieve nighttime glare suppression, although domain discrepancies remain an issue. He et al. [30] proposed OENet, an overexposure correction network that integrates Transformer structures to effectively fuse local and global information for detail enhancement; but, its large model size limits deployment on embedded devices. Niu et al. [31] introduced GR-GAN, which utilizes a glare attention detector to adapt to irregular glare patterns for single-image glare removal, although it relies on paired supervised data for training.

For glare removal in specific scenarios, Dai et al. [32] proposed a nighttime reflection glare removal method based on optical central symmetry priors, leveraging the real-world glare dataset BracketFlare combined with a network-guided approach to enhance glare removal. Chen et al. [33] proposed DTDN, which decomposes license plate images into occlusion, foreground, and roughly de-glared components to improve license plate recognition accuracy. Wu et al. [34] combined physical modeling with semi-synthetic data, utilizing scattering and reflective glare models to generate realistic training pairs, and proposed a deep learning-based glare removal method that does not require real paired data, achieving effective glare removal on single RGB images. However, this method still exhibits localized distortion and limited generalizability under extreme conditions. Additionally, Yan et al. [35] introduced a nighttime dehazing method that integrates grayscale and color modules, achieving dehazing under complex illumination by frequency domain decomposition and module collaboration, although its performance is limited under extremely low illumination and strong interference lighting. Zhang et al. [36] proposed a user-guided edge-aware cascaded network for single-image reflection removal, which uses interactive user hints to guide the decomposition of reflections and background, enabling fast and effective reflection removal. However, the method’s reliance on user interaction limits its applicability in fully automated scenarios.

In summary, deep learning methods have significantly advanced the development of low-light image enhancement and glare removal, demonstrating strong capabilities in detail restoration and adaptability. Nevertheless, these methods still face challenges, including reliance on paired data, limited generalization under extreme conditions, high computational complexity, and constrained adaptability in specialized scenarios, necessitating further research to enhance their practicality and robustness.

3. The Methods

3.1. Overview

We propose a Nighttime Road Glare Suppression Network (NRGS-Net) to enhance the visibility and clarity of glare-affected regions in nighttime road scenarios for autonomous driving. As illustrated in Figure 1, NRGS-Net adopts an encoder–decoder structure with an improved Uformer as its backbone. Specifically, we design lightweight downsampling modules (L-Downsampling) and lightweight upsampling modules (L-Upsampling) to preserve spatial details and enable structural restoration with low computational cost. Within the vanilla Transformer blocks, we incorporate a Local Context-Aware Feed-Forward Network (LCA-FFN) to adaptively adjust channel-wise feature weights, effectively suppressing irrelevant feature interference and further recovering structural details in glare-affected regions, with the new block incorporating LCA-FFN referred to as LCAWin-Transformer. Additionally, we also introduce a gated positional attention (GPA) module in the skip connections. It combines positional encoding and local context to guide the network in locating and suppressing glare, improving visibility in affected areas. To facilitate effective learning, paired training data consisting of flare-corrupted images, and their corresponding clean background images and estimated flare images are utilized. This allows the network to learn to separate glare components from the background explicitly during training. The network jointly predicts the restored background and the estimated flare component, with background loss and flare loss used as supervision signals to ensure the effectiveness of glare suppression while preserving scene details. The following sections provide a detailed description of the NRGS-Net architecture and the working principles of each module.

3.2. Paired Dataset Synthesis

Although glare removal has been explored in general image enhancement, it remains under-investigated in nighttime road scenarios for autonomous driving. This is largely due to the lack of annotated datasets and the complexity of nighttime lighting—such as multiple light sources, reflections, and severe localized overexposure. These factors make it impractical to manually annotate large-scale glare removal datasets, as pixel-level annotation using software tools for glare artifacts is extremely labor-intensive and costly.

To address these challenges, synthetic paired datasets have become a practical and effective alternative for training and evaluation in glare removal tasks. Inspired by previous work [34], we employ the same algorithms to synthesize datasets. Specifically, we use a large collection of natural-scene background images as the base and composite them with glare-only images. A previously released dataset [37] provides the glare images used in our synthesis. The details of the procedure are described below.

First, the background image

I_{b}

and the glare image

I_{f}

are gamma-corrected, where the gamma value γ is randomly sampled from a Gaussian distribution

γ ~ μ (1.8,2.2)

. The gamma correction is defined as follows:

I_{b}^{γ} = {(I_{b})}^{γ}

(1)

I_{f}^{γ} = {(I_{f})}^{γ}

(2)

Next, a mixing coefficient α is randomly sampled from a Gaussian distribution

α ~ μ (0.5,1.0)

and used as the blending weight for

I_{b}^{γ}

and

I_{f}^{γ}

. The synthetic image is generated as follows:

I_{b - f} = I_{b}^{γ} \times (1 - α \cdot M_{f}) + I_{f}^{γ} \times (α \cdot M_{f})

(3)

where

M_{f}

denotes the glare mask.

Afterward, the synthesized image undergoes inverse gamma correction, expressed as follows:

I_{b}^{f} = {(I_{b - f})}^{\frac{1}{γ}}

(4)

During the synthesis process, the weighted blending before and after gamma correction may result in pixel values exceeding the range of [0, 1]. Finally, we clip

I_{b}^{f}

to [0, 1] to ensure valid pixel values.

3.3. Lightweight and Effective Uformer

In recent years, Transformer models have achieved remarkable progress in computer vision. Vision Transformer (ViT) [38] has demonstrated superior image classification performance over CNNs by leveraging pre-training on large-scale datasets. Recently, the integration of Transformers with U-Net architecture has further expanded their application to image reconstruction and enhancement tasks. By employing a U-shaped structure combined with local window-based attention mechanisms, these methods have achieved effective low-level vision restoration, verifying the feasibility of Transformers for image enhancement tasks [39]. However, these methods still suffer from high computational complexity and limited inference efficiency [40]. To address these issues, we propose a lightweight and effective Uformer (LEUformer), as illustrated in Figure 2. Specifically, only a single Transformer block is used in each encoder and decoder layer to effectively reduce computational overhead. Additionally, to further reduce the model’s parameter count and computational burden, we introduce L-Downsampling and L-Upsampling modules to replace the conventional downsampling and upsampling operations.

In the L-Downsampling module, a DWConv-PWConv structure (depthwise separable convolution followed by pointwise convolution) [41] is used instead of the conventional 4 × 4 convolution, enabling efficient feature downsampling while preserving spatial information. For upsampling, we adopt a residual PixelShuffle-based L-Upsampling module instead of deconvolution. This approach suppresses artifacts commonly introduced during upsampling and improves both inference efficiency and restoration quality in nighttime glare suppression tasks. Furthermore, we improve the Feed-Forward Network (FFN) of the original Win-Transformer and propose an LCAWin-Transformer, which incorporates channel attention mechanisms on top of the original structure to further enhance feature representation capabilities. The overall network architecture is depicted in Figure 2.

In the main branch of L-Upsampling, we first apply a 3 × 3 convolution before upsampling to further extract local spatial features (such as textures and edges), enhancing the effective feature representation after upsampling. We then utilize PixelShuffle to rearrange the channel dimensions into spatial dimensions, achieving spatial upsampling while preserving the contextual information extracted by the convolution without introducing additional parameters. Specifically, given an input tensor of shape

(C, r^{2}, H, W)

, where r is the upsampling factor, PixelShuffle rearranges the data from the channel dimension into the spatial dimensions, producing an output tensor of shape

(C, H \cdot r, W \cdot r)

. The upsampling operation can be expressed as follows:

U_{m} = PixelShuffle ({C o n v}_{3 \times 3} (X))

(5)

In the residual path, we apply a 1 × 1 convolution for channel adjustment and linear projection to improve channel expressiveness and flexibility before upsampling. We then perform bilinear interpolation for spatial upsampling, which efficiently and smoothly enlarges the image size while avoiding feature damage and artifacts. This can be expressed as follows:

U_{S} = Bilinear ({C o n v}_{1 \times 1} (X))

(6)

Finally, ReLU activation is applied to enhance nonlinearity and obtain the upsampled output feature map. This can be expressed as follows:

U = ReLU (U_{m} + U_{S})

(7)

In the LCAWin-Transformer, we introduce an attention mechanism within the LCA-FFN to further enhance the effectiveness of feature representation. As illustrated in Figure 3, we incorporate a simple yet efficient SE attention module within the feed-forward network (FFN), which effectively improves channel feature extraction, enabling the network to better focus on glare-related features. Specifically, the LCAWin-Transformer can be formulated as follows:

X_{l}' = W ⁃ M S A (L N (X_{l - 1})) + X_{l - 1}

(8)

X_{l} = L C A ⁃ F F N (L N (X_{l}')) + X_{l}'

(9)

where W-MSA denotes the window-based multi-head self-attention layer used in Uformer, and LCA-FFN represents the locally channel-enhanced feed-forward network,

X_{l}'

and

X_{l}

denote the outputs after the W-MSA and LCA-FFN layers, respectively, LN indicates layer normalization, and

X_{l - 1}

is the input.

In the LCA-FFN, we replace the fully connected layers in the standard Transformer feed-forward network with convolution operations to better capture local spatial dependencies. To further enhance channel-wise feature discrimination, we incorporate a Squeeze-and-Excitation (SE) [42] attention mechanism. Unlike previous works that apply SE modules in global feature refinement or image classification, we integrate SE attention specifically within a localized Transformer block to support fine-grained feature selection under challenging glare conditions. This helps the model prioritize structure-relevant features and suppress those dominated by glare or noise. As shown in Figure 3, the computation within the LCA-FFN is defined as follows:

F_{1} = G E L U ({C o n v}_{1 \times 1} (X))

(10)

where

{C o n v}_{1 \times 1}

denotes the convolution operation with a kernel size of 1 × 1,

G E L U ()

represents the GELU activation function, and

F_{1}

is the resulting feature map after the 1 × 1 convolution and GELU activation.

Subsequently,

F_{1}

is reshaped to

R^{B \times C \times H \times W}

, followed by a depthwise separable convolution and GELU activation to capture local contextual information. After applying a flatten operation, a simple yet effective SE attention module is used to obtain channel-weighted features. Finally, a pointwise convolution is applied to restore the dimensionality, and the result is added to the residual input to obtain the output Y of the LCA-FFN:

Y = {C o n v}_{1 \times 1} (S E (G E L U ({D W C o n v}_{3 \times 3} (F_{1})))) ⨁ X

(11)

where

{C o n v}_{1 \times 1}

denotes a 1 × 1 convolution operation,

G E L U

(⋅) represents the GELU activation function, X is the input feature map,

S E (\cdot)

denotes the SE channel attention mechanism,

{D W C o n v}_{3 \times 3} (\cdot)

represents a 3 × 3 depthwise separable convolution, and

⨁

indicates element-wise addition.

3.4. Gated Positional Attention

Attention mechanisms commonly used in visual tasks—such as SE-Net [42], CBAM [43], PSANet [44], and Gated Convolutional Networks [45]—have demonstrated the effectiveness of gating and spatial/channel attention for feature enhancement. However, these methods are not tailored to handle the instability of glare positions and the intense regional interference commonly found in nighttime road scenes.

To address these limitations, we propose a gated positional attention (GPA) module specifically designed for nighttime glare suppression. Unlike conventional attention modules that focus on channel reweighting or global spatial enhancement, GPA explicitly integrates positional encoding into the spatial attention process to improve the model’s sensitivity to glare locations. It combines parallel depthwise separable and dilated convolutions to capture multi-scale local context, and employs a gating mechanism to filter out glare-dominated regions. This design improves visibility and detail restoration under complex nighttime lighting. An overview of the GPA module is shown in Figure 4.

Firstly, based on the spatial dimensions H and W of the input feature map, a fixed normalized positional encoding is constructed, which can be expressed as follows:

P = Concat (linspace (- 1,1, H), linspace (- 1,1, W))

(12)

where

linspace (- 1,1, H)

denotes a linear sequence of length H along the vertical direction to generate normalized row position coordinates, while linspace(−1,1,W) denotes a linear sequence of length W along the horizontal direction to generate normalized column position coordinates.

P \in R^{2 \times H \times W}

represents the positional embedding matrix, providing normalized spatial positional information for the network, where H and W represent the height and width of the input feature map, respectively.

The positional encoding P is concatenated with the input features X to obtain

X_{p}

which can be formulated as follows:

X_{p} = Concat (X, P)

(13)

where

Concat (\cdot)

denotes concatenation along the channel dimension.

Subsequently, a 1 × 1 convolution is employed for dimensionality reduction:

X_{r} = {C o n v}_{1 \times 1} (X_{p}), X_{r} \in R^{B \times \frac{c}{r} \times H \times W}

(14)

where B denotes batch size, C is the number of channels, and r is the reduction ratio.

Next, standard depthwise separable convolution is used to capture local contextual information. In nighttime glare removal scenarios, there often exist large regions of blur and overexposure; thus, dilated convolution is employed to capture local contextual features, background, and structural information over a wider receptive field. To this end, depthwise separable convolution

D W C o n v

and dilated depthwise separable convolution

D D W C o n v

are applied in parallel to extract local contextual features, and the outputs are concatenated and fused using an additional 1 × 1 convolution; these can be denoted as follows:

X_{d} = {D W C o n v}_{3 \times 3} (X_{r})

(15)

X_{k} = {D D W C o n v}_{3 \times 3} (X_{r})

(16)

X_{f} = {C o n v}_{1 \times 1} (Concat (X_{d}, X_{k})), X_{f} \in R^{B \times C \times H \times W}

(17)

Finally, a Sigmoid gating branch is utilized to generate the gating weight G, which adaptively predicts the importance weights from the input features for dynamic control of the fusion ratio between the input features and the context-enhanced features. Using this gating fusion mechanism, the output can be formulated as follows:

G = {σ (C o n v}_{1 \times 1} (X))

(18)

Y = X ⊙ G + X_{f} ⊙ (1 - G)

(19)

where

G

denotes the gating weight,

σ

represents the Sigmoid activation function, and

⊙

denotes element-wise multiplication.

3.5. Loss Function

The total loss includes background loss

L_{B}

, glare loss

L_{F}

, and reconstruction loss

L_{r e c}

, which can be expressed as follows:

L = w_{1} L_{B} + w_{2} L_{F} + w_{3} L_{r e c}

(20)

where

w_{1}

,

w_{2}

,

w_{3}

are, respectively, set to 0.5, 0.5, and 1.0 in our experiments.

In the computation of the background loss, the glare removal network can be defined as

Φ

, which takes the glare-corrupted image

I

as input. The estimated output image

{\hat{I}}_{0}

and the estimated flare image

\hat{F}

can thus be expressed as follows:

({\hat{I}}_{0}, \hat{F}) = Φ (I)

(21)

Furthermore, the background loss

L_{B}

can be expressed as follows:

L_{B} = L_{1} ({\hat{I}}_{0}, I_{0}) + L_{v g g} ({\hat{I}}_{0}, I_{0})

(22)

where

L_{1}

denotes the L1 loss,

L_{v g g}

denotes the VGG perceptual loss, and

I_{0}

is marked background image.

The flare loss

L_{F}

shares the same formulation as the background loss, which can be denoted as follows:

L_{F} = L_{1} (\hat{F}, F) + L_{v g g} (\hat{F}, F)

(23)

where

F

denotes the marked glare image.

Finally, the reconstruction loss

L_{r e c}

is defined as follows:

L_{r e c} = |I - C l i p ({\hat{I}}_{0} \oplus \hat{F})|

(24)

where

\oplus

denotes the element-wise addition performed in the linearized, gamma-decoded domain using the previously sampled

γ

. The resulting values are subsequently clipped to the range [0, 1] to ensure valid intensity levels.

4. Experiments

4.1. Experimental Setup Details

4.1.1. Model Training Details

We train our model on two NVIDIA GeForce RTX 3090 GPUs with CUDA 11.2 and NVIDIA driver version 460.32.03. During training, the Adam optimizer is utilized with hyperparameters β1 = 0.9 and β2 = 0.99. The batch size is set to two, and the learning rate is initialized at 0.0001, which is decayed by a factor of 0.5 after 200,000 iterations. The total number of training iterations is set to 300,000. To further stabilize and smooth the training process, we employ the Exponential Moving Average (EMA) mechanism with a decay rate of 0.9.

4.1.2. Datasets

We utilize the widely recognized public dataset Flare7k++ [39] for training and testing. Additionally, to evaluate the generalization capability of our glare removal model, we construct a nighttime glare image dataset named NRGD, with example images illustrated in Figure 5 and Figure 6. NRGD is collected by capturing videos in real nighttime road and street scenarios using a Vivo S19 smartphone with a 0.6× ultra-wide-angle setting to capture glare regions caused by strong light sources such as headlights and streetlights, ensuring diversity and realism in the captured scenes. The videos are recorded at a resolution of 1920 × 1080 to maintain clarity while facilitating subsequent processing and frame extraction. To obtain a uniform and diverse frame distribution, frames are extracted at 3-s intervals, resulting in approximately 500 images for model training and evaluation. Among them, images without obvious glare are paired with synthetically generated glare counterparts using the method described in Section 3.2, forming paired glare/non-glare data, as shown in Figure 5, where (a), (b), (c), and (d) are four pairs of clean images and their corresponding synthetic glare images. Meanwhile, images with noticeable glare are used as unpaired data, as shown in Figure 6, where (a), (b), (c), and (d) are four examples of unpaired images from the NRGD.

4.1.3. Evaluation Metrics

For quantitative evaluation, we adopt four full-reference metrics—PSNR [46], SSIM [46], LPIPS [47], and MSE [46]—and three no-reference metrics—NIQE [48], BRISQUE [48], and PIQE [49].

PSNR and SSIM measure the fidelity of the restored/generated images against reference images, with higher values indicating better quality (PSNR is measured in dB, typically ranging from 10 to 50 in low-light enhancement tasks; SSIM ranges from 0 to 1). LPIPS evaluates perceptual similarity using deep features, with lower values indicating better perceptual quality (LPIPS ranges from 0 to 1); meanwhile, MSE measures pixel-level differences, where lower values are preferable (MSE ≥ 0, and with pixel intensities in the range [0, 255], its maximum possible value is 65,025). The MSE formula is as follows:

M S E = \frac{1}{H \times W \times C} \sum_{i = 1}^{H} \sum_{j = 1}^{W} \sum_{k = 1}^{C} {(I_{g t} (i, j, k) - I_{p r e d} (i, j, k))}^{2}

(25)

where

H

,

W

, and

C

denote the height, width, and number of channels of the image, respectively, and

I_{g t}

and

I_{p r e d}

represent the pixel values of the ground truth and predicted images, respectively.

For no-reference assessment, NIQE evaluates naturalness and is sensitive to noise and color distortion, BRISQUE detects blur and noise based on local statistics, and PIQE assesses perceptual consistency in detecting artifacts such as halos. These metrics use a 0–100 scale, with lower scores indicating better quality, and are widely used for evaluating low-light image enhancement methods, with PIQE providing stable assessments in extremely low-light scenarios.

4.2. Quantitative Comparison with State-of-the-Art Methods

4.2.1. Result on NRGD

As shown in Table 1, On the NRGD, our proposed method demonstrates significant improvements across all quantitative metrics compared to existing state-of-the-art methods. Specifically, our method achieves a PSNR of 26.4192 dB, which is substantially higher than the best competing method IAT (16.4169 dB) and significantly outperforms all other methods, indicating superior fidelity in restored images. In terms of structural similarity, our method attains an SSIM of 0.9602, which exceeds the next best method IAT (0.6588) by a considerable margin, highlighting the effectiveness of our approach in preserving structural details and consistency under severe nighttime glare conditions. Regarding perceptual quality, our method achieves a LPIPS of 0.0792, substantially lower than IAT (0.3671) and other methods, demonstrating the enhanced perceptual quality and naturalness of the restored images in alignment with human visual perception. Furthermore, our method obtains an MSE of 400.5131, which is notably lower than all baseline methods, indicating reduced pixel-wise reconstruction errors. These results collectively demonstrate that our proposed method not only enhances the visual quality of glare-affected nighttime images but also effectively preserves structural and perceptual consistency while minimizing pixel-level errors, thus validating the effectiveness and superiority of our approach under challenging real-world nighttime glare scenarios.

4.2.2. Result on Flare7K++ Dataset

As shown in Table 2, On the Flare7K++ dataset, our proposed method also demonstrates outstanding performance, achieving a PSNR of 31.4508 dB, a SSIM of 0.9822, and reducing MSE to 87.9677, significantly outperforming all competing methods. These results indicate that our approach ensures high-fidelity image reconstruction, excellent structural consistency, and reduced pixel-level errors even under challenging nighttime glare scenarios.

4.3. Qualitative Comparison with State-of-the-Art Methods

4.3.1. Visual Analysis on NRGD

Figure 7 presents the qualitative comparison results on the NRGD, where GT denotes the ground truth images. As shown, our method effectively suppresses glare of various colors and shapes, significantly removing scattered glare around light sources while preserving fine details that were previously occluded by glare. In contrast, although EnlightenGAN and URetinex-Net improve overall image brightness, they fail to suppress glare and, in some cases, even amplify it. For instance, EnlightenGAN enlarges the green glare region in the third example and the white glare in the fourth example. Zero-DCE leads to blurred image details, while RUAS exhibits severe overexposure, degrading overall image quality. PSENet shows limited glare suppression capabilities but struggles with linear glare patterns, and the IAT method produces results that are nearly identical to the original images. By comparison, our method successfully suppresses the dispersed glare in the first image and the linear glare in the second image while accurately restoring the underlying details, demonstrating superior glare removal and detail recovery capabilities. Overall, these qualitative results are consistent with our quantitative evaluations, further confirming the effectiveness and superiority of our proposed method in nighttime glare suppression and visibility enhancement tasks.

4.3.2. Visual Analysis on Flare7K++ Dataset

Figure 8 presents the qualitative comparison results on the Flare7k++ dataset. As shown, EnlightenGAN, Zero-DCE, and URetinex-Net all amplify glare regions, while RUAS exhibits severe overexposure, resulting in significant detail loss in road scenes. PSENet shows a limited ability to suppress glare, making the light source in the third image appear more focused, but it fails to address linear glare artifacts. The IAT method only slightly reduces the extent of glare in the second and third images. Benefiting from the proposed GPA module, which accurately identifies glare regions, our method effectively suppresses glare around light sources while preserving the integrity of the light source itself. As seen in the third image, large halo areas surrounding the light source are significantly reduced. Additionally, the efficient and lightweight sampling modules facilitate the recovery of background details previously obscured by glare. For example, in the fourth image, the building structures covered by glare are fully restored. Overall, these qualitative results further demonstrate the unique advantages of our method in glare suppression.

4.4. Ablation Study

4.4.1. Quantitative Ablation of Modules

To systematically validate the effectiveness of each module proposed in this paper, we conducted comprehensive ablation studies on the NRGD and Flare7k++ dataset. The L-Downsampling and L-Upsampling strategies used in our method are collectively referred to as L-Sampling. We evaluate the contribution of each component to the overall performance by individually removing the GPA and L-Sampling modules. Here, removing L-Sampling indicates replacing it with the original convolution-based downsampling and transposed convolution-based upsampling modules.

Table 3 presents the results on the NRGD. It can be observed that when the GPA module is removed (w/o GPA), the PSNR decreases to 24.2491, SSIM drops to 0.9114, while LPIPS increases to 0.1361 and MSE rises to 508.977. This indicates that removing GPA degrades the model’s ability to capture global structures and brightness distributions, leading to noticeably lower de-glared image quality. When the L-Sampling strategy is removed (w/o L-Sampling), the PSNR decreases to 25.1757, SSIM drops to 0.9317, LPIPS increases to 0.1015, and MSE increases to 454.0037. Although this shows clear improvements compared to the removal of GPA, there is still a noticeable performance gap compared to our complete method. This demonstrates that L-Sampling plays a positive role in helping the model remove localized strong glare while maintaining global structural consistency. It is worth noting that the lightweight design of L-Sampling effectively reduces the model parameters from 20.47 M to 17.88 M, indicating that L-Sampling is not only effective but also highly efficient. Overall, our complete method achieves the best results across all metrics, demonstrating that the synergistic integration of the GPA and L-Sampling modules effectively enhances the model’s de-glaring capability under complex lighting and strong glare conditions, improving both the structural integrity and perceptual quality of the restored images.

Table 4 presents the ablation results on the Flare7k++ dataset. The results show that our method outperforms both the w/o GPA and w/o L-Sampling configurations across all four metrics, including PSNR, SSIM, LPIPS, and MSE.

Among these, the removal of the GPA module leads to the most significant performance degradation across all evaluation metrics. These results validate the critical role of the GPA module and the L-Sampling strategy in effectively enhancing image quality and perceptual quality in nighttime de-glaring tasks.

Table 5 presents the experimental results on the real-world, unpaired subset of the NRGD, which contains glare captured in natural scenes rather than synthetically generated glare. Since there are no corresponding glare-free ground truth images available for this subset, we employ three no-reference evaluation metrics suitable for unpaired data: NIQE, BRISQUE, and PIQE. As shown in the table, our method achieves the best performance in terms of BRISQUE and PIQE, while the NIQE score is slightly higher than that obtained after removing the L-Sampling module. This is because NIQE is particularly sensitive to the overall brightness and contrast distribution in images. The L-Sampling module tends to enhance brightness and detail in darker regions during training, resulting in a more balanced and visually appealing image; but, this may slightly reduce global contrast, leading to a marginally higher NIQE score. Overall, our complete method still demonstrates superior perceptual quality, further illustrating the effectiveness of both the GPA and L-Sampling modules.

4.4.2. Qualitative Ablation of Modules

Figure 9 presents the glare suppression and glare prediction results on the paired NRGD. Here, w/o GPA and w/o GPA-flare represent the de-glared image and the predicted glare map obtained after removing the GPA module, respectively. Similarly, w/o L-Sampling and w/o L-Sampling-flare denote the de-glared image and the predicted glare map after removing the L-Sampling strategy. Ours and Ours-flare correspond to the de-glared image and the predicted glare map produced by our complete model. As shown in the figure, removing either the GPA or L-Sampling module weakens the glare suppression capability, and the predicted glare regions also become smaller and less accurate. In contrast, our complete model demonstrates superior glare suppression performance and more accurate glare region prediction.

Figure 10 visualizes the ablation results on the Flare7k++ dataset. It can be observed that when the GPA and L-Sampling modules are removed, the streak-like glare around the light sources is not effectively suppressed. Notably, as shown in the first image, the linear glare from the two illuminated lamps remains partially visible. The removal of the GPA module significantly weakens the glare suppression capability, while the removal of the L-Sampling strategy shows slightly better suppression compared to the GPA removal; but, it is still inferior to our complete method. In contrast, our full model almost entirely removes the glare around the light sources while preserving the light sources themselves and restoring the details occluded by glare.

Figure 11 shows the visualization results on the unpaired NRGD. It can be observed that there is still a streak-like glare near the first light source in the second image, which is detected only in Ours-flare but not in w/o GPA-flare or w/o L-Sampling-flare. Overall, the above analysis demonstrates the critical role of the proposed GPA and L-Sampling modules in nighttime glare suppression tasks.

5. Conclusions

In this paper, we proposed a Nighttime Road Glare Suppression Network (NRGS-Net) to address the challenges of glare suppression and detail restoration in nighttime road scenarios, which are critical for enhancing the reliability of electronic rearview mirrors in autonomous driving and ADASs. By introducing the gated positional attention (GPA) module, the network effectively identifies and suppresses diverse glare patterns caused by complex nighttime lighting conditions while preserving the visibility and integrity of the underlying scene content. The redesigned lightweight Uformer backbone (LCAtransformer), incorporating efficient downsampling and upsampling modules and an improved LCA-FFN with channel attention, significantly reduces computational complexity while maintaining the capability to recover fine details in glare-affected regions.

To facilitate further research in this domain, we constructed and released the Night Road Glare Dataset (NRGD), which contains real-world nighttime road scenes with various glare conditions, enriching the evaluation resources available for glare suppression research. Extensive experiments conducted on the Flare7k++ and NRGD using multiple evaluation metrics demonstrate that NRGS-Net consistently outperforms six state-of-the-art methods in both subjective and objective assessments, achieving superior glare suppression, detail preservation, and visibility enhancement under challenging nighttime conditions.

Overall, the proposed NRGS-Net offers a practical and efficient solution for nighttime glare removal and visibility enhancement, contributing to safer and more reliable environmental perception in autonomous driving and advanced driver assistance systems under nighttime scenarios. However, solely suppressing glare is insufficient to fully address the issue of overall low brightness in nighttime images. In future work, we will further explore a joint optimization strategy that integrates glare suppression with global brightness enhancement, aiming to improve image visibility while preserving structural details and visual naturalness, thereby enhancing the perceptual robustness of autonomous driving systems under complex nighttime conditions.

Author Contributions

Conceptualization, R.Y.; methodology, R.Y. and S.L.; software, R.Y.; validation, R.Y.; formal analysis, S.L.; investigation, S.L.; resources, R.Y. and Z.W.; data curation, R.Y.; writing—original draft preparation, R.Y.; writing—review and editing, R.Y.; visualization, S.L. and R.Y.; supervision, H.C.; project administration, H.C.; funding acquisition, H.C. and Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by the “Yang Fan” major project in Guangdong Province, China, No. [2020]05.

Data Availability Statement

The dataset will be made publicly available at https://github.com/RuoyuYoung/NRGS-Net (accessed on 12 July 2025).

Conflicts of Interest

Author Zhixi Wang was employed by the company Truly Opto-Electronics Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Lu, L.; Zhou, Y.; Panetta, K.; Agaian, S. Comparative study of histogram equalization algorithms for image enhancement. Mobile Multimedia. In Proceedings of the SPIE Defense, Security, and Sensing, Orlando, FL, USA, 5–9 April 2010; Volume 7708, pp. 337–347. [Google Scholar]
Wu, T.; Wu, W.; Yang, Y.; Fan, F.; Zeng, T. Retinex image enhancement based on sequential decomposition with a plug-and-play framework. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 14559–14572. [Google Scholar] [CrossRef]
Proakis, J.G. Digital Signal Processing: Principles, Algorithms, and Applications, 4th ed.; Pearson Education: Noida, India, 2007. [Google Scholar]
Lyons, R.G. Understanding Digital Signal Processing, 3rd ed.; Pearson Education: Noida, India, 1997. [Google Scholar]
Jähne, B. Digital Image Processing; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Prince, S.J.D. Computer Vision: Models, Learning, and Inference; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
Vitoria, P.; Ballester, C. Automatic flare spot artifact detection and removal in photographs. J. Math. Imaging Vis. 2019, 61, 515–533. [Google Scholar] [CrossRef]
Zhang, Z.; Feng, H.; Xu, Z.; Li, Q.; Chen, Y. Single image veiling glare removal. J. Mod. Opt. 2018, 65, 2220–2230. [Google Scholar] [CrossRef]
Si, L.; Wang, Z.; Xu, R.; Tan, C.; Liu, X.; Xu, J. Image enhancement for surveillance video of coal mining face based on single-scale retinex algorithm combined with bilateral filtering. Symmetry 2017, 9, 93. [Google Scholar] [CrossRef]
Mandal, G.; Bhattacharya, D.; De, P. Real-time automotive night-vision system for drivers to inhibit headlight glare of the oncoming vehicles and enhance road visibility. J. Real-Time Image Process. 2021, 18, 2193–2209. [Google Scholar] [CrossRef]
Rahman, S.; Rahman, M.M.; Abdullah-Al-Wadud, M.; Al-Quaderi, G.D.; Shoyaib, M. An adaptive gamma correction for image enhancement. EURASIP J. Image Video Process. 2016, 2016, 35. [Google Scholar] [CrossRef]
Chen, S.D.; Ramli, A.R. Minimum mean brightness error bi-histogram equalization in contrast enhancement. IEEE Trans. Consum. Electron. 2003, 49, 1310–1319. [Google Scholar] [CrossRef]
Tang, C.; Wang, Y.; Feng, H.; Xu, Z.; Li, Q.; Chen, Y. Low-light image enhancement with strong light weakening and bright halo suppressing. IET Image Process. 2019, 13, 537–542. [Google Scholar] [CrossRef]
Zhou, Y.; Liang, D.; Chen, S.; Huang, S.J.; Yang, S.; Li, C. Improving lens flare removal with general-purpose pipeline and multiple light sources recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12969–12979. [Google Scholar]
Dai, Y.; Li, C.; Zhou, S.; Feng, R.; Zhu, Q.; Sun, Q.; Loy, C.C.; Gu, J.; Liu, S.; Wang, H.; et al. Mipi 2023 challenge on nighttime flare removal: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 2853–2863. [Google Scholar]
Reinhard, E. High dynamic range imaging. In Computer Vision: A Reference Guide; Springer International Publishing: Cham, Switzerland, 2021; pp. 558–563. [Google Scholar]
Wu, W.; Weng, J.; Zhang, P.; Wang, X.; Yang, W.; Jiang, J. Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5901–5910. [Google Scholar]
Liu, R.; Ma, L.; Zhang, J.; Fan, X.; Luo, Z. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2021; pp. 10561–10570. [Google Scholar]
Zhang, Y.; Zhang, J.; Guo, X. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1632–1640. [Google Scholar]
Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z.; Xu, J.; et al. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef]
Cui, Z.; Li, K.; Gu, L.; Su, S.; Gao, P.; Jiang, Z.; Qiao, Y.; Harada, T. You only need 90k parameters to adapt light: A light weight transformer for image enhancement and exposure correction. arXiv 2022, arXiv:2205.14871. [Google Scholar] [CrossRef]
Guo, Q.; Wang, H.; Yang, J. Night Vision Anti-Halation Method Based on Infrared and Visible Video Fusion. Sensors 2022, 22, 7494. [Google Scholar] [CrossRef] [PubMed]
Zhou, Y.; Xie, L.; He, K.; Xu, D.; Tao, D.; Lin, X. Low-light image enhancement for infrared and visible image fusion. IET Image Process. 2023, 17, 3216–3234. [Google Scholar] [CrossRef]
Huang, W.; Li, K.; Xu, M.; Huang, R. Self-Supervised Non-Uniform Low-Light Image Enhancement Combining Image Inversion and Exposure Fusion. Electronics 2023, 12, 4445. [Google Scholar] [CrossRef]
Liu, Y.; Yan, Z.; Wu, A.; Ye, T.; Li, Y. Nighttime image dehazing based on variational decomposition model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 640–649. [Google Scholar]
Sharma, A.; Tan, R.T. Nighttime visibility enhancement by increasing the dynamic range and suppression of light effects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2021; pp. 11977–11986. [Google Scholar]
Jin, Y.; Yang, W.; Tan, R.T. Unsupervised night image enhancement: When layer decomposition meets light-effects suppression. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 404–421. [Google Scholar]
Dong, S.W.; Lu, C.H. Dynamically Activated De-glaring and Detail-Recovery for Low-light Image Enhancement Directly on Smart Cameras. IEEE Trans. Emerg. Top. Comput. 2024, 13, 222–233. [Google Scholar] [CrossRef]
Jin, Y.; Lin, B.; Yan, W.; Yuan, Y.; Ye, W.; Tan, R.T. Enhancing visibility in nighttime haze images using guided apsf and gradient adaptive convolution. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 2446–2457. [Google Scholar]
He, Q.; Zhang, J.; Chen, W.; Zhang, H.; Wang, Z.; Xu, T. OENet: An overexposure correction network fused with residual block and transformer. Expert Syst. Appl. 2024, 250, 123709. [Google Scholar] [CrossRef]
Niu, C.; Li, K.; Wang, D.; Zhu, W.; Xu, H.; Dong, J. Gr-gan: A unified adversarial framework for single image glare removal and denoising. Pattern Recognit. 2024, 156, 110815. [Google Scholar] [CrossRef]
Dai, Y.; Luo, Y.; Zhou, S.; Li, C.; Loy, C.C. Nighttime smartphone reflective flare removal using optical center symmetry prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20783–20791. [Google Scholar]
Chen, B.H.; Ye, S.; Yin, J.L.; Cheng, H.Y.; Chen, D. Deep trident decomposition network for single license plate image glare removal. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6596–6607. [Google Scholar] [CrossRef]
Wu, Y.; He, Q.; Xue, T.; Garg, R.; Chen, J.; Veeraraghavan, A.; Barron, J.T. How to train neural networks for flare removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 11–15 June 2021; pp. 2239–2247. [Google Scholar]
Yan, W.; Tan, R.T.; Dai, D. Nighttime defogging using high-low frequency decomposition and grayscale-color networks. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 473–488. [Google Scholar]
Zhang, H.; Xu, X.; He, H.; He, S.; Han, G.; Qin, J.; Wu, D. Fast user-guided single image reflection removal via edge-aware cascaded networks. IEEE Trans. Multimed. 2019, 22, 2012–2023. [Google Scholar] [CrossRef]
Dai, Y.; Li, C.; Zhou, S.; Feng, R.; Luo, Y.; Loy, C.C. Flare7k++: Mixing synthetic and real datasets for nighttime flare removal and beyond. arXiv 2023, arXiv:2306.04236. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Cai, Y.; Bian, H.; Lin, J.; Wang, H.; Timofte, R.; Zhang, Y. Retinexformer: One-stage retinex-based transformer for low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 12504–12513. [Google Scholar]
Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17683–17693. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Zhao, H.; Zhang, Y.; Liu, S.; Shi, J.; Loy, C.C.; Lin, D.; Jia, J. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 267–283. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Computer Vision–ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part IV; Springer International Publishing: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
Zheng, S.; Gupta, G. Semantic-guided zero-shot learning for low-light image/video enhancement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 581–590. [Google Scholar]
Li, C.; Guo, C.; Loy, C.C. Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4225–4238. [Google Scholar] [CrossRef] [PubMed]
Nguyen, H.; Tran, D.; Nguyen, K.; Nguyen, H.; Tran, D.; Nguyen, K.; Nguyen, R. Psenet: Progressive self-enhancement network for unsupervised extreme-light image enhancement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 1756–1765. [Google Scholar]

Figure 1. The structure of NRGS-Net.

Figure 2. The structure of LEUformer.

Figure 3. The structure of LCAWin-Transformer.

Figure 4. The structure of gated positional attention.

Figure 5. The example of paired NRGD.

Figure 6. The example of unpaired NRGD.

Figure 7. Visual comparison with state-of-the-art glare suppression methods on the NRGD.

Figure 8. Visual comparison with state-of-the-art glare suppression methods on the Flare7K++ dataset.

Figure 9. Visualization results of ablation experiments on the NRGD.

Figure 10. Visualization results of ablation experiments on the Flare7k++ dataset.

Figure 11. Visualization results of the ablation experiment on unpaired data on the NRGD.

Table 1. Quantitative comparison of state-of-the-art methods on the NRGD, where bold indicates the best results.

Methods	$PSNR ↑$	$SSIM ↑$	$LPIPS ↓$	$MSE ↓$
EnlightenGAN [13]	9.5874	0.5346	0.4660	7762.5161
Zero-DCE [50]	10.8350	0.5562	0.4461	5900.8370
RUAS [18]	6.1093	0.3674	0.6472	17,553.7916
URetinex [17]	10.4410	0.5630	0.4521	6338.3320
PSENet [51]	10.5074	0.5341	0.4459	6105.1058
IAT [21]	16.4169	0.6588	0.3671	1802.1397
Ours	26.4192	0.9602	0.0792	400.5131

Table 2. Quantitative comparison of state-of-the-art methods on the Flare7k++ Dataset, where bold indicates the best results.

Methods	$PSNR ↑$	$SSIM ↑$	$LPIPS ↓$	$MSE ↓$
EnlightenGAN [20]	11.9049	0.7219	0.2665	4490.1825
Zero-DCE [50]	12.6071	0.7168	0.2656	3694.5139
RUAS [18]	6.3398	0.4290	0.5594	15,444.7054
URetinex [17]	13.1262	0.7256	0.2704	3374.2936
PSENet [51]	14.1257	0.7443	0.2257	2814.5622
IAT [21]	16.4169	0.6588	0.3671	1802.1397
Ours	31.4508	0.9822	0.0317	87.9677

Table 3. Ablation on the NRGD, where bold indicates the best results.

Methods	$PSNR ↑$	$SSIM ↑$	$LPIPS ↓$	$MSE ↓$
w/o GPA	24.2491	0.9114	0.1361	508.977
w/o L-Sampling	25.1757	0.9317	0.1015	454.0037
Ours	26.4192	0.9602	0.0792	400.5131

Table 4. Ablation on the Flare7k++ dataset, where bold indicates the best results.

Methods	$PSNR ↑$	$SSIM ↑$	$LPIPS ↓$	$MSE ↓$
w/o GPA	23.1328	0.8844	0.1764	607.8928
w/o L-Sampling	30.0409	0.9752	0.0444	105.7256
Ours	31.4508	0.9822	0.0317	87.9677

Table 5. Ablation of unpaired data on NRGD, where bold indicates the best results.

Methods	$NIQE ↓$	$BRISQUE ↓$	$PIQE ↓$
w/o GPA	4.9766	41.7599	35.5739
w/o L-Sampling	4.8686	41.9334	35.4058
Ours	4.8866	41.0672	35.2119

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, R.; Chen, H.; Luo, S.; Wang, Z. NRGS-Net: A Lightweight Uformer with Gated Positional and Local Context Attention for Nighttime Road Glare Suppression. Appl. Sci. 2025, 15, 8686. https://doi.org/10.3390/app15158686

AMA Style

Yang R, Chen H, Luo S, Wang Z. NRGS-Net: A Lightweight Uformer with Gated Positional and Local Context Attention for Nighttime Road Glare Suppression. Applied Sciences. 2025; 15(15):8686. https://doi.org/10.3390/app15158686

Chicago/Turabian Style

Yang, Ruoyu, Huaixin Chen, Sijie Luo, and Zhixi Wang. 2025. "NRGS-Net: A Lightweight Uformer with Gated Positional and Local Context Attention for Nighttime Road Glare Suppression" Applied Sciences 15, no. 15: 8686. https://doi.org/10.3390/app15158686

APA Style

Yang, R., Chen, H., Luo, S., & Wang, Z. (2025). NRGS-Net: A Lightweight Uformer with Gated Positional and Local Context Attention for Nighttime Road Glare Suppression. Applied Sciences, 15(15), 8686. https://doi.org/10.3390/app15158686

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

NRGS-Net: A Lightweight Uformer with Gated Positional and Local Context Attention for Nighttime Road Glare Suppression

Abstract

1. Introduction

2. Related Work

2.1. Traditional Methods

2.2. Deep Learning Methods

3. The Methods

3.1. Overview

3.2. Paired Dataset Synthesis

3.3. Lightweight and Effective Uformer

3.4. Gated Positional Attention

3.5. Loss Function

4. Experiments

4.1. Experimental Setup Details

4.1.1. Model Training Details

4.1.2. Datasets

4.1.3. Evaluation Metrics

4.2. Quantitative Comparison with State-of-the-Art Methods

4.2.1. Result on NRGD

4.2.2. Result on Flare7K++ Dataset

4.3. Qualitative Comparison with State-of-the-Art Methods

4.3.1. Visual Analysis on NRGD

4.3.2. Visual Analysis on Flare7K++ Dataset

4.4. Ablation Study

4.4.1. Quantitative Ablation of Modules

4.4.2. Qualitative Ablation of Modules

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI