CISPD: Complementary Illumination–Semantic Prompt Diffusion for Low-Light Remote Sensing Image Enhancement

Gao, Huan; Liao, Yuntai; Ma, Zongfang; Song, Lin

doi:10.3390/rs18091347

Open AccessArticle

CISPD: Complementary Illumination–Semantic Prompt Diffusion for Low-Light Remote Sensing Image Enhancement

College of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710311, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(9), 1347; https://doi.org/10.3390/rs18091347

Submission received: 9 February 2026 / Revised: 3 April 2026 / Accepted: 10 April 2026 / Published: 28 April 2026

(This article belongs to the Special Issue Applications to Internet of Things with Images and Videos from Remote Sensing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We propose CISPD, a prompt-guided diffusion framework that explicitly separates illumination correction and structure preservation for low-light remote sensing image enhancement.
The illumination-aware prompt and semantic-invariant prompt, regulated by a contrastive constraint, cooperatively guide the diffusion process toward well-exposed results with faithful geometric structures.

What are the implications of the main findings?

CISPD consistently achieves state-of-the-art performance on iSAID-dark and demonstrates strong real-scene generalization on darkrs, enabling region-wise illumination correction while preserving thin structures and boundaries.
The prompt-guided diffusion paradigm remains effective on natural image datasets LOLv1 and LOLv2, indicating a generalizable enhancement strategy beyond a single remote sensing domain.

Abstract

When performing nighttime passive visible remote sensing of non-emissive land surfaces, illumination is typically dominated by weak moonlight that varies with lunar phase, producing low-radiance images with degraded textures and thus motivating low-radiance visible remote sensing image enhancement. We propose a Complementary Illumination–Semantic Prompt Diffusion framework (CISPD) that incorporates a semantic-invariant prompt and a self-learned illumination-aware prompt to guide diffusion-based low-light remote sensing image enhancement. During denoising, we sequentially inject two complementary prompts. We first retrieve a self-learned illumination-aware prompt from a learnable pool conditioned on the current latent context to correct non-uniform brightness, and then apply a semantic-invariant prompt extracted from a vision foundation model to reinforce geometric structures and suppress artifacts. To keep the two prompts complementary rather than redundant, we introduce a contrastive constraint that encourages their representations to remain distinct, and the dual prompts jointly steer the diffusion trajectory toward well-exposed results with faithful structures. Experiments on iSAID-dark and darkrs, together with LOLv1 and LOLv2, demonstrate that CISPD achieves the best PSNR and SSIM on iSAID-dark, strong qualitative generalization on darkrs, and competitive quantitative performance on LOLv1 and LOLv2.

Keywords:

low-light remote sensing image enhancement; diffusion models; prompt learning; illumination-aware prompt; semantic-invariant prompt; contrastive prompt constraint

1. Introduction

With the rapid deployment of remote-sensing-enabled Internet of Things systems, high-resolution aerial and satellite images and videos are increasingly used as perceptual inputs for large-scale, continuous monitoring applications. High-resolution remote sensing images typically contain dense man-made structures and heterogeneous land-cover types. However, under nighttime acquisition, illumination is often dominated by weak, phase-dependent moonlight, which makes remote sensing images prone to severe underexposure, color degradation, and loss of fine details, thereby reducing the visibility of thin structures such as roads and roof boundaries. This inevitably affects downstream tasks that rely on high-resolution remote sensing imagery, including disaster assessment, wildlife monitoring, and environmental protection. Therefore, the development of algorithms dedicated to enhancing low-light remote sensing images is crucial.

Recent CNN-based, Transformer-based low-light enhancement methods predominantly follow single-pass feed-forward pipelines. CNN-based methods typically learn direct illumination compensation or Retinex decomposition and restoration from paired supervision, aiming to lift exposure while suppressing noise and recovering details [1,2,3]. In low-light remote sensing, the same paradigm is extended with domain-oriented representations and high-resolution modeling, including dual-domain feature fusion and data-efficient adaptation for challenging shadowed regions [1,2]. Transformer-based approaches further introduce global dependency modeling and long-range interaction to improve region-wise correction and detail recovery under spatially varying illumination [4,5,6]. Despite these advances, both CNN and Transformer paradigms typically adopt single-pass feed-forward mappings optimized with pixel-wise objectives, which often lead to over-averaged solutions in severely underexposed regions, thereby compromising fine details and structural consistency. This degradation is particularly detrimental for remote sensing, as it can obscure small-scale targets and blur object boundaries.

To mitigate the limitations of single-pass pixel-wise optimization, recent studies increasingly formulate low-light enhancement within diffusion frameworks, where restoration is modeled as an iterative denoising trajectory over image distributions, enabling progressive refinement of exposure, contrast, and structural details [7,8,9,10,11]. Such formulations provide a flexible backbone for incorporating additional guidance signals during sampling, which is particularly beneficial for handling severe degradation and non-uniform illumination. Existing methods typically rely on external modulation or guidance to stabilize the denoising process, including priors such as Fourier-domain constraints and Retinex-inspired decomposition [7,8], as well as conditioning through pretrained latent diffusion models for zero-shot enhancement and training-free attribute guidance [9,11,12]. More recently, prompt-based conditioning has emerged as an effective mechanism to inject auxiliary cues into diffusion models, where learnable or retrieved prompts modulate intermediate representations to provide degradation-aware guidance without modifying the backbone architecture [13,14,15,16,17]. Despite these advances, the guidance adopted in existing diffusion-based low-light enhancement is often single and static, remaining fixed across the denoising trajectory and tightly coupled to a specific prior or prompt form. Such monolithic guidance makes it difficult to simultaneously accommodate spatially varying illumination correction and preserve semantically meaningful structures, particularly when real-world scene conditions diverge from the priors assumed by the enhancement model. In remote sensing images, this often manifests as region-wise inconsistent correction and distorted local structures, which can obscure small-scale targets and blur object boundaries. These observations indicate that effective diffusion-based enhancement requires complementary and adaptive guidance mechanisms, which motivates the design of dual, learnable prompts to cooperatively steer the diffusion process.

This decomposition is necessary because low-light remote sensing enhancement must address two different objectives at the same time. The first is adaptive correction of spatially varying underexposure. The second is preservation of semantic structure and object boundaries under low contrast. A single guidance signal tends to couple these two roles and is therefore less suitable for handling both illumination adjustment and structure preservation in a stable manner.

To address these issues, we propose a Complementary Illumination–Semantic Prompt Diffusion (CISPD) framework tailored for low-light remote sensing image enhancement. Instead of relying on a single, fixed guidance mechanism, CISPD separates illumination correction from structure preservation through two complementary prompts injected into the denoiser. Specifically, we introduce a self-learned illumination-aware prompt (IAP) retrieved from a learnable prompt pool conditioned on the current latent context, which provides adaptive exposure-related guidance to handle spatially varying underexposure. Meanwhile, a semantic-invariant prompt (SIP) extracted from a vision foundation model supplies stable structural cues that are less affected by illumination variation, improving geometric consistency and suppressing artifacts. During the denoising trajectory, CISPD applies the illumination-aware prompt first to correct non-uniform brightness, and then uses the semantic-invariant prompt to reinforce structures and recover details after exposure correction. To prevent the two prompts from collapsing into redundant guidance, we further impose a contrastive prompt constraint loss (CPL) that keeps their representations distinct, encouraging complementary information flow throughout refinement. With this design, CISPD steers diffusion refinement with adaptive illumination control and a semantic-invariant prompt in a coordinated manner, leading to well-exposed results with faithful structures on low-light remote sensing imagery.

Our contributions are summarized as follows:

We develop CISPD as a prompt-guided residual diffusion framework for low-light remote sensing image enhancement, which couples adaptive illumination-aware prompt retrieval with semantic-invariant structural guidance to address spatially varying underexposure during iterative refinement.
We design a self-learned illumination-aware prompt and a semantic-invariant prompt, injected sequentially into the diffusion denoiser, together with a contrastive constraint to encourage complementary guidance.
Extensive experiments on both low-light remote sensing and natural-image datasets demonstrate that CISPD achieves superior enhancement quality and robust generalization under spatially varying illumination conditions.

2. Related Work

2.1. Diffusion Models in Image Restoration

Recent image restoration research increasingly adopts diffusion models as flexible image priors, where restoration is achieved by steering the reverse diffusion process toward solutions that satisfy observation constraints. A representative line of work keeps a pretrained diffusion prior fixed and designs guidance rules that couple data consistency with the denoising trajectory. For example, iteratively preconditioned guidance [18] refines the guidance formulation to better handle noisy observations in inverse problems, while frequency-guided posterior [19] sampling exploits the frequency structure of reverse diffusion to reduce approximation error and improve restoration quality. Beyond guidance design, the efficiency and controllability of diffusion sampling are also studied. DeqIR [20] reformulates the long sampling chain as a fixed-point system, enabling parallel sampling and fast gradient computation without additional training. For degradations that are difficult to specify in parametric form, Diffusion Image Prior [21] avoids an explicit degradation model by leveraging the optimization dynamics of a pretrained diffusion prior and an early-stopping strategy for blind restoration.

Another direction trains or adapts diffusion models to better align the diffusion process with restoration objectives, rather than relying solely on test-time guidance. DTPM [22] focuses the diffusion prior on texture recovery and then adapts it with conditional components for downstream restoration tasks. RDDM [23] proposes a dual diffusion process that explicitly models residual diffusion together with noise diffusion, aiming to make diffusion-based restoration more interpretable and generic across tasks. Diffusion priors are also combined with task structure and additional modalities to improve robustness under complex degradations. DiffBIR [24] decouples blind restoration into degradation removal and information regeneration, leveraging generative diffusion priors for detail synthesis within a unified pipeline. Osmosis [25] extends diffusion priors to RGBD by learning a joint diffusion prior and applying model-based guidance for underwater restoration. For low-light enhancement, LightenDiffusion [8] integrates Retinex-motivated decomposition in latent space with diffusion-based restoration under unpaired supervision.

Diffusion-based restoration is further explored in real-world super-resolution, where balancing fidelity and hallucination is critical. FaithDiff [26] studies how to better exploit latent diffusion priors while maintaining structural consistency with the input, and diffusion inversion [27] enables flexible sampling with an adjustable number of steps for super-resolution. UPSR [28] investigates how to utilize low-resolution information more effectively via region-dependent uncertainty guidance during diffusion. Since diffusion inference can be computationally demanding, compression and distillation have become increasingly important for deployment. Adversarial Diffusion Compression [29] distills diffusion-based super-resolution into a lighter diffusion-GAN hybrid and reduces computation cost while preserving generation ability. Overall, these recent studies indicate a trend toward combining principled guidance, task-aligned diffusion formulations, and efficiency-oriented designs to make diffusion restoration more practical [18,19,20,21,28,29]. Equivariant sampling further improves diffusion-based image restoration by exploiting dual sampling trajectories and a timestep-aware schedule to better utilize diffusion priors [30]. TPGDiff further improves diffusion-based image restoration by introducing hierarchical triple-prior guidance, where degradation priors, structural priors, and semantic priors are integrated across the denoising trajectory for unified image reconstruction [31].

2.2. Prompt Learning in Image Restoration

Prompt learning has become a common conditioning mechanism for image restoration, especially under mixed or unknown degradations where a single model is expected to adapt its restoration behavior to diverse inputs. Different from fixed task tokens, restoration prompts are often designed to be degradation-aware and dynamically adjusted, so the model can select or compose informative cues conditioned on the current input. MPerceiver [13] leverages both textual and visual prompts to exploit pretrained diffusion priors for all-in-one restoration, with emphasis on transferability under unseen or few-shot degradations. UHD-Processor [32] introduces degradation-aware prompts together with progressive frequency learning to support unified restoration while remaining resource-efficient for ultra-high-resolution images.

Recent works also study how prompts can encode degradations more explicitly and reduce ambiguity in conditioning signals. Instead of treating text prompts as external instructions, removing degradations in textual representations maps degraded images into the text space, edits degradation-related information in text, and then converts restored textual representations into guidance for image restoration [33]. AutoDIR [34] couples degradation identification with latent diffusion, where an assessment stage predicts degradation descriptions that can serve as prompts for an all-in-one restoration stage. SPIRE [14] further supports instruction-based restoration by enabling semantic prompts and degradation-aware restoration prompts, providing a more controllable interface for fine-grained restoration objectives. Beyond text, FPro [15] proposes frequency prompt guidance that injects prompt components across frequency bands to address both structure and detail in multiple restoration tasks.

Prompt pools and retrieval provide another mechanism to handle the combinatorial complexity of degradations. T3-DiffWeather [35] uses a prompt pool to construct weather prompts and introduces a contrastive objective to keep different prompt types distinct, improving robustness to mixed adverse weather inputs. In diffusion-based restoration, DPIR [17] incorporates dual prompts together with diffusion transformers to extract conditional information from low-quality inputs using both visual and textual cues. In contrast, CISPD is designed for low-light remote sensing enhancement under spatially non-uniform illumination. It uses a self-learned illumination-aware prompt retrieved from the current latent feature and a semantic-invariant prompt for structure-preserving guidance during residual diffusion. Defusion [36] constructs explicit visual instructions aligned with degradation patterns and guides a diffusion process that operates in degradation space, targeting stable generalization across diverse tasks. Feature Difference Instruction [37] adapts text-to-image diffusion for generic restoration by extracting degradation guidance from vision-language representations and injecting it into the diffusion process through lightweight tuning.

Prompt learning is also extended beyond RGB natural images. MP-HSIR [38] integrates spectral, textual, and visual prompts for hyperspectral image restoration under diverse degradations, showing that prompt conditioning can generalize to modalities with different priors and degradation characteristics. Overall, these works suggest that prompt learning provides a structured way to encode degradation cues and controllable restoration intent, and it becomes increasingly intertwined with diffusion-based restoration to improve adaptability and controllability under complex real-world degradations [13,14,17,36,37,38]. Neural discrimination-prompted Transformers further show that prompt-guided attention and feature modulation can improve both restoration quality and computational efficiency for UHD image restoration and enhancement [39].

2.3. Low-Light Remote Sensing Image Enhancement

Low-light enhancement for remote sensing imagery is a specialized challenge characterized by complex imaging conditions, including non-uniform illumination, vast spatial scales, and a high diversity of ground features. Wu et al. [40] tackled this by developing a Retinex-based deep unfolding network that decomposes low-light images into reflection and illumination layers within a learnable optimization framework. To maximize feature extraction, Fu et al. [41] utilized pairs of low-light images for training. Exploring alternative representations, Yang et al. [42] applied implicit neural representation to the enhancement process. In a more modular approach, Liu et al. [43] implemented a restoration scheme comprising three core elements: the Visual State Space Module (VSSM), the Local Feature Module (LFM), and the Dual-Gate Dconv Feed-Forward Network (DGDFFN). From an unsupervised learning perspective, Xing et al. [44] introduced a GAN-based framework that optimizes mutual information between low-light and restored images via Self-Similarity Contrast Learning (SSCL). Furthermore, Yao et al. [1] bifurcated the enhancement task into two stages: restoring global brightness through amplitude information and refining structural details using phase information. Finally, addressing the specific issue of non-uniform lighting, Zhao et al. [3] proposed an enhancement method derived from the atmospheric scattering model, tailored for complex remote sensing environments. HDCGAN+ combines weakly paired training with structure-aware fusion for low-illumination UAV remote sensing image enhancement [45]. Another recent study proposes a quaternion-wavelet Retinex framework for low-light image enhancement with applications to remote sensing, where illumination-reflectance decomposition is combined with wavelet-based denoising to improve color constancy and noise suppression [46]. These methods mainly rely on explicit decomposition or handcrafted physical modeling. In contrast, CISPD does not estimate illumination and reflectance as separate restoration outputs. Instead, it performs residual diffusion and uses prompt-guided conditioning to handle illumination correction and structure preservation during iterative denoising.

3. Methodology

3.1. Residual Diffusion with Implicit Priors

As shown in Figure 1, we propose CISPD, which aims to enhance a low-light remote sensing image

y \in R^{H \times W \times 3}

by learning the illumination-degradation residual

r_{d} = x - y

, where

x

is the clear reference. In the denoising UNet, the illumination-aware prompt is injected first and the semantic-invariant prompt is injected afterward through sequential cross-attention. Let

z_{0}

be the latent representation of

r_{d}

. The forward diffusion process yields a noisy latent

z_{t}

at timestep t as follows:

z_{t} = \sqrt{{\bar{α}}_{t}} z_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ, ϵ \sim N (0, I) .

(1)

The proposed framework optimizes a conditional denoiser

ϵ_{θ}

to predict the noise

ϵ

, guided by two synergistic prompts that separate the semantic reflectance and complex illumination context. This design follows the task decomposition in low-light remote sensing enhancement, where illumination correction requires adaptive degradation-aware guidance, while structure preservation requires stable semantic guidance that is less affected by illumination variation.

3.2. Semantic-Invariant Prompting

The semantic component of a remote sensing image represents the intrinsic physical attributes of land-cover objects, which should remain invariant regardless of illumination shifts. To preserve these intrinsic properties, we introduce the Semantic-Invariant Prompt

P_{S}

.

We leverage a vision foundation model DINOv3 [47] as the feature extractor for SIP. We choose DINOv3 because SIP is designed to encode semantic structure rather than illumination intensity, and a pretrained vision foundation model provides semantically richer and more stable descriptors than directly using low-level image features. This is suitable for remote sensing images, where land-cover layouts and object boundaries should remain stable under illumination variation. Given the input

y

, we extract the semantic feature

F_{i n v} = Φ_{D I N O} (y)

, which provides stable geomorphological descriptors. These features are projected into the prompt space via a projection layer

Ψ_{S} (\cdot)

to form the static semantic feature:

P_{S} = Ψ_{S} (F_{i n v}) \in R^{L_{s} \times D} .

(2)

P_{S}

serves as a semantic anchor in the diffusion process, ensuring that the geomorphological integrity is preserved during the intensity restoration.

3.3. Self-Learned Illumination-Aware Prompting

We propose a self-learned illumination-aware prompt

P_{I}

to complement the semantic-invariant prompt by modeling illumination-specific variation rather than semantic structure. This prompt is designed for adaptive exposure correction under spatially varying degradation. We define a Prompt Pool

P = {p_{i}}_{i = 1}^{N}

as a learnable library of degradation prompt embeddings, where each embedding

p_{i} \in R^{L_{a} \times D}

implicitly represents a latent mode of illumination-related degradation.

For a specific input, we extract its contextual feature

F_{e}

from the latent space of denoising UNet and perform top-k sparse retrieval to fit its unique degradation distribution. We calculate the similarity between a query

Q (F_{e})

and learnable keys

K = {k_{i}}_{i = 1}^{N}

associated with the pool:

s_{i} = Softmax (\frac{Q (F_{e}) \cdot k_{i}^{T}}{\sqrt{d i m}}), i \in {1, \dots, N} .

(3)

To provide a locally inductive bias for non-uniform lighting, we select the top-k most relevant descriptors to construct the illumination-aware prompt:

P_{I} = \sum_{j \in Top - k} s_{j} \cdot p_{j} \in R^{L_{a} \times D} .

(4)

This sparse selection mechanism allows the model to fit complex illumination distributions without being constrained by idealized physical assumptions. In CISPD, illumination adjustment is not controlled by a hand-crafted exposure threshold. Instead, it is governed by similarity-based sparse retrieval, where the similarity scores and the top-k selection determine which illumination descriptors are activated for the current latent context.

3.4. Illumination–Semantic Prompt Injection via Sequential Cross-Attention

To ensure that the prompts precisely guide the denoising process, we implement a Sequential Cross-Attention mechanism within the latent space of the denoising UNet. The denoiser

ϵ_{θ}

interacts with

P_{I}

and

P_{S}

in a cascading manner to progressively refine the latent features.

Let

F_{l a t}

be the intermediate feature map of the UNet. In the first stage, the feature map attends to the illumination descriptor

P_{I}

to perform localized brightness compensation:

F_{l a t}^{'} = Softmax (\frac{Q (F_{l a t}) K {(P_{I})}^{T}}{\sqrt{d i m}}) V (P_{I}) .

(5)

Here,

Q

,

K

, and

V

denote learnable linear projections that generate the query, key, and value representations, respectively. The symbol dim denotes the feature dimension used for scaled dot-product normalization.

In the second stage, the updated feature

F_{l a t}^{'}

attends to the semantic-invariant anchor

P_{S}

to reinforce geomorphological structures and eliminate potential artifacts introduced by illumination enhancement:

{\hat{F}}_{l a t} = Softmax (\frac{Q (F_{l a t}^{'}) K {(P_{S})}^{T}}{\sqrt{d i m}}) V (P_{S}) .

(6)

This sequential design follows the restoration requirement of low-light remote sensing images. The latent feature should first be corrected for spatially varying underexposure through

P_{I}

. After this correction,

P_{S}

provides semantic-invariant structural guidance to preserve boundaries and suppress artifacts. Therefore, brightness lifting is followed by structure-aware correction, which helps suppress over-enhancement and preserve local details during illumination adjustment. The comparison with alternative prompt integration strategies is provided in Section 4.7.

Formally, the two prompts condition the denoiser through the intermediate latent feature transformation

F_{l a t} \to F_{l a t}^{'} \to {\hat{F}}_{l a t}

defined by Equations (5) and (6). Therefore, the denoiser can be written as

ϵ θ (z_{t}, t, y; P_{I}, P_{S})

, where the dependence on

P_{I}

and

P_{S}

is realized by replacing the original intermediate feature

F_{l a t}

with the prompt-refined feature

{\hat{F}}_{l a t}

inside the denoising UNet. In this way, prompt information is injected into the reverse process through latent feature modulation rather than through an external sampling guidance term.

3.5. Illumination–Semantic Prompt Disentanglement via Contrastive Prompt Learning

To further ensure that

P_{S}

and

P_{I}

are truly separated into the semantic and illumination domains, we implement a Contrastive Prompt Loss (

L_{c p}

). We enforce cosine similarity between the semantic-invariant anchor and the illumination-contextual descriptor:

L_{c p} = \frac{1}{B} \sum_{b = 1}^{B} [sim (P_{S}^{b}, P_{I}^{b}) - sim (P_{S}^{b}, F_{i n v}^{b})] .

(7)

Here, B denotes the batch size, and b denotes the sample index within the batch.

P_{S}^{b}

and

P_{I}^{b}

denote the semantic-invariant prompt and the illumination-aware prompt of the b-th sample, respectively. The function sim denotes cosine similarity. By minimizing

L_{c p}

, we decrease the similarity between

P_{S}

and

P_{I}

while increasing the similarity between

P_{S}

and

F_{i n v}

. Therefore,

P_{S}

is anchored to semantic-invariant features, and

P_{I}

is encouraged to encode information that is distinct from this semantic anchor. In this way, the loss suppresses representational overlap between the two prompts and promotes functional disentanglement between illumination guidance and structure guidance.

3.6. Sampling and Optimization

The final training objective combines the residual diffusion loss, the contrastive prompt loss, and a pixel-level fidelity term:

L_{t o t a l} = λ_{1} L_{r e s} + λ_{2} L_{c p} + λ_{3} L_{P S N R} .

(8)

where

λ_{1}

,

λ_{2}

, and

λ_{3}

are weighting hyperparameters that balance the contribution of each loss component. We set them to 1 as a unified default configuration for all datasets. This equal-weight setting avoids additional dataset-specific manual tuning and keeps the optimization setup consistent across experiments.

L_{r e s}

supervises the denoiser to recover the illumination-degradation residual

r_{d}

, and therefore mainly drives illumination enhancement in the residual diffusion space.

L_{c p}

enforces the functional separation between

P_{I}

and

P_{S}

, so illumination-related correction and semantic-invariant structure guidance do not collapse into redundant conditioning.

L_{P S N R}

constrains the restored result to remain close to the clear reference, which helps reduce structural distortion during enhancement. As a result, the optimization promotes exposure restoration while preserving structural consistency.

Accordingly, the reverse transition is parameterized as

p_{θ} (z_{t - 1} ∣ z_{t}, y, P_{I}, P_{S})

, where the prompts affect the reverse process only through the prompt-conditioned noise prediction network. During inference, the denoising process is implemented through the Sequential Cross-Attention mechanism, which updates the intermediate latent feature before noise prediction:

z_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} (\frac{z_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (z_{t}, t, y, P_{I}, P_{S})}{\sqrt{{\bar{α}}_{t}}}) + \sqrt{1 - {\bar{α}}_{t - 1}} ϵ_{θ} .

(9)

The enhanced remote sensing image is ultimately reconstructed as

\hat{x} = Dec (z_{f i n a l}) + y

. By pivoting on the diffusion residual, the framework restores localized radiance gradients and topographic details in a highly efficient sampling pass.

4. Experiments

4.1. Dataset

We evaluate the proposed method on both low-light remote sensing datasets and paired natural Low-Light Image Enhancement (LLIE) datasets.

Remote sensing datasets. We adopt two datasets for low-light remote sensing image enhancement. iSAID-dark [1] is a paired dataset constructed from high-resolution remote sensing images. The dataset selects 751 images as the base dataset and generates paired low-/normal-light samples via a synthetic degradation process. To increase scene diversity and make training feasible, multiple random crops are extracted and resized to

500 \times 500

, yielding 3755 training image pairs and 66 validation image pairs. darkrs [1] contains 86 real nighttime remote sensing images captured by drones. Since paired ground truth is unavailable, it is mainly used to evaluate real-world generalization through qualitative comparisons.

Together, these two remote sensing datasets cover complementary challenging conditions. iSAID-dark evaluates restoration under paired low-light degradation with strong spatial illumination variation, while darkrs evaluates robustness on real nighttime scenes with complex illumination and sensor-dependent noise.

Natural image datasets. We further evaluate on three common paired datasets with official splits. LOLv1 [48] contains 500 paired low/normal-light images captured in real-world environments, where 485 pairs are used for training and 15 pairs for testing. LOLv2-Real [49] provides 689 training pairs and 100 testing pairs collected from real-scene captures. LOLv2-Syn [49] is a synthetic paired dataset with 900 training pairs and 100 testing pairs, constructed to simulate diverse low-light scenarios.

4.2. Metrics

For paired datasets, we report PSNR and SSIM to measure reconstruction fidelity and structural similarity. To evaluate perceptual quality, we adopt LPIPS for remote sensing evaluation on iSAID-dark and FID for natural image evaluation on LOLv1, LOLv2-Real, and LOLv2-Syn. We retain these metrics because they are widely used in low-light enhancement and provide direct comparability with existing baselines. Higher PSNR and SSIM indicate better fidelity and structure preservation, while lower LPIPS and FID reflect better perceptual quality.

4.3. Training Schedules

The proposed diffusion framework is implemented in PyTorch 2.1.0 and trained on dual NVIDIA RTX4090 (NVIDIA Corporation, Santa Clara, CA, USA) GPUs. Training initiates at a learning rate of

1.5 \times 10^{- 4}

, progressively attenuated via cosine annealing. The Adam optimizer is employed for parameter optimization, incorporating an exponential moving average with a weight of 0.995 for model weights. The diffusion framework operates across

T = 1000

timesteps with linearly scaled

β_{t}

values from 0.0001 to 0.02. Image inputs are processed as

256 \times 256

pixel patches with a batch size of 2. Data augmentation includes horizontal flips and random rotations at fixed angles

90^{\circ}

,

180^{\circ}

and

270^{\circ}

.

4.4. Qualitative Evaluation

We provide qualitative comparisons on both remote sensing and natural image datasets, including iSAID-dark, darkrs, LOLv1, and LOLv2-Real. Among them, Figure 2 and Figure 3 correspond to the target remote sensing task, while Figure 4 and Figure 5 are included as supplementary cross-domain validation on standard paired natural-image low-light benchmarks to show that the proposed prompt-guided diffusion mechanism is not restricted to remote sensing data. For a fair visual assessment, we focus on three key aspects: (i) whether severely underexposed regions are sufficiently lifted, (ii) whether object boundaries, structural layouts, and low-contrast details are preserved after enhancement, and (iii) whether common artifacts are avoided.

Results on iSAID-dark. Figure 2 presents qualitative comparisons on iSAID-dark. This dataset is characterized by large-scale underexposure and strong spatial illumination variation, where effective enhancement requires lifting dark regions while preserving thin structures and low-contrast textures in aerial views. As shown in Figure 2, some competing methods increase global brightness but tend to compress local contrast, which weakens the visibility of subtle scene cues such as parking-slot markings and boundary transitions on the asphalt. In contrast, methods that emphasize conservative correction may leave shadowed regions insufficiently recovered, yielding limited visibility gains in severely dark areas. More specifically, CUE exhibits noticeable over-brightening in the underexposed region, leading to washed-out appearances and reduced texture separability. SCI introduces evident chromatic noise-like patterns over relatively homogeneous areas, which distracts structural perception and harms visual consistency. NeRCo produces relatively limited illumination lifting, so the dark region remains less informative compared with other results. By comparison, CISPD achieves a better balance between exposure correction and structure preservation: it lifts the shadowed area to reveal meaningful scene content while keeping the parking-line patterns and low-contrast object boundaries clearer after brightness enhancement, and it avoids the obvious color corruption observed in SCI and the contrast collapse caused by aggressive brightening.

Results on darkrs. Figure 3 shows comparisons on darkrs, which consists of real nighttime remote sensing images with complex illumination conditions and sensor-dependent noise, where the enhancement quality is largely reflected by whether the method avoids over-amplification and maintains stable color statistics across the scene. Overall, several competing approaches produce overly strong exposure lifting, which makes the scene appear over-exposed and reduces the visibility of structural transitions, especially around the circular layout and surrounding paths. For instance, FourLLIE and LLFormer significantly brighten the entire image, resulting in a pale appearance that weakens tonal separation between different regions and reduces depth cues in the layout. NeRCo introduces a clear warm color bias, making the overall tone deviate from a natural nighttime distribution and affecting the consistency between illuminated and non-illuminated areas. SCI shows an evident brightness over-correction accompanied by a strong tint, which further harms visual realism. In contrast, CISPD performs a more controlled illumination adjustment: it improves visibility in dark regions while maintaining stable global tone and avoiding the strong color shift seen in NeRCo, and it better preserves the structural layout and boundary transitions of the scene without the over-exposure effect that appears in several baselines, indicating stronger real-scene generalization on darkrs.

This result also indicates that CISPD remains stable under challenging real nighttime imaging conditions, where non-uniform illumination and sensor-dependent noise appear simultaneously but paired ground truth is unavailable.

Results on LOLv1. Figure 4 demonstrates that CISPD recovers natural brightness and contrast while maintaining faithful colors and textures. On this dataset, a common failure mode of competing methods is to improve global brightness but sacrifice local contrast or color fidelity, resulting in washed-out regions, tone shifts, or detail loss in challenging areas. Some methods also introduce over-smoothing when suppressing noise, which removes high-frequency textures. In contrast, CISPD yields cleaner details and more visually coherent results: it enhances dark regions without excessive saturation, maintains more natural tone transitions, and preserves textures and edges in local areas.

Focusing on the zoomed regions in Figure 4, RUAS tends to leave the red-box area under-enhanced, where the fur texture and the boundary against the background remain indistinct. LLFormer largely lifts the exposure, yet the red-box crop shows weakened micro-texture on the fur, and the green-box crop exhibits a more saturated yarn tone with less clear thread patterns. RetinexFormer suppresses noise but also smooths fine structures in the green-box region, leading to reduced high-frequency details. In contrast, CISPD restores the exposure in both crops while retaining the fur strands in the red-box region and preserving yarn grooves and grape boundaries in the green-box region, without introducing noticeable saturation drift.

Results on LOLv2-Real. Figure 5 further validates CISPD on LOLv2-Real, which contains more diverse real-scene degradations and is generally more difficult than LOLv1. Existing approaches can produce inconsistent correction across regions, leading to remaining dark areas or over-brightened outputs with degraded local structures. In addition, uneven illumination correction may cause local contrast collapse or unnatural appearance. CISPD delivers more coherent illumination adjustment across the image and preserves local geometric structures with fewer artifacts, indicating improved robustness under diverse real-world lighting conditions.

The zoomed crops in Figure 5 further reveal the local behavior under real-scene degradations. QuadPrior shows a visible color cast and reduced local contrast in the green-box crop, and the red-box crop exhibits softened contours around the cables and the bag. CWNet improves overall brightness, but the green-box region presents weaker separation between adjacent stripe transitions, and the red-box region still contains less stable edge definition. In comparison, CISPD preserves clearer stripe patterns in the green-box crop and maintains sharper cable boundaries in the red-box crop, while keeping the global tone consistent, which leads to more reliable local structures on LOLv2-Real.

4.5. Quantitative Evaluation

We quantitatively compare CISPD with state-of-the-art LLIE methods on the paired remote sensing dataset iSAID-dark and the paired natural image datasets LOLv1, LOLv2-Real, and LOLv2-Syn. We report fidelity-oriented metrics together with perceptual metrics to reflect both reconstruction accuracy and perceptual quality and to ensure consistent comparison with prior low-light enhancement methods.

Results on iSAID-dark. Table 1 reports results under two settings: directly evaluating on iSAID-dark and retraining on iSAID-dark for comparison. The column headed iSAID-dark denotes direct evaluation on the iSAID-dark test set without retraining on iSAID-dark, which is used to assess cross-domain generalization. The column headed iSAID-dark retrain denotes the results obtained after retraining on the iSAID-dark training split, which is used to assess in-domain performance under dataset-specific supervision. Without retraining, CISPD achieves the best 21.51 dB in PSNR and 0.707 in SSIM, outperforming the second-best method by 3.07 dB in PSNR and 0.128 in SSIM. Although LPIPS is not the best under direct cross-dataset evaluation, CISPD still shows the strongest PSNR and SSIM margins. After retraining, CISPD achieves the best overall performance with 26.53 dB in PSNR, 0.856 in SSIM, and 0.101 in LPIPS. Compared with the second-best method, CISPD improves PSNR by 1.26 dB and SSIM by 0.035, and further reduces LPIPS by 0.028. These results show that CISPD remains effective under a stronger and more up-to-date remote sensing comparison.

Results on LOLv1 and LOLv2. Table 2 summarizes quantitative comparisons on LOLv1, LOLv2-Real and LOLv2-Syn. On LOLv1, CISPD achieves the best PSNR and the lowest FID among methods that report this metric, while PyDiff reports a slightly higher SSIM. On LOLv2-Real, CISPD remains competitive with 23.31 dB in PSNR and 0.888 in SSIM, while CUGD reports a slightly higher SSIM and PyDiff reports a higher PSNR. On LOLv2-Syn, CISPD achieves the best PSNR and SSIM, exceeding the second-best PSNR by 1.23 dB and the second-best SSIM by 0.004. Overall, the updated comparison shows that CISPD remains highly competitive across standard paired natural-image benchmarks. We note that CISPD does not obtain the top PSNR, SSIM, or FID on LOLv2-Real. LOLv2-Real contains more diverse real-scene degradations than LOLv1 and LOLv2-Syn, where residual misalignment and sensor-specific noise can penalize pixel-wise fidelity and feature-distribution metrics. In such cases, CISPD still maintains competitive quantitative results and strong qualitative structure preservation.

4.6. Efficiency Analysis

We analyze the model complexity of diffusion-based approaches on the LOLv1 dataset. Since diffusion models can be computationally demanding, we report parameter count and MACs to characterize computational cost at the architecture level. As summarized in Table 3, CISPD has a moderate parameter count among the compared diffusion-based baselines and requires substantially fewer MACs. These results indicate that CISPD achieves a favorable balance between restoration performance and computational complexity in terms of parameter count and MACs. We do not claim deployment-oriented efficiency from these metrics alone.

4.7. Ablation Study

Most ablation experiments are conducted on the iSAID-dark dataset under the same training and evaluation setting as the main comparison, and we report PSNR and SSIM for quantitative analysis. We further supplement cross-dataset validation on LOLv1 for the key hyperparameters of IAP length and top-k selection.

Effect of IAPs and SIPs. Table 4 evaluates the contributions of the two prompts by removing each component from CISPD. When removing IAPs, PSNR drops from 26.53 dB to 26.32 dB and SSIM decreases from 0.856 to 0.821, indicating that illumination-aware guidance is necessary for improving exposure correction and maintaining structural consistency. When removing SIPs, performance further degrades to 26.24 dB in PSNR and 0.818 in SSIM, showing that the semantic-invariant guidance plays a key role in preserving stable structures during enhancement. We also evaluate a variant without learnable keys of IAPs, which obtains 26.39 dB in PSNR and 0.826 in SSIM. Compared with the full model, this variant shows lower SSIM, suggesting that learnable keys help retrieve illumination prompts that better match the latent context and thus improve structural fidelity. Overall, combining SIPs and IAPs achieves the best performance, validating the necessity of using dual prompts in CISPD.

Effect of the IAP length. Table 5 studies the length of IAPs by varying it among 32, 64, and 128. Using length 64 yields the best results with 26.53 dB in PSNR and 0.856 in SSIM. A shorter prompt length of 32 reduces performance to 26.31 dB in PSNR and 0.829 in SSIM, indicating insufficient representational capacity for capturing illumination cues needed by the denoiser. Increasing the length to 128 does not improve performance, and instead results in 26.33 dB in PSNR and 0.831 in SSIM. This suggests that excessively long prompts may introduce redundant information and weaken the effectiveness of prompt injection. Therefore, we adopt length 64 as the default setting in CISPD. Its cross-dataset stability is further validated on LOLv1 in Table 6.

Effect of the contrastive prompt loss. Table 7 validates the role of the contrastive prompt loss that encourages the two prompts to remain complementary. Without this loss, performance decreases to 26.09 dB in PSNR and 0.802 in SSIM, which indicates that simply using dual prompts is insufficient, and their interaction needs explicit regulation. We further examine two reduced variants by removing the negative or positive component. Removing the negative component yields 26.19 dB in PSNR and 0.817 in SSIM, while removing the positive component yields 26.15 dB in PSNR and 0.813 in SSIM. Both variants perform worse than the full loss formulation, showing that the two terms contribute jointly to learning distinct and useful prompt representations. This result is consistent with the design goal of CPL, which is to enforce functional disentanglement between illumination guidance and structure guidance. With the full contrastive prompt loss, CISPD achieves 26.53 dB in PSNR and 0.856 in SSIM, giving a gain of 0.44 dB in PSNR and 0.054 in SSIM over the variant without the contrastive constraint.

Effect of the number of IAPs. Figure 6 investigates the influence of the IAP pool size on the iSAID-dark. As the pool size increases from a small scale, PSNR improves steadily and reaches its best value at a moderate pool size. When the pool is too small, the retrieved illumination cues are less diverse, which limits the ability of CISPD to handle spatially varying exposure. When the pool becomes excessively large, performance drops from the peak, indicating that enlarging the candidate set does not necessarily improve retrieval quality and can introduce less relevant prompts that weaken the guidance signal. Therefore, we adopt the pool size corresponding to the best performance in Figure 6 as the default setting.

Effect of the top-k selection. Figure 6 also studies the top-k selection used to construct IAPs from retrieved candidates. The results show that intermediate top-k values achieve the best performance. Using a very small top-k may miss cues required for correcting non-uniform brightness. Using an excessively large top-k tends to mix less relevant candidates, which reduces the selectivity of retrieval and weakens the illumination prior. This indicates that illumination adjustment in CISPD is practically governed by retrieval selectivity rather than by a fixed brightness threshold. Accordingly, we use the top-k value that achieves the best performance in Figure 6 as the default setting. Its cross-dataset stability is further validated on LOLv1 in Table 6.

Effect of the prompt integration strategy. We compare three prompt integration strategies on the iSAID-dark dataset. As shown in Table 8, direct input concatenation gives the weakest performance, which indicates that simple joint fusion cannot effectively separate illumination correction from structure preservation. Sequential fusion improves the results under both orders. Among them, the proposed IAP → SIP strategy achieves the best PSNR and SSIM, while the reverse order remains inferior. This result suggests that correcting illumination-related distortion before semantic-invariant structural refinement is more effective for low-light remote sensing image enhancement. Therefore, CISPD adopts IAP → SIP as the default integration strategy.

Cross-dataset validation of key hyperparameters. We further validate the two key hyperparameters on LOLv1 to examine whether the default settings selected on iSAID-dark remain stable across datasets. As shown in Table 6, the default IAP length 64 still achieves the best performance on LOLv1, while both shorter and longer prompts lead to inferior results. A similar trend is observed for top-k selection, where the default value also gives the best PSNR and SSIM. These results suggest that the selected hyperparameters are not specific to a single dataset and remain stable across both remote sensing and natural-image low-light enhancement benchmarks.

5. Conclusions

In this work, we studied low-light remote sensing image enhancement under the practical challenges of spatially varying illumination, sensor noise, and scene-dependent degradations, where preserving thin structures and boundary fidelity is critical for reliable remote sensing interpretation. To this end, we proposed CISPD, a complementary illumination–semantic prompt diffusion framework that reformulates enhancement as an iterative denoising process and introduces complementary prompt guidance to separate illumination correction from structure preservation. Concretely, CISPD retrieves a self-learned illumination-aware prompt from a learnable prompt pool conditioned on the current latent context, providing adaptive cues for correcting non-uniform underexposure. Meanwhile, a semantic-invariant prompt extracted from a vision foundation model supplies stable structural priors that help maintain geometric consistency and suppress artifacts after brightness correction. By injecting the two prompts sequentially along the diffusion trajectory, CISPD enables targeted exposure adjustment while retaining scene structures and fine details. In addition, we adopt a contrastive prompt constraint to prevent redundant guidance and encourage the two prompts to encode complementary information, which stabilizes refinement and improves structural fidelity.

Extensive experiments on paired low-light remote sensing datasets and real nighttime remote sensing imagery demonstrate that CISPD consistently improves both reconstruction fidelity and perceptual quality, while producing more coherent region-wise correction and clearer structures in qualitative comparisons. We further validated the generalization of CISPD on standard paired natural-image datasets, showing that the proposed guidance mechanism is not limited to a single domain. Efficiency analysis indicates that CISPD attains competitive model complexity with reduced computational cost compared with recent diffusion-based baselines, making it more practical for high-resolution enhancement. Ablation studies further confirm the roles of the illumination-aware and semantic-invariant prompts, the prompt design choices, and the contrastive constraint, and verify that appropriate prompt pool size and retrieval configuration are important for stable performance.

Current remote sensing evaluation is constrained by the available benchmark setting, which consists of one paired synthetic dataset and one real nighttime dataset without paired ground truth. Future work will further extend the validation to broader remote sensing benchmarks and more diverse acquisition conditions. In particular, actual inference speed and memory usage will be further optimized for practical high-resolution processing. We also plan to explore more robust prompt retrieval under domain shifts and real sensor noise, so that the framework can better adapt to diverse acquisition settings without additional supervision.

Author Contributions

Conceptualization, Y.L. and Z.M.; methodology, H.G. and Z.M.; software, H.G. and Y.L.; investigation, Y.L.; resources, Z.M.; writing—original draft preparation, H.G. and Y.L.; writing—review and editing, Z.M. and L.S.; visualization, H.G. and Y.L.; supervision, Z.M. and L.S.; and funding acquisition, Z.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant number 62276207.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yao, Z.; Fan, G.; Fan, J.; Gan, M.; Philip Chen, C.L. Spatial–Frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4706516. [Google Scholar] [CrossRef]
Zhang, F.; Tu, Z.; Hao, W.; Chen, Y.; Li, F.; Ye, M. Zero-Shot Parameter Learning Network for Low-Light Image Enhancement in Permanently Shadowed Regions. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5630216. [Google Scholar] [CrossRef]
Zhao, X.; Huang, L.; Li, M.; Han, C.; Nie, T. Atmospheric Scattering Model and Non-Uniform Illumination Compensation for Low-Light Remote Sensing Image Enhancement. Remote Sens. 2025, 17, 2069. [Google Scholar] [CrossRef]
Wang, T.; Zhang, K.; Shen, T.; Luo, W.; Stenger, B.; Lu, T. Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method. Proc. AAAI Conf. Artif. Intell. 2023, 37, 2654–2662. [Google Scholar] [CrossRef]
Cai, Y.; Bian, H.; Lin, J.; Wang, H.; Timofte, R.; Zhang, Y. Retinexformer: One-stage retinex-based transformer for low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2023; pp. 12470–12479. [Google Scholar]
Wu, J.; Ai, H.; Zhou, P.; Wang, H.; Zhang, H.; Zhang, G.; Chen, W. Low-Light Image Dehazing and Enhancement via Multi-Feature Domain Fusion. Remote Sens. 2025, 17, 2944. [Google Scholar] [CrossRef]
Lv, X.; Zhang, S.; Wang, C.; Zheng, Y.; Zhong, B.; Li, C.; Nie, L. Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2024; pp. 25378–25388. [Google Scholar] [CrossRef]
Jiang, H.; Luo, A.; Liu, X.; Han, S.; Liu, S. Lightendiffusion: Unsupervised low-light image enhancement with latent-retinex diffusion models. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 161–179. [Google Scholar]
Huang, Y.; Liao, X.; Liang, J.; Quan, Y.; Shi, B.; Xu, Y. Zero-shot low-light image enhancement via latent diffusion models. Proc. AAAI Conf. Artif. Intell. 2025, 39, 3815–3823. [Google Scholar] [CrossRef]
Wang, T.; Zhang, K.; Zhang, Y.; Luo, W.; Stenger, B.; Lu, T.; Kim, T.K.; Liu, W. LLDiffusion: Learning degradation representations in diffusion models for low-light image enhancement. Pattern Recognit. 2025, 166, 111628. [Google Scholar] [CrossRef]
Lan, G.; Ma, Q.; Yang, Y.; Wang, Z.; Wang, D.; Li, X.; Zhao, B. Efficient Diffusion as Low Light Enhancer. In Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2025; pp. 21277–21286. [Google Scholar] [CrossRef]
Lin, Y.; Ye, T.; Chen, S.; Fu, Z.; Wang, Y.; Chai, W.; Xing, Z.; Li, W.; Zhu, L.; Ding, X. AGLLDiff: Guiding diffusion models towards unsupervised training-free real-world low-light image enhancement. Proc. AAAI Conf. Artif. Intell. 2025, 39, 5307–5315. [Google Scholar] [CrossRef]
Ai, Y.; Huang, H.; Zhou, X.; Wang, J.; He, R. Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2024; pp. 25432–25444. [Google Scholar]
Qi, C.; Tu, Z.; Ye, K.; Delbracio, M.; Milanfar, P.; Chen, Q.; Talebi, H. SPIRE: Semantic Prompt-Driven Image Restoration. In Proceedings of the Computer Vision—ECCV 2024: 18th European Conference, Milan, Italy, 29 September–4 October 2024, Proceedings, Part XL; Springer: Cham, Switzerland, 2024; pp. 446–464. [Google Scholar] [CrossRef]
Zhou, S.; Pan, J.; Shi, J.; Chen, D.; Qu, L.; Yang, J. Seeing the unseen: A frequency prompt guided transformer for image restoration. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 246–264. [Google Scholar]
Luo, Z.; Gustafsson, F.K.; Zhao, Z.; Sjölund, J.; Schön, T.B. Controlling Vision-Language Models for Multi-Task Image Restoration. In Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Kong, D.; Li, F.; Wang, Z.; Xu, J.; Pei, R.; Li, W.; Ren, W. Dual prompting image restoration with diffusion transformers. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 12809–12819. [Google Scholar]
Garber, T.; Tirer, T. Image restoration by denoising diffusion models with iteratively preconditioned guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 25245–25254. [Google Scholar]
Thaker, D.; Goyal, A.; Vidal, R. Frequency-guided posterior sampling for diffusion-based image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2025; pp. 12873–12882. [Google Scholar]
Cao, J.; Shi, Y.; Zhang, K.; Zhang, Y.; Timofte, R.; Van Gool, L. Deep Equilibrium Diffusion Restoration with Parallel Sampling. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2024; pp. 2824–2834. [Google Scholar] [CrossRef]
Chihaoui, H.; Favaro, P. Diffusion Image Prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2025; pp. 24636–24644. [Google Scholar]
Ye, T.; Chen, S.; Chai, W.; Xing, Z.; Qin, J.; Lin, G.; Zhu, L. Learning Diffusion Texture Priors for Image Restoration. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2024; pp. 2524–2534. [Google Scholar] [CrossRef]
Liu, J.; Wang, Q.; Fan, H.; Wang, Y.; Tang, Y.; Qu, L. Residual Denoising Diffusion Models. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2024; pp. 2773–2783. [Google Scholar] [CrossRef]
Lin, X.; He, J.; Chen, Z.; Lyu, Z.; Dai, B.; Yu, F.; Qiao, Y.; Ouyang, W.; Dong, C. Diffbir: Toward blind image restoration with generative diffusion prior. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 430–448. [Google Scholar]
Nathan, O.B.; Levy, D.; Treibitz, T.; Rosenbaum, D. Osmosis: Rgbd diffusion prior for underwater image restoration. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 302–319. [Google Scholar]
Chen, J.; Pan, J.; Dong, J. Faithdiff: Unleashing diffusion priors for faithful image super-resolution. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 28188–28197. [Google Scholar]
Yue, Z.; Liao, K.; Loy, C.C. Arbitrary-steps image super-resolution via diffusion inversion. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 23153–23163. [Google Scholar]
Zhang, L.; You, W.; Shi, K.; Gu, S. Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 17980–17989. [Google Scholar]
Chen, B.; Li, G.; Wu, R.; Zhang, X.; Chen, J.; Zhang, J.; Zhang, L. Adversarial diffusion compression for real-world image super-resolution. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 28208–28220. [Google Scholar]
Wu, C.; Kong, Q.; Zhao, P.; Yang, W.; Ma, W.; Tang, F.; Jiang, Z.; Zhou, S.K. Equivariant Sampling for Improving Diffusion Model-based Image Restoration. arXiv 2025, arXiv:2511.09965. [Google Scholar] [CrossRef]
Tu, Y.; Yan, Q.; Niu, A.; Tang, J. TPGDiff: Hierarchical Triple-Prior Guided Diffusion for Image Restoration. arXiv 2026, arXiv:2601.20306. [Google Scholar] [CrossRef]
Liu, Y.; Li, D.; Fu, X.; Lu, X.; Huang, J.; Zha, Z.J. UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 23121–23130. [Google Scholar]
Lin, J.; Zhang, Z.; Wei, Y.; Ren, D.; Jiang, D.; Tian, Q.; Zuo, W. Improving image restoration through removing degradations in textual representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 2866–2878. [Google Scholar]
Jiang, Y.; Zhang, Z.; Xue, T.; Gu, J. Autodir: Automatic all-in-one image restoration with latent diffusion. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 340–359. [Google Scholar]
Chen, S.; Ye, T.; Zhang, K.; Xing, Z.; Lin, Y.; Zhu, L. Teaching tailored to talent: Adverse weather restoration via prompt pool and depth-anything constraint. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 95–115. [Google Scholar]
Luo, W.; Qin, H.; Chen, Z.; Wang, L.; Zheng, D.; Li, Y.; Liu, Y.; Li, B.; Hu, W. Visual-Instructed Degradation Diffusion for All-in-One Image Restoration. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 12764–12777. [Google Scholar]
Wang, C.; Fan, H.; Yang, H.; Karimi, S.; Yao, L.; Yang, Y. Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 23539–23550. [Google Scholar]
Wu, Z.; Chen, Y.; Yokoya, N.; He, W. MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration. arXiv 2025, arXiv:2503.09131. [Google Scholar]
Wang, C.; Pan, J.; Wang, L.; Wang, W.; Yang, Y. Neural Discrimination-Prompted Transformers for Efficient UHD Image Restoration and Enhancement. Int. J. Comput. Vis. 2026, 134. [Google Scholar] [CrossRef]
Wu, W.; Weng, J.; Zhang, P.; Wang, X.; Yang, W.; Jiang, J. URetinex-Net: Retinex-based Deep Unfolding Network for Low-light Image Enhancement. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2022; pp. 5891–5900. [Google Scholar] [CrossRef]
Fu, Z.; Yang, Y.; Tu, X.; Huang, Y.; Ding, X.; Ma, K.K. Learning a Simple Low-Light Image Enhancer from Paired Low-Light Instances. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2023; pp. 22252–22261. [Google Scholar] [CrossRef]
Yang, S.; Ding, M.; Wu, Y.; Li, Z.; Zhang, J. Implicit Neural Representation for Cooperative Low-light Image Enhancement. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2023; pp. 12872–12881. [Google Scholar] [CrossRef]
Liu, M.; Cui, Y.; Ren, W.; Zhou, J.; Knoll, A.C. LIEDNet: A Lightweight Network for Low-Light Enhancement and Deblurring. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 6602–6615. [Google Scholar] [CrossRef]
Xing, L.; Qu, H.; Xu, S.; Tian, Y. CLEGAN: Toward Low-Light Image Enhancement for UAVs via Self-Similarity Exploitation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5610714. [Google Scholar] [CrossRef]
Ke, K.C.; Sun, M.; Wang, X.; Liu, D.; Yang, H. HDCGAN+: A Low-Illumination UAV Remote Sensing Image Enhancement and Evaluation Method Based on WPID. Remote Sens. 2026, 18, 999. [Google Scholar] [CrossRef]
Frants, V.; Agaian, S.; Panetta, K.; Grigoryan, A. QWR-Dec-Net: A Quaternion-Wavelet Retinex Framework for Low-Light Image Enhancement with Applications to Remote Sensing. Information 2026, 17, 89. [Google Scholar] [CrossRef]
Siméoni, O.; Vo, H.V.; Seitzer, M.; Baldassarre, F.; Oquab, M.; Jose, C.; Khalidov, V.; Szafraniec, M.; Yi, S.; Ramamonjisoa, M.; et al. DINOv3. arxiv 2025, arXiv:2508.10104. [Google Scholar]
Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep Retinex Decomposition for Low-Light Enhancement. In Proceedings of the British Machine Vision Conference; British Machine Vision Association: Durham, UK, 2018. [Google Scholar]
Yang, W.; Wang, S.; Fang, Y.; Wang, Y.; Liu, J. From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2020; pp. 3063–3072. [Google Scholar]
Wang, C.; Wu, H.; Jin, Z. FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency Information. In Proceedings of the 31st ACM International Conference on Multimedia; ACM: New York, NY, USA, 2023; pp. 7459–7469. [Google Scholar] [CrossRef]
Zheng, N.; Zhou, M.; Dong, Y.; Rui, X.; Huang, J.; Li, C.; Zhao, F. Empowering Low-Light Image Enhancer through Customized Learnable Priors. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2023; pp. 12525–12535. [Google Scholar] [CrossRef]
Yang, K.F.; Cheng, C.; Zhao, S.X.; Yan, H.M.; Zhang, X.S.; Li, Y.J. Learning to Adapt to Light. Int. J. Comput. Vis. 2023, 131, 1022–1041. [Google Scholar] [CrossRef]
Ma, L.; Ma, T.; Liu, R.; Fan, X.; Luo, Z. Toward Fast, Flexible, and Robust Low-Light Image Enhancement. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2022; pp. 5627–5636. [Google Scholar] [CrossRef]
Wang, W.; Yang, H.; Fu, J.; Liu, J. Zero-reference low-light enhancement via physical quadruple priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 26057–26066. [Google Scholar]
Liu, Y.; Huang, T.; Dong, W.; Wu, F.; Li, X.; Shi, G. Low-light image enhancement with multi-stage residue quantization and brightness-aware attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2023; pp. 12140–12149. [Google Scholar]
Liu, R.; Ma, L.; Zhang, J.; Fan, X.; Luo, Z. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 10561–10570. [Google Scholar]
Yan, Q.; Feng, Y.; Zhang, C.; Pang, G.; Shi, K.; Wu, P.; Dong, W.; Sun, J.; Zhang, Y. Hvi: A new color space for low-light image enhancement. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 5678–5687. [Google Scholar]
Zhang, T.; Liu, P.; Lu, Y.; Cai, M.; Zhang, Z.; Zhang, Z.; Zhou, Q. Cwnet: Causal wavelet network for low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2025; pp. 8789–8799. [Google Scholar]
Xu, R.; Niu, Y.; Li, Y.; Xu, H.; Liu, W.; Chen, Y. URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 21267–21276. [Google Scholar]
Li, C.; Guo, C.; Loy, C.C. Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4225–4238. [Google Scholar] [CrossRef]
Xu, X.; Wang, R.; Fu, C.W.; Jia, J. Snr-aware low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 17714–17724. [Google Scholar]
Yi, X.; Xu, H.; Zhang, H.; Tang, L.; Ma, J. Diff-Retinex++: Retinex-Driven Reinforced Diffusion Model for Low-Light Image Enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 6823–6841. [Google Scholar] [CrossRef] [PubMed]
He, C.; Fang, C.; Zhang, Y.; Ye, T.; Li, K.; Tang, L.; Guo, Z.; Li, X.; Farsiu, S. Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model. In Proceedings of the Thirteenth International Conference on Learning Representations; OpenReview.net: Singapore, 2025. [Google Scholar]
Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef]
Wang, Y.; Wan, R.; Yang, W.; Li, H.; Chau, L.P.; Kot, A. Low-light image enhancement with normalizing flow. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2022; Volume 36, pp. 2604–2612. [Google Scholar]
Xu, X.; Wang, R.; Lu, J. Low-light image enhancement via structure modeling and guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2023; pp. 9893–9903. [Google Scholar]
Hou, J.; Zhu, Z.; Hou, J.; Liu, H.; Zeng, H.; Yuan, H. Global structure-aware diffusion process for low-light image enhancement. Adv. Neural Inf. Process. Syst. 2023, 36, 79734–79747. [Google Scholar]
Zhou, D.; Yang, Z.; Yang, Y. Pyramid diffusion models for low-light image enhancement. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence; IJCAI Press: Macao SAR, China, 2023. [Google Scholar] [CrossRef]
Li, G.; Zhao, B.; Li, X. Low-light image enhancement with sam-based structure priors and guidance. IEEE Trans. Multimed. 2024, 26, 10854–10866. [Google Scholar] [CrossRef]
Xu, R.; Li, Y.; Niu, Y.; Xu, H.; Chen, Y.; Zhao, T. Bilateral interaction for local-global collaborative perception in low-light image enhancement. IEEE Trans. Multimed. 2024, 26, 10792–10804. [Google Scholar] [CrossRef]
Liao, M.; Dong, H.; Wang, X.; Ubul, K.; Shao, Y.; Yan, Z. GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2025; pp. 8766–8776. [Google Scholar]
Li, C.; Su, H.; Tan, X.; Zhang, X.; Ma, L. WV-LUT: Wide Vision Lookup Tables for Real-Time Low-Light Image Enhancement. IEEE Trans. Multimed. 2025, 27, 4441–4453. [Google Scholar] [CrossRef]
Zeng, X.; Zhu, L.; Yang, W.; Leung, H.; Wang, S.; Kwong, S. Low-Light Image Enhancement via Diffusion Models With Semantic Priors of Any Region. IEEE Trans. Circuits Syst. Video Technol. 2026, 36, 3754–3767. [Google Scholar] [CrossRef]

Figure 1. The architecture of our proposed method. The denoising UNet is guided by two consecutive cross-attention stages. The illumination-aware prompt is injected first for illumination correction. The semantic-invariant prompt is injected afterward for structure preservation.

Figure 2. Visual comparison on the iSAID-dark dataset. The compared methods are FourLLIE [50], CUE [51], LANet [52], LLFormer [4], NeRCo [42], SCI [53].

Figure 3. Visual comparison on the darkrs dataset. The compared methods are FourLLIE [50], CUE [51], LANet [52], LLFormer [4], NeRCo [42], SCI [53].

Figure 4. Visual comparison on the LOLv1 dataset. The compared methods are LLFormer [4], QuadPrior [54], RetinexFormer [5], RQ-LLIE [55], and RUAS [56].

Figure 5. Visual comparison on the LOLv2-Real dataset. The compared methods are HVI [57], ReDDiT [11], QuadPrior [54], CWNet [58], and URWKV [59].

Figure 6. Ablation study of the top-k value and the number of IAPs on the iSAID-dark dataset.

Table 1. Quantitative comparison of the proposed method with state-of-the-art LLIE methods on the paired iSAID-dark dataset under direct evaluation and retraining settings. The best and second-best results in each metric column are marked in red and blue. ↑ indicates higher is better, and ↓ indicates lower is better.

Methods	Venue	iSAID-Dark			iSAID-Dark (Retrain)
Methods	Venue	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
Zero-DCE++ [60]	TPAMI22	12.07	0.04	0.818	8.46	0.055	0.95
SCI [53]	CVPR22	12.14	0.194	0.901	13.08	0.205	0.745
SNR [61]	CVPR22	13.31	0.246	0.722	23.17	0.76	0.179
LLFormer [4]	AAAI23	16.66	0.376	0.658	23.11	0.725	0.221
CUE [51]	ICCV23	15.7	0.343	0.717	21.47	0.682	0.263
FourLLIE [50]	ACMM23	14.49	0.35	0.579	19.78	0.643	0.309
LANet [52]	IJCV23	11.88	0.089	0.804	18.72	0.695	0.231
NeRCo [42]	ICCV23	16.73	0.513	0.381	18.79	0.604	0.326
Diff-Retinex++ [62]	TPAMI25	18.05	0.551	0.392	25.08	0.809	0.141
Reti-Diff [63]	ICLR25	18.44	0.579	0.386	25.27	0.821	0.129
Ours	-	21.51	0.707	0.404	26.53	0.856	0.101

Table 2. Quantitative comparison of the proposed method with state-of-the-art LLIE methods on the paired LOLv1 and LOLv2 datasets. The best and second-best results in each metric column are marked in red and blue. ↑ indicates higher is better, and ↓ indicates lower is better.

Methods	Venue	LOLv1			LOLv2 Real			LOLv2 Syn
Methods	Venue	PSNR↑	SSIM↑	FID↓	PSNR↑	SSIM↑	FID↓	PSNR↑	SSIM↑	FID↓
RUAS [56]	CVPR21	18.23	0.723	127.60	18.27	0.723	151.62	16.55	0.652	91.60
EnGAN [64]	TIP21	17.48	0.656	153.98	18.23	0.617	173.28	16.57	0.734	93.66
SNR-Net [61]	CVPR22	24.61	0.842	66.47	21.48	0.849	68.56	24.14	0.928	30.52
LLFlow [65]	AAAI22	21.13	0.852	65.17	17.43	0.831	70.68	24.81	0.919	20.24
SMG [66]	CVPR23	24.82	0.838	69.47	22.62	0.857	71.76	25.62	0.905	23.36
RetinexFormer [5]	ICCV23	25.16	0.845	72.38	22.80	0.840	79.58	25.67	0.930	22.78
RQ-LLIE [55]	ICCV23	25.24	0.855	53.32	22.37	0.854	68.89	25.54	0.940	20.86
LLFormer [4]	AAAI23	25.75	0.823	76.96	20.05	0.792	70.16	24.03	0.909	24.58
GSAD [67]	NIPS23	23.23	0.852	51.64	20.19	0.847	46.77	24.22	0.927	19.24
PyDiff [68]	IJCAI23	27.09	0.930	-	24.01	0.876	-	19.60	0.878	-
QuadPrior [54]	CVPR24	22.84	0.800	80.19	20.59	0.811	69.95	16.10	0.758	76.19
SGF [69]	TMM24	24.97	0.847	-	22.05	0.850	-	25.92	0.938	-
BiFormer [70]	TMM24	24.03	0.856	-	22.93	0.860	-	24.81	0.928	-
GM-MoE [71]	ICCV25	26.66	0.857	-	23.65	0.806	-	26.30	0.937	-
Diff-Retinex++ [62]	TPAMI25	24.67	0.867	50.77	23.41	0.872	49.60	26.06	0.944	15.80
Reti-Diff [63]	ICLR25	25.35	0.866	49.14	22.97	0.858	43.18	25.75	0.958	13.26
WV-LUT [72]	TMM25	23.31	0.833	-	23.27	0.856	-	24.37	0.911	-
CUGD [73]	TCSVT26	-	-	-	23.11	0.892	-	25.83	0.956	-
Ours	-	27.27	0.921	41.02	23.31	0.888	48.50	27.53	0.962	19.34

Table 3. Efficiency analysis of the diffusion-based method.

Metrics	Diff-Reti [62]	PyDiff [68]	GSAD [67]	Ours
Param (M)	56.88	97.89	17.17	70.04
MACs (G)	396.32	459.69	1340.63	119.64

Table 4. Ablation study of IAPs and SIPs on the iSAID-dark dataset.

Method	PSNR	SSIM
w/o. IAPs	26.32	0.821
w/o. learnable keys of IAPs	26.39	0.826
w/o. SIPs	26.24	0.818
w. SIPs and IAPs (Ours)	26.53	0.856

Table 5. Ablation study of the length of IAPs on the iSAID-dark dataset.

Length	PSNR	SSIM
32	26.31	0.829
64 (Ours)	26.53	0.856
128	26.33	0.831

Table 6. Cross-dataset validation of the IAP length and top-k on the LOLv1 dataset.

Setting	Value	PSNR	SSIM
IAP Length	32	27.08	0.916
IAP Length	64	27.27	0.921
IAP Length	128	27.11	0.917
Top-k	3	27.14	0.918
Top-k	4	27.27	0.921
Top-k	5	27.21	0.919

Table 7. Ablation study of the contrastive prompt loss on the iSAID-dark dataset.

Method	PSNR	SSIM
w/o. CPL	26.09	0.802
w/o. Negative	26.19	0.817
w/o. Positive	26.15	0.813
w. CPL (Ours)	26.53	0.856

Table 8. Ablation study of different prompt integration strategies on the iSAID-dark dataset.

Integration Strategy	PSNR	SSIM
Input Concatenation	25.71	0.827
Sequential Fusion SIP → IAP	26.24	0.842
Sequential Fusion IAP → SIP (Ours)	26.53	0.856

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, H.; Liao, Y.; Ma, Z.; Song, L. CISPD: Complementary Illumination–Semantic Prompt Diffusion for Low-Light Remote Sensing Image Enhancement. Remote Sens. 2026, 18, 1347. https://doi.org/10.3390/rs18091347

AMA Style

Gao H, Liao Y, Ma Z, Song L. CISPD: Complementary Illumination–Semantic Prompt Diffusion for Low-Light Remote Sensing Image Enhancement. Remote Sensing. 2026; 18(9):1347. https://doi.org/10.3390/rs18091347

Chicago/Turabian Style

Gao, Huan, Yuntai Liao, Zongfang Ma, and Lin Song. 2026. "CISPD: Complementary Illumination–Semantic Prompt Diffusion for Low-Light Remote Sensing Image Enhancement" Remote Sensing 18, no. 9: 1347. https://doi.org/10.3390/rs18091347

APA Style

Gao, H., Liao, Y., Ma, Z., & Song, L. (2026). CISPD: Complementary Illumination–Semantic Prompt Diffusion for Low-Light Remote Sensing Image Enhancement. Remote Sensing, 18(9), 1347. https://doi.org/10.3390/rs18091347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CISPD: Complementary Illumination–Semantic Prompt Diffusion for Low-Light Remote Sensing Image Enhancement

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Diffusion Models in Image Restoration

2.2. Prompt Learning in Image Restoration

2.3. Low-Light Remote Sensing Image Enhancement

3. Methodology

3.1. Residual Diffusion with Implicit Priors

3.2. Semantic-Invariant Prompting

3.3. Self-Learned Illumination-Aware Prompting

3.4. Illumination–Semantic Prompt Injection via Sequential Cross-Attention

3.5. Illumination–Semantic Prompt Disentanglement via Contrastive Prompt Learning

3.6. Sampling and Optimization

4. Experiments

4.1. Dataset

4.2. Metrics

4.3. Training Schedules

4.4. Qualitative Evaluation

4.5. Quantitative Evaluation

4.6. Efficiency Analysis

4.7. Ablation Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI