Weather-Corrupted Image Enhancement with Removal-Raindrop Diffusion and Mutual Image Translation Modules

Go, Young-Ho; Lee, Sung-Hak

doi:10.3390/math13193176

Open AccessFeature PaperArticle

Weather-Corrupted Image Enhancement with Removal-Raindrop Diffusion and Mutual Image Translation Modules

by

Young-Ho Go

and

Sung-Hak Lee

^*

School of Electronic and Electrical Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu 41566, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(19), 3176; https://doi.org/10.3390/math13193176

Submission received: 26 August 2025 / Revised: 29 September 2025 / Accepted: 1 October 2025 / Published: 3 October 2025

(This article belongs to the Special Issue Deep Learning in Image Processing and Scientific Computing)

Download

Browse Figures

Versions Notes

Abstract

Artificial intelligence-based image processing is critical for sensor fusion and image transformation in mobility systems. Advanced driver assistance functions such as forward monitoring and digital side mirrors are essential for driving safety. Degradation due to raindrops, fog, and high-dynamic range (HDR) imbalance caused by lighting changes impairs visibility and reduces object recognition and distance estimation accuracy. This paper proposes a diffusion framework to enhance visibility under multi-degradation conditions. The denoising diffusion probabilistic model (DDPM) offers more stable training and high-resolution restoration than the generative adversarial networks. The DDPM relies on large-scale paired datasets, which are difficult to obtain in raindrop scenarios. This framework applies the Palette diffusion model, comprising data augmentation and raindrop-removal modules. The data augmentation module generates raindrop image masks and learns inpainting-based raindrop synthesis. Synthetic masks simulate raindrop patterns and HDR imbalance scenarios. The raindrop-removal module reconfigures the Palette architecture for image-to-image translation, incorporating the augmented synthetic dataset for raindrop removal learning. Loss functions and normalization strategies improve restoration stability and removal performance. During inference, the framework operates with a single conditional input, and an efficient sampling strategy is introduced to significantly accelerate the process. In post-processing, tone adjustment and chroma compensation enhance visual consistency. The proposed method preserves fine structural details and outperforms existing approaches in visual quality, improving the robustness of vision systems under adverse conditions.

Keywords:

diffusion model; raindrop removal; data augmentation; conditional image translation; image restoration; tone adjustment

MSC:

68T45

1. Introduction

Artificial intelligence-based vision systems are critical for advanced driver assistance technology, including forward monitoring and digital side mirrors (DSMs), and are essential for ensuring visibility and driving safety. However, Figure 1 illustrates that these systems are often exposed to visual degradation caused by environmental factors, such as rain, fog, and abrupt illumination changes.

In particular, raindrops on camera lenses and blurring effects induced by fog significantly reduce the accuracy of object detection and distance estimation. Furthermore, inadequate high-dynamic range (HDR) processing under extreme lighting conditions, such as day–night transitions or in tunnels, can result in severe exposure imbalance [1,2,3].

Conventional studies have independently applied raindrop removal, dehazing, and HDR correction techniques to address these problems [3]. However, these approaches are insufficient for handling the compound degradations that occur concurrently in real-world driving environments. Traditional physics-based restoration methods fail to recover nonlinear distortions and global luminance loss effectively, and their performance deteriorates under combined degradation conditions.

Recently, among probabilistic generative models that have been gaining attention in deep learning, the denoising diffusion probabilistic model (DDPM) [4] has been highly regarded for its training stability and superior structural detail preservation compared to generative adversarial network (GAN)-based methods [5]. The DDPM employs a Markov chain to transform the original image gradually into Gaussian noise using a forward process while learning a reverse process to restore the original image [6]. Through these forward and reverse processes, the DDPM achieves high-quality restoration even for severely degraded images, demonstrating robust performance in conditional synthesis and image translation tasks. However, achieving such performance requires large-scale paired datasets. These constraints have been repeatedly highlighted in recent studies [7]. In practice, collecting aligned datasets that capture raindrop degradation in real driving environments is physically challenging due to variations in lens contamination, illumination, viewing angles, and time of day. Such uncontrollable factors have been noted to cause misaligned paired data and oversimplified raindrop patterns in existing datasets, restricting the generalizability of the model and reducing restoration quality in real-world scenarios [8]. Furthermore, many public datasets are biased toward daytime scenes and background-focused views, which induce distribution shift and further simplify the learned patterns [9].

To address these limitations, this study proposes a multi-degradation restoration framework based on the Palette diffusion model to address these limitations [10]. The proposed framework comprises three components: a data augmentation module, a raindrop removal module, and a post-processing stage. First, the data augmentation module generates synthetic data to compensate for the scarcity of real-world samples [11]. This approach enables the creation of diverse training data reflecting raindrop patterns and HDR imbalance, enhancing model generalization. Second, the raindrop removal module optimizes the Palette architecture for image-to-image translation, incorporates the augmented dataset for training, and applies smooth L1 loss [12] along with batch normalization (BN) [13] to improve stability and preserve structural details. Additionally, the denoising diffusion implicit model (DDIM) sampling strategy [14] is adopted to accelerate inference, enabling nearly real-time responsiveness for DSMs and forward-monitoring systems.

Finally, this work incorporates a post-processing stage to ensure the perceptual naturalness of the restored images. This stage employs a mutual image translation module (MITM) [15] for HDR tone correction and color balance adjustments, combined with blending strategies and chroma compensation to prevent excessive luminance variation and color loss. This comprehensive approach removes raindrops and fog and enables HDR restoration, ensuring reliable visibility across diverse weather conditions. Experimental results demonstrate that the proposed framework outperforms existing methods in restoration accuracy and visual quality.

2. Related Work

2.1. GAN-Based Methods and Emergence of Diffusion Models

Early studies on raindrop removal have focused on architectures using convolutional neural networks (CNNs) [16] or GANs [17]. These methods are typically trained on low-resolution images or scenes with simple backgrounds and incorporate various architectural mechanisms to address localized visual degradation caused by raindrops. For example, AttentiveGAN [18] employs an attention module that assigns higher weights to raindrop-affected regions and applies contextual information from surrounding areas for restoration. Similarly, UnfairGAN [19] combines edge priors with mask-based attention to improve restoration performance in severely degraded regions.

However, the GAN relies on adversarial learning between the generator and discriminator, leading to unstable optimization and high sensitivity to normalization strategies. Excessive regularization or dominant discriminator bias can cause mode collapse or produce repetitive and unrealistic outputs. Consequently, restored images often suffer from reduced realism and structural fidelity, which is detrimental in raindrop removal tasks where fine visual details must be preserved.

The DDPM has emerged as a promising alternative to overcome these limitations. The DDPM adopts a probabilistic approach that progressively adds noise to clean images and learns to reverse this process, enabling high-quality image generation. The DDPM is grounded in an explicit likelihood function; hence, it offers greater training stability and more predictable convergence than the GAN [20]. Furthermore, its ability to estimate noise accurately at each timestep enhances the restoration quality. These characteristics have allowed DDPM to surpass GAN-based methods in a range of low-level vision tasks, including super-resolution, inpainting, and colorization.

Figure 2 illustrates the overall workflow of DDPM, comprising forward and reverse processes [4]. In the forward process, Gaussian noise is gradually added to the original image over multiple timesteps until it becomes nearly pure noise. Conversely, the reverse process aims to reconstruct the original image by iteratively removing noise via a learned denoising network.

The forward process

q

is defined as a one-step conditional distribution, where at each step

t

, Gaussian noise with a variance of

β_{t}

is input to destroy structural information:

q (x_{t} | x_{t - 1}) = N (\sqrt{1 - β_{t}} x_{t - 1}, β_{t} Ι) .

(1)

where the expression

q (x_{t} | x_{t - 1})

denotes the forward conditional probability of transitioning from step

t - 1

to

t

. The notation

N (μ, Σ)

represents a normal distribution with mean

μ

and covariance

Σ

. The parameter

β_{t}

specifies the variance schedule that determines the noise magnitude at step

t

, and

Ι

indicates the identity matrix. In addition,

\sqrt{1 - β_{t}}

serves as an energy preservation factor to regulate noise injection.

During the reverse process, the injected noise is removed by modeling the conditional probability distribution for estimating the previous state

x_{t - 1}

at each step

t

. This distribution is approximated by:

p_{θ} (x_{t - 1} | x_{t}) = N (μ_{θ} (x_{t}, t), σ_{t}^{2} Ι) .

(2)

where the probability

p_{θ} (x_{t - 1} | x_{t})

represents the reverse conditional distribution for restoring from step

t

to

t - 1

, with

μ_{θ} (x_{t}, t)

serving as the predicted mean computed based on the estimated noise. The term

σ_{t}^{2} Ι

denotes the fixed variance at step

t

, which is predefined by the schedule rather than learned during training.

The DDPM learning objective minimizes the discrepancy between the true noise

ϵ

and predicted noise

ϵ_{θ} (x_{t}, t)

input during the forward process, formulated as follows:

L_{s i m p l e} = E_{x_{0}, t, ϵ} {‖ϵ - ϵ_{θ} (x_{t}, t)‖}_{2}^{2} .

(3)

where the loss function

L_{s i m p l e}

measures the squared

L_{2}

distance between the actual noise vector

ϵ

and the predicted noise

ϵ_{θ} (x_{t}, t)

. The expectation

E_{x_{0}, t, ϵ}

is calculated over the original image, timestep, and noise samples. The term

ϵ_{θ} (x_{t}, t)

indicates the noise prediction output from the model at the current noisy state

x_{t}

. The norm

{‖\cdot‖}_{2}^{2}

corresponds to the squared

L_{2}

distance, that is, the mean squared error (MSE) between the predicted and actual noise.

Despite these advantages, applying the DDPM to raindrop removal presents two significant limitations. First, constructing a high-quality paired raindrop dataset is challenging. In real-world environments, lens contamination, illumination changes, camera positions, and timing make it physically challenging to obtain well-aligned paired datasets. Second, the sampling process for the DDPM requires hundreds to thousands of iterative steps, resulting in high computational cost that hinders real-time deployment in vision systems.

Therefore, applying the DDPM for raindrop removal requires a domain-specific framework that addresses the difficulty of dataset construction and the computational burden of inference. Given the importance of preserving fine structural details, there has been increased demand for diffusion-based solutions that optimally balance visual fidelity, robustness, and computational efficiency.

2.2. Palette Diffusion Model

The Palette diffusion model was developed for conditional image restoration tasks, including inpainting, uncropping, colorization, and JPEG artifact removal. Palette adopts the DDPM to alleviate the training instability and mode collapse common in GAN-based architectures, achieving structural generality and high-fidelity restoration. The model employs a U-Net–based encoder–decoder structure integrated with self-attention modules, enabling stable optimization and flexible conditional generation. Explicit conditional inputs produce visually coherent and natural output, and the sampling-based generative capability facilitates synthesizing perceptually smooth and realistic images across diverse restoration tasks.

Despite these strengths, Palette presents several limitations in restoration tasks requiring spatially localized processing and fine structural precision, such as raindrop removal. First, as the default loss function, MSE supports pixelwise optimization but is insufficient for preserving sharp edges and distinct object boundaries. Thus, structural details in degraded regions are often lost, and object contours appear blurred. Second, the native architecture of Palette is primarily designed for stylistic restoration driven by conditional input, lacking task-specific priors necessary for detecting and reconstructing complex distortions, such as semi-transparent and spatially distributed raindrop artifacts. Therefore, the ability of Palette to isolate and restore localized degradations is restricted.

Furthermore, although Palette excels at producing perceptually natural output under diverse types of conditional input, this strength stems from its generative nature, which prioritizes visual plausibility over structural fidelity. Even when ground truth (GT) data are available, the model tends to emphasize aesthetics rather than precise restoration accuracy. Additionally, DDPM-based sampling requires hundreds or more iterative steps, resulting in prohibitively long inference times, making real-time raindrop removal applications impractical [21].

Several enhancement strategies have recently been introduced to mitigate these challenges. For example, SDEdit [22] applies stochastic differential equations to improve restoration accuracy under partial conditioning, and UNIT-DDPM [23] imposes latent space consistency constraints to maintain structural coherence in unpaired image translation. These efforts indicate that diffusion-based models alone are insufficient for structure-oriented restoration and underscore the increasing need to integrate structural priors, attention mechanisms, and mask-guided learning strategies. However, such approaches remain highly sensitive to mask quality and initial noise configurations while posing the risk of overfitting to specific domains. Thus, diffusion-based restoration models still face constraints in domain-specific applications that demand localized degradation handling and structural accuracy. In this context, effective deployment requires a refined interpretation of conditional signals and a customized architectural design, with diffusion models that jointly ensure structural robustness and generative flexibility emerging as a critical research direction.

3. Proposed Method

This paper proposes a two-stage diffusion-based framework to mitigate complex degradation problems caused by adverse weather conditions in environments with forward monitoring and DSMs, particularly severe visual distortions induced by raindrops on camera lenses. Figure 3 illustrates the overall architecture of the proposed framework, comprising two core modules: the make-raindrop diffusion (MRD) module in Figure 3a and removal-raindrop diffusion (RRD) module in Figure 3b. Additionally, a post-processing step using the MITM is employed to enhance tone and chromatic balance in the restored images (Figure 3c).

The first stage, the MRD module, is responsible for generating synthetic raindrop images to increase dataset diversity and improve generalization. Difference maps and edge detection results between GT images and actual raindrop images are employed to achieve this improvement by extracting localized degradation patterns, which are transformed into masks. The binary mask processing in Figure 3a generates suitable raindrop detection masks by applying bitwise operations and morphological operations to the extracted maps. These masks are applied during the inpainting process to reflect the spatial characteristics and irregular patterns of raindrop regions. During inference, random raindrop masks simulate diverse degradation conditions.

The second stage, the RRD module, is an image-to-image translation structure and is trained using the augmented dataset generated by the MRD module for raindrop removal. In this process, loss functions and normalization strategies are adjusted to improve removal performance.

During inference, DDIM-based sampling is employed to accelerate the process, enabling the restoration of high-quality images close to the original using a single conditional input containing raindrops.

Finally, in the post-processing stage, MITM is applied to perform HDR tone correction. A compensation technique is applied in the chroma domain to minimize color distortion caused by luminance adjustment. This process produces natural, restored images with reinforced structural detail preservation and color consistency. In summary, the method proceeds through the following core components:

The MRD module enhances dataset diversity and improves model generalization by synthesizing raindrop-degraded images using an inpainting approach guided by difference maps and edge-based masks.
The RRD module performs image-to-image translation using the augmented dataset, incorporating smooth L1 loss and BN to preserve structural details. During inference, DDIM-based sampling reduces computational cost and accelerates inference speed significantly.
In the post-processing stage, HDR tone correction is performed using MITM, followed by image blending to prevent overexposure artifacts. Chroma compensation is employed to mitigate color distortions caused by luminance adjustments, ensuring overall color consistency.

3.1. Make-Raindrop Diffusion Module

Training raindrop removal models for real-world scenarios requires a sufficiently large and well-aligned dataset. However, collecting large-scale datasets is challenging in practice due to diverse lighting conditions and dynamic scene variations. Moreover, raindrop patterns vary in size, density, and shape, making it difficult to achieve robust generalization with limited training data. This study introduces the MRD module for synthetic data generation to address these limitations.

The MRD module is based on the conditional learning structure of the Palette diffusion model and generates raindrop-degraded images by conditioning on clean input. The architecture adopts a U-Net-based encoder–decoder design with an embedded self-attention mechanism, realistically representing irregular raindrop patterns while preserving the structural integrity of background regions. In this study, U-Net is chosen as the backbone for its efficiency and stability in diffusion-based restoration, which aligns with prior diffusion research reporting U-Net-style denoisers consistently achieve stable training and strong empirical performance. While Transformers are effective for modeling global dependencies, their computational and memory requirements scale steeply in high-resolution image restoration. In contrast, U-Net preserves local details through multi-scale features and skip connections, which is better aligned with the restoration-oriented objectives of Palette [24,25].

3.1.1. Make-Raindrop Diffusion Module and Binary Mask Processing

The training process of the MRD module begins by generating a precise raindrop mask based on the difference between a clean image and its corresponding raindrop-degraded ground truth image.

The primary stages of the procedure are the initial candidate extraction using a difference map, boundary refinement via edge detection and bitwise operations, and post-processing with morphological operations. Figure 4 illustrates the overall process, revealing the sequential pipeline from edge map generation to final mask construction.

In the first stage, a pixelwise difference map is computed between the clean and raindrop-degraded images. Raindrop-affected regions vary significantly in intensity; thus, this difference map serves as a preliminary reference for raindrop localization. However, the raw difference map often emphasizes irrelevant details, such as object contours and background textures, reducing its accuracy as a direct mask.

The second stage incorporates auxiliary edge information to address this problem. Two distinct edge maps are generated to capture image boundaries and structural details: a morphological gradient edge map [26], highlighting structural contours by calculating the difference between dilation and erosion, and a Canny edge map [27], detecting sharp and fine edges. These edge maps are inverted using a NOT operation and are combined with the initial difference map using AND operations. This process suppresses edge-adjacent details while retaining the irregular regions characteristic of raindrops.

The third stage applies morphological post-processing. A closing operation (dilation followed by erosion) is performed to fill small gaps introduced during bitwise operations, followed by an opening operation (erosion followed by dilation) to remove residual noise [28]. These sequential operations enhance mask continuity and reduce minor error regions. Finally, the two refined results are merged using an OR operation to generate the most comprehensive representation of raindrop regions. Gaussian blurring is applied to smooth the mask boundaries, alleviating edge discontinuities and restoring the natural contours of raindrop regions. Unlike conventional diffusion architectures that inject noise uniformly across an image, MRD selectively injects noise within raindrop regions defined by a refined mask. This inpainting-based strategy redirects the diffusion trajectory toward clean-to-raindrop synthesis and shifts the learning objective from restoration based on noise removal to raindrop formation and data augmentation.

Figure 5 presents the final mask generation results, where (a) represents the input raindrop image, (b) depicts the raw difference map, and (c) illustrates the refined raindrop mask after morphological post-processing. This comparison demonstrates the effectiveness of the proposed pipeline in suppressing unnecessary structural details while isolating irregular raindrop regions. Although the mask does not capture every raindrop, its precision in excluding nonraindrop regions makes it highly suitable for inpainting-based synthesis, where the primary objective is to generate raindrops only in the designated regions. Therefore, this mask was adopted as the basis for raindrop synthesis in the MRD module.

The resulting high-precision raindrop mask is used for inpainting-based synthesis during MRD training. In this process, the clean image is provided as the conditional input, and noise is selectively input into the raindrop regions defined by the mask. This strategy enables the model to learn a generative trajectory in the diffusion process, progressively forming raindrop patterns. Hence, the MRD module can synthesize realistic and diverse raindrop-degraded images from clean input.

3.1.2. Data Augmentation

Data augmentation is a critical functionality of the MRD module, aiming to generate diverse artificial raindrop patterns and shapes that are difficult to capture in real-world environments, increasing dataset diversity and enhancing the generalizability of the model. Specifically, the MRD module performs conditional synthesis via an inpainting-based strategy during training and, in the inference phase, generates new raindrop-degraded images from clean conditional input without requiring GT images. By applying randomly generated raindrop masks, multiple synthetic raindrop images can be produced from a single clean input image.

The random mask generation process is critical for determining the diversity and realism of synthetic data. A predefined number of random coordinates is sampled based on the resolution of the input image to make this determination, and raindrop regions are defined around these points. Raindrop sizes are stochastically sampled, with circular shapes as the default; however, elliptical and irregular contours are also introduced to reflect realistic physical characteristics. These variations capture the asymmetry of raindrops and replicate irregularities observed in real-world conditions.

The total number of raindrops in a mask and their spatial coverage are also probabilistically varied. For instance, certain images may contain densely packed small raindrops, whereas others may include a few large raindrops distributed sparsely and asymmetrically. This variability enables the model to adapt to various degradation intensities and visual configurations, including extreme cases, such as oversized droplets or highly irregular distributions.

In Figure 6, the proposed data augmentation strategy can generate numerous synthetic raindrop images with diverse configurations from a single clean image. By varying crucial parameters, such as the number, size, and area ratio of raindrops, the configuration can be adjusted to simulate various degradation scenarios. Figure 6b–f illustrates variations, including the increased drop count, enlarged drop radius, and reduced density, demonstrating the flexibility of the synthesis process in modeling complex raindrop patterns. This approach addresses the fundamental limitations of existing real-world datasets, such as spatial misalignment and limited pattern diversity. The augmented dataset includes raindrop instances with variations in size, shape, density, and transparency, enabling the removal network to achieve higher generalized restoration performance.

Consequently, MRD-based data augmentation is essential for mitigating overfitting caused by limited real-world datasets and significantly improves the robustness and practicality of raindrop removal models.

3.2. Removal-Raindrop Diffusion Module

The RRD module adopts an image-to-image translation framework based on the Palette diffusion model to restore clean images (i.e., GT) from raindrop-degraded input (Cond). Raindrop removal requires preserving high-frequency structural details and maintaining accurate color reproduction; therefore, this study introduces several architectural enhancements to the original Palette network to improve training stability and restoration quality.

First, this approach employs loss function improvement. Although the original Palette employs the MSE, this work replaces it with smooth L1 loss, combining the advantages of the MSE and mean absolute error (MAE). Smooth L1 preserves the differentiability and sensitivity of the MSE to minor errors while applying the robustness of the MAE for significant errors. As illustrated in Figure 7, smooth L1 behaves quadratically near zero, similar to the MSE, ensuring sensitivity to minor deviations, and transitioning to a linear regime, such as MAE, for larger errors. This property mitigates the influence of significant errors caused by severe raindrop distortions during early training, facilitating stable convergence. At later stages, the linear error growth of the L1 component promotes accurate boundary recovery and improved preservation of high-frequency details.

Second, the normalization strategy is modified. The original Palette employs group normalization (GN), whereas this study adopts batch normalization (BN). The GN method is robust to batch size variations, benefiting irregular mask-based inpainting and grayscale-to-color restoration tasks. However, GN normalizes in groups, restricting inter-channel interaction and potentially introducing color distortion. In contrast, BN computes statistics for the entire batch for each channel, reinforcing interchannel consistency and reducing color shifts. For raindrop removal, where maintaining a fine color balance in regions with semi-transparent raindrops and background overlap is critical, BN significantly enhances naturalness and structural fidelity.

Third, the attention mechanism is restructured. In Palette, self-attention serves as a core component for modeling global context and structural coherence. However, applying full self-attention across multiple resolutions (e.g., 8 × 8, 16 × 16, and 32 × 32) incurs a high computational cost. Global attention is restricted to intermediate resolutions (16 × 16 and 32 × 32) to address this problem, reducing complexity while retaining global dependencies crucial for boundary refinement and nonlocal pattern modeling [29]. This design preserves the structural advantages of Palette and improves feasibility for real-time applications.

Fourth, inference efficiency is enhanced. Instead of DDPM-based sampling, the proposed framework employs the DDIM for accelerated inference. In the DDPM, the reverse process is defined as follows:

x_{t - 1} = \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{β_{t}}{\sqrt{1 - {\bar{α}}_{t}}} ϵ_{θ} (x_{t}, t)) + σ_{t} z, z ~ N (0, Ι),

(4)

where

x_{t}

denotes the noisy image at timestep

t

,

x_{t - 1}

represents the previous step,

α_{t} = 1 - β_{t}

indicates the noise retention ratio,

{\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}

quantifies the cumulative noise schedule,

ϵ_{θ} (x_{t}, t)

denotes the predicted noise, and

σ_{t} z

represents the stochastic term. Although this approach ensures diversity, it requires hundreds or thousands of steps.

In contrast, the DDIM removes the stochastic term and introduces a deterministic trajectory, significantly reducing the number of iterations. The DDIM reverse step is expressed as follows:

x_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} x_{0} + \sqrt{1 - {\bar{α}}_{t - 1}} ϵ_{θ} (x_{t}, t),

(5)

where

x_{0}

represents the predicted clean image, and

{\bar{α}}_{t - 1}

denotes the cumulative noise decay coefficient. By linearly interpolating toward

x_{0}

, the DDIM achieves fast sampling with as few as 20 to 50 steps while maintaining visual quality comparable to the DDPM. This acceleration is highly advantageous for DSMs and forward-monitoring applications requiring nearly real-time performance.

The RRD module is trained using a combination of large-scale synthetic paired raindrop datasets generated using the MRD module and a limited set of real raindrop images. The proposed framework achieves superior stability compared to the original Palette model via this data strategy and architectural refinements, delivering improved boundary detail preservation, reduced color distortion, and enhanced visual naturalness. As illustrated in Figure 8, the proposed RRD module removes raindrops effectively and demonstrates robustness across diverse real-world scenarios, including variations in lighting, background complexity, and raindrop density. The restored outputs display fine structural detail, accurate color reproduction, and overall visual coherence, confirming the practical applicability and reliability of this approach under challenging conditions.

3.3. Post-Processing Stage

The luminance (L) channel extracted from the output of the RRD module undergoes tone correction using the MITM to enhance the perceptual quality of the restored images. The MITM extends the CycleGAN [30] architecture and specializes in adjusting brightness and contrast in regions with significant exposure differences via image translation. This model is trained to emphasize details in low-light areas, improving object visibility under complex illumination conditions and enhancing overall tonal balance. This process is critical for mitigating the HDR imbalance common in multi-degradation environments. However, HDR tone enhancement often results in excessive luminance amplification in bright regions, causing oversaturation and color distortion. Therefore, this study introduces a luminance-domain blending strategy that preserves the benefits of HDR tone correction while restoring luminance balance by blending the RRD output with the HDR-enhanced result. The blending process is defined as follows:

L_{f u s e d} = L_{s u r r} \cdot L_{R R} + (1 - L_{s u r r}) \cdot L_{H D R},

(6)

where

L_{R R}

denotes the luminance channel of the image restored by the RRD module, and

L_{H D R}

represents the HDR-enhanced luminance obtained via the MITM. In addition,

L_{s u r r}

is a surround map derived from the HDR tone map, assigning higher weights to regions with significant brightness variation to preserve contrast during blending. Figure 9b shows that HDR tone enhancement alone can over brighten highlights. Applying blending based on the surround map in Figure 9c, following Equation (6), balances the tone distribution well.

Blending operates only on the luminance channel; therefore, a direct recombination with the original a and b chroma channels often introduces luminance-chroma inconsistencies, resulting in desaturation or color distortion. A chroma compensation technique based on color preservation is employed to resolve this. In the Lab color space, chroma intensity is defined as follows:

C = \sqrt{{(a - 128)}^{2} + {(b - 128)}^{2}},

(7)

where

a

and

b

represent the two chromatic components in Lab space, and 128 corresponds to the neutral gray baseline. Moreover,

C

can be interpreted as the norm of the chroma vector

(a - 128, b - 128)

, quantifying the color saturation.

The relative luminance ratio induced by blending is computed as follows:

L_{r a t i o} = \frac{L_{f u s e d}}{L_{R R}}, C_{n e w} = C \cdot L_{r a t i o},

(8)

where

L_{r a t i o}

indicates the relative change in luminance, and

C_{n e w}

represents the adjusted chroma magnitude scaled proportionally to luminance variation. The compensated a and b values are calculated to maintain the original chroma vector direction while updating its magnitude to

C_{n e w}

, as follows:

a_{c o m p} = \frac{a \cdot C_{n e w}}{C} + 128, b_{c o m p} = \frac{b \cdot C_{n e w}}{C} + 128 .

(9)

Finally, the adjusted Lab channels are converted back to red, green, and blue (RGB) to reconstruct the final image:

I_{B l e n d} = L a b 2 B G R (L_{f u s e d}, a_{c o m p}, b_{c o m p}) .

(10)

This series of operations linearly adjusts chroma intensity according to luminance variation while preserving the hue direction of the original color, preventing oversaturation or desaturation after tone adjustment. The post-processing stage, combining HDR tone enhancement with chroma compensation, balances the integration of brightness and color, delivering visually natural and consistent restoration results under diverse illumination conditions and degradation scenarios.

4. Simulation Results

In this study, the MRD module was trained using a paired dataset comprising 1420 image pairs collected from the clean-raindrop dataset of RainDS [31], the AttentiveGAN [18] dataset, and additional publicly available images sourced from the internet. For the RRD module, training was performed on about 11,000 image pairs from the clean-raindrop dataset, including around 1500 real raindrop pairs obtained from AttentiveGAN, RainDS, and CarsAndTrafficSignal [32], along with about 9500 synthetic pairs generated based on these real samples. Clean images for data augmentation were obtained from the Self-Driving-Car, RaindropClarity [33], AttentiveGAN, RainDS, and CarsAndTrafficSignal datasets.

The evaluation was conducted using images not included in the training set, from AttentiveGAN, RainDS, and additional self-collected images, to ensure objective performance analysis based on quantitative metrics. A diverse set of images representing varying degradation conditions was selected for comparison, and the effectiveness of the proposed tone-mapped removal-raindrop diffusion (TMRRD) module was assessed against existing fusion-based methods.

All experiments were conducted on a system equipped with an RTX 4080 Super graphics processing unit (NVIDIA, Santa Clara, CA, USA) and an i7-10700K central processing unit (Intel, Santa Clara, CA, USA). The MRD module was trained for 300 epochs with a batch size of 3, using 2000 timesteps and a learning rate of

5 \times 10^{- 5}

, whereas the RRD module was trained for 445 epochs with a batch size of 4, using 2000 timesteps and a learning rate of

2 \times 10^{- 5}

.

4.1. Evaluation Metric

A CNN-based classification experiment was conducted to evaluate the quantitative restoration performance of the RRD module. The objective was to verify the reliability of raindrop removal by assessing how accurately the restored images produced by the RRD module were recognized as clean images.

The CNN model [34] for classification comprised four convolutional layers with rectified linear unit activation functions, followed by max-pooling after each layer. The final stage included a fully connected layer and a softmax classifier. The RMSProp method was employed for optimizing the gradient loss during training [35]. All input images were resized to 256 × 256 and normalized, and the model was trained for 100 epochs. The training dataset contained clean and raindrop-degraded images, enabling the CNN to learn the distinction between the two classes.

For evaluation, 106 restored images generated by the RRD module were input into the CNN, and the proportion of images classified as clean was calculated. Table 1 summarizes the clean classification accuracy (%) under each experimental condition. The proposed RRD module achieved a high clean classification rate in the CNN-based evaluation, indicating that the restored images display characteristics closely resembling those of actual clean images. In addition, the proposed TMRRD also achieved a comparably high clean classification rate, reinforcing the reliability of the framework across configurations. These results confirm that the proposed raindrop removal network provides stable and effective restoration performance for diverse degradation scenarios.

4.2. Ablation Experiments

Ablation study was conducted to verify the stepwise contribution of each component within the proposed framework. As summarized in Table 2, all experiments were performed under identical training settings, and a single component was varied at a time to isolate its individual effect.

Case 1: The original Palette diffusion model was configured for an image-to-image setting and trained for raindrop removal on a dataset of 5000 images (1000 real and 4000 synthetic).
Case 2: The dataset size was kept at 5000 while replacing Group Normalization with Batch Normalization to assess the impact of normalization on color stability.
Case 3: The dataset was expanded to approximately 11,000 images and the same training procedure was repeated to evaluate whether MRD generated synthetic data improves generalization.
Case 4: Only the tone mapping module was applied in post processing to examine luminance balancing in the RRD output, and color correction was excluded in this setting.
Case 5: The full pipeline with blending and chroma compensation was applied. This is the final proposed method, which preserves color balance after tone adjustment and suppresses both over saturation and desaturation.

The qualitative results clearly demonstrate the effect of each component. Figure 10b and Figure 11b correspond to training raindrop removal by remapping the inputs and outputs of the original colorization-oriented Palette diffusion model, which fails to learn effective removal and introduces color shifts. In contrast, Case 2 reduces color distortion by replacing group normalization with batch normalization inside the attention block, enabling channel wise image-to-image learning for raindrop removal, although the performance remains limited (Figure 10c and Figure 11c). Case 3 shows consistent improvements in robustness and suppression of residual artifacts by training with a much larger synthetic dataset (Figure 10d and Figure 11d). Case 4 improves visibility in dark regions but causes excessive brightening in some highly exposed areas (Figure 10e and Figure 11e). Finally, Case 5 achieves the best overall perceptual quality by jointly ensuring tone balance and color stability through blending and chroma compensation (Figure 10f and Figure 11f).

4.3. Comparative Experiments

In this study, visual evaluations were performed by comparing the proposed method with existing baseline approaches for raindrop removal. These baseline methods are widely adopted in the removal-raindrop domain and have distinct strengths and limitations. The superiority of the proposed method was validated via this comparative analysis.

Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18 and Figure 19 illustrate the restoration results of each method under environmental and degradation conditions. In each case, (a) represents the input image containing raindrops, whereas (b) is the GT clean image. Panels (c) to (h) display the results of the RRCL [34], TUM [36], DIT [33], proposed RRD output, and proposed TMRRD with post-processing applied, respectively. The comparison images include complex degradation scenarios with diverse raindrop patterns, varying transparency levels, and illumination imbalances.

Figure 12c demonstrates relatively robust removal performance even under severe raindrop contamination; however, its noise suppression capability is limited, resulting in excessive granular noise in the restored image and a loss of fine structural details. In Figure 14d, TUM provides overall stability and satisfactory color restoration but lacks robustness against raindrops of varying size and shape, leaving residual artifacts in certain regions. As illustrated in Figure 18f, DIT delivers restoration results close to the original image; however, faint smudges remain around object boundaries after raindrop removal, degrading the perceptual quality.

In contrast, as observed in Figure 12f, Figure 13f, Figure 14f, Figure 15f, Figure 16f, Figure 17f, Figure 18f and Figure 19f, the proposed RRD consistently achieves high visual clarity across diverse degradation conditions, delivering stable restoration even when raindrop size, position, and transparency vary irregularly. Furthermore, as seen in the red boxes of Figure 16h, Figure 17h, Figure 18h and Figure 19h, the proposed TMRRD significantly enhances the overall perceptual quality by improving details in underexposed regions via HDR tone correction and preventing luminance-chroma inconsistencies via chroma compensation. Compared to conventional approaches, the proposed TMRRD demonstrates superior restoration accuracy under mild and severe degradation scenarios, indicating robust applicability to real-world conditions.

4.4. Quantitative Evaluation

The restoration performance of the proposed methods was quantitatively evaluated using a variety of no-reference image quality assessment (NR-IQA) metrics. This study employs recently proposed deep learning-based indicators and traditional methods based on statistics.

The deep learning-based metrics include patches to pictures-perceptual quality (PaQ-2-PiQ) [38], contrastive-language image pretraining for image quality assessment (CLIP-IQA+) [39], and multiscale image quality transformer (MUSIQ) [40]. These models are trained on large-scale human-labeled datasets and are designed to reflect perceptual quality aligned with human preferences. The PaQ-2-PiQ metric processes the global image and local patches, enabling consistent quality prediction across diverse resolutions and distortions. The CLIP-IQA+ metric operates in the embedding space of the CLIP model and estimates image quality based on visual similarity. This metric is extended to work in a self-supervised manner, eliminating the need for textual prompts and allowing direct evaluation of the “look and feel” of the image. The MUSIQ metric segments the input image into multiscale patches and applies a transformer-based architecture for quality prediction, offering robust performance under real-world conditions. These metrics typically assign higher scores to higher-quality images and demonstrate a strong correlation with human subjective assessments. For CLIP-IQA+, which is based on cosine similarity, relative comparison or normalized interpretation is more meaningful than absolute scores.

In addition to the deep learning metrics, this work also employs traditional statistic-based indicators, including the natural image quality evaluator (NIQE) [41], blind/referenceless image spatial quality evaluator (BRISQUE) [42], perception-based image quality evaluator (PIQE) [43], and perceptual index (PI) [44]. These methods rely on natural scene statistics to estimate image quality. The NIQE indicator constructs a statistical model of natural images without requiring training data and measures deviations from this model. The BRISQUE indicator extracts features based on natural scene statistics from distorted images and predicts quality using a support vector regression model trained on mean opinion scores. The PIQE indicator selects spatially complex regions in an image and evaluates local distortion levels to compute an overall quality score. All three indicators produce lower values for better-quality images and effectively capture blur, structural artifacts, and compression-related distortions. The PI metric is defined as the average of the NIQE and Ma scores [45], evaluating structural fidelity and perceptual aesthetics. Lower PI values indicate superior visual quality.

Figure 20 presents the quantitative results, which are visually summarized in metric-specific graphs. Although the performance rankings vary slightly depending on the characteristics of the individual images, the proposed framework consistently achieved high-quality results. These findings indicate that the RRD restoration network and the post-processing module function complementarily, maintaining visual quality and structural consistency under complex and diverse degradation conditions.

5. Discussion

This paper proposes a conditional diffusion-based image restoration framework to address compound degradations in real world driving environments, particularly localized distortions from raindrops on the camera lens and HDR tone imbalance. Autonomous driving, forward monitoring, and advanced safety vision systems require robust restoration techniques that preserve structural fidelity and color accuracy under diverse weather conditions. Through these improvements, signals that are directly tied to safety decisions such as lane contours, traffic sign characters, and the outlines of leading vehicles are restored reliably, which exerts a direct impact on road safety.

The proposed framework comprises three components: MRD, RRD, and post-processing modules. The MRD module generates synthetic degraded images with raindrops of various shapes, densities, and transparencies to build a large-scale paired clean- raindrop dataset, compensating for the limited availability of real-world data and enhancing model generalization. The RRD module restores clean images from synthetic and real degraded inputs by incorporating the smooth L1 loss, self-attention-based architecture, BN, and a DDIM-based sampling strategy, achieving training stability and efficient inference. The effectiveness of this module was validated via a CNN-based classification experiment. The diverse synthetic patterns provided by MRD mitigate dataset bias and misalignment, which strengthen RRD against boundary warping and various semitransparent raindrop residues. This is meaningful because it practically reduces the training instability and boundary artifacts that have been frequent in GAN based approaches.

The post-processing module integrates tone correction using MITM and chroma compensation, improving visibility in underexposed areas while suppressing overexposure in bright regions. These enhancements promote luminance balance, reduce color distortion, and contribute to the overall consistency and perceptual quality. As a result, even under rapid illumination changes such as rain or tunnel segments, object boundaries and textures are not washed out by excessive brightness variation and color naturalness is preserved. From an integrated perspective, coupling raindrop removal and tone correction rather than treating them in isolation yields an overall improvement for compound degradations.

As summarized in Table 3, regarding traditional no-reference image quality metrics, such as NIQE, BRISQUE, PIQE, and PI, the proposed framework demonstrates superior performance, particularly in terms of blur reduction and structural preservation. Additionally, the proposed framework performs competitively with recent learning-based perceptual metrics, including PaQ-2-PiQ, MUSIQ, and CLIP-IQA+. Notably, the complete framework with post-processing consistently outperforms existing methods under various degradation conditions. Improvements on traditional metrics indicate recovery of edge sharpness and statistical naturalness, while competitiveness on learning-based metrics suggests alignment with human subjective quality. Moreover, the CNN based clean classification result implies that the restored images recover distributional properties sufficiently to be perceived as clean in an operational sense.

However, the RRD and post-processing modules operate sequentially; hence, the overall computational cost increases. This result highlights the need for model compression to enable real-time deployment. Moreover, although this study focuses on raindrop degradation, further research is required to extend the framework to other weather-related degradations, such as dust and snow, using conditional learning and domain adaptation strategies. Hyperspectral image research has proposed generalization strategies, particularly those advanced through weakly supervised representation learning and studies on foundation models. These strategies provide practical guidance for improving the design of conditioning signals and enhancing data efficiency [46,47]. Given the constraints of real-time operation, system level optimization is required, including pipeline parallelization, lightweight sampling schedules, and low precision arithmetic. For multi-degradation extension, the design of conditioning information is central, and the quality of degradation specific masks together with the interpretability of conditioning signals will largely determine the lower and upper bounds of performance.

Despite these challenges, the proposed framework exhibits robust practical potential across a broad range of applications, including DSMs and front-facing cameras in autonomous driving, drone-based vision, intelligent surveillance, and industrial machine vision. By combining raindrop removal with HDR tone correction, the framework restores structural details and color fidelity under complex environmental conditions. Future work should explore lightweight implementations on edge devices, such as Jetson and Qualcomm NPUs, unified restoration for multiple degradation types, zero-shot restoration via self-supervised learning, and perceptual quality optimization using CLIP-based reinforcement learning. For functions that are sensitive to boundary sharpness and color contrast such as lane keeping assistance and forward collision warning, the proposed framework can stabilize detection thresholds and reduce false alarms. In addition, unifying data synthesis and restoration within a single diffusion paradigm offers a practical alternative for adverse condition vision where data dependency is severe. Overall, this study presents a robust and scalable solution for image restoration under complex degradation scenarios by integrating data synthesis, a conditional diffusion-based restoration network, and post-processing into a unified framework.

6. Conclusions

This paper presents a conditional diffusion-based restoration framework for removing raindrop artifacts and HDR tone distortions under compound degradations. The framework integrates MRD and RRD with a post processing pipeline, alleviates the scarcity of paired data by constructing a large scale clean-degraded dataset, and leverages both synthetic and real data to produce restorations with preserved boundaries and stable colors. Tone blending and chroma compensation in post processing further ensure global luminance balance and saturation stability.

The method demonstrates consistent gains in structural preservation, visual clarity, and perceptual quality under compound degradations, and performs competitively or better than existing approaches across various raindrop size, position, and transparency conditions on both traditional and learning-based IQA metrics. CNN-based evaluation indicates distributional recovery to the clean domain. Key advantages include improved boundary fidelity through smooth L1 loss and self-attention-based modeling, enhanced color stability with BN and tone–chroma reconciliation, and practical sampling efficiency using DDIM. Limitations include sequential computational overhead from RRD and post processing, and specialization in raindrops that constrain real-time deployment and cross-weather generalization.

Future work will prioritize model compression and hardware-optimized acceleration for embedded real-time use, a unified conditioning design that handles multiple degradations such as fog, dust, and snow, and the combination of self-supervised learning with CLIP embedding objectives to strengthen perceptual alignment while preserving structure.

Author Contributions

Conceptualization, S.-H.L.; methodology, Y.-H.G. and S.-H.L.; software, Y.-H.G.; validation, Y.-H.G. and S.-H.L.; formal analysis, Y.-H.G. and S.-H.L.; investigation, Y.-H.G. and S.-H.L.; re-sources, Y.-H.G. and S.-H.L.; data curation, Y.-H.G. and S.-H.L.; writing—original draft preparation, Y.-H.G.; writing—review and editing, S.-H.L.; visualization, Y.-H.G.; supervision, S.-H.L.; project administration, S.-H.L.; funding acquisition, S.-H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Korea Creative Content Agency (KOCCA) grant funded by the Ministry of Culture, Sports and Tourism (MCST) in 2024 (Project Name: Development of optical technology and sharing platform technology to acquire digital cultural heritage for high quality restoration of composite materials cultural heritage, Project Number: RS-2024-00442410, Contribution Rate: 50%) and the Institute of Information & Communications Technology Planning & Evaluation (IITP)—Innovative Human Resource Development for Local Intellectualization program grant—funded by the Korea government (MSIT) (IITP-2025-RS-2022-00156389, 50%).

Data Availability Statement

The data presented in the study are openly available in Qian et al. in reference [16] (https://github.com/rui1996/DeRaindrop?tab=readme-ov-file), Quan et al. in reference [27] (https://github.com/Songforrr/RainDS_CCN?tab=readme-ov-file), Jin et al. in reference [29] (https://github.com/jinyeying/RaindropClarity), and online available in reference [28] (https://universe.roboflow.com/pandyavedant18-gmail-com/carsandtrafficsignal) (all web accessed on 22 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

Zhang, Y.; Carballo, A.; Yang, H.; Takeda, K. Perception and sensing for autonomous vehicles under adverse weather conditions a survey. ISPRS J. Photogramm. Remote Sens. 2023, 196, 146–177. [Google Scholar] [CrossRef]
Hamzeh, Y.; Rawashdeh, S.A. A review of detection and removal of raindrops in automotive vision systems. J. Imaging 2021, 7, 52. [Google Scholar] [CrossRef] [PubMed]
He, D.; Shang, X.; Luo, J. Adherent mist and raindrop removal from a single image using attentive convolutional network. Neurocomputing 2022, 505, 178–187. [Google Scholar] [CrossRef]
Lee, Y.; Jeon, J.; Ko, Y.; Jeon, B.; Jeon, M. Task-driven deep image enhancement network for autonomous driving in bad weather. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation ICRA, Xi’an, China, 30 May–5 June 2021; pp. 13746–13753. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar] [CrossRef]
Peng, Y. A comparative analysis between GAN and diffusion models in image generation. Trans. Comput. Sci. Intell. Syst. Res. 2024, 5, 189–195. [Google Scholar] [CrossRef]
Sohl-Dickstein, J.; Weiss, E.A.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning ICML, Lille, France, 6–11 July 2015; Volume 37, pp. 2256–2265. [Google Scholar] [CrossRef]
Su, Z.; Zhang, Y.; Shi, J.; Zhang, X.-P. A survey of single image rain removal based on deep learning. ACM Comput. Surv. 2024, 56, 103. [Google Scholar] [CrossRef]
Soboleva, V.; Shipitko, O. Raindrops on windshield dataset and lightweight gradient-based detection algorithm. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence SSCI, Orlando, FL, USA, 5–8 December 2021; pp. 1–7. [Google Scholar] [CrossRef]
Saharia, C.; Chan, W.; Chang, H.; Lee, C.A.; Ho, J.; Salimans, T.; Fleet, D.J.; Norouzi, M. Palette image-to-image diffusion models. In Proceedings of the SIGGRAPH 2022 Conference, Vancouver, BC, Canada, 8–11 August 2022. [Google Scholar] [CrossRef]
He, C.; Lu, H.; Qin, T.; Li, P.; Wei, Y.; Loy, C.C. Diffusion models in low-level vision a survey. IEEE Trans. Neural Netw. Learn. Syst. 2025, 34, 1972–1987. [Google Scholar] [CrossRef]
Barron, J.T. A general and adaptive robust loss function. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, Long Beach, CA, USA, 15–20 June 2019; pp. 4326–4334. [Google Scholar] [CrossRef]
Ioffe, S. Batch renormalization towards reducing minibatch dependence in batch-normalized models. Adv. Neural Inf. Process. Syst. 2017, 30, 1945–1953. [Google Scholar] [CrossRef]
Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations ICLR, Vienna, Austria, 4 May 2021. [Google Scholar] [CrossRef]
Go, Y.-H.; Lee, S.-H.; Lee, S.-H. Multiexposed image-fusion strategy using mutual image translation learning with multiscale surround switching maps. Mathematics 2024, 12, 3244. [Google Scholar] [CrossRef]
Wang, S.-Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. CNN-generated images are surprisingly easy to spot for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, Seattle, WA, USA, 14–19 June 2020; pp. 8695–8704. [Google Scholar] [CrossRef]
Goodfellow, I. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Qian, R.; Tan, R.T.; Yang, W.; Su, J.; Liu, J. Attentive generative adversarial network for raindrop removal from a single image. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2482–2491. [Google Scholar] [CrossRef]
Nguyen, D.M.; Lee, S.-W. UnfairGAN an enhanced generative adversarial network for raindrop removal from a single image. Expert Syst. Appl. 2023, 228, 118232. [Google Scholar] [CrossRef]
Zheng, K.; Chen, Y.; Chen, H.; He, G.; Liu, M.-Y.; Zhu, J.; Zhang, Q. Direct discriminative optimization your likelihood-based visual generative model is secretly a GAN discriminator. arXiv 2025, arXiv:2503.01103. [Google Scholar] [CrossRef]
Dhariwal, P.; Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar] [CrossRef]
Meng, C.; He, Y.; Song, Y.; Song, J.; Wu, J.; Zhu, J.-Y.; Ermon, S. SDEdit guided image synthesis and editing with stochastic differential equations. In Proceedings of the International Conference on Learning Representations ICLR, Virtual, 25 April 2022. [Google Scholar] [CrossRef]
Sasaki, H.; Willcocks, C.G.; Breckon, T.P. UNIT-DDPM unpaired image translation with denoising diffusion probabilistic models. arXiv 2021, arXiv:2104.05358. [Google Scholar] [CrossRef]
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. SwinIR image restoration using Swin Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops ICCVW, Montreal, QC, Canada, 10–17 October 2021; pp. 1833–1844. [Google Scholar] [CrossRef]
Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple baselines for image restoration. In Proceedings of the Computer Vision—ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 17–33. [Google Scholar] [CrossRef]
Evans, A.N.; Liu, X.U. A morphological gradient approach to color edge detection. IEEE Trans. Image Process. 2006, 15, 1454–1463. [Google Scholar] [CrossRef]
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, 6, 679–698. [Google Scholar] [CrossRef]
Haralick, R.M.; Sternberg, S.R.; Zhuang, X. Image analysis using mathematical morphology. IEEE Trans. Pattern Anal. Mach. Intell. 1987, 4, 532–550. [Google Scholar] [CrossRef]
Moritz, N.; Hori, T.; Le Roux, J. Capturing multi-resolution context by dilated self-attention. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing ICASSP, Toronto, ON, Canada, 6–11 June 2021; pp. 6429–6433. [Google Scholar] [CrossRef]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision ICCV, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar] [CrossRef]
Quan, R.; Yu, X.; Liang, Y.; Yang, Y. Removing raindrops and rain streaks in one go. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, Virtual, 19–25 June 2021; pp. 9147–9156. [Google Scholar] [CrossRef]
Roboflow Universe. CarsAndTrafficSignal Dataset. 2025. Available online: https://universe.roboflow.com/pandyavedant18-gmail-com/carsandtrafficsignal (accessed on 22 August 2025).
Jin, Y.; Li, X.; Wang, J.; Zhang, Y.; Zhang, M. Raindrop clarity a dual-focused dataset for day and night raindrop removal. In Proceedings of the Computer Vision—ECCV 2024, Milan, Italy, 29 September–4 October 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Eds.; Springer: Cham, Switzerland, 2025; pp. 1–17. [Google Scholar] [CrossRef]
Han, Y.-K.; Jung, S.-W.; Kwon, H.-J.; Lee, S.-H. Rainwater-removal image conversion learning with training pair augmentation. Entropy 2023, 25, 118. [Google Scholar] [CrossRef]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar] [CrossRef]
Chen, W.-T.; Huang, Z.-K.; Tsai, C.-C.; Yang, H.-H.; Ding, J.-J.; Kuo, S.-Y. Learning multiple adverse weather removal via two-stage knowledge learning and multi-contrastive regularization toward a unified model. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, New Orleans, LA, USA, 19–24 June 2022; pp. 17632–17641. [Google Scholar] [CrossRef]
Özdenizci, O.; Legenstein, R. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10346–10357. [Google Scholar] [CrossRef]
Ying, Z.; Niu, H.; Gupta, P.; Mahajan, D.; Ghadiyaram, D.; Bovik, A. From patches to pictures PaQ-2-PiQ mapping the perceptual space of picture quality. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, Seattle, WA, USA, 14–19 June 2020; pp. 3575–3585. [Google Scholar] [CrossRef]
Wang, J.; Chan, K.C.K.; Loy, C.C. Exploring CLIP for assessing the look and feel of images. Proc. AAAI Conf. Artif. Intell. 2023, 37, 2555–2563. [Google Scholar] [CrossRef]
Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. MUSIQ multiscale image quality transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision ICCV, Montreal, QC, Canada, 10–17 October 2021; pp. 5128–5137. [Google Scholar] [CrossRef]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a completely blind image quality analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef]
Venkatanath, N.; Praneeth, D.; Bh, M.C.; Channappayya, S.S.; Medasani, S.S. Blind image quality evaluation using perception based features. In Proceedings of the 2015 Twenty First National Conference on Communications NCC, Mumbai, India, 27 February–1 March 2015; pp. 1–6. [Google Scholar] [CrossRef]
Blau, Y.; Mechrez, R.; Timofte, R.; Michaeli, T.; Zelnik-Manor, L. The 2018 PIRM Challenge on perceptual image super-resolution. In Proceedings of the European Conference on Computer Vision ECCV Workshops, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2019; pp. 334–355. [Google Scholar] [CrossRef]
Ma, C.; Yang, C.-Y.; Yang, X.; Yang, M.-H. Learning a no-reference quality metric for single-image super-resolution. Comput. Vis. Image Underst. 2017, 158, 1–16. [Google Scholar] [CrossRef]
Yang, J.; Du, B.; Wang, D.; Zhang, L. ITER image-to-pixel representation for weakly supervised HSI classification. IEEE Trans. Image Process. 2024, 33, 257–272. [Google Scholar] [CrossRef]
Wang, D.; Hu, M.; Jin, Y.; Miao, Y.; Yang, J.; Xu, Y.; Qin, X.; Ma, J.; Sun, L.; Li, C.; et al. HyperSIGMA hyperspectral intelligence comprehension foundation model. IEEE Trans. Pattern Anal. Mach. Intell. 2025, in press. [Google Scholar] [CrossRef]

Figure 1. Image degradation caused by raindrops: (a) raindrop image and (b) clean image.

Figure 2. Illustration of the forward and reverse processes of the denoising diffusion probabilistic model (DDPM).

Figure 3. Proposed framework: (a) make-raindrop diffusion (MRD) module for raindrop synthesis, (b) removal-raindrop diffusion (RRD) module for raindrop removal, and (c) mutual image translation module (MITM)-based post-processing.

Figure 4. Workflow of raindrop mask generation.

Figure 5. Results of mask generation: (a) input raindrop image, (b) raw difference map, and (c) refined raindrop mask.

Figure 6. Generation of synthetic raindrop images conditioned on clean input: (a) clean input image, (b) baseline configuration, (c) increased drop count, (d) increased drop radius, (e) reduced drop count with enlarged radius, and (f) further reduced drop count with enlarged radius.

Figure 7. Graphs of the mean absolute error (MAE), mean squared error (MSE), and smooth L1 loss.

Figure 8. Results of the proposed removal-raindrop diffusion (RRD) module on real-world raindrop images: (a) input images, and (b) RRD result images.

Figure 9. Components of the high-dynamic range (HDR) tone adjustment stage: (a) luminance channel of the image restored by the removal-raindrop diffusion (RRD) module, (b) HDR-enhanced luminance channel obtained via the mutual image translation module (MITM), (c) surround map derived from the HDR-enhanced image, and (d) final tone-adjusted image after blending.

Figure 10. Qualitative results of the ablation study: (a) input, (b) case 1, (c) case 2, (d) case 3, (e) case 4, and (f) case 5.

Figure 11. Qualitative results of the ablation study: (a) input, (b) case 1, (c) case 2, (d) case 3, (e) case 4, and (f) case 5.

Figure 12. Vehicle scene with input and restoration results: (a) raindrop input image, (b) clean ground-truth image, (c) RRCL, (d) TUM, (e) RDiffusion (f) DIT, (g) proposed removal-raindrop diffusion, and (h) proposed tone-mapped removal-raindrop diffusion.

Figure 13. Building scene with input and restoration results: (a) raindrop input image, (b) clean ground-truth image, (c) RRCL, (d) TUM, (e) RDiffusion (f) DIT, (g) proposed removal-raindrop diffusion, and (h) proposed tone-mapped removal-raindrop diffusion.

Figure 14. Vehicle and roadside building scene with restoration results: (a) raindrop input image, (b) clean ground-truth image, (c) RRCL, (d) TUM, (e) RDiffusion (f) DIT, (g) proposed removal-raindrop diffusion, and (h) proposed tone-mapped removal-raindrop diffusion.

Figure 15. Vehicle-centered scene with building background results: (a) raindrop input image, (b) clean ground-truth image, (c) RRCL, (d) TUM, (e) RDiffusion (f) DIT, (g) proposed removal-raindrop diffusion, and (h) proposed tone-mapped removal-raindrop diffusion.

Figure 16. Entrance sign scene with raindrop input and restoration results: (a) raindrop input image, (b) clean ground-truth image, (c) RRCL, (d) TUM, (e) RDiffusion (f) DIT, (g) proposed removal-raindrop diffusion, and (h) proposed tone-mapped removal-raindrop diffusion.

Figure 17. Apartment exterior scene and restoration outcomes: (a) raindrop input image, (b) clean ground-truth image, (c) RRCL, (d) TUM, (e) RDiffusion (f) DIT, (g) proposed removal-raindrop diffusion, and (h) proposed tone-mapped removal-raindrop diffusion.

Figure 18. Roadside traffic object scene with input and restoration results: (a) raindrop input image, (b) clean ground-truth image, (c) RRCL, (d) TUM, (e) RDiffusion (f) DIT, (g) proposed removal-raindrop diffusion, and (h) proposed tone-mapped removal-raindrop diffusion.

Figure 19. Outdoor vegetation scene with input and restoration results: (a) raindrop input image, (b) clean ground-truth image, (c) RRCL, (d) TUM, (e) RDiffusion (f) DIT, (g) proposed removal-raindrop diffusion, and (h) proposed tone-mapped removal-raindrop diffusion.

Figure 20. Metric scores: (a) Patches to pictures–perceptual quality (PaQ-2-PiQ), (b) contrastive language–image pretraining for image quality assessment plus (CLIP-IQA+), (c) multiscale image quality transformer (MUSIQ), (d) natural image quality evaluator (NIQE), (e) blind/referenceless image spatial quality evaluator (BRISQUE), (f) perception-based image quality evaluator (PIQE), and (g) perceptual index (PI) (the y-axis represents the metric score; the x-axis indicates the image number; the arrows in the titles indicate whether a higher score (↑) or a lower score (↓) is better).

Table 1. Comparisons with metric scores.

	RRCL [34]	TUM [36]	RDiffusion [37]	DIT [33]	Proposed
	RRCL [34]	TUM [36]	RDiffusion [37]	DIT [33]	RRD	TMRRD
Clean accuracy	84.9%	79.2%	77.4%	79.2%	95.2%	95.2%

Clean accuracy indicates the percentage of 106 evaluation images classified as clean. The RRCL, TUM, and DIT methods represent existing raindrop removal techniques, whereas the proposed method indicates the output of the RRD module without post-processing.

Table 2. Comparisons of components in each stage.

Name	Components of Each Stage
Case 1	Palette based image-to-image translation
Case 2	Batch normalization instead of Group normalization
Case 3	RRD module
Case 4	RRD with MITM tone mapping only
Case 5	TMRRD with tone blending and color compensation

Table 3. Comparisons of metric scores.

	RRCL	TUM	RDiffusion	DIT	Proposed
	RRCL	TUM	RDiffusion	DIT	RRD	TMRRD
PaQ-2-PiQ $(↑)$	69.9767	69.3781	71.5601	71.7653	71.5080	71.80081
CLIP-IQA+ $(↑)$	0.6310	0.5732	0.6349	0.6496	0.6608	0.6452
MUSIQ $(↑)$	51.1206	51.5334	55.4226	55.4252	54.4827	54.5679
NIQE $(↓)$	4.7855	4.7384	4.7156	5.1256	4.6992	4.1983
BRISQUE $(↓)$	31.3016	29.4883	24.1244	28.6625	28.0645	25.8919
PIQE $(↓)$	39.6359	29.5934	38.9378	46.2912	28.5374	28.9483
PI $(↓)$	3.4175	3.3398	3.3472	3.9030	3.6340	3.0632

(↑) higher scores are preferable; (↓) lower scores are preferable. Bold font marks the best results in the corresponding metric.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Go, Y.-H.; Lee, S.-H. Weather-Corrupted Image Enhancement with Removal-Raindrop Diffusion and Mutual Image Translation Modules. Mathematics 2025, 13, 3176. https://doi.org/10.3390/math13193176

AMA Style

Go Y-H, Lee S-H. Weather-Corrupted Image Enhancement with Removal-Raindrop Diffusion and Mutual Image Translation Modules. Mathematics. 2025; 13(19):3176. https://doi.org/10.3390/math13193176

Chicago/Turabian Style

Go, Young-Ho, and Sung-Hak Lee. 2025. "Weather-Corrupted Image Enhancement with Removal-Raindrop Diffusion and Mutual Image Translation Modules" Mathematics 13, no. 19: 3176. https://doi.org/10.3390/math13193176

APA Style

Go, Y.-H., & Lee, S.-H. (2025). Weather-Corrupted Image Enhancement with Removal-Raindrop Diffusion and Mutual Image Translation Modules. Mathematics, 13(19), 3176. https://doi.org/10.3390/math13193176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Weather-Corrupted Image Enhancement with Removal-Raindrop Diffusion and Mutual Image Translation Modules

Abstract

1. Introduction

2. Related Work

2.1. GAN-Based Methods and Emergence of Diffusion Models

2.2. Palette Diffusion Model

3. Proposed Method

3.1. Make-Raindrop Diffusion Module

3.1.1. Make-Raindrop Diffusion Module and Binary Mask Processing

3.1.2. Data Augmentation

3.2. Removal-Raindrop Diffusion Module

3.3. Post-Processing Stage

4. Simulation Results

4.1. Evaluation Metric

4.2. Ablation Experiments

4.3. Comparative Experiments

4.4. Quantitative Evaluation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI