Image Visibility Enhancement Under Inclement Weather with an Intensified Generative Training Set

Lee, Se-Wan; Lee, Seung-Hwan; Son, Dong-Min; Lee, Sung-Hak

doi:10.3390/math13172833

Open AccessArticle

Image Visibility Enhancement Under Inclement Weather with an Intensified Generative Training Set

School of Electronic and Electrical Engineering, Kyungpook National University, 80 Daehakro, Buk-Gu, Daegu 41566, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(17), 2833; https://doi.org/10.3390/math13172833

Submission received: 7 August 2025 / Revised: 26 August 2025 / Accepted: 30 August 2025 / Published: 3 September 2025

(This article belongs to the Special Issue The Application of Deep Neural Networks in Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Image-to-image translation inputs an image and transforms it into a new image. Deep learning-based image translation requires numerous training data to prevent overfitting; therefore, this study proposes a method to secure training data efficiently by generating and selecting fake water-droplet images using a cycle-consistent generative adversarial network (CycleGAN) and a convolutional neural network (CNN) for image enhancement under inclement weather conditions. A CNN-based classification model was employed to select 1200 well-formed virtual paired sets, which were then added to the existing dataset to construct an augmented training set. Using this augmented dataset, a CycleGAN-based removal module was trained with a modified L1 loss incorporating a difference map, enabling the model to focus on water-droplet regions while preserving the background color configuration. Additionally, we introduce a second training step with tone-mapped target images based on Retinex theory and CLAHE to enhance image contrast and detail preservation under low-light rainy conditions. Experimental results demonstrate that the proposed framework improves water-droplet removal performance compared to the baseline, achieving higher scores in image quality metrics such as BRISQUE and SSEQ and yielding clearer images with reduced color distortion. These findings indicate that the proposed approach contributes to improving image clarity and the safety of autonomous driving under inclement weather conditions.

Keywords:

image deep learning; CycleGAN; CNN; data augmentation; water-droplet removal; tone mapping

MSC:

68T45

1. Introduction

The development of computer vision-based autonomous vehicles has progressed with the advancement of artificial intelligence (AI) technology. Autonomous driving technology, which permits vehicles to perceive the surroundings and self-navigate, has the potential to revolutionize future transportation systems. However, water droplets on the front camera lens during rainy weather significantly degrade image clarity, making it difficult to detect objects accurately, including pedestrians, vehicles, and road signs. This visibility problem negatively influences object recognition performance, increasing the risk of accidents while driving. Developing tools to address this problem is essential for enhancing the safety and reliability of autonomous driving systems [1,2,3,4]. Image visibility enhancement under inclement weather conditions has been widely studied through various approaches such as dehazing, color correction, and deblurring. These methods effectively address issues caused by fog, low illumination, and motion blur. However, they do not directly tackle the occlusion effect of raindrops adhering to camera lenses. Water-droplet removal is thus an essential yet underexplored sub-problem, as it directly affects visual clarity in rainy conditions.

Image-to-image translation using deep learning transforms images from the original domain into a target domain. The generative adversarial network (GAN) is widely employed in image-to-image translation due to its ability to generate high-quality images via adversarial learning [5,6]. A representative GAN-based model, pix2pix, requires paired datasets for training, where each input image corresponds to a target image. This dependency on large, paired datasets poses a limitation in its application. The cycle-consistent GAN (CycleGAN) was introduced to overcome this problem [7].

Although several recent approaches such as diffusion-based models [8,9] and transformer-based architectures [10] have demonstrated state-of-the-art performance in image restoration tasks, their computational cost and slow inference speed make them less suitable for real-time autonomous driving applications. In contrast, CycleGAN provides an efficient framework that can be trained with unpaired data, which is particularly beneficial for raindrop removal tasks where paired datasets are difficult to obtain.

However, Figure 1b reveals that, for the CycleGAN trained on an unpaired dataset, the conversion results in color or brightness distortion of the image, with insufficient water-droplet removal. This occurs because the training dataset is unpaired, making it difficult to specify the required features for conversion and limiting water-droplet removal. In contrast, the CycleGAN trained with a paired dataset can detect water-droplet features in the converted images well and convert them without distorting their color or brightness compared to unpaired learning (Figure 1c).

When training CycleGAN using a paired dataset, if the dataset is insufficient, the model may become overly optimized to the limited training data, leading to overfitting and poor generalization to unseen testing data. Therefore, securing sufficient paired training data is essential [11,12]. In addition, augmenting the datasets allows the model to learn a wider variety of patterns, enabling better recognition and effectively removing various characteristics of the water droplets, improving visibility. Accordingly, in this study, a water-droplet generation module was trained to supplement the dataset, and a CNN-based classification model was employed to filter and select high-quality samples, which were added to the existing dataset to construct an augmented training dataset [13].

Before conducting water-droplet removal training, we modified the input and target components of the L1 loss function, which measures the difference between the predicted output and ground truth in the CycleGAN model, to focus the learning process on the water-droplet region. The CycleGAN model with the modified L1 loss accelerated learning on water-droplet features, resulting in a faster decrease in loss over training epochs and reduced distortion in background areas without raindrops compared to the original model [14,15].

This paper proposes improved visibility by restoring images with water droplets removed by applying two modules learned based on the CycleGAN model. The first module focuses on efficient water-droplet removal by training the augmented dataset using a CycleGAN model with modified L1 loss. The second module employs the same augmented dataset but applies tone mapping [16] to the target images, addressing not only additional water-droplet removal but also visibility enhancement under low-light rainy conditions. Since rainy weather is typically accompanied by low illumination, this design makes the proposed framework more practical and novel compared to existing approaches. The image resulting from sequentially applying the two modules displays better water-droplet removal with less detail loss and improved contrast, compared to the resulting image from the module trained on the original paired dataset.

2. Related Works

2.1. Cycle-Consistent Generative Adversarial Network

Image-to-image translation in computer vision aims to learn feature-level transformations between input and output images. This technique typically requires paired datasets comprising aligned image pairs. However, in many cases, obtaining sufficient pairs for effective training is challenging. Hence, the CycleGAN method, comprising two generators and two discriminators [7], was introduced to overcome this challenge.

The generator network is based on the residual network (ResNet) architecture, whereas the discriminator is based on the PatchGAN architecture. The ResNet architecture employs residual learning, focusing on learning the difference between the input and output. This structure is suitable for image generation tasks because it improves the ability of the model to capture the differences between the original and generated images. In addition, skip connections are applied to alleviate the vanishing gradient problem in deep networks [17,18]. In contrast, PatchGAN judges the authenticity of an image by evaluating localized patches of a fixed size rather than by the entire image.

The input image is either passed to a discriminator for evaluation or transformed into another domain and passed to the corresponding discriminator, followed by a transformation back to the original domain. This process is mirrored in the opposite direction by another network [19]. The following equation represents the adversarial loss, which is commonly used in GAN.

L_{G A N} (G, D_{Y}, X, Y) = E_{y ~ p_{d a t a (y)}} [\log D_{Y} (Y)] + E_{x ~ p_{d a t a (x)}} [\log (1 - D_{Y} (G (x)))],

(1)

where G denotes the mapping function for the X to Y transformation,

D_{Y}

represents the discriminator in the Y domain, and E is the expected value. In addition,

y ~ p_{d a t a (y)}

represents the distribution of real data, and

x ~ p_{d a t a (x)}

represents the distribution of fake data. The first term of the adversarial loss is the expected value obtained by taking the logarithm of the outcome of inputting each real data point y in domain Y into the discriminator. The second term is the expectation value obtained by taking the log(1-result) of the result of inputting each fake data point x in domain X into the generator and the result of feeding it into the discriminator.

The following equation calculates the cycle consistency loss, a typical feature of CycleGAN that differentiates it from traditional GAN.

L_{c y c} (G, F) = E_{x ~ p_{d a t a (x)}} [| | F (G (x)) - x | |_{1}] + E_{y ~ p_{d a t a (y)}} [| | G (F (y)) - y | |_{1}],

(2)

where F is a mapping function for the Y to X transformation and

D_{X}

denotes a discriminator in the X domain. Cyclic consistency loss ensures that a cycle of image transformations reverts to the original image when cycling through the mapping functions G and F with adversarial loss. The first term is the expectation value obtained when, for each image x in domain X, the image G(x) input into the mapping function G is input back into the mapping function F in the inverse direction, and the difference from the original image x is expressed as the absolute value. The second term is the value obtained when, for each image y in domain Y, the image F(y) that is substituted into the mapping function F is substituted back into the mapping function G to express the difference from the original image y as the absolute value.

The following equation represents the final loss of CycleGAN, combining the adversarial loss and cycle consistency loss.

L (G, F, D_{X}, D_{Y}) = L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, Y, X) + λ L_{c y c} (G, F),

(3)

where λ represents the weight of the cyclic consistency loss. The final CycleGAN loss combines the adversarial loss for the mapping function G, the adversarial loss for the mapping function F, and the cyclic consistency loss via a weighted λ.

For image-to-image translation, an additional identity loss is introduced to map to preserve the color configuration between the input and output. The following formula presents the identity loss.

L_{i d e n t i t y} (G, F) = E_{y ~ p_{d a t a (y)}} [| | G (y) - y | |_{1}] + E_{x ~ p_{d a t a (x)}} [| | F (x) - x | |_{1}],

(4)

The first term measures the difference between the output G(y) and the original input y when the mapping function G receives real data from domain Y. The purpose of this term is to guide the mapping function G to output the true image of the target domain, ensuring that the colors and characteristics of the input image remain unchanged. The second term measures the difference between the output F(x) and the original input x when the mapping function F receives real data x from domain X. The purpose of this term is similar to the first term.

The stochastic gradient descent (SGD) is a fundamental concept in optimization algorithms used to update parameters (weights and biases) to minimize a loss function. In CycleGAN, we apply the Adam optimizer for optimization [19].

2.2. Training Image Classification Model

The CNN is a useful architecture for determining patterns to analyze images [20] and is specialized in processing visual data, such as images. This structure was first proposed as an artificial neural network in the 1980s, and, in 2012, the AlexNet model had a considerable influence on the deep learning industry due to its excellent performance in large-scale image classification via deep learning [21]. Thus, related papers are still cited and researched, and the rectified linear units (ReLU) activation function has recently been adopted to solve the problem of gradient decay and accelerate the learning speed by adopting ReLU rather than the conventional sigmoid activation function or hyperbolic tangent [22,23].

The CNN consists of multiple convolutional layers and pooling layers for feature extraction. The data that are input into the CNN comprise three-dimensional (3D) tensors of height, width, and channel, represented in the form of a matrix. The convolutional operation of the input data and filter is performed for each channel in the convolutional layer. There are as many filters as channels in the input; hence, the result of the convolution is as many channels as filters. The results of each channel are added to obtain one output, which is called a feature map. The filter spacing can be changed by specifying the stride. During the convolutional process, the image becomes increasingly smaller; hence, information about specific pixels is lost. Padding can be employed to solve this problem, which is the addition of a border of a specific size around the edges of the input image, most commonly zero padding, which employs zero-filled pixels. The input and output images can be similarly sized to preserve information along the edges of the image.

Equations (5) and (6) calculate the output image size of the convolutional layer.

O_{h e i g h t} = \frac{(I_{h e i g h t} + 2 P - F H)}{S} + 1,

(5)

O_{w i d t h} = \frac{(I_{w i d t h} + 2 P - F W)}{S} + 1,

(6)

where

O_{h e i g h t}

and

O_{w i d t h}

denote the height and width of the output data, and

I_{h e i g h t}

and

I_{w i d t h}

represent the height and width of the input data. In addition, P denotes the padding size, and S denotes the stride size. Finally, FH and FW indicate the filter height and width, respectively [24].

In the pooling layer, the pooling operation takes the output of the convolutional layer (i.e., the feature map) as input and downsamples it to reduce the size of the feature map. Two standard pooling operations are max pooling and average pooling. Max pooling extracts the maximum value in the region overlapping the filter, whereas average pooling extracts the average of the values in the region. Unlike convolutional operations, pooling operations have no weights to learn, and the number of channels does not change after the operation.

The resulting data from the pooling layer should be passed to the fully connected layer. However, the feature map after the pooling layer still has a 3D shape. Therefore, it passes through a flattening process to convert it into a 1D vector, which is employed as input to the fully connected layer. In the fully connected layer, all input and output neurons are connected, and complex patterns and relationships are trained by comprehensively considering all features of the input data.

The resulting data from the fully connected layer are passed to the output layer, which applies an activation function such as the softmax or sigmoid function.

3. Proposed Method

All generator training using the CycleGAN model employed adversarial loss, cycle-consistency loss, and identity loss, where the identity loss contributes to preserving the original color configuration as discussed earlier.

Figure 2 illustrates the flowchart of the proposed method. The flowchart is divided into parts: (1) generate training data (Section 3.1) and (2) image visibility enhancement (Section 3.2). Each section of the figure is described in the following order: training the CycleGAN-based virtual water-droplet generation module (Section 3.1.1), data augmentation (Section 3.1.2), training the data classification model (Section 3.1.3), training the CycleGAN-based Rem_Module (Section 3.2.1), training the CycleGAN-based Rem&TM_Module (Section 3.2.2), and final testing with the two modules (Section 3.2.3).

3.1. Generating Training Data

3.1.1. Training the CycleGAN-Based Virtual Water-Droplet Generation Module

As mentioned, ensuring sufficient paired datasets to train CycleGAN is critical. In this training, the paired dataset consists of a clean image without water droplets and an image with water droplets on the same background. However, acquiring paired datasets with the same background is challenging in practice. Therefore, in this study, to supplement the lack of paired datasets for training with virtual datasets, we trained a virtual water-droplet generation module in the first block of the flowchart. The existing paired dataset with 800 images and an unpaired dataset with 400 images were provided as input to CycleGAN to train two droplet generation modules. In our image generation tasks, several parameters were carefully considered to ensure that the model effectively captured the differences between the original and generated images. All training images were resized to 256 × 256 to maintain consistency in spatial representation. The batch size was set to 3 and the learning rate to 2 × 10⁻⁴, which provided a balance between stable convergence and the ability to learn fine-grained details. The training was performed for 300 epochs using the Adam optimizer, enabling the model to gradually refine the mapping between clean and water-droplet images.

3.1.2. Data Augmentation

In the second block, the two generation modules trained in the previous step were employed to generate virtual water droplets on clean images (without water droplets). Figure 3 presents the results.

Figure 3a presents the virtual water-droplet generation results of the module trained on the 800-image paired dataset, revealing that the water droplets are small, distinct, and dense. In contrast, Figure 3b displays the virtual water-droplet generation results of the module trained with the dataset with 400 unpaired images, revealing that the water droplets are large, transparent, and less dense.

In addition, the results of the module trained with the paired dataset are better than the module trained with the unpaired dataset because this method is equivalent to supervised learning with a 1:1 match between the input and ground truth. Nevertheless, we employed both modules to generate data because training the various patterns of water droplets can improve visibility by more effectively removing the droplets in the real world.

3.1.3. Training the Data Classification Model

In the third block, we trained and used the CNN classification model to classify the good data among the augmented data. Figure 4 presents the results of generating two virtual droplet images.

Most generated virtual droplets are well formed (Figure 4a), but some are poorly formed (Figure 4b). Manually classifying these images individually is cumbersome, so we trained the CNN classification model to select well-formed images efficiently.

The CNN model used in this study comprises three convolutional layers and pooling layers, each with filters of size 3 × 3 and the padding set to 1 to maintain the size of the input image. The ReLU activation function was employed, and a max pooling operation of size 2 × 2 was performed to halve the size of the feature map. The feature maps were flattened and converted into 1D vectors, which were passed to the fully connected layer for final classification. The training was performed for 100 epochs on 600 pairs of data, using cross-entropy loss as the loss function and the Adam optimizer as the gradient loss optimization technique [25,26].

Figure 5 illustrates the results of training the CNN model without applying data augmentation.

In Figure 5a, the training loss approaches zero, whereas the validation loss increases to about 8.5. Similarly, in Figure 5b, the training accuracy is nearly 100%, but the validation accuracy fluctuates between 55% and 70%. In contrast, Figure 6 presents the results of training the CNN model by applying the data augmentation techniques RandomFlip and RandomCrop.

In Figure 6a, the training loss is nearly zero, and the validation loss also follows the training loss to approach zero. In addition, in Figure 6b, the training accuracy is lower than the initial CNN model, approaching 90%, but the validation accuracy also follows the training accuracy to nearly 90%. Therefore, in this study, we applied data augmentation to train the CNN model. The trained CNN model was used to classify 4000 images of virtual water droplets into well-formed and poorly formed categories. Table 1 lists the analysis results of the classification.

The number of well-formed virtual water-droplet images generated is 1200 out of 4000 (Table 1). The module trained with the paired dataset yielded 685 images, which is slightly more than the 515 images yielded by the module trained with the unpaired dataset. Only 1200 well-generated images of virtual water droplets were paired with clean originals and used to train the final water-droplet removal module. The detailed generating training data procedure is shown in Algorithm 1.

Algorithm 1 Generating training data

1:

Input : \{(D_{p a i r e d}^{c l e a n}, D_{p a i r e d}^{d r o p l e t})\}, \{(D_{u n p a i r e d}^{c l e a n}, D_{u n p a i r e d}^{d r o p l e t})\}, D_{1}^{c l e a n}, D_{2}^{c l e a n}

2:

Params : r e s i z e = 256 \times 256, b a t c h s i z e = 3, l r = 2 \times 10^{- 4}, e p o c h s = 300

3:

Output : D_{f i n a l}

4:

Initialize : D_{f i n a l} {\leftarrow D}_{p a i r e d}

5: Train two generators
6:

G_{p} : c l e a n \to d r o p l e t w i t h \{(D_{p a i r e d}^{c l e a n}, D_{p a i r e d}^{d r o p l e t})\}

7:

G_{u} : c l e a n \to d r o p l e t w i t h \{(D_{u n p a i r e d}^{c l e a n}, D_{u n p a i r e d}^{d r o p l e t})\}

8:

for e a c h I_{1}^{c l e a n} \in D_{1}^{c l e a n}

do
9:

\hat{I_{1}^{d r o p l e t}} = G_{p} (I_{1}^{c l e a n})

10:

D_{a u g m e n t e d} \leftarrow D_{a u g m e n t e d} \cup \{(I_{1}^{c l e a n}, \hat{I_{1}^{d r o p l e t}})\}

11:

end for

12:

for e a c h I_{2}^{c l e a n} \in D_{2}^{c l e a n}

do
13:

\hat{I_{2}^{d r o p l e t}} = G_{p} (I_{2}^{c l e a n})

14:

D_{a u g m e n t e d} \leftarrow D_{a u g m e n t e d} \cup \{(I_{2}^{c l e a n}, \hat{I_{2}^{d r o p l e t}})\}

15: end for
16:

Train CNN-based classifier C_{C}

using a labeled subset of D_{a u g m e n t e d}

to distinguish
17: well-formed vs. poorly formed droplets
18:

for e a c h p a i r {(I^{c l e a n}, \hat{I^{d r o p l e t}})} \in D_{a u g m e n t e d}

do
19:

if C_{C} (\hat{I^{d r o p l e t}})

== well-formed then
20:

D_{f i n a l} \leftarrow D_{f i n a l} \cup \{(I^{c l e a n}, \hat{I^{d r o p l e t}})\}

21: end for

3.2. Image Visibility Enhancement

3.2.1. Training the CycleGAN-Based Rem_Module

In the fourth block, we modified the input and target parts of the L1 loss so that the Rem_Module focuses on the water droplet. The image obtained from the difference between the water-droplet and clean images was loaded, normalized, and multiplied by the input and target parts of the L1 loss, allowing the model to focus on the water droplet when calculating the cycle consistency loss and identity loss to which the L1 loss is applied.

Equations (7) and (8) provide the modified cycle consistency loss and identity loss formulas, respectively.

L_{c y c} (G, F) = {(E}_{x ~ p_{d a t a (x)}} [| | M * (F (G (x)) - x) | |_{1}] + E_{y ~ p_{d a t a (y)}} [| | M * (G (F (y)) - y) | |_{1}]) * 100,

(7)

L_{i d e n t i t y} (G, F) = (E_{y ~ p_{d a t a (y)}} [| | M * (G (y) - y) | |_{1}] + E_{x ~ p_{d a t a (x)}} [| | M * (F (x) - x) | |_{1}]) * 100,

(8)

where M is the normalized difference image, which acts as an attention mask to target the parts of the droplets with values between 0 and 1. The image is also scaled by 100 to prevent the loss from becoming too small when multiplied by the attention mask. Figure 7 depicts the comparison with and without the L1 loss modification.

Figure 7a depicts the original water-droplet image, and Figure 7b presents the result of water-droplet removal using the module without modifying the L1 loss. Figure 7c displays the result of water-droplet removal using the module with the modified L1 loss. When looking at the red boxes in Figure 7, the module with the modified L1 loss does a better job of preserving the background in the original image during removal than the module without the modified L1 loss. In addition, in the second image in Figure 7, the module with the modified L1 loss preserves the boundary information of the median better than the module without the modified L1 loss, with a clearer sense of the median’s boundaries when removing water droplets. Therefore, we modified the L1 loss and provided a 2000-pair dataset with the previously existing dataset and the multiplied dataset as input to CycleGAN to train the Rem_Module. The training epoch value was set to 300.

3.2.2. Training the CycleGAN-Based Rem&TM_Module

In the fifth block, we trained the Rem&TM_Module. This module used the same dataset as the one used for training the Rem_Module, but it was trained by applying tone mapping to the clean image set, replacing it with a new clean target image set. Tone mapping was performed using a technique that combines the retinex theory and multiscale contrast-limited adaptive histogram equalization [16]. Figure 8 compares the results with and without tone mapping.

Figure 8a illustrates the original clean image without tone mapping, and Figure 8b presents the clean target image with tone mapping. The rainy weather situations covered in this paper are typically low-light environments; thus, modules trained in this way can preserve details in dark areas and enhance contrast to improve object detection performance.

The Rem&TM_Module is the same as the Rem_Module, with 2000 pairs in the dataset as input to CycleGAN, and the learning epoch is set to 300.

3.2.3. Final Testing with the Two Modules

In the last block, we applied the two modules learned in sequence. First, the Rem_Module is applied to remove water droplets, and the resulting image is converted to the LAB color space. The color information channels, A and B, are preserved separately. Then, we extracted only the luminance (L) channel from the result of applying the Rem&TM_Module, combining it with the preserved A and B channels to generate the final image.

Figure 9 and Figure 10 compare the model with and without preserving the A and B channels.

Figure 9a and Figure 10a present the results of applying the Rem&TM_Module without preserving the A and B channels, and Figure 9b and Figure 10b display the results of applying the Rem&TM_Module with the A and B channels preserved. When looking at the red circles in Figure 9, the Rem&TM_Module applied without preserving the A and B channels, the previously white wall appears cooler in color, with a blue tint due to a white balance error, an unnatural or distorted color appearance caused by an unbalanced color space conversion. In contrast, preserving the A and B channels and applying the Rem&TM_Module reduces the white balance error.

In addition, red circles in Figure 10 reveal that, when the Rem&TM_Module is applied without preserving the A and B channels, color bleeding occurs in the boundary area because the color information channels, A and B, are lost during color-space conversion, making it difficult to distinguish the original colors, resulting in unnatural color bleeding. In contrast, preserving the A and B channels and applying the Rem&TM_Module preserves the color information, resulting in sharper boundaries and improved color bleeding.

The two submodules play complementary roles in enhancing image visibility: the Rem_Module focuses on effective water-droplet removal using the augmented dataset and modified loss, while the Rem&TM_Module introduces tone-mapped targets to improve luminance and contrast under low-light conditions. In particular, the Rem&TM_Module reduces white balance error because the chrominance channels (a and b) in the LAB color space are preserved from the Rem_Module output, ensuring that original color relationships remain intact while only the luminance (L) channel is enhanced. This separation prevents color shifts during tone mapping and maintains natural appearance in the restored images. The detailed image visibility enhancement procedure is shown in Algorithm 2.

Algorithm 2 Image Visibility Enhancement

1:

Input : \{(D_{f i n a l}^{c l e a n}, D_{f i n a l}^{d r o p l e t})\}, I_{t e s t}^{d r o p}

2:

Params : r e s i z e = 256 \times 256, b a t c h s i z e = 3, l r = 2 \times 10^{- 4}, e p o c h s = 300

3:

Output : I_{t e s t}^{r e s u l t}

4:

Initialize : M_{f i n a l} \leftarrow D_{f i n a l}^{d r o p l e t} - D_{f i n a l}^{c l e a n}, D_{f i n a l}^{T M} \leftarrow T_{R e t i n e x + C L A H E} (D_{f i n a l}^{c l e a n})

5: Train Rem_Module
6:

G_{R e m} : d r o p l e t \to c l e a n w i t h \{(D_{f i n a l}^{c l e a n}, D_{f i n a l}^{d r o p l e t}, M_{f i n a l})\}

7: Train Rem&TM_Module
8:

G_{R e m & T M} : d r o p l e t \to c l e a n w i t h \{(D_{f i n a l}^{T M}, D_{f i n a l}^{d r o p l e t})\}

9:

\hat{I_{R e m}} = G_{R e m} (I_{t e s t}^{d r o p})

10:

Convert \hat{I_{R e m}}

to LAB \to

keep A, B

channels
11:

\hat{I_{R e m & T M}} = G_{R e m & T M} (\hat{I_{R e m}} ™)

12:

Convert \hat{I_{R e m}}

to LAB \to

keep L

channels
13:

Fuse (L_{\hat{I_{R e m & T M}}}, A_{\hat{I_{R e m}}}, B_{\hat{I_{R e m}}})

and convert to RGB \to I_{t e s t}^{r e s u l t}

4. Simulations

This study trains the modules using a paired dataset with 800 images and an unpaired dataset with 400 images from the work by Qian et al. [27]. All the training images were resized to 256 × 256. We compared the results with several existing algorithms under various imaging conditions to evaluate the effectiveness of this approach rigorously. The following options for the modules used in training were consistent across all cases: batch size = 3, epoch = 300, and learning rate = 2 × 10⁻⁴. The training and experiments were implemented on an Nvidia RTX 3090 GPU and an Intel i7-11700 CPU, using MATLAB R2025b for computational analysis.

4.1. Comparative Experiments

For a fair comparison, all baseline methods (RWM [28], pix2pix [6], CycleGAN [7], and Palette [8]) were re-implemented and trained on the same dataset as our proposed method [27], with identical image resolution and training settings. The method by Han et al. [29] was additionally evaluated with its original data augmentation strategy, which is conceptually similar to our augmentation approach, to ensure consistency with the reported results in their paper. No pretrained weights were used; all models were fine-tuned on our dataset to allow a direct and fair comparison.

Figure 11 and Figure 12 compare the images of the proposed and existing methods on the input water-droplet images. Figure 11a and Figure 12a present the original water-droplet images. Figure 11b and Figure 12b provide the results of the RWM method. Figure 11c and Figure 12c display the results of the pix2pix method. Figure 11d and Figure 12d depict the results of the method by Han et al [29]. Figure 11e and Figure 12e depict the results of CycleGAN and Figure 11f and Figure 12f depict the results of the Palette method. Finally, Figure 11g and Figure 12g present the results of the proposed method.

The proposed method removes water droplets more effectively than existing approaches while preserving natural image appearance. It produces outputs with clearer object boundaries, more stable color tones, and fewer residual artifacts, leading to an overall improvement in visual quality under rainy and low-light conditions. In particular, in the first and second comparison images in Figure 11, the vehicles and lanes in the shadows are not clear in the original water-droplet image or the images derived from the conventional method. In contrast, the proposed method effectively removes the water droplets while improving the contrast, so that the vehicles and lanes in the shade are more clearly recognized.

Furthermore, in the first comparison image in Figure 12, the original water-droplet image and the results of the existing methods do not clearly present the four vehicles in the image due to the effects of backlighting and the low-light environment. In contrast, the proposed method effectively removes the water droplets and improves the contrast ratio, so that the four vehicles are recognized more clearly than in the existing methods. Table 2 presents the processing times of the existing and proposed methods. The proposed method has a longer processing time than the existing CycleGAN because it processes two modules.

It is worth noting that this study specifically addresses rain-related visibility degradation. While fog, snow, and other weather conditions present distinct challenges, our focus is on water-droplet removal and low-light rainy environments. Extending the proposed framework to other adverse weather types remains an important area for future exploration.

4.2. Quantitative Evaluations

The performance of the proposed method was quantitatively evaluated using several image quality metrics. The blind/referenceless image spatial quality evaluator (BRISQUE) measures perceived quality by modeling natural scene statistics from locally normalized luminance features, allowing distortions to be quantified objectively [30]. The spatial and spectral entropy-based quality (SSEQ) index analyzes entropy distributions in both spatial and frequency domains, making it suitable for detecting structural inconsistencies in diverse imaging conditions [31]. The spatial-spectral sharpness measure (S3) combines local spectral slope with total variation to provide a comprehensive evaluation of sharpness [32].

The localized perceptual contrast–sharpness index (LPC-SI) emphasizes the preservation of contrast and structural information, highlighting the retention of fine image details [33]. The natural image quality evaluator (NIQE) offers a no-reference assessment by comparing statistical regularities of the image with those from natural scenes, providing an objective indication of quality without requiring a ground truth [34]. The JPEG_2000 metric estimates the level of degradation due to compression artifacts by analyzing spatial distortions associated with this format [35].

Figure 13 presents the score across 23 comparison images and their averages. Figure 14 shows the images used for evaluation, and Table 3 compares the metric scores.

5. Conclusions

This paper proposes a novel approach to remove water droplets from images derived from cameras on autonomous vehicles due to rainy weather. To solve the problem of the lack of existing datasets, a CycleGAN-based generation module was trained to produce synthetic water-droplet images, and a CNN-based classification model was used to select high-quality samples for data augmentation. These augmented data were then used to train a two-stage removal network: the Rem_Module for basic droplet removal and the Rem&TM_Module with tone mapping for enhanced removal under low-light conditions. Sequential application of the two modules produced final restoration results that preserved chrominance while improving luminance. Experimental results show that the proposed method generates more natural, less distorted images and outperforms existing methods on evaluation metrics. In future work, we aim to expand the dataset to more diverse environments and design a lightweight network for real-time processing.

Author Contributions

Conceptualization, S.-H.L. (Sung-Hak Lee); methodology, S.-W.L. and S.-H.L. (Sung-Hak Lee); software, S.-W.L.; validation, S.-W.L., S.-H.L. (Seung-Hwan Lee), D.-M.S., and S.-H.L. (Sung-Hak Lee); formal analysis, S.-W.L. and S.-H.L. (Sung-Hak Lee); investigation, S.-W.L. and S.-H.L. (Sung-Hak Lee); resources, S.-W.L. and S.-H.L. (Sung-Hak Lee); data curation, S.-W.L., S.-H.L. (Seung-Hwan Lee), D.-M.S., and S.-H.L. (Sung-Hak Lee); writing—original draft preparation, S.-W.L.; writing—review and editing, S.-H.L. (Sung-Hak Lee); visualization, S.-W.L.; supervision, S.-H.L. (Sung-Hak Lee); project administration, S.-H.L. (Sung-Hak Lee); funding acquisition, S.-H.L. (Sung-Hak Lee). All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Korea Creative Content Agency (KOCCA) grant funded by the Ministry of Culture, Sports and Tourism (MCST) in 2024 (Project Name: Development of optical technology and sharing platform technology to acquire digital cultural heritage for high quality restoration of composite materials cultural heritage, Project Number: RS-2024-00442410, Contribution Rate: 50%) and the Institute of Information and Communications Technology Planning & Evaluation (IITP)-Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government (MSIT) (IITP-2025-RS-2022-00156389, 50%).

Data Availability Statement

The data presented in this study are openly available in Qian et al. in reference [27] and freely utilized at https://github.com/rui1996/DeRaindrop?tab=readme-ov-file (accessed on 28 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

Kwon, H.-J.; Lee, S.-H. Raindrop-removal image translation using target-mask network with attention module. Mathematics 2023, 11, 3318. [Google Scholar] [CrossRef]
Zheng, X.; Liao, Y.; Guo, W.; Fu, X.; Ding, X. Single-Image-Based Rain and Snow Removal Using Multi-guided Filter. In Neural Information Processing; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8228 LNCS, pp. 258–265. [Google Scholar] [CrossRef]
Bijelic, M.; Gruber, T.; Mannan, F.; Kraus, F.; Ritter, W.; Dietmayer, K.; Heide, F. Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Halder, S.S.; Lalonde, J.-F.; de Charette, R. Physics-Based Rendering for Improving Robustness to Rain. 2019. Available online: https://team.inria.fr/rits/computer-vision/weather-augment/ (accessed on 5 July 2025).
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efro, A.A.s. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Saharia, C.; Chan, W.; Chang, H.; Lee, C.; Ho, J.; Salimans, T.; Fleet, D.; Norouzi, M. Palette: Image-to-image diffusion models. In Proceedings of the SIGGRAPH’22: Special Interest Group on Computer Graphics and Interactive Techniques Conference, Vancouver, BC, Canada, 7–11 August 2022. [Google Scholar] [CrossRef]
Xia, B.; Zhang, Y.; Wang, S.; Wang, Y.; Wu, X.; Tian, Y.; Yang, W.; Van Gool, L. Diffir: Efficient diffusion model for image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; Available online: http://openaccess.thecvf.com/content/ICCV2023/html/Xia_DiffIR_Efficient_Diffusion_Model_for_Image_Restoration_ICCV_2023_paper.html (accessed on 25 August 2025).
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; Available online: http://openaccess.thecvf.com/content/CVPR2022/html/Zamir_Restormer_Efficient_Transformer_for_High-Resolution_Image_Restoration_CVPR_2022_paper.html (accessed on 25 August 2025).
Ying, X. An overview of overfitting and its solutions. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2019; p. 022022. [Google Scholar]
Torralba, A.; Efros, A.A. Unbiased look at dataset bias. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1521–1528. [Google Scholar] [CrossRef]
Kiyoiti, F.H.; Tanaka, S.; Aranha, C.; Lee, W.S.; Suzuki, T. Data Augmentation Using GANs. arXiv 2019, arXiv:1904.09135. Available online: https://arxiv.org/pdf/1904.09135 (accessed on 5 July 2025). [CrossRef]
Janocha, K.; Czarnecki, W.M. On loss functions for deep neural networks in classification. arXiv 2017, arXiv:1702.05659. [Google Scholar] [CrossRef]
Mejjati, Y.A.; Richardt, C.; Tompkin, J.; Cosker, D.; Kim, K.I. Unsupervised Attention-guided Image-to-Image Translation. Adv. Neural Inf. Process. Syst. 2018, 31, 3693–3703. [Google Scholar]
Kim, J.Y.; Son, D.-M.; Lee, S.-H. Retinex Jointed Multiscale CLAHE Model for HDR Image Tone Compression. Mathematics 2024, 12, 1541. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
Kinga, D.; Adam, J.B. A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 2002, 86, 2278–2324. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Jang, C.Y.; Kim, T.Y. Hand feature enhancement and user decision making for CNN hand gesture recognition algorithm. J. Inst. Electron. Inf. Eng. 2020, 57, 60–70. [Google Scholar] [CrossRef]
Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; Available online: https://proceedings.mlr.press/v15/glorot11a.html (accessed on 5 July 2025).
Dumoulin, V.; Visin, F.; Box, G.E.P. A guide to convolution arithmetic for deep learning. arXiv 2016, arXiv:1603.07285. Available online: https://arxiv.org/pdf/1603.07285 (accessed on 5 July 2025).
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. Available online: https://arxiv.org/pdf/1409.1556 (accessed on 5 July 2025).
Qian, R.; Tan, R.T.; Yang, W.; Su, J.; Liu, J. Attentive Generative Adversarial Network for Raindrop Removal from a Single Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2482–2491. [Google Scholar] [CrossRef]
Jung, S.-W.; Kwon, H.-J.; Lee, S.-H. Enhanced Tone Mapping Using Regional Fused GAN Training with a Gamma-Shift Dataset. Appl. Sci. 2021, 11, 7754. [Google Scholar] [CrossRef]
Han, Y.-K.; Jung, S.-W.; Kwon, H.-J.; Lee, S.-H. Rainwater-Removal Image Conversion Learning with Training Pair Augmentation. Entropy 2023, 25, 118. [Google Scholar] [CrossRef] [PubMed]
Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
Liu, L.; Liu, B.; Huang, H.; Bovik, A.C. No-reference image quality assessment based on spatial and spectral entropies. Signal Process. Image Commun. 2014, 29, 856–863. [Google Scholar] [CrossRef]
Vu, C.T.; Chandler, D.M. S3: A spectral and spatial sharpness measure. In Proceedings of the First International Conference on Advances in Multimedia, Colmar, France, 20–25 July 2009; pp. 37–43. [Google Scholar] [CrossRef]
Hassen, R.; Wang, Z.; Salama, M.M.A. Image sharpness assessment based on local phase coherence. IEEE Trans. Image Process. 2013, 22, 2798–2810. [Google Scholar] [CrossRef] [PubMed]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a ‘completely blind’ image quality analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
Sazzad, Z.M.P.; Kawayoke, Y.; Horita, Y. No reference image quality assessment for JPEG2000 based on spatial features. Signal Process. Image Commun. 2008, 23, 257–268. [Google Scholar] [CrossRef]

Figure 1. Water-droplet input and removal using CycleGAN: (a) Original water-droplet image, (b) water-droplet removal using an unpaired dataset with CycleGAN, and (c) water-droplet removal using a paired dataset with CycleGAN.

Figure 2. Flowchart of the proposed method.

Figure 3. Results of generating fake water droplets: (a) Module results trained with a dataset of 800 paired images and (b) results with a dataset of 400 unpaired images.

Figure 4. Results of generating virtual water droplets: (a) well-formed image and (b) poorly formed image.

Figure 5. Results of the CNN model training without data augmentation: (a) loss and (b) accuracy.

Figure 6. Results of the CNN model training with data augmentation: (a) loss and (b) accuracy.

Figure 7. Comparison based on the L1 loss modification: (a) Original water-droplet image, (b) results of water-droplet removal using the module without modifying the L1 loss, and (c) results of water-droplet removal using the module with the modified L1 loss.

Figure 8. Tone mapping: (a) Original clean image and (b) clean target image with tone mapping.

Figure 9. Comparison based on the A and B channel preservation: (a) Original water-droplet image, (b) result of applying the Rem&TM_Module without preserving the A and B channels, and (c) result of applying the Rem&TM_Module with preserved A and B channels.

Figure 10. Comparison based on A and B channel preservation: (a) Original water-droplet image, (b) result of applying the Rem&TM_Module without preserving the A and B channels, and (c) result of applying the Rem&TM_Module with preserved A and B channels.

Figure 11. Comparison of the results using the existing methods and the proposed method: (a) Original water droplet images, (b) RWM [28], (c) pix2pix [6], (d) Han et al. [29], (e) CycleGAN [7], (f) Palette [8], (g) Proposed method.

Figure 12. Comparison of the results using the existing methods and the proposed method: (a) Original water droplet images, (b) RWM [28], (c) pix2pix [6], (d) Han et al. [29], (e) CycleGAN [7], (f) Palette [8], (g) Proposed method.

Figure 13. Metric scores: (a) Blind/referenceless image spatial quality evaluator (BRISQUE), (b) spatial and spectral entropy-based quality (SSEQ), (c) spatial-spectral sharpness measure (S3) score, (d) local phase coherence–sharpness index (LPC_SI) score, (e) natural image quality evaluator (NIQE) score, and (f) JPEG_2000 score. (The y-axis represents the metric score; the x-axis indicates image numbers. The symbol (↑) indicates that higher scores are preferable, and (↓) indicates that lower scores are preferable.).

Figure 14. Images employed for score metrics. The numbers indicate the image indices.

Table 1. Analysis of classification results.

	Module Trained with a Paired Dataset (800 Images)	Module Trained with an Unpaired Dataset (400 Images)	Total Images
Well formed	685	515	1200
Poorly formed	1315	1485	2800
Total	2000	2000	4000

Table 2. Process time for the existing and proposed methods. Bold font highlights the best result. Superscript numbers indicate the ranking among comparison models.

	RWM [25]	Pix2pix [6]	Han et al. [26]	CycleGAN [7]	Pallete [8]	Proposed
Process time	0.111 s	0.192 s	0.201 s	0.067 s ¹	40.060 s	0.106 s ²

Table 3. Comparison of the metric scores.

	RWM [28]	Pix2pix [6]	Han et al. [29]	CycleGAN [7]	Pallete [8]	Proposed
BRISQUE (↓)	22.095	23.435	24.186	20.308	18.191	17.795 ¹
SSEQ (↓)	28.314	22.195	23.696	14.282	12.206	21.931 ³
S3 (↑)	0.180	0.244	0.228	0.239	0.251	0.255 ¹
LPC_SI (↑)	0.928	0.936	0.941	0.931	0.935	0.943 ¹
NIQE (↓)	4.927	5.406	5.002	5.006	6.522	4.916 ¹
JPEG_2000 (↑)	80.220	80.206	80.189	80.266	80.259	80.261 ²

Notes: The symbol (↑) indicates that higher scores are preferable, and (↓) indicates that lower scores are preferable. Superscript numbers indicate the ranking among comparison models. Bold font highlights the best result in each corresponding metric.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, S.-W.; Lee, S.-H.; Son, D.-M.; Lee, S.-H. Image Visibility Enhancement Under Inclement Weather with an Intensified Generative Training Set. Mathematics 2025, 13, 2833. https://doi.org/10.3390/math13172833

AMA Style

Lee S-W, Lee S-H, Son D-M, Lee S-H. Image Visibility Enhancement Under Inclement Weather with an Intensified Generative Training Set. Mathematics. 2025; 13(17):2833. https://doi.org/10.3390/math13172833

Chicago/Turabian Style

Lee, Se-Wan, Seung-Hwan Lee, Dong-Min Son, and Sung-Hak Lee. 2025. "Image Visibility Enhancement Under Inclement Weather with an Intensified Generative Training Set" Mathematics 13, no. 17: 2833. https://doi.org/10.3390/math13172833

APA Style

Lee, S.-W., Lee, S.-H., Son, D.-M., & Lee, S.-H. (2025). Image Visibility Enhancement Under Inclement Weather with an Intensified Generative Training Set. Mathematics, 13(17), 2833. https://doi.org/10.3390/math13172833

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Image Visibility Enhancement Under Inclement Weather with an Intensified Generative Training Set

Abstract

1. Introduction

2. Related Works

2.1. Cycle-Consistent Generative Adversarial Network

2.2. Training Image Classification Model

3. Proposed Method

3.1. Generating Training Data

3.1.1. Training the CycleGAN-Based Virtual Water-Droplet Generation Module

3.1.2. Data Augmentation

3.1.3. Training the Data Classification Model

3.2. Image Visibility Enhancement

3.2.1. Training the CycleGAN-Based Rem_Module

3.2.2. Training the CycleGAN-Based Rem&TM_Module

3.2.3. Final Testing with the Two Modules

4. Simulations

4.1. Comparative Experiments

4.2. Quantitative Evaluations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI