Raindrop-Aware GAN : Unsupervised Learning for Raindrop-Contaminated Coastal Video Enhancement

: We propose an unsupervised network with adversarial learning, the Raindrop-aware GAN , which enhances the quality of coastal video images contaminated by raindrops. Raindrop removal from coastal videos faces two main difﬁculties: converting the degraded image into a clean one by visually removing the raindrops, and restoring the background coastal wave information in the raindrop regions. The components of the proposed network—a generator and a discriminator for adversarial learning—are trained on unpaired images degraded by raindrops and clean images free from raindrops. By creating raindrop masks and background-restored images, the generator restores the background information in the raindrop regions alone, preserving the input as much as possible. The proposed network was trained and tested on an open-access dataset and directly collected dataset from the coastal area. It was then evaluated by three metrics: the peak signal-to-noise ratio, structural similarity, and a naturalness-quality evaluator. The indices of metrics are 8.2% (+2.012), 0.2% (+0.002), and 1.6% ( − 0.196) better than the state-of-the-art method, respectively. In the visual assessment of the enhanced video image quality, our method better restored the image patterns of steep wave crests and breaking than the other methods. In both quantitative and qualitative experiments, the proposed method more effectively removed the raindrops in coastal video and recovered the damaged background wave information than state-of-the-art methods.


Introduction
Coastal area plays an important role in national economy, commerce, and recreation. However, these regions are currently threatened by climate change, sea-level rise, beach erosion, extreme storms, and coastal urbanization [1]. Studying coastal area is a complicated task. Coastal research is strongly interrelated with hydrodynamics, morphodynamics, and anthropogenic interactions, and is also linked to geological, meteorological, hydrological, and biological processes [2]. Furthermore, these processes and their complex interactions vary on temporal scales from seconds to decades and on spatial scales from centimeters to tens of kilometers. Although many processes and interactions have been elucidated over the past few decades, the remaining scientific challenges require advancements in simulation and observation [3].
Remote sensing can observe the large-scale variability over a long-term period, and provides coastal measurements even in extreme events. In particular, land-based remote-sensing equipment such as cameras and video systems can provide long-term, sub-meter-scale consecutive images with high images. The rain streaks are identified by their common movements and appearances in the two models. Roser and Geiger [11] proposed a photometric raindrop model that detects the spherical shapes of raindrops. However, neither of these techniques restores the background information. The region-completion method of You et al. [12] accomplishes both raindrop detection and background restoration using the neighboring pixel information. Although this method restores the raindrop-occluded pixels in the frequency domain, it is limited to static raindrops with spherical shapes.
Lauded for their excellent performance in image processing and translation, convolutional neural networks (CNNs) have been introduced to various data-driven approaches for raindrop removal. CNN-based techniques can be classified into supervised learning and unsupervised learning, depending on whether or not clean images corresponding to distorted images are used as the ground truth in the training phase.
Employing a supervised CNN restored the background of raindrop-distorted regions in a patch-wise manner [13]. High performance is achieved only on images contaminated by scattered small-sized sprinkles. Qian et al. [14] investigated raindrop removal by supervised image-to-image translation methods with adversarial learning, and proposed the attentive generative adversarial network (Attentive GAN) [14]. Attentive GAN generates a raindrop mask using a long short-term memory (LSTM)-based network, which guides the subsequent generator network to restore the background in the distorted regions. Attentive GAN achieves higher similarities between the restored images and the corresponding clean images than conventional methods such as Pix2Pix [15]. However, Attentive GAN requires an additional ground-truth dataset for supervising the raindrop-mask generation. More recently, Peng et al. [16] introduced a fully-CNN with skip connections and concurrent channel and spatial attention modules. Their method removes raindrops from single images, but supervised approaches invariably require paired datasets including clean images. Backgrounds with dynamically changing subjects are not easily acquired as clean images with accurate correspondence to distorted images. For this reason, supervised methods are usually evaluated on a simulated dataset.
Unsupervised image-to-image translation with adversarial learning has been drawing attention in the field of remote sensing, owing to their advantages in using unlabeled and unpaired data set for training. Recently, a cycle-consistent GAN has been exploited to transcode synthetic-aperture-radar (SAR) images into optical images for building change detection using optical-like features [17,18]. GAN architectures have also proven their advantages using discriminative features of the discriminator in hyperspectral image classification [19,20]. Unsupervised image-to-image translation has been employed for domain adaptation in aerial image segmentation [21]. In raindrop removal, to avoid the paired-data requirement, Uzun and Temizel [22] and Wei et al. [23] proposed unsupervised and semi-supervised approaches using GAN architectures to map the distorted images to a target distribution of clean images. However, as shown in Qian et al. [14], image-to-image translation without attention to raindrops often fails the background reconstruction task in undistorted regions. On the contrary, we present an unsupervised GAN with a raindrop-restoration focus and regularization for improved background learning.

Raindrop-Aware GAN
The proposed raindrop-removal framework (Raindrop-aware GAN) is an unsupervised image-to-image translation method with adversarial learning, which transforms a raindrop-distorted image into a clean image domain. As shown in Figure 1, the proposed network consists of two sub-networks: (1) a conditional generative model called the scene generator, which restores the backgrounds in the raindrop regions, and (2) a discriminator that captures the distributions of the distorted and clean image samples in a patch-wise manner. The scene generator attentively restores the backgrounds by focusing on the raindrop regions determined by a raindrop mask, while preserving the input image as much as possible. The adversarial learning in the scene generator restores the natural appearance of the coastal waves in the backgrounds. The restoration level is sufficient to deceive the discriminator.

Scene Generator for Raindrop-Aware Background Reconstruction
Given an input image (r i ) contaminated by raindrops, the scene generator (G) produces a background-restored image (o r i ) and a raindrop mask (m r i ). The mask represents the raindrop regions on a scale from zero (background) to one (raindrops). Our scene generator recognizes the raindrop regions without any supervision, and transforms the distorted images into a distribution of clean images while preserving the backgrounds of the raindrop regions. The reconstruction is formulated as follows:r where i is the image index of coastal videos, r indicates the distorted images, andr i is the final output of the generator through the fusion of the background-restored image (o r i ) and the input image using the raindrop mask. The scene generator is implemented as a fully-convolutional network with skip connections. Figure 1a shows the structure of the scene generator. It is composed of three parts: (1) an encoder that learns the significant features from the inputs; (2) a bottleneck that builds a deep architecture with a large receptive field; and (3) a decoder that generates the background-restored image and a raindrop mask with the same input resolution.
The encoder consists of a series of convolution blocks. Each block comprises 3 × 3 convolutional layers followed by a rectified linear unit (RELU) activation function [24]. For downsampling operations in the feature encoding, the second and last convolution blocks are performed with a stride of 2. Thus, the spatial resolution of the encoder output is w/4 × h/4, where w is the width, and h is the height of the input image, respectively. The bottleneck part is a series of residual blocks with dilated convolution layers [25]. Each residual block combines a shortcut connection by addition and a sequence of two 3 × 3 dilated convolutions followed by RELU functions. The dilated convolution efficiently increases the receptive field while maintaining spatial resolution. The bottleneck part with multiple residual blocks ensures the robustness of Raindrop-aware GAN when detecting raindrop regions of various sizes. In the decoder, transposed convolutional layers with a stride of 2 are used for upsampling operations, and the convolutional blocks are used for reconstruction. Among the four output channels of the decoder, three channels are allocated to the background-restored image, and one channel is reserved for the raindrop mask. To minimize the loss of image details by frequent convolution and downsampling operations, we employ long skip-connections that deliver the early-layer features of the encoder to the decoder by concatenation.
The scene generator is trained to minimize three loss functions: L gen for adversarial learning against the discriminator, L mask for enforcing the sparsity of the raindrop mask, and L reg for background learning. L gen and the discriminator loss are discussed to describe the adversarial learning between the generator and discriminator networks in the next Section 3.3. The total loss for training the scene generator is also described along with L gen . In the reconstruction operation (Equation (1)), the mask sparsity loss (L mask ) is an l 1 loss that prevents easy saturation (1.0) of the mask, and which drives the focus to the regions distorted by raindrops. The loss function (L mask ) is given by where I rain is the set of training instances of distorted images. Because the raindrop discrimination is unsupervised, we regularize the scene generator to produce a zero-valued mask in the undistorted background regions while correctly reconstructing the backgrounds. To achieve this task, we compute the regularization loss (L reg ) of selected clean images (c j ) that are randomly sampled from the training set: where m c j is the mask, o c j is the background-restored version of image c j , and I clean is the training set of clean images. c indicates the outputs of the scene generator from the clean images. Figure 2 shows the forward propagation of the input samples in the scene generator when computing the loss functions of unsupervised training.

Discriminator for Regional Raindrop Pattern Recognition
During adversarial learning, the discriminator (D) is trained to distinguish clean images from the images reconstructed by the scene generator. As a classifier, the discriminator learns the distributions of the real and fake samples; meanwhile, the scene generator attempts to deceive it by producing more realistic images. Because the raindrop distortions are observed locally, we construct the discriminator in the PatchGAN architecture [15]. The discriminator is composed of a series of convolution blocks (convolutional layers followed by a RELU activation function). After multiple downsampling operations through convolution blocks with a stride of 2, the output dimension of the discriminator is w/32 × h/32 × 1. Each element of the output feature represents one patch of the given input (see the discriminator in Figure 1). The loss function of the discriminator is given by wherer i is the image reconstructed by Equation (1). To deceive the discriminator, the scene generator is trained using the following loss function: The total loss of the scene generator is then computed as where ω 1 , ω 2 , and ω 3 are hyperparameters that control the importance of each loss function. As the o>1 and o>2 values increase, the masks more rapidly approach to zero; conversely, increasing W3 drives more areas of the mask toward one. Algorithm 1 performs the unsupervised training of Raindrop-aware GAN on the training sets of the distorted and clean images. The Shuffle (·) function in Algorithm 1 shuffles the data indices of the training sets, and the Next, (·) function sequentially calls the data instances of the next indices. In the testing stage, single video frames, extracted from coastal videos, are given to the trained generator. Note that the block of frame sequences can be input to the generator as well. This approach can help reduce the processing time for entire videos. Since the generator learns the backgrounds to regenerate them using the loss L reg , extracted frame sequences including undistorted images can be given without any further processing.  (1) with r i as an input Compute L mask and L gan of G with r i as an input Compute L reg of G with c i as an input Compute L D of D with c i andr i as an input Update parameters of D using L D Update parameters of G using Equation (6) end while end for Output: Trained networks G and D

Raindrop1119
Qian et al. [14] created a dataset of raindrops attached to a glass window or lens, and released it for use in raindrop removal studies in Attentive GAN. Their dataset includes 1119 image pairs, one degraded by raindrops, the other free from raindrops in the same background scene (corresponding ground-truth image of degraded one). The outdoor background scenes and the raindrop sizes and distributions are varied among the image pairs. This dataset, hereafter referred to as Raindrop1119, contains 891 training data and 58 test data. Samples from Raindrop1119 are shown in Figure 3.

Coastal Wave Monitoring at Anmok Beach
The video enhancement study is conducted at Anmok beach, a straight, almost 4 km-long strips dominated by micro-tidal waves. The beach is located on the east coast of South Korea (Figure 4a,b), which has eroded in the last few decades. To understand the associated physical process, a video monitoring system using general Closed Circuit Television (CCTV) has been installed, and the video data captured in 2016 and 2017 have been stored. The camera locations and an image of the camera view are indicated in (b) and (c) of Figure 4, respectively. In this study, we additionally installed CCTVs to acquire raindrop-contaminated coastal video images paired with their clean images (see (d) in Figure 4) for a fair comparison with the supervised approaches using ground-truth. Details of the paired coastal video acquisition are described in the following section.

Anmok Paired Dataset
Similarly to Raindrop1119, pairs of raindrop images at Anmok beach were generated for fairly comparing the proposed unsupervised method with the state-of-the-art supervised baseline models. The camera location is indicated in Figure 4b, and samples of the acquired video images are shown in Figure 5. An image of the camera view is displayed in Figure 4d. Hereafter, this dataset is referred to as the Anmok paired dataset. The images in Figure 5a are degraded by raindrops, whereas those in Figure 5b are raindrop-free. The images (ground-truth) are paired against the same background.
The two CCTVs were connected for simultaneously recording the raindrop-contaminated video images and their corresponding raindrop-free images. Under inclement weather conditions, the CCTV lens-distortion pattern caused by raindrops on the instrument was replicated by a (20 × 15 cm) transparent glass plate attached to the front of the CCTV lens, and the raindrop pattern was simulated by a sprayer. Clear images without distortion were acquired as the corresponding synchronous CCTV images taken at the same time as the contaminated images, and with the same glass plate attached. The collected images contained 13 different raindrops and raindrop deformation images collected at 1-min intervals. The samples showed the spatial and temporal changes in the raindrops over approximately 20 min. From each video, we randomly extracted video frames (16,841 frames total) and divided them into two sets according to acquisition time: 13,605 data for training and 3236 data for verification. The training and test sets are not overlapped and not extracted from the same videos.

Anmok Unpaired Dataset
To evaluate the applicability of the proposed method, we also acquired coastal video images from the video monitoring system (see Figure 4c), installed at Anmok beach. Samples of the acquired video images are shown in Figure 6. Hereafter, this dataset is called Anmok unpaired dataset. Unlike the Raindrop1119, Figure 6b is free from raindrops but do not correspond to the raindrop-contaminated images in Figure 6a. The spatial and temporal resolutions of the video system are 1920 × 1080 and 30 frames per second (fps), respectively. From the video clips acquired in November 2016 and 2017, we randomly selected 233 and 360 clips for training and validating the proposed networks, respectively (training:validation ratio = 8:2). All videos were recorded in daytime and cover different wavebreaking and light conditions. Each video clip is approximately 10 min long. For unsupervised learning, which excludes the labeling task, all video frames recorded at 30 fps were used without downsampling.

Implementation Details
The kernel parameters of the proposed networks were randomly initialized by He initialization [26] and trained by an Adam optimizer [27]. In both the scene generator and discriminator, the parameters β 1 , β 2 , and of the Adam optimizer were set to 0.5, 0.999, and 0.001, respectively. The learning rate (η) was 0.0001, and the training datasets were augmented with random horizontal flipping. The size of the images input to the networks was fixed to 720 × 480. The hyperparameters ω 1 , ω 2 , and ω 3 were empirically set to 1.0, 0.01, and 0.5 respectively through hyperparameter tuning process using a tuning set, which was randomly selected from the Raindrop1119 dataset. The size of the tuning set was 10% of the training set. The hyperparameters were applied equally to all experiments.
All experiments were conducted on a workstation, equipped with a single TITAN XP (12 GB), Intel i9 CPU, and 32 GB main memory, with a batch size of 16. Training time depends on the size of the training data. We trained the proposed networks with the observation of the validation loss to avoid overfitting.

Baseline Models with Supervised Learning
In this experiment, the proposed method was competed against Pix2Pix [15] and Attentive GAN [14]. Pix2Pix is a supervised image-to-image translation method based on adversarial learning. The generator of Pix2Pix is trained to minimize the l 1 loss between the reconstructed images and their corresponding ground truth images, along with the adversarial loss against the discriminator. The generator of Attentive GAN is composed of two sub-networks: an attentive network based on convolutional LSTM that generates the attention mask of the given input, and a fully-convolutional network that produces a de-rained image from the attention mask and input. Attentive GAN requires the binary raindrop masks of all training instances for supervised learning. The hyperparameters of Pix2Pix and Attentive GAN were tuned using the validation sets.

Evaluation Metrics
The accuracy of the background restoration on each method was evaluated by the peak signal-to-noise ratio (PSNR) [28], the structural similarity index (SSIM) [29], and the natural image quality evaluator (NIQE) [30].
The PSNR is the ratio of the maximum possible power of a signal and the power of the corrupting noise. The SSIM metric captures the perceived quality loss between two image sequences. As reconstructed images are more similar to their paired clean images than the raw images, the PSNR of a reconstructed image should be high and the SSIM should approach 1.0. The PSNR and SSIM are employed to measure the image quality with respect to the ground-truth, undistorted images.
Meanwhile, the NIQE is a non-referenced image-quality score that assesses the perceptual image-quality enhancement. The NIQE measures the distance between the statistics-based features in an image of a natural scene and the same features obtained from image databases. The features are modeled as multidimensional Gaussian distributions, called the space domain natural scene statistic (NSS) model. The NIQE can assess images with arbitrary distortions such as blurring, ringing, and noise. A lower NIQE score indicates a better image quality.

Comparison with Baseline Models Using the Raindrop1119
For a fair comparison with the supervised-learning baseline methods, we evaluated the accuracy and image-quality improvements on Raindrop1119's test dataset. Table 1 lists the PSNR, SSIM, and NIQE scores calculated from the images reconstructed by Pix2Pix, Attentive GAN, and the proposed Raindrop-aware GAN. The average PSNR and SSIM between the distorted images and their corresponding clean images (ground-truth) in Raindrop1119 were 24.078 and 0.850, respectively. The Pix2Pix method generated images with lower perceptual quality (NIQE = 12.296) than the distorted images. The Attentive GAN and our Raindrop-aware GAN improved all three metrics of the raindrop-removal performance. In this experiment, the performance of Attentive GAN was evaluated using the trained model provided by Qian et al. [14]. Table 1. Accuracy and image-quality assessment of the two baseline models and our proposed method, which were trained only training data of Raindrop1119 dataset, with test data of Raindrop1119 dataset.  Figure 7 shows the reconstruction results of the two baseline models and Raindrop-aware GAN on Raindrop1119's test data. As evidenced by its best performance scores, Attentive GAN successfully removed most of the raindrops, whereas the Pix2Pix reconstruction left gray stains in the raindrop regions. Raindrop-aware GAN also generated high-quality images but could not remove tiny raindrops from some test examples.

Distorted Images
According to these results, direct supervised learning of the raindrop masks is advantageous for recognizing small-sized and scattered patterns. The acquisition process of the raindrop mask in Qian et al. [14] used empirically derived thresholds for the paired sets in Raindrop1119. It is important to note that Raindrop1119 was acquired in a city with a limited field of view and artificial raindrops. The mask acquisition method using simple thresholds is difficult to apply in practical cases such as coastal videos having moving backgrounds like waves.

Performance Evaluation on the Anmok Paired Dataset
To evaluate the effectiveness of the baselines and the proposed model on actual coastal videos, we acquired a paired dataset of the wave videos at Anmok Beach. As described above, the dataset was captured by two connected CCTV cameras placed side by side. While recording the coastal waves, we sprayed water droplets on one camera multiple times to simulate the image distortions caused by raindrops. The geometric differences between the corresponding images acquired by the cameras were minimized by rigid registration. To simulate various raindrop-distortion patterns, we acquired 15 images splashed with artificial raindrops. Each video clip was filmed for approximately one minute (on average) to capture the wave movements. From the video datasets of both cameras, we randomly sampled 16,841 paired frames at the same time points, and divided them into a 13,605-image training set and a 3236-image test set.
In this experiment, we re-trained the baselines and Raindrop-aware GAN that were previously trained on Raindrop1119, and investigated their validity on the newly acquired datasets. Attentive GAN performed poorly after fine-tuning because the threshold-based method in Qian et al. [14] could not acquire the correct mask images for localizing the raindrops. Table 2 shows the experimental results. After unsupervised re-training and fine-tuning, Raindrop-aware GAN outperformed Pix2Pix and Attentive GAN, yielding the highest performance indicators of raindrop removal (PSNR = 26.505, SSIM = 0.940, NIQE = 11.878). The NIQE was of limited value in comparing the image-quality scores of coastal video images. Although the PSNR and SSIM scores reflected the degraded performance of Attentive GAN, the image reconstruction of Attentive GAN yielded a lower (i.e., better) average NIQE score than the distorted images. These results may be caused by complex patterns with many white particles observed during wave breaking. Table 2. Accuracy and image quality assessment of the tested methods on Anmok paired dataset.

Distorted Images
Pix2Pix AttentiveGAN In the visual assessment of image quality, the fine-tuned Raindrop-aware GAN better restored the image patterns of steep wave crests and breaking (see Figure 8) than the other methods. Black stains and blurred regions appeared in the images reconstructed by Pix2Pix, Attentive GAN, and the pre-trained Raindrop-aware GAN. These results indicate that wave-specific patterns in coastal video images are difficult to restore without additional training processes. Whereas supervised learning methods must acquire paired images with and without raindrops, our unsupervised approach can utilize all coastal videos acquired from outdoor visual-sensing systems.

Application to the Anmok Unpaired Dataset
To validate the proposed method in practice, we assessed the image qualities of the baselines (the state-of-the-art models) and Raindrop-aware GAN on the Anmok unpaired dataset. From the coastal videos acquired over two months, we collected separate video clips under wet and dry conditions and then randomly sampled 12,000 video frames from the two datasets. Note that no temporal correspondence exists between the raining and non-raining video datasets. In the validation experiment, we took the Raindrop-aware GAN trained on Raindrop1119 and fine-tuned it on the Anmok unpaired dataset.
In this experiment, we measured only the NIQE because the image sets were not paired. Table 3 gives the image quality scores of the reconstructed images. Raindrop-aware GAN outperformed Pix2Pix and Attentive GAN; moreover, the fine-tuned Raindrop-aware GAN obtained clearer boundaries of the propagated waves and wave-like patterns in the reconstructed images than the other methods (see Figure 9). Table 3. Image-quality assessment on raindrop-contaminated video images taken from a CCTV that monitors coastal areas at Anmok Beach (Anmok unpaired dataset).

Distorted Images
Pix2Pix Attentive GAN

Discussion
The proposed method provides a learning-based approach to enhance raindrop-contaminated coastal video. To see how the enhancement of the coastal image from video monitoring system is helpful for video-based coastal dynamic research beyond the evaluation metric mainly used in deep learning-based image enhancement, we would like to examine the applicability of the proposed method through timestack images.
A temporally stacked image, called a timestack image, is an image in which the intensity of an array of pixels is plotted along time. It is very useful to monitor long-term shoreline and bathymetry evolution and to track and estimate individual breaking waves on the surf and swash zone. (a)-(e) in the upper part of the Figure 10 show the raindrop-contaminated image, the reconstructed images obtained using Pix2Pix, Attentive GAN, pre-trained Raindrop-aware GAN, and fine-tuned Raindrop-aware GAN, respectively. The lower part of the Figure 10 shows a group of timestack image created by accumulating consecutive 300 frames of images for 6 s for the five types from (a) to (e) of images on the upper part, and for each the three cross-shore line transect marked 1 , 2 , and 3 .
Looking at the timestack image created along the first line transect 1 , the refraction due to large raindrops in the sand side (pink box area) and the contaminated sea area (yellow box area) are overall well reconstructed when using the fine-tuned Raindrop-aware GAN in the timestack image placed in the last rightmost column. In the timestack image created along the 2 and 3 line transect, it is clear that the white foam of the breaking waves on the sand side is best reconstructed in the proposed method (pink box area) and the crest of the breaking wave is clearly displayed in the sea area (yellow box area) in the timestack images placed in the last rightmost column.
The timestack images placed in the second column from the right vertically show the results of Raindrop-aware GAN trained with Raindrop1119 dataset only. This is slightly inferior in reconstruction compared to the images located in the rightmost column. It shows that the best reconstruction performance can be obtained when the coastal video is used for fine-tuning even in the same model architecture of Raindrop-aware GAN. By creating timestack image and visually assessment it, we can confirm that the performance of the proposed method is the best and it also has high applicability in studying nearshore wave dynamics with video remote sensing, in particular data preparation step, such as breaking wave height estimation from coastal video [31,32], video sensing of nearshore bathymetry evolution [33,34], nearshore wave transform with video imagery [35], shoreline response and resilience through video monitoring [36][37][38][39], wave run-up prediction [40,41], and nearshore wave tracking through coastal video [42,43].

Conclusions
We performed unsupervised learning with a GAN-based video generation method that enhances coastal video images contaminated by raindrops. Unlike recent approaches based on supervised learning, which require the pairing of degraded images with clean (ground-truth) images, the proposed raindrop-removal network Raindrop-aware GAN is an unsupervised learning method. Raindrop-aware GAN attentively corrects the degraded region with minimal changes to the raindrop-free areas in the contaminated image. For this purpose, it learns the raindrop region and its surroundings, and then generates a mask image mapped with the spatial-attention information. The scene generator in the proposed method is expanded by adding the mask generated by background learning, and is supplemented with a discriminator that distinguishes the raindrop and raindrop-free regions in a patch-wise manner using adversarial learning.
To evaluate its reference performance, the proposed network was pre-trained on the open dataset Raindrop1119. For unsupervised learning, the paired dataset (the paired clean and degraded images) was ignored and the whole dataset was shuffled in random order. Via transfer learning, the pre-trained network was applied to coastal video images of Anmok Beach. The Anmok video dataset was continuously acquired over a long period by CCTVs. To quantitatively verify the proposed network, we collected an additional dataset of clean images paired with raindrop-contaminated images.
The images from cameras, video systems, and other land-based remote-sensing technologies are severely degraded by bad weather conditions during extreme events. In such situations, unsupervised learning-based data-driven modeling is essential. Therefore, our proposed method is expected to assist the data-preparation stage of vision-based remote-sensing studies.
However, whether or not the method correctly restores the movement of propagating waves in continuous time is difficult to quantify. To correct this uncertainty, we will encode not only the spatial features, but also the temporal features in an extended version of our network. Moreover, we intend to modify the architecture of the scene generator for recognizing temporal changes in the raindrop regions. For this purpose, we will employ a recurrent sub-network.