Wasserstein Generative Adversarial Networks Based Data Augmentation for Radar Data Analysis

: Ground-based weather radar can observe a wide range with a high spatial and temporal resolution. They are beneficial to meteorological research and services by providing valuable information. Recent weather radar data related research has focused on applying machine learning and deep learning to solve complicated problems. It is a well-known fact that an adequate amount of data is a positively necessary condition in machine learning and deep learning. Generative adversarial networks (GANs) have received extensive attention for their remarkable data generation capacity, with a fascinating competitive structure having been proposed since. Consequently, a massive number of variants have been proposed; which model is adequate to solve the given problem is an inevitable concern. In this paper, we propose exploring the problem of radar image synthesis and evaluating different GANs with authentic radar observation results. The experimental results showed that the improved Wasserstein GAN is more capable of generating similar radar images while achieving higher structural similarity results.


Introduction
Weather radar, which uses electromagnetic waves to identify precipitations in the atmosphere, is one of the frequently used remote sensing devices. It can provide observation results with remarkably high spatiotemporal resolution in a wide observation area, which is beneficial to meteorological research and services. There are a large number of applications utilizing the observation results, such as quantitative precipitation estimation [1], convective storm nowcasting [2], downburst prediction [3], and post-event analysis [4]. Unfortunately, undesirable echoes, which are also called non-meteorological echoes, might contaminate the observation results when weather radar determines the location and the intensity of the precipitation. Sea clutter [5], ground clutter [6], and anomalous propagation [7] are typical examples of the non-meteorological echoes and make a quality control process [8] complicated by their similar characteristics to meteorological echoes. Therefore, a large number of domestic and foreign scholars have paid attention to implementing precise prediction methods or automated quality control processes to alleviate the experts' tasks based on statistical [9] and machine learning methods, such as fuzzy logic [10], Bayesian [11], random forest [12], and clustering [13]. Furthermore, lots of meteorological researchers and forecasters have recently focused on deep learning-based approaches, which have shown their superior capability to solve real-world problems in various fields [14].
An adequate amount of data and equally distributed data are positively necessary to implement an effective and appropriately working machine learning method. However, it is nearly impossible to obtain ideal datasets when solving real-world problems: most of them tend to be imbalanced and insufficient. In the downburst prediction using machine learning techniques [15], for example, it requires enormous exertion to gather a sufficient amount of information on each specific class, even though weather radar data are for free and plentiful in many countries. Data augmentation [16] is a popular selection method for solving the problems mentioned above by utilizing the given dataset. It can increase the quantity of the entire dataset to make it sufficient. By utilizing the data augmentation, it allows saving time and resources by generating realistic data based on a relatively small number of observed cases annotated by experts. Additionally, it can adjust the balance between majority and minority classes by increasing or decreasing in quantity of the minor or major class data. There are several traditional approaches to obtain augmented data: cropping; brightness control; and simple geometric transformations such as translation, rotation, and resizing. While traditional approaches can relieve some of those problems, they fundamentally generate data highly correlated with the data provided.
In this case, GAN (generative adversarial network) [17] can be the best alternative solution. The GAN consists of two models, the generator and the discriminator. They intensely compete and cooperate while training-the generator endeavors to produce elaborate counterfeit data that thoroughly fools the discriminator, and the discriminator exerts itself to distinguish between the counterfeit data and the original data. By using this intriguing structure, it has been providing a novel form of data augmentation and proving its potential in various fields, including natural language processing [18], hyperspectral image classification [19], and medical imaging [20]. Consequently, the active studies of the GAN yield a massive number of fascinating methods. However, which model is appropriate to solve a problem in a given situation remains a concern.
In this paper, we propose exploring the problem of data augmentation for radar data analysis and evaluate different models with actual radar observation results. We select six models of the generative adversarial networks: GAN, conditional GAN (CGAN) [21], deep convolutional GAN (DCGAN) [22], InfoGAN [23], least square GAN (LSGAN) [24], and improved Wasserstein GAN (WGAN-GP) [25,26]. The rest of this paper is organized as follows: Section 2 introduces GANs' theory and architecture, including principles, variants, and learning schemes. Section 3 presents and discusses the experimental results of GANs. Finally, the concluding remarks are made in Section 4.

Methods
GANs were inspired by game theory and offered a fascinating substitute to maximum likelihood techniques: the generator and discriminator are the key components, which compete with each other to achieve the Nash equilibrium in the training process. The adversarial composition is the shared architecture of GAN and its variants. In this section, we represent the theory and architectures of the basic and variants of GANs: GAN, CGAN, DCGAN, InfoGAN, LSGAN, and WGAN-GP.

GAN
The architecture of the GAN is shown in Figure 1a. The learning scheme of the GAN trains a generator G and a discriminator D simultaneously: The generator captures the data distribution and generates exquisite counterfeit data from a random noise vector z, usually a uniform (z ∼ U(0, 1)) or normal distribution(z ∼ N(0, 1)). The discriminator evaluates data for authenticity. In other words, the discriminator does its best to distinguish real data from fake data. It reaches the optimal state when the discriminator cannot determine whether the data comes from the real dataset or the counterfeit. The generator and the discriminator have their loss functions set to achieve the global optimal solution, as shown in Equations (1) and (2). (1) The basic idea of the GAN is outstanding, but it also has deficiencies that originate from various reasons, such as vanishing or exploding gradients, mode collapse, and failure to converge. Therefore, various derived generative models were proposed.

CGAN
The intuitive approach to solve the mode collapse problem is by improving the randomness property of the input noise vector z. The conditional GAN (CGAN), as shown in Figure 1b, introduces the conditional variable c in both the generator and the discriminator to add conditional information, which can directly influence the data generation process. The input of the generator is both the random noise vector z and the conditional variable c. The inputs of the discriminator are two-fold: the real data under the control of the same conditional variable c, and the generated data G(z, c). The generator and the discriminator have their loss functions, including the conditional variable, as shown in Equations (3) and (4).

DCGAN
The DCGAN is one of the successful network architectures for GANs. The main contribution of the DCGAN is to suggest a convolution filter based GAN model which provides stable training in most cases. The underlying architecture is the same as the GAN, as shown in Figure 1a. However, it primarily composes convolution layers that can downsample and upsample by utilizing convolutional stride and transposed convolution instead of max pooling and fully connected layers. In the generator, it uses the rectified linear unit (ReLU) except for in the output layer, which uses a hyperbolic tangent function. On the other hand, it utilizes the LeakyReLU in the discriminator. The generator and the discriminator have their loss functions, as shown in Equations (5) and (6).

InfoGAN
The InfoGAN is another popular form of conditional GAN, which works by introducing mutual information. The mutual information, which represents the correction between the generated data G(z, c) and the latent code c that is unknown different to the conditional variable c in the CGAN, makes the generation process more controllable and makes the generated results more easily interpretable. The essential point of the InfoGAN is that the information in the latent code c should not be lost in the generation process by changing the loss functions. As shown in the Figure 1c, the InfoGAN has an additional neural network Q(c|x) with the latent code c. The generator and the discriminator have their loss functions that include constraint function I(c, G(z, c)) with a hyper-parameter λ, as shown in Equations (7)- (9).
where H(c) indicates the individual entropy of the latent code c. Maximizing the constraint function I(c, G(z, c)) makes the latent code c more reasonable for the generated data quality.

LSGAN
The GAN uses sigmoid cross-entropy loss function in the discriminator, which occurs the vanishing gradient problem when updating the generator. The underlying idea of the LSGAN starts here. Instead of the sigmoid cross-entropy, the LSGAN suggests using least square loss in the discriminator for giving penalties to data that exist far from decision boundary. It allows the generated data to be closer to real data. Equations (10) and (11) are different from the others and intuitive.
where a and b are the labels for fake data and real data, respectively. c denotes the value that the generator wants the discriminator to believe for fake data.

WGAN-GP
As mentioned before, the loss functions of the GAN are susceptible to hyperparameter choice and random initialization, which are not desirable. The WGAN remedies the problem by redefining the loss function with Wasserstein distance, which makes the training process stable and less sensitive to hyperparameter selection. The WGAN attempts to minimize an approximation of the intractable Wasserstein-1 distance, also called the earth mover distance, between the distribution of real and generated images. The approximation has requirements, including that the weights of a discriminator must be a K-Lipshitz function. In other words, the first derivative of the function is bounded everywhere to be less than a constant. The early version of the WGAN introduces a weight clipping within a closed interval [−c, c], where c is a constant. But the constant c for weight clipping causes a vanishing gradient problem. Therefore, a novel approach was suggested, solving the problem by utilizing a gradient penalty to enforce the 1-Lipshitz constraint on the discriminator. The improved WGAN (WGAN-GP) allows the discriminator to learn more complex functions and reduces the vanishing and exploding gradient problem. Equations (12) and (13) represent the loss functions of the discriminator and generator, respectively.

Image Quality Assessment
It is widely recognized that the GANs lack an objective function, which makes it difficult to compare the performances of different models. There is no absolute standard to evaluate their performance yet, notwithstanding that several measures have been introduced [27]. Therefore, it is essential to select an adequate performance measure carefully. In this paper, we chose the structural similarity (SSIM) [28] index, which is a well-known image quality measure to derive the similarity between two images. It evaluates the similarity by predicting human perception similarity judgment by considering three independent quantities to yield an overall similarity measure between two non- negative images a and  b, luminance l(a, b), contrast c(a, b), and structure s(a, b), as shown in Equation (14). Each quantity has a value within [0, 1] and has a high similarity when close to 1.
where µ a , µ b are the averages, σ 2 a , σ 2 b are the variances, and σ ab is the covariance of a and b. The constants c 1 , c 2 are to avoid instability when µ 2 a + µ 2 b and σ 2 a + σ 2 b are very close to zero, respectively. Additionally, α > 0, β > 0, and γ > 0 can adjust the relative importance of the three components. Generally, α, β, and γ are set to 1 and c 3 = c 2 /2 to simplify the expression as shown in Equation (15).
The SSIM index takes values between 0 and 1, where 1 corresponds to having identical images, and 0 corresponds to a loss of all structural similarity.

Data Description
In this paper, we conduct experiments by utilizing the provided data from the Korea Meteorological Administration (KMA), which uses dual-polarization Doppler radars for weather prediction. The dualpolarization Doppler radar can provide useful information, including corrected reflectivity (CZ), differential reflectivity (DR), specific differential phase shift (KD), and cross-correlation (RH). Among them, we chose CZ feature data, which are essential for observing atmospheric scatters, from a radar installed in a specific place that can cover a 240 km radius observation area, including both onshore and offshore.
Additionally, we chose 5000 radar images 480 × 480 in size and 0 to 80 range in dBZ, and categorized three classes according to the ratio of radar echo existing in the observation range, as shown in Figure

Experiments
The primary objectives for the GANs in this paper are as follows: First, the generated patterns should appear inside the circle. Second, the generated patterns should be different between classes. Third, the generated patterns should not be identical to the training images. Further, the common settings of GANs are as follows: Batch normalization layers are excluded. The dimension of the input vector of the generator is 200, which follows a normal distribution, N(0, 1). The training is finished at 300 epochs, and the results are derived in specific epochs: 0 for the beginning, 150 for the intermediate, and 300 for the last step.  Figure 3 shows the graphical results of the GAN, and Figure 4 describes the architectures of the generators and discriminators of the GAN. It contains dense layers, activation layers using a rectified linear unit (ReLU), LeakyReLU, and hyperbolic tangent functions. It seems there were no improvements via training until at least 150 epochs. At 300 epochs, the graph shows that the GAN changes significantly. However, Figure 3g,h,i share almost identical outlines. That might indicate the mode collapse problem. The only difference is the level of green area. Therefore, the GAN seems inappropriate to generate synthetic radar images, because it does not satisfy the first and second objectives. Figure 5 describes the graphical results of the CGAN, and Figure 6 indicates the architectures of the generator and discriminator of the CGAN. It also contains dense layers, activation layers using a rectified linear unit (ReLU), LeakyReLU, and hyperbolic tangent function. The difference between it and the GAN is that class information is added when generating synthetic images. It seems that there are no meaningful improvements but noise in the whole epochs. Therefore, the CGAN also seems inadequate to generate synthetic radar images, because it does not satisfy the first and second objectives. Figure 7 indicates the graphical results of the DCGAN, and Figure 8 presents the architectures of the generator and discriminator of the DCGAN. Unlike previous models, the DCGAN includes not only dense layers and activation layers using a rectified linear unit (ReLU), LeakyReLU, and hyperbolic tangent function, but convolution layers that are appropriate to analyze and extract features of given images. It seems that there are remarkable changes compared to previous models. As the epoch increases, the generated pattern seems to build a circular shape. Unfortunately, the outside of the circle also has noisy data, as shown in Figure 7g,h,i. Therefore, the DCGAN also seems improper to generate synthetic radar images, because it does not satisfy the first and second objectives. But it provided us helpful information that includes the convolution layer in the architectures of the generator, and the discriminator allows obtaining favorable results.      Figure 9 presents the graphical results of the InfoGAN, and Figure 10 shows the architectures of the generator and discriminator of the InfoGAN. The primary difference compared to the previous models, is that the discriminator has the Q network, which computes mutual information in the latent code c using the softmax layer. The other layers are the same as the DCGAN architecture: convolution layers, dense layers, and activation layers using the rectified linear unit (ReLU), LeakyReLU, and hyperbolic tangent function. In the early training stage, it was possible to expect positive results because the InfoGAN generated distinct and intense patterns, unlike previous models. However, as the learning progresses, the generated patterns seem not to converge to original images. It just generates exquisite patterns from given input vectors. Therefore, the InfoGAN also seems improper to generate synthetic radar images, because it does not satisfy the first and second objectives. Figure 11 shows the graphical results of the LSGAN, and Figure 12 describes the architectures of the generator and discriminator of the LSGAN. The architectures are identical to the DCGAN, except the number of convolution filters: convolution layers, dense layers, and activation layers using the rectified linear unit (ReLU), LeakyReLU, and hyperbolic tangent function. It seems that the generated images form a circular pattern in all stages of the epochs. Additionally, it seems that the LSGAN learned features of individual original images in the training process because there is a slight difference between classes. Unfortunately, the generated images still include noise that is impossible to neglect outside the circle. Therefore, the LSGAN also seems improper to generate synthetic radar images, because it does not satisfy the first objective.
Finally, Figure 13 describes the graphical results of the WGAN-GP, and Figure 14 represents the architectures of the generator and discriminator of the WGAN-GP. The architectures are identical to the DCGAN: convolution layers, dense layers, and activation layers using the rectified linear unit (ReLU), LeakyReLU, and hyperbolic tangent function. It seems that the generated images form a circular pattern in the entire stages of epochs. Additionally, it seems that the WGAN-GP learned features of individual original images in the training process, because there is a remarkable difference between classes, as shown in Figure 13g,h,i. The final results contain patterns inside the circle; the generated patterns are different between classes; and the generated images are dissimilar to the training images. Therefore, the WGAN-GP seems appropriate to generate synthetic radar images, because it satisfies all objectives.
The SSIM index measures the perceptual difference between two similar images rather than determining which one is better. Therefore, we chose representative examples of each class of radar image in Figure 2 as references. Furthermore, we generated five images at 300 epochs from each GAN model to compare. Table 1 shows the derived SSIM index results. We separated the generated images at 300 epochs into three groups A, B, and C for convenience, which included the generated images of the GAN models of Class 0, Class 1, and Class 2, respectively: A = {Figure 3g, Figure  , where the first, second, and third elements indicate the average SSIM index for Class 0, Class 1, and Class 2, respectively. Therefore, we can conclude that Table 1 numerically underpins the graphical results: that the WGAN-GP generally has better results than the others.

Conclusions
The results in Section 3 demonstrate the versatility of the GANs for the synthetic data generation of weather radar images. Among selected variants of GANs, the WGAN-GP verifies its ability for the radar data augmentation qualitatively and quantitatively using the SSIM index. Additionally, the results showed that for generating the synthetic images under the same environment (Ubuntu 17.10, TITAN V, 16GB RAM), the WGAN-GP takes less time to converge than other GANs as shown in Table  2. The WGAN-GP finishes its learning process within 8 min, while the InfoGAN consumes more than 17 h. The other GANs spend 7 to 12 h, which is also considerably longer than the WGAN-GP. Therefore, the WGAN-GP can be applied to make a database within a reasonable time by using a small amount of annotated examples by experts to solve challenging real-world problems, including qualitative precipitation estimation, convective storm prediction, and non-meteorological echo elimination. Future work needs to concentrate on applying the WGAN-GP to other features of weather radar, such as DR, KD, and RH, and combining them altogether as a color image.