Various Generative Adversarial Networks Model for Synthetic Prohibitory Sign Image Generation

A synthetic image is a critical issue for computer vision. Traffic sign images synthesized from standard models are commonly used to build computer recognition algorithms for acquiring more knowledge on various and low-cost research issues. Convolutional Neural Network (CNN) achieves excellent detection and recognition of traffic signs with sufficient annotated training data. The consistency of the entire vision system is dependent on neural networks. However, locating traffic sign datasets from most countries in the world is complicated. This work uses various generative adversarial networks (GAN) models to construct intricate images, such as Least Squares Generative Adversarial Networks (LSGAN), Deep Convolutional Generative Adversarial Networks (DCGAN), and Wasserstein Generative Adversarial Networks (WGAN). This paper also discusses, in particular, the quality of the images produced by various GANs with different parameters. For processing, we use a picture with a specific number and scale. The Structural Similarity Index (SSIM) and Mean Squared Error (MSE) will be used to measure image consistency. Between the generated image and the corresponding real image, the SSIM values will be compared. As a result, the images display a strong similarity to the real image when using more training images. LSGAN outperformed other GAN models in the experiment with maximum SSIM values achieved using 200 images as inputs, 2000 epochs, and size 32 × 32.


Introduction
Neural networks with more layers have been implemented in the latest development in deep learning [1]. These neural network models are far more capable of acquiring greater preparation. Nonetheless, obtaining a correct and reliable data collection with manual labeling is often costly. In machine learning as well as computer vision, this has been a general problem. An effective way to synthesize images is to increase the training collection which will improve image recognition accuracy. Employing data augmentation for enlarging the training set in image classification has been carried out in various research [2]. Traffic sign detection (TSD) and traffic sign recognition (TSR) technology have been thoroughly researched and discussed by researchers in recent years [3,4]. Many TSD and TSR systems consist of large quantities of training data. In recent years, a few datasets of traffic signs have been shown: German Traffic Sign Data Set (GTSRB) [5], Chinese Traffic Sign Database (TSRD), and Tsinghua-Tencent 100K (TT100K) [6]. Traffic signs are different from country to country and, in various circumstances, an interesting recommendation is to apply synthetically generated training data. The synthetic image will save time and energy for data collection [7,8]. Synthetic training data have not yet been commonly used in the TSR sector but are worth exploring because very few datasets come from other countries, in particular from Taiwan. In this research, we focus on Taiwan's prohibitory signs. Our motivation arises from the current unavailability of such a Taiwan traffic sign database, image and research system.
A generative adversarial network (GAN) [9] is a deep research framework of two models, a generative model and a discriminative model. Both models are instructed together. GAN has brought a lot of benefits to several specific tasks, such as images synthesis [10][11][12], image-to-image translation [13,14], and image restoration [15]. Image synthesis is a fundamental problem in computer vision [16][17][18]. In order to obtain more diverse and low-cost training data, traffic sign images synthesized from standard templates have been widely used to train classification algorithms based on machine learning [12,19]. Radford et al. [20] proposed the Deep Convolutional Generative Adversarial Network (DCGAN) in 2016. DCGAN combines the Generative Adversarial Network (GAN) with CNN so that all GANs can obtain better and more stable training results. Other versions of GAN are Least Squares Generative Adversarial Networks (LSGAN) and Wasserstein Generative Adversarial Networks (WGAN) [21,22]. Both can better solve the problem of instability training in GAN. Each GAN has achieved excellent results in producing synthetic imagery. Therefore, due to the lack of a current training dataset, our experiments apply DCGAN, LSGAN, and WGAN to generate synthetic images.
Traffic sign images compiled from regular models are commonly used to collect additional training data with low cost and flexibility to train computer classification algorithms [19]. In this paper, DCGAN, LSGAN, and WGAN are used to generate complicated images. The synthetic image is a solution for holding a small amount of data. GAN has performed outstanding results in image data generation. Our experiment favors using synthetic images by GAN to obtain image data because this does not depend on a vast number of datasets for training. This work's main contributions can be summarized as follows: first, a synthesis of high-quality Taiwan prohibitory sign images Class (T1-T4) is obtained using various GAN models. Second, an analysis and evaluation performance of DCGAN, LSGAN, and WGAN generates a synthetic image with different epochs (1000 and 2000), numbers, and sizes (64 × 64, and 32 × 32). Next, we proposed an experimental setting with various GAN styles to generate a synthetic image. We then evaluate the synthetic image using SSIM and MSE. The remainder of this article is structured as follows. Section 2 covers materials and methods. Section 3 describes the experiment and results. Lastly, Section 4 offers preliminary conclusions and suggests future work.

Materials and Methods
A synthetic picture is used to expand the dataset broadly. A well-known method is the combination of original and synthetic data for better detection performance. Multiple approaches such as [23,24] have confirmed the advantage of combining synthetic data when actual data is limited. Lately, particular approaches [25] have proposed to defeat the domain gap among real and synthetic data by applying generative adversarial networks (GANs). This system obtained more reliable results than training with real data. However, GAN is challenging to train and has shown its importance primarily in regression tasks.

Deep Convolutional Generative Adversarial Networks (DCGAN)
Radford et al. evaluated the architectural and topological constraints of the convolutional GAN in 2016. The method is more stable in most settings, and is named Deep Convolutional GAN (DCGAN) [20,26]. DCGAN is a paradigm for image production consisting of a generative G network and a discriminative D network [20,27]. Figure 1 displays the G and D network diagram. The G network is a neural de-convolutional device that creates images from d-dimensional vectors using de-convolutional layers. On the other hand, a D network has the same equivalent structure as a traditional CNN that discriminates whether the data is a real image from a predefined dataset or G [28]. The training of DCGAN is expressed in Formula (1) as follows [9]: where x is the first image, z is a d-dimensional vector consisting of arbitrary numbers, and pdata(x) and p z (z) are the probability distributions of x and z. D(x) is the probability of the input being a generated image from pdata(x), and (1 − D(G(z)) is the probability of being generated from p z (z). D is trained to increase the correct answer rate, and G is trained to decrease log(1 − D(G(z)) to deceive D. Consequently, optimizing D, we obtain maximum V (D, G), and when optimizing G, we obtain minimum V (D, G). Lastly, the optimization problem is displayed in Formula (2) and Formula (3): G captures sample data distribution and generates a sample like real training data with noise z obedient to a certain distribution, such as uniform distribution and Gaussian distribution. The pursuit effect is as good as the actual sample. The D classification estimates the probability of a sample being taken from training instead of the from data generated. If the sample is from real training results, D gives a significant probability. Otherwise, D gives a small probability [29,30].

Least Squares Generative Adversarial Networks (LSGAN)
The discriminator in LSGANs uses the least squares as its cost function [31,32]. In other applications, LSGANs are used to generate samples that can represent the real data. There are two advantages of Least Squares Generative Adversarial Networks (LSGANs) over regular GANs. First, LSGANs can produce more extraordinary quality images than conventional GANs. Second, LSGANs perform more stably during the learning process [33,34]. Training GANs is a complex problem in practice because of the instability of GANs' learning.
Recently, research papers have pointed out that the uncertainty of GANs' learning is affected by the objective function [35]. In particular, decreasing the typical GAN objective functions can affect gradient loss problems, which makes it difficult to update the generator. This barrier can be relieved by LSGAN since the penalization of samples dependent on the boundary distances may create further gradients when the generator is modified. In comparison, training instability for standard GANs is focused technically on the methodsearching action of the objective function, and LSGANs display fewer mode-seeking behaviors. The cost function of an LSGAN is shown in Formulas (4) and (5) [36].
LSGANs can generate new data with high similarity to the original data through the mutual benefits of discriminator and generator in the model [37]. Therefore, this paper chooses LSGAN to augment the dataset and generate more realistic data.

Wasserstein Generative Adversarial Networks (WGANs)
WGAN [22] has been developed to solve the problem of network training variability [38], which is believed to be correlated with the presence of unwanted fine gradients of the GAN discriminator function. Yang et al. [39] approved WGAN for denoising lowdose CT images and attained a successful application in medical imaging reconstruction. WGAN is used in the synthesis data generation module to generate virtual damage signals to monitor the increase in minority defects and stabilize the training data set using synthetic signals.
Two important contributions of WGAN [40] are as follows: (1) WGAN may not display a sign in the experimental collapse mode. (2) When the critic performs well, the generator will always understand. To estimate the Wasserstein distance, we need to find a 1-Lipschitz function. This experiment builds a deep network to learn about the problem. Indeed, this network is very similar to the discriminator D, but without the sigmoid function, and the output is a scalar score rather than a probability. This score can be explained as how real the input images are. In WGAN the discriminator is changed to the critic to reflect its new role. The difference between GAN and general WGAN is to change discriminator to critic, along with the cost function. For both, the network design is almost the same except that the critic does not have an output sigmoid function. The cost function in critic and generator for WGAN could be seen in Formulas (6) and (7), respectively.
However, f has to be a 1-Lipschitz function. To enforce the constraint, WGAN applies a simple clipping to reduce the highest weight value in f. The weights of the discriminator must be regulated by hyperparameters c within a certain range. The architecture of WGAN is shown in Figure 2, where z represents random noise, G represents generator, G(z) represents samples generated by the generator, C represents discriminator, and C* represents an approximate expression of Wasserstein-1 distance.

SSIM and MSE
The structural similarity (SSIM) index is a good indicator of perceived image quality. The SSIM assessment approach distinguishes the brightness and contrast of the required image detail and incorporates structural information for image quality evaluation [41,42]. The structural similarity measurement is split into three parts: the luminance function l(x,y), the contrast function c(x,y), and the structure comparison function s(x,y) [43]. These three factors will become indicators of how similar the structure is. The mean value is an estimate of brightness, the standard deviation is used as a contrast estimate, and the total variation number is used as a structural resemblance measure. The SSIM functions Formulae (8)-(11) areas follows [44,45].
where µ x is the average of x, µ y is the average of y, σ 2 x is the variance of x, σ 2 y is the variance of y, and σ xy is the covariance of x and y. The input of SSIM [46] is a pair image, one an undistorted image, and the other a distorted image. The structural similarity between both images can be observed as an image quality indicator of the distorted image. Contrasted with traditional image quality measurement indicators, such as Peak Signal-to-Noise Ratio (PSNR), and Mean Squared Error (MSE) [47], the structural similarity is more in line with the human eye for image quality in terms of image quality measurement judgment. The relation between SSIM and more conventional quality metrics in a vector field of the image components can be demonstrated geometrically. The components of these images might be pixels or other derived elements, for example, linear coefficients. [48].
Mean Square Error (MSE) is adopted to determine the discrepancy between estimated values, and the original values of the quantity being estimated are the square of the difference of pixels. The error is the amount by which the value implied by the estimator differs from the quantity to be estimated shown in Formula (12) [49].
where P i represents observed value, n is the number of data points, and Q i represents predicted value. In our works, synthetic images generated by DCGAN, LSGAN, and WGAN are evaluated using SSIM and MSE. However, the value of SSIM is between −1 and 1, the higher is better. In contrast, smaller MSE values suggest a more favorable result.

Image Preprocessing
Traditional data augmentation comes from fundamental changes such as horizontal flipping, differences in color space, and automatic cutting. These developments encode several of the invariances previously discussed which model challenges for the classification of images. The increases mentioned in geometric transformations, color space transformations, kernel flips, images blend, random erasing, increased function spaces, adverse preparation, transitions in neural design, and meta-learning systems are surveyed [50]. While these methods of data augmentation are developed manually, recent experiments have continued to focus on deep neural network models to automatically create new training samples [49,51].
Crop images can be practiced by cutting a central patch for a specific image as a reasonable method step for image data with combined width and height dimensions. Besides, random cropping can also be used to perform an outcome relevant to interpretation. The difference between random cutting and translation is that the cutting decreases the size of the object, while translations maintain the spatial dimension of the image. This may not be a label-preserving change, depending on the compression threshold determined for harvesting. To get a better result, we cropped the image to focus on the sign. We use 200, 100, and 50 images as input in each group. Rotation changes are accomplished by the right or left rotation of the image on an axis of around 1 • to 359 • . Rotation increases depend heavily on the rotation grade parameter. Light rotations such as between 1 and 20 or −1 to −20 may be useful for digit identification activities, but the data mark is no longer retained after transformation as the rotation grade rises. Therefore, during data augmentation, these experiments perform certain operations using the following parameter parameters: rotation range = 20, zoom range = 0.10, width shift range = 0.2, height shift range = 0.2, and shear range = 0.15.

Research Workflow
In this section, we will describe our proposed method to generate traffic sign images using different GAN methods. Figure 3 illustrates the workflow of this research. Besides, we conducted some experiments with different settings to create a realistic synthetic image by DCGAN, LSGAN, and WGAN. We only focus on Taiwan prohibitory signs that consist of no entry images (Class T1), no stopping images (Class T2), no parking images (Class T3), and speed limit images (Class T4), see Table 1. This research analysis divides the picture into a category based on the overall picture used for training. The first category used 200 images with sizes 64 × 64 and 32 × 32.
Later, it produces 1000 images for each combination of the same size. The second category applies 100 images of 64 × 64 and 32 × 32 dimensions. Next, for each combination, it will generate 1000 images of the same size. The latter group practices 50 images of 64 × 64 and 32 × 32 dimensions. Therefore, 1000 prints of a similar combination size will be produced. The selection of image size is based on the fact that traffic signs are usually small. Table 2 describes various GANs' experimental settings in our work. A detailed description of advantages and disadvantages of DCGAN, LSGAN, and WGAN is shown in Table 3.   (1) LSGAN enhances the primary GAN loss function by substituting the original cross-entropy loss function with the least-squares loss function. This way fixes the two major traditional GAN problems.
(2) LSGAN makes the image quality of the outcome stronger, the training process robust, and the speed of convergence is faster [53].
(1) WGAN solves the problem of training instability due to its efficient network architecture. The sigmoid feature eliminates the discriminator's last layer in this model [54].
(2) The loss values of WGAN correspond with generated image quality. The lower loss means better quality image, for a steady training method.

Disadvantages
(1) The model parameters oscillate, destabilize and never converge.
(2) The generator collapses which produces limited varieties of samples, and highly sensitive to the hyperparameter selections.
(3) The discriminator becomes extremely successful so that the generator gradient disappears and receives nothing. Unbalance within the generator and discriminator causing overfitting.
(1) The disadvantage of LSGAN is that excessive penalties for outliers lead to reduced sample diversity.
(1) The disadvantage of WGAN is the longer training time.

Data Generation Results
The training model environment was described in this stage. This experiment used Nvidia GTX970 GPU accelerator 16 GB memory and an intel E3-1231 v3 Central Processing Unit (CPU) with 16 GB DDR3-1866 memory. In Torch and TensorFlow our approach is applied. The generative network and discriminative network are trained with Adam [20] optimizer with β1 = 0:5, β2 = 0:999, and learning rate of 0.0002. The batch size is 25, and the hyperparameter λ is set to 0.5. The iterations for pre-training and training are set as 1000 and 2000. Then, the total images for input are 200, 100, and 50. Further, the images sizes are32 × 32, and 64 × 64, respectively, for input and output. Hence, the steps in discriminator training are as follows [55,56]: (1) The discriminator groups both original data and fake data from the generator. (2) The discriminator loss fixes the error classifying, such as an original instance as fake or a fake as an original. (3) The discriminator renews its weights through backpropagation from the discriminator loss through the discriminator system. Furthermore, some procedures for the training generator are as follows [57,58]: (1) Example random noise. (2) Produce generator products from sampled arbitrary noise. (3) Obtain a discriminator "Real" or "Fake" classification for generator output. (4) Estimate loss from discriminator classification. (5) Backpropagate through both the discriminator and generator to achieve gradients. (6) Apply gradients to modify only the generator weights.
Furthermore, we measure the G loss value and D loss value in each experiment. In the beginning, two-loss functions are connected to the discriminator and, during the discrimination training, it uses the D loss. During the generator training, we use the G loss. Hence, the discriminator aims to determine the probability of real and fake images. The training time increases with the number of epochs. The LSGAN training process is shown in Figure 4.   Figure 6 shows the realistic synthetic image generated by DCGAN, LSGAN, and WGAN for all classes with 2000 epoch and size 64 × 64. Figures 7 and 8 describe the synthetic image generation result using 1000 epoch and size 32 × 32 and 64 × 64, respectively. Moreover, the image is relatively real because we cannot distinguish which image is fake and which is actual. The images seem very sharp, natural and realistic. Hence, the worst generate images occur while using 50 input images and 1000 epochs, as seen in Figures 7 and 8. The image appears blurry, not clear, and has much noise compared to others.

Discussion
Our experiments empirically tested the data generation by various GANs by calculating the similarity between the synthesized images and their corresponding real images. We measured SSIM values between generated images and authentic images of a similar nature. SSIM includes masking of the luminosity and contrast. The error calculation also involves strong interconnections of closer pixels, and the metric is based on small image windows. Figure 9 describes some examples of the SSIM and MSE calculation for original image and synthetic image by LSGAN. All original image in Figure 9 indicates the same MSE = 0 and SSI = 1. We calculated the SSIM and MSE values for each synthetic image and compared them with the original image. We do this to evaluate which GAN model is the best. Hence, Figure 9b shows MSE = 2.11 and SSIM = 0.81 for class T1.     The detailed performance evaluation of synthetic images by various GANs using 1000 and 2000 epochs is presented in Table 4.  LSGAN exceeds other GANs, as LSGANs give certain advantages over standard GANs. LSGANs will first produce images of better quality than standard GANs. Secondly, LSGANs perform more stably during the learning process. For evaluating the image quality, we conducted qualitative and quantitative experiments, and the experimental results show that LSGANs can generate higher quality images than regular GANs. Moreover, LSGAN enhances the primary GAN loss function by substituting the original cross-entropy loss function with the least-squares loss function. This fixes the two major traditional GAN problems. LSGAN makes the image quality of the outcome stronger, the training process robust, and the speed of convergence faster. The synthetic image that LSGAN creates looks obvious, actual, and genuine.
The Least Squares GAN (LSGAN) is planned to help generators become more valuable. Intuitively, LSGAN required the discriminator target label for the original image to be 1 and the resulting image to be 0. For the generator, we needed the target label for the resulting image to be 1. The LSGAN can be implemented with a minor change to the discriminator layer's output and the adoption of the least-squares, or L2, loss function. The output layer of the discriminator model must be a linear activation function.

Conclusions
This paper mainly discusses how synthetic images are produced by various GANs (DCGAN, LSGAN, and WGAN). We conduct an analysis and evaluation performance of DCGAN, LSGAN, and WGAN to generate a synthetic image with different epoch (1000  and 2000), numbers, and sizes (64 × 64, and 32 × 32). Next, we evaluate the synthetic image generation results using SSIM and MSE.
Based on our experiments' results, we can summarize as follows: (1) The trend of MSE value increases along with image size, the number of epochs, and training time. In the future, the synthetic image generated by various GANs will be used for training and combine with the real image to enhance traffic sign recognition systems. Currently, only images with a total input of 200 and 2000 epochs were used. Through a model trained on synthetic images of different sizes, we will understand the synthetic image characteristic that affects the method. We will design a new optimized GAN to generate traffic sign images and compare it with the existing GANs in our future works. Future research will also trial other synthetic image generation methods blended with Explainable AI (XAI).