RAHC_GAN: A Data Augmentation Method for Tomato Leaf Disease Recognition

Accurate recognition of tomato diseases is of great significance for agricultural production. Sufficient and insufficient training data of supervised recognition neural network training are symmetry problems. A high precision neural network needs a large number of labeled data, and the difficulty of data sample acquisition is the main challenge to improving the performance of disease recognition. The tomato leaf data augmented by the traditional data augmentation methods based on geometric transformation usually contain less information, and the generalization is not strong. Therefore, a new data augmentation method, RAHC_GAN, based on generative adversarial networks is proposed in this paper, which is used to expand tomato leaf data and identify diseases. In this method, continuous hidden variables are added at the input of the generator, and the purpose is to continuously control the size of the generated disease area and to supplement the intra class information of the same disease. Additionally, the residual attention block is added to the generator to make it pay more attention to the disease region in the leaf image; a multi-scale discriminator is also used to enrich the detailed texture of the generated image and finally generate leaves with obvious disease features. Then, we use the images generated by RAHC_GAN and the original training images to build an expanded data set, which is used to train four kinds of recognition networks, AlexNet, VGGNet, GoogLeNet, and ResNet, and the performance is evaluated through the test set. Experimental results show that RAHC_GAN can generate leaves with obvious disease features, and the generated expanded data set can significantly improve the recognition performance of the classifier. Furthermore, the results of the apple, grape, and corn data set show that RAHC_GAN can also be used as a method to solve the problem of insufficient data in other plant research tasks.


Introduction
Tomato is one of the crops planted all over the world and is rich in nutritional value [1]. Most studies have shown that the intake of tomatoes or tomato products has an inhibitory effect on some cancers (such as lung cancer, gastric cancer, etc.), which is beneficial for multiple health outcomes in humans [2,3]. However, in the cultivation process of tomatoes, due to the influence of climate, environment, pests, and other factors, they will be infected with different kinds and degrees of diseases. These diseases are highly infectious, widely spread, and harmful; are not conducive to the growth of tomatoes; and also affect the yield and quality of fruit. Therefore, it is of great significance to accurately identify tomato diseases and carry out early prevention and timely treatment. With the development of deep learning, convolutional neural networks (CNN) have made major breakthroughs in the field of disease recognition. Most researchers [4][5][6][7][8] improved the architecture of neural network models and applied them to the recognition of plant diseases, which has achieved good results. However, a neural network with good performance usually requires a large number of parameters, and a large amount of data is required to make these parameters work correctly. To obtain accurate disease data, manual collection and labeling are needed, which are time-consuming, labor-intensive, and error-prone. Therefore, in the field of tomato disease recognition, it is often impossible to obtain enough data to train the neural network, which is also the main reason that affects the further improvement of tomato disease recognition accuracy.
In the field of computer vision, data augmentation is a typical and effective method to deal with small data sets, including supervised and unsupervised methods. The unsupervised data augmentation method expands samples by geometric transformation [9][10][11][12] (such as flipping, clipping, rotation, scaling, etc.), which is simple and operable, so it is widely used in the data processing of diseased leaves. Liu et al. [13] used conventional image rotation, brightness adjustment, and principal component analysis methods to expand the image, solve the problem of insufficient images of apple disease, and achieve good recognition results. Abas et al. [14] used rotation, translation, scaling, and other methods to enhance the plant image data and used an improved VGG16 network to classify plants, so as to solve the problem of overfitting caused by too few plant samples. This kind of generation method can generate a large amount of data, which solves the problem of insufficient data to a certain extent but still has the following shortcomings: the image expanded by geometric transformation contains less information and is temporary, and a partially expanded image is not conducive to the training of the model, is not applicable to some tasks, and even reduces the performance of the model. At the same time, due to the large randomness of the generated data, there is no consistent target, which will produce a large number of redundant samples. In order to generate images that are more beneficial to specific tasks, an unsupervised image generation method based on learning has been proposed [15][16][17][18]. This kind of method uses different geometric transformation sequences to transform images, finds the most beneficial combination of classification results through reinforcement learning to enhance data, and adaptively adjusts it according to the current training iteration, finally generating a customized, high-performance data augmentation plan. However, this method is time-consuming and resource-intensive for training, so it has limitations on some tasks.
GAN [19] (generative adversarial network) is another data augmentation method based on unsupervised methods. It has been proven to be able to generate images with a similar distribution to the original data, showing subversive performance in image generation [20][21][22][23]. Clément et al. [24] used a GAN-based image augmentation method to enhance the training data and segmented the apple disease on tree crown with a smaller data set. Compared with the results without image generation, the F1 value was increased by 17%. The DCGAN (deep convolution generative adversarial network) [25] used a CNN (convolutional neural network) to replace the multilayer perceptron in GAN to improve the quality of the generated images. Purbaya et al. [26] used a regularization method to improve DCGAN and generated plant leaf data to prevent the model from overfitting. Although DCGAN improves the quality of the generated images, it cannot control the classes of the generated images, limiting the generation of multi-class images. To this end, ACGAN (generative adversarial network with an auxiliary classifier) [27] is proposed, which adds class information constraints to the generator and an auxiliary classifier to the discriminator to classify the output image, so that it has the function of image classification and can guide the generator to generate images of different classes. In addition, in order to generate higher quality data, many new architectures based on GAN have been proposed and applied to the field of plant science [25,[28][29][30][31][32]. For the problem of insufficient data for apple diseases, Tian et al. [33] used CycleGAN [34] to generate images, which enriched the diversity of training data. Zhu et al. [35] proposed a data augmentation method based on cGAN, adding condition information to the input of generator to control the class of the generated image. In addition, dense connection was added to the generator and discriminator to enhance the information input of each layer and significantly improve the classification performance. Liu et al. [36] added dense connections in GAN, and improved the loss function to generate grape leaves to achieve better recognition results in the same recognition network. However, the goal of GAN is to generate images that are as real as possible. It learns the overall information of the image, ignoring the key local disease information, and cannot generate images with clear disease spots. At the same time, due to the lack of detail texture, most of the images generated are of poor quality.
In order to solve the problem of insufficient data for tomato disease identification and insufficient disease features of expanded data, this paper proposes an improved ACGAN method RAHC_GAN for data augmentation (Figure 1a) and establishes an expanded data set of original data and generated data using RAHC_GAN (Figure 1b), so as to meet the data required for neural network training and improve the performance of the classifier in tomato leaf disease recognition. The innovations and contributions are as follows: • Adding a set of continuous hidden variables to the input of the generative adversarial network to improve the diversity of the generated images. Due to the intra-class differences of the same tomato disease being small, the traditional generative adversarial network has difficulty learning the intra-class differences, and then it will generate similar images, resulting in the phenomenon of mode collapse [37]. In order to avoid this phenomenon, we use the hidden variable and class label to generate tomato leaves with different diseases. The class label is used to generate specific disease classes, and the hidden variable is used to improve the potential changes in the same class. For each class of disease, we capture the potential differences within the class by changing the value of the hidden variable, such as the area size and severity of the disease, so as to supplement the information within the class and enrich the diversity of the generated images. • The fusion of residual attention block and multi-scale discriminator to enrich disease information of generated pictures. We add residual attention blocks to the generator. The residual network deepens the network depth and avoids network degradation. The attention mechanism makes the generator pay more attention to the disease information in the leaves from the perspectives of channel and space and guides the generation of tomato diseased leaves with obvious disease features. In addition, we introduce a multi-scale discriminator, which can capture different levels of information in the image, enrich the texture and edges of the generated leaves, and make the generated leaves more complete, richer in detail, and clearer in texture. • Using RAHC_GAN to expand the training set to meet the large amount of data needed for neural network training. We use the expanded data set as the training set, train four kinds of recognition networks (AlexNet, VGGNet, GoogLeNet, and ResNet) through transfer learning, and use the test set for performance evaluation. The experimental results show that in multiple recognition networks, the performance of the expanded data set is better than that of the original training set. The rest of this paper is organized as follows: Section 2 introduces related work and the model we proposed. Then, Section 3 shows the experimental results and analyses. Section 4 shows the related discussions. Section 5 contains the conclusions and future prospects.

Related Works
The working principle of GAN is that the generator generates a fake image, and the discriminator determines whether the input image is real or fake and achieves balance through mutual games. In the traditional GAN, the input of the generator is a random noise, so it is impossible to generate images of specified classes. In order to generate images of different classes accurately, it is necessary to train a generator for each class, which is cumbersome and involves a heavy workload. To solve this problem, ACGAN was proposed. It connects the class label c with the random noise z and adds a classifier in the discriminator to guide the training to control the generator to generate specified class images, which overcomes the limitation of GAN in multi-class data generation.
The main goal of our work is to generate six different tomato leaf disease images and use them as an extended data set to train the disease recognition network to improve the accuracy of disease recognition. However, in tomato leaves, the early features of some diseases are not obvious, and the similarity between the same diseases is high. Therefore, relying on the class label alone is not enough to generate diseased leaves with obvious features.

CBAM
The CBAM (convolutional block attention module) [38] is a lightweight attention module composed of a channel attention module and a spatial attention module. For a given feature map, CBAM infers the attention map along the channel and space dimensions and then multiplies the attention map with the input feature map to refine the features, so that the network focuses on the parts that are more effective for the task, thereby improving the performance of the model. The network structure of CBAM is shown in Figure 2. For the input feature map F, the channel attention map M c is first generated through the channel attention module and dot-multiplied with the original feature map F to obtain F . Then the spatial attention map M s is generated by the spatial attention module. The final output F is obtained by dot multiplication with F . The calculation process is shown in Equations (1) and (2).
The channel attention module aggregates the spatial information of the feature map through average pooling and maximum pooling and generates average pooling features F c avg and maximum pooling features F c max . Then, the channel attention map M c R C×1×1 is generated through the shared network. The calculation process of the channel attention module is shown in Equation (3).
The spatial attention module generates F s avg ∈ R 1×H×W and F s max ∈ R 1×H×W through the average pooling layer and the maximum pooling layer, and then performs convolution operations to generate the space attention map. The calculation process of the spatial attention module is shown in Equation (4). In the leaf disease recognition network, there is a strong symmetric relationship between sufficient and insufficient training data. Sufficient data make the neural network achieve better performance. Therefore, in the case of insufficient training data, we used the method of GAN to enhance the data. Inspired by Odena, A [27], this paper improved the network architecture of ACGAN and proposed a generative adversarial network, RAHC_GAN, which is used to generate tomato leaf images with obvious disease features, so as to improve the recognition accuracy of the recognition network. The overall network structure of RAHC_GAN is shown in Figure 3. RAHC_GAN uses the hidden variable mechanism to continuously control the generation of the diseased area; through the generator with a residual block and attention mechanism, more obvious leaf features were extracted to generate tomato leaves with prominent disease, and the multi-scale discriminator is used to capture the high-frequency information lost in the image and enrich the detail texture of the image.

The Generative Network G
In order to control the disease class of the generated leaves and enhance the generator's attention to the diseased area, this paper used the generator network structure as shown in Figure 4 (a). It extracts high-level features in the image through the residual attention block, so it can generate images that are more conducive to recognition tasks.
The generator is composed of multiple deconvolution layers. The input of the generator is a random vector composed of class label, hidden variable, and random noise. After six layers of deconvolution operations, the output tensor is obtained. Finally, a 128 × 128 × 3 image is generated through the tanh activation layer.
The original ACGAN controls the class of generated images through a set of discrete class labels. Inspired by this, we added a set of continuous variables as a supplement to the input information. Since the class label is a discrete value of 0, 1, we defined a 1 × 1 × 6 hidden variable Hide ∈ [−1, 1]. Similar to the class label, the six dimensions represent six classes of diseases respectively, since this paper studies a single disease of a single leaf image and does not consider the existence of multiple diseases in one image. Therefore, for the hidden variable of a specified class of disease, we set its dimension to a random value of −1 to 1 and set other dimensions to 0 to generate the image of this class, so that the size of the disease area and the disease severity of the generated image can change with the continuous change of the hidden variables. Thus, the introduced hidden variables can guide the generation of images with different disease sizes and regions and increase the diversity of generated images.
However, the existing generators can generate images, considering that the deeper network is beneficial to the improvement of model performance. In this paper, 16 residual blocks [39] are added before the last deconvolution layer of the generator. On one hand, the network depth is deepened to enable the generator to obtain high-level features of images, and on the other hand, the degradation problem caused by networks with a too deep depth is avoided.
Accurate recognition of tomato diseases requires accurate disease features. Some of the early-stage features of the diseases are relatively small. However, the existing GAN often ignores the tiny disease features when generating images. In order to pay attention to the disease area of the image, we add a CBAM block before the accumulation layer of each residual block to help the model judge more beneficial features from two aspects: channel and space. Channel attention makes the network pay attention to the diseased area, and spatial attention makes the network know which parts to pay attention to, so that the generator focuses on the diseased area and de-noises irrelevant information to generate disease images with obvious disease features.

The Discriminative Network D
The discriminator is composed of sequentially stacked convolutional layers, and the input is a real or generated image. The input image is subjected to a six-layer convolution operation to obtain a feature vector, and the feature vector is transformed into a one-dimensional vector through the flatten operation of the sigmoid functionand soft-max function, respectively, which is used to indicate the probability of the input image as a real image and the probability of being classified. In addition, for the additional hidden variable h = (h 0 , ..., h n ) introduced at the input, our purpose is to independently control the inter-class details of the same disease. In this way, similar to the class label c, the hidden variable h should be independent, similar to random noise. Therefore, for the image generated by using a hidden variable, after the feature extraction of the discriminator, it should be possible to separate the various components h = h 0 , ..., h n of h [40], which is called the restored hidden variable. Correspondingly, since the hidden variable at the input takes the value [−1,1], we use the tanh function to flatten the feature vector to obtain the restored hidden variable and the hidden variable loss to constrain it. The overall architecture of the discriminator is shown in Figure 4b.
The images generated by traditional GAN often have the problem of a fuzzy texture structure and lack of high-frequency information. In order to capture the image distribution at different scales and recover the high-frequency information lost in the image, we proposed a multi-scale parallel learning discriminator architecture [41], which used two discriminant scales, namely the original 128 × 128 image scale and the down sampling 64 × 64 image scale. We input them into the corresponding discriminator for feature extraction and subsequent operation and integrate multi-scale data by feature fusion. Among them, the discriminator with a smaller input size has a larger field and guides the generator to generate whole continuous plant leaves, and the discriminator with a larger input size guides the generator to generate a smaller area, such as the boundary of the lesion. Since images with different resolutions contain different levels of visual features, the proposed multi-scale discriminator can increase the discriminator's receptive field and make the generated images more detailed and contain a clearer texture.

Loss Function
Adversarial Loss: In order to make the generator "cheat" the discriminator and generate indistinguishable images, the adversarial loss function of the original GAN is used, as shown in Equation (5): where x represents the real data sampled from the real data p data (x), and z represents the d-dimensional noise vector sampled from the Gaussian distribution p z . Classification Loss: For a given class label c, the generator needs to generate the corresponding class image X f ake . The real image pair (c, X real ) is given by the training data, and the classification loss is shown in Equation (6).
Hidden variable loss: Due to the instability of the GAN's training, pattern collapse often occurs. Therefore, we added hidden variable loss to GAN to enhance the diversity of generated images and reduce the occurrence of mode collapse. The definition of hidden variable loss is shown in Equation (7): where h = (h 0 , ..., h n ) represents the hidden variable input, and h = h 0 , ..., h n represents the restored hidden variable obtained after RAHC_GAN.
Specifically, mode collapse means that different input vectors generate the same image through the generator. For different hidden variables h 1 , h 2 , they are connected with a label vector and random noise as the input of the generator z 1 , z 2 , further generating images. Then, the discriminator extracts the features of the generated images and tries to restore the hidden variables through the fully connected layer. If there is a mode collapse phenomenon in GAN, the images generated G(z 1 ) = G(z 2 ) after different inputs z 1 , z 2 are passed through the generator. When the image G(z 1 ) = G(z 2 ) is input to the discriminator, the discriminator extracts the same features and, through the full connection layer restoration, the restored hidden variables h 1 = h 2 . While the input hidden variables h 1 = h 2 , at this time, the hidden variable loss calculated by Equation 7 will be very large. In order to reduce the loss, the RAHC_GAN will continuously optimize the generated images and generate different images for different inputs, which can improve the diversity of generated images and reduce the occurrence of model collapse.
Full Objective: Finally, the objective functions of the generator and discriminator are defined as Equations (8) and (9), respectively: where λ is a hyperparameter, and λ = 1 is used in all experiments.

Recognition Model for Tomato Leaf Disease Identification
In this paper, a convolutional neural network is applied to tomato disease recognition, mainly to explore whether the image generated by RAHC_GAN is beneficial to the improvement of recognition performance, so the performance of different recognition networks is not compared. Moreover, in order to explore the performance of the expanded data set on different recognition models, we conducted experiments on four classic recognition networks, AlexNet [42], VGGNet [43], GoogLeNet [34], and ResNet [39]. The main algorithm is shown in Table 1. Sampling b h group data (x z , y),where x z ∼ (c, h, z), y ∼ p g (y); Sampling b h group real data (x r , y) ∼ p r (X, y); 6.
// RAHC_GAN completes training; 10. Generate data X g ∼ x g , y through trained RAHC_GAN 11. X e = X ∪ X g 12. Initialize the network parameters of the recognition network; 13. // recognition network training 14. for epoch ← 1 to E c do 15.
Sampling b c group data (x e , y) ∼ p e (X e , y); 16.
Update the recognition network through gradient descent; 17.
Sampling b c group data (x c , y) ∼ p r (X c , y), use recognition network to calculate the recognition accuracy; 18. end OUTPUT: Recognition accuracy rate of test set X c

Data Set
In this paper, tomato data from the Plant Village data set are used for experimentation. The Plant Village data set is the only public data set on plant leaf disease research at present. We selected six classes of data from tomato diseases, including five classes of diseased leaf data and healthy leaf data, as shown in Figure 5. In order to maintain the data balance, we selected 1250 images for each class of diseased leaves from the Plant Village data set and used an 80-20% train-test distribution [44] to divide them into a training set (Train (origin)) and test set (Test). The specific information of the data set is shown in Table 2. In Table 2, Train (origin + augment) represents the expanded data set, which is composed of Train (origin) and the data generated by the augmentation method. Specifically, for each class of data in Train (origin + augment), it includes 1000 original data points in Train (origin) and 1000 data points generated by the augmentation method.

Experimental Setup
All the models in the experiments are running based on the pytorch framework under Ubuntu 18.04.2 LTS, and the Python version is 3.6.9.
For RAHC_GAN, Adam is used as the optimizer, the learning rate is set to 0.0002, the output image size is 128 × 128, and the training stops after 500 epochs. The training data set is Train (origin).
In all experiments, the recognition network is trained from 0, and the network weights are randomly initialized. The final fully connected layer output of the recognition network is modified to 6, corresponding to the number of classes of tomato diseases. The training is stopped after 50 epochs. In our work, for all the recognition results obtained by expanding the data using data augmentation methods, the training set is the expanded data set Train (origin+augment), and the test set (Test) remains unchanged.
The recognition accuracy Acc of the test set is used as an index to evaluate the classification effect, and the calculation process is shown in Equation (10). The original data set contains 1000 original images, and the expanded data set contains 2000 images (1000 original images, 1000 generated images). In the whole evaluation process, the test set is kept unchanged, and the data generated are only used to expand the training set: where N represents the number of classes, t i is the number of correct predictions for the i-th class, and n i is the total number of samples in all classes.

Number of Residual Blocks for Plant Image Generation
In order to explore the influence of the number of residual blocks on tomato disease recognition results, we conducted experiments on different numbers of residual blocks based on the original ACGAN, and the results are shown in Table 3. In Table 3, the first row represents the recognition results obtained after data expansion using ACGAN, and the second to fifth rows represent the results of adding 4, 8, 16, and 32 residual blocks into ACGAN, respectively. It can be seen from Table 3 that compared with the original ACGAN, the network with the residual blocks has better performance (except for the application of Res32 on AlexNet and GoogLeNet), which shows that the addition of the residual blocks is beneficial to deepening the generation network and improving the generation results. In addition, we noticed that the performance of ACGAN with 32 residual blocks is better than that of ACGAN with 16 residual blocks on the ResNet network, but the performance on other networks is not as good as the results with 16 residual blocks. Considering the performance and longer training time adding more residual blocks, in this paper, 16 residual blocks are added to the generator to deepen the depth of the generator.

Different Scale of Discriminators for Plant Image Generation
In order to explore the influence of the discriminator scale on tomato disease recognition results, we conducted experiments on discriminators with different scales based on the original ACGAN, and the results are shown in Table 4.  Table 4 shows the tomato disease recognition results of discriminators with different scales. The scale of the image generated by the generator is 128 × 128, and we downsampled and up-sampled, respectively, to obtain 64 × 64 and 256 × 256 scale images. The first two rows of Table 4 indicate that two discriminators are used to input images with scales of 64 × 64 and 128 × 128 and 128 × 128 and 256 × 256, respectively. The results show that the performance of down-sampling is better than that of up-sampling. On this basis, we further introduced a discriminator and down-sampled the image again to obtain an image with a scale of 32 × 32. We combined three discriminators and input images with scales of 32 × 32, 64 × 64, and 128 × 128 and images of 64 × 64, 128 × 128, and 256 × 256, respectively. The results show that for the combination of three discriminators, two down samplings of the image generated by the generator will achieve better results, but the results are still lower than that of using down sampling once; thus, we used multiscale discriminators with scales of 64 × 64 and 128 × 128 to enrich the detailed texture of the generated images.

The Recognition Performance on Different Expansion Numbers of Training Set
By using the images generated by RAHC_GAN to expand the training set, the recognition accuracy of the classifier can be improved. In order to explore the optimal number of expanded images, we used RAHC_GAN to generate images and evaluated the performance of the classifier in four cases (expanding the data by 0.5 times, 1 times, 2 times, and 3 times the number of original training sets). The experimental results are shown in Figure 6. From Figure 6, we know that expanding different amounts of data on the original data set can improve the performance of the classifier, and after expanding the data by 1 time, the classifier obtains the best performance. However, when the expansion ratio is increased to 2 times and 3 times the original data set, the performance of the classifier does not improve, and the overall performance shows a downward trend. It is worth noting that on ResNet, the performance of the model after the data are expanded by 3 times is lower than that of original data set, which indicates that a greater expansion of the amount of data is not better. A larger expansion of the data may "contaminate" the data set and have a negative impact on accuracy [45]. Therefore, for all the experiments in this paper, we chose to expand the data set by 1 time on the basis of the original data set as the expanded data set for further experiments.

The Effect of Hidden Variables
In order to illustrate the control effect of hidden variables on disease generation, we fixed the random noise z and class c and explored its influence on the generated images by changing the values of the hidden variable. Each dimension of the hidden variable h ([h 0 , . . . , h 5 ]) controls a disease, and we control the generation of the disease by changing the value of a certain dimension. Since this work studies a single disease class of a single leaf image and does not consider the existence of two diseases on a single image, we change the size of a certain dimension in the range of −1 to 1 with an increasing value of 0.25 and set the values of other dimensions to 0 for the experiment. The results are shown in Figure 7. From Figure 7, we know that when the hidden variable takes different values, the area size and the severity of leaf disease in the generated images are not exactly the same. With the increase of the hidden variable, the disease area of spot, septoria leaf spot, bacterial spot, and late blight gradually increases, indicating that the disease suffered by the leaves is becoming more and more serious [46]. Moreover, when the value of the hidden variable increases, the septoria leaf spot becomes deeper and deeper, and the leaf color difference of yellow leaf curl virus disease becomes bigger and bigger, which indicates that the hidden variable can capture the potential difference information within the class, so it can control the severity of the disease. Therefore, the proposed hidden variable can improve the intra-class difference of diseased leaves, thereby increasing the diversity of generated images. We added hidden variable loss to the full objective loss of RAHC_GAN, which is to enhance the diversity of generated images and reduce the occurrence of mode collapse. In order to explore the importance of the newly added hidden variable loss relative to the overall loss, we set the value of the hyperparameter λ (Equation (8)) ranging from 0 to 1 and the experiment with 0.2 as the step size. In addition, according to the study of Jin et al. [47], we set λ = 6 to make it equal to the number of classes. We trained RAHC_GAN with different λ values, expanded the generated data to the training set, and used it for the training of AlexNet, VGGNet, GoogLeNet, and ResNet; then, the test sets were recognized. According to the recognition accuracy, the generation quality under different λ values was judged. The results are shown in Figure 8. It can be seen from Figure  8 that when λ = 1, the accuracy of each recognition network reaches the highest, which shows that when the hidden variable loss is the same as the proportion of adversarial loss and classification loss, the disease generation ability of the network is optimal.

The Comparative Experiment
In order to evaluate the effect of disease images generated by RAHC_GAN on recognition performance, we compared it with six other generation methods, including conventional augmentation, random erasing [9], ACGAN [27], DCGAN [25], SAGAN [48], and MSGAN [49]. We used different generation methods to generate images and expanded them to the original training set to obtain the expanded training set. Then, we performed training and testing on four mainstream recognition networks of AlexNet, VGGNet, GoogLeNet, and ResNet. The experimental results are shown in Table 5. In Table 5, the baseline indicates that no augmentation method was used to expand the data, that is, only the Train (origin) was used to train the model. It can be seen from Table 5 that the effect of various generation methods on the ResNet recognition network is in a declining state, which may be because we have used the pre-training model for training, and it is difficult to improve the performance. Since conventional augmentation only changes the location and direction of the image, less information is learned, and the improvement in recognition accuracy is limited. Random erasing erases random areas in the image, but it may erase the diseased area, which makes the recognition network learn incomplete disease features. Therefore, its accuracy is lower than that of conventional augmentation. When ACGAN, DCGAN, and MSGAN were used as the generation network to expand the data set, the performance of some recognition networks was improved. However, because the generated disease features are not obvious or there are overlaps between diseases, the performance of some recognition networks declined. On the other hand, RAHC_GAN adds a residual attention mechanism to the generator to enhance the transmission of disease information in the network, which is conducive to generating obvious disease features. Compared with the baseline, the performance of RAHC_GAN on the four classification networks improved by 1.8%, 2.2%, 2.7%, and 0.4%, respectively.
In addition, compared with other data augmentation methods, the expanded data set generated by RAHC_GAN can improve the accuracy of the four recognition networks, which has obvious advantages.
In order to intuitively compare the quality of six classes of leaf disease images generated by different generation methods, we visualized the generated disease images, as shown in Figure 9. In Figure 9, different rows represent leaf disease images generated by different methods, and different columns represent different leaf diseases. It can be seen that the images generated by DCGAN and MSGAN are fuzzy and poor in quality; the images generated by SAGAN are clearer, but the disease features are not obvious. Under the same training setting, due to the insufficient ability of the generator to learn disease features, ACGAN has difficulty generating obvious spot diseases, while the images generated by RAHC_GAN have high resolution and obvious disease features. In addition, combined with Table 5, we can seen that MSGAN, which generates relatively fuzzy images, has achieved better results in the recognition of tomato leaf diseases, while SAGAN with clearer images reduced the performance of the model, because the GAN can not balance image generation and class classification well [50]. In contrast, RAHC_GAN adds a residual attention block to make the disease information of the generated images more obvious, which helps the discriminator to classify correctly. The multi-scale discriminators introduced can enrich the detail texture of the generated images, so that the generated images are clear with obvious disease features. In order to further illustrate that the images generated by RAHC_GAN have obvious disease features, we used Train (Origin) as the training set and the generated images as the test set to conduct experiments on different network architectures. Since each class in the origin test set (Test) contains 250 images, we also generate 250 images for each class and use them as the test set. The results are shown in Figure 10. From Figure 10, we know that on different classifiers, the recognition accuracy of leaf diseases reached the highest after adding RAHC_GAN (except for the result on ResNet), indicating that the images generated by RAHC_GAN are more similar to the original images. We noticed that the accuracy of RAHC_GAN on the ResNet classifier is lower than SAGAN, which may be because the pre-trained model used is more sensitive to the leaves generated by SAGAN. In addition, the images generated by DCGAN and MSGAN are relatively fuzzy, and the disease feature are not obvious, so they cannot be classified correctly. On the whole, compared with other generation methods, the images generated by RAHC_GAN can be recognized more accurately, indicating its effectiveness in generating leaf disease features.

The Ablation Experiment
The data augmentation method RAHC_GAN proposed in this paper mainly adopts three improvement strategies to improve the baseline model ACGAN: the addition of residual attention blocks, the introduction of hidden variables, and the use of multi-scale discriminators. In order to verify the effectiveness of these three improvement strategies, we trained a generated adversarial network based on different network configurations and combined the generated images with the original training set as an extended training set, which is trained on four recognition networks of AlexNet, VGGNet , GoogLeNet, and ResNet, and used the test set to evaluate the performance of different networks. The results are shown in Table 6. It can be seen from Table 6 that after adding the residual attention block to ACGAN, the recognition accuracy of AlexNet, VGGNet, GoogLeNet, and ResNet has been improved, indicating that the residual attention block can improve the disease generation ability of the network. On this basis, by adding the hidden variable input and hidden variable loss function, the performance of the four recognition networks has been greatly improved. This is because the hidden variable has a control effect on the generation of diseases and can supplement intra-class information in the generated images. In addition, the constraint of the hidden variable's loss increases the diversity of the generated images, so that the recognition network can learn more disease characteristics. Finally, the multi-scale discriminator strategy is applied to RAHC_GAN, and the performance is also improved on the four recognition networks, because the multi-scale discriminator enriches the detailed texture of the generated images and can generate complete local disease information. It can be seen that the three proposed improvement strategies are beneficial to the generation of diseased images, which can improve the recognition performance of the classifier.

The Recognition Performance on Other Plants
RAHC_GAN can generate leaves with obvious disease features on tomato plants. At the same time, we hoped that it could also have a good disease generation ability on other plants to improve the performance of the recognition network. In order to verify whether RAHC_GAN has a good ability to generate other plants, we chose apple, grape, and corn leaves in the Plant Village data set. The apple data include four diseases (black spot, black rot, cedar-apple rust, and healthy), the grape data include four diseases (black rot, wheel spot, brown spot, and healthy), and the corn data include four diseases (gray leaf spot, common rust, healthy, and northern leaf blight), as shown in Figure 11. We use the same method for image generation, data expansion, and disease identification on these three plants. The specific information of the data set and the experimental results of RAHC_GAN on these three types of data sets are shown in Tables 7 and 8, respectively.   From Table 8, we can see that after using RAHC_GAN for data augmentation, the performance of the three recognition models on apple, grape, and corn data has been improved in varying degrees, proving that the RAHC_GAN method has a generalization ability on different data sets. It is worth noting that for apple and corn data, using ACGAN for expansion can improve the recognition performance of the classifier. However, for grape data, on the AlexNet and GoogLeNet recognition networks, using ACGAN for expansion will reduce the recognition performance. This is because the four classes of apple and corn disease data are quite different in shape and lesion (Figure 11a,c), while the shapes of the four grape disease data sets (Figure 11b) are very similar, with small intra-class differences, and the difference of the four diseases is only in the disease spots. In this case, ACGAN cannot accurately focus on the lesion area, which leads to an indistinct lesion in the generated images and even reduces the recognition performance of the model. In contrast, because RAHC_GAN adds the residual attention module, it can accurately pay attention to and learn the disease information in the image. Furthermore, the multi-scale discriminator can enrich the detailed texture of the disease spot area and generate the images with obvious disease features. Therefore, in grape data, RAHC_GAN can improve the recognition performance of the classifier.

Discussion
Deep learning training requires a large amount of data. When the data are insufficient, a variety of methods can be used for data augmentation, such as flipping, cropping, etc. Instead of this method, GAN does not require the manual selection of transformation operations, but through the confrontation of generator and discriminator, it simulates the distribution of the original data and generates a group of data. Experimental results show that the extended data generated by GAN can improve the recognition performance of plant diseases. However, because the variability of the image generated by GAN is limited, it is impossible to capture all the possible changes in the image, so the performance improvement is limited. At the same time, the images generated by GAN have limitations. Although adding data can improve the recognition performance, adding too much data in the training set may destroy the original data set and cause performance degradation. Therefore, in future work, we will continue to explore more effective data augmentation methods to generate higher-quality images while maintaining better recognition performance.
In addition, because GAN is a kind of neural network, its training also depends on data. When the amount of data in the training set is too small, the image of the specified class cannot be generated accurately by noise. Therefore, we will also try to use image to image conversion generation instead of noise to image conversion for data augmentation.

Conclusions
In the training of a recognition network, there is a strong symmetric relationship between sufficient and insufficient training samples. In order to solve the problem of insufficient data, we proposed a generative adversarial network model for tomato disease leaf generation, called RAHC_GAN, and evaluated it on four disease identification networks: AlexNet, VGGNet, GoogLeNet, and ResNet. The experimental results show that the expanded data set can meet the large amount of data required for neural network training, which makes the data set expanded by RAHC_GAN have a greater improvement in recognition accuracy under different classification models. At the same time, we found that under the same experimental settings, VGGNet, as the classification network with the most network parameters, has the lowest recognition performance among the four networks without data augmentation, and the performance improvement is the most obvious after using RAHC_GAN for data augmentation, which shows that on the neural network with large parameters, sufficient data can significantly improve its recognition performance.
In comparison with other data augmentation methods, the data set expanded by RAHC_GAN has better performance on the four classification networks, indicating that RAHC_GAN can generate tomato diseased leaf images with obvious disease features from random noise and can effectively solve the problem of insufficient training data. The results of ablation experiments show that in the generation of leaves with the same disease, the newly added hidden variable can learn the potential intra-class change information, so as to enrich the class information of the same disease. The addition of residual attention blocks can make the generator pay attention to the disease area in the image and guide it to generate leaves with obvious disease features. The introduction of a multi-scale discriminator can enrich the detailed texture of the generated images.
The application of RAHC_GAN on apple, grape, and corn data from the PlantVillage data set shows that the proposed method has good generalization ability. In addition to solving the problem of insufficient data for tomato leaf identification, it can also be used for other plant research tasks lacking diseased leaf data.  Data Availability Statement: The PlantVillage data set is available at the following link: https://github.com/spMohanty/PlantVillage-Dataset/tree/master/raw.

Conflicts of Interest:
The authors declare no conflict of interest.