1. Introduction
The Oriental melon (
Cucumis melo L. var.
makuwa) is a widely cultivated fruit in South and East Asia. According to the Korean Statistical Information Service (KOSIS), the total cultivation area in Korea has steadily increased from 3581 hectares in 2017 to 4580 hectares in 2023, with more than 95% of the produce being grown in greenhouses (
https://kosis.kr accessed on 15 May 2025). However, maintaining stable fruit yield and quality remains a primary challenge for farmers. One of the major challenges in greenhouse cultivation is the prevalence of plant diseases, particularly powdery mildew and downy mildew, which significantly impact fruit quality and yield [
1]. One of the major issues in greenhouse cultivation is the prevalence of fungal diseases such as powdery mildew and downy mildew, which thrive under humid and controlled environments [
2] and are difficult to detect in their early stages [
3]. Traditional disease diagnosis relies on expert-based phenotyping, which is labor-intensive, time-consuming, and costly [
4]. With advancements in computer vision and artificial intelligence (AI), deep learning-based plant disease detection has emerged as a promising alternative [
5]. Deep convolutional neural networks (CNNs) have been extensively used to analyze plant leaf images and classify diseases with high accuracy [
6]. However, CNN-based models require large and well-balanced datasets for effective training, which poses a significant challenge in real-world agricultural applications [
7]. Publicly available datasets, such as Plant Village, often suffer from class imbalance, where healthy leaf samples outnumber diseased samples, leading to biased model performance [
8]. To mitigate this issue, data augmentation techniques are commonly employed, including traditional image transformations such as rotation, scaling, and translation [
9]. However, these techniques exhibit inherent limitations, as they lack feature diversity, fail to generate realistic disease patterns, and are unable to preserve critical morphological characteristics, such as vein-associated lesions or powdery textures, thereby increasing the risk of model overfitting [
10].
To address these challenges, generative adversarial networks (GANs), introduced by Goodfellow et al. [
11], have been increasingly used for data augmentation in plant disease classification [
12]. GANs generate synthetic images by learning data distribution through adversarial training between a generator and a discriminator. Existing GAN-based approaches can be categorized into image-to-image translation and noise-to-image generation models. Image-to-image GANs, such as CycleGAN and LeafGAN, modify healthy leaf images by embedding disease symptoms [
13,
14]. However, these methods depend on paired or domain-matching datasets, which are not always available, and often produce repetitive results [
15]. In contrast, deep convolutional GAN (DCGAN), introduced by Radford et al. [
16], generates images directly from random noise, enabling the creation of novel and diverse disease features that were absent in the original dataset [
17]. This noise-to-image approach enhances data diversity and generalization, leading to more robust deep learning models trained on generated images [
18]. Although DCGANs have shown promising results in generating synthetic images for data augmentation, they commonly suffer from limited feature expressiveness, mode collapse, and difficulty in capturing complex structural details that are critical in modeling diverse and fine-grained disease symptoms [
19].
To further improve model stability and feature representation, residual connections have been proven effective in enhancing gradient flow and maintaining feature consistency [
20,
21]. Accordingly, this study applies a simplified residual connection to the DCGAN generator to enhance training stability and structural realism in synthesized disease images. In addition to architectural improvements, other studies have explored various activation functions to improve gradient flow and enhance the non-linear representational capacity of GANs. The rectified linear unit (ReLU) became the standard due to its computational simplicity, but it often suffers from the dead neuron problem [
22]. To address this, Maas et al. [
23] introduced leaky ReLU, which allows a small gradient flow in the negative domain. Clevert et al. [
24] later proposed the exponential linear unit (ELU), offering smoother optimization and faster convergence. More recently, Misra [
25] proposed Mish, a self-regularized non-monotonic activation function that provides smoother and more continuous non-linearity, resulting in enhanced gradient propagation and expressive power. Despite these advances, few studies have systematically examined the combined effects of residual connections and advanced activation functions such as Mish in GAN architectures.
To overcome the limitations of existing GAN-based approaches trained on controlled environment datasets, this study employs images collected from agricultural environments to enable the generative model to learn more realistic and diverse data distributions. This research was carried out in two phases. For the first phase, various data augmentation methods were explored to find the best augmentation method, and then, in the second phase, datasets were applied in classification models to determine performance improvement as well as to find the best-performing classification model.
2. Materials and Methods
2.1. Overview of the Research Methodology
The flow chart in
Figure 1 shows the research methodology. Initially, plant leaf diseases and healthy image data were collected to prepare the dataset. The dataset was split into train and test. The naturally collected dataset has a limited number of images and higher class imbalances. The training data was augmented with different methods: traditional, DCGAN, and modified DCGAN. The dataset after augmentation was applied to different DL image classifiers to determine plant leaf disease classification. Finally, the performance of the different classifier models was compared to determine the best-performing augmentation method, as well as the best-performing classifier models among different DL classifiers.
Figure 2 illustrates the selection process of the modified DCGAN algorithm for the generative image for the purpose of augmentation. The modified DCGAN was developed by modifying the base DCGAN model by adding a residual connection, with different activation functions (RELU, ELU, and Mish). The performance of the augmented datasets generated by these methods is evaluated using quantitative and qualitative metrics. The quantitative metrics used were inception score (IS), Fréchet inception distance (FID), and improved precision and recall (IPR). The qualitative metric used was t-distributed stochastic neighbor embedding (t-SNE). The metrics check the intuitive visual representation of how closely the generated images resemble real samples and check whether the synthetic data effectively captures the variability present in real plant disease images. This methodology enables a rigorous evaluation of modified DCGAN data augmentation techniques for synthetic plant disease image generation.
2.2. Dataset
The oriental melon was cultivated in the greenhouse (
Figure 3) located at 35°57′ N, 128°19′ E (Wolhang-myeon, Seongju-gun, Gyeongsangbuk-do, Republic of Korea). Inside the greenhouse, the average temperature was 29.1 °C, and the humidity was 78.0%. The oriental melon plants were grown without fungicide treatment from June to October 2022. To capture different variations in the leaf disease images, they were obtained on various natural lighting conditions, including morning, afternoon, and evening. The diseases were identified with the guidance of plant cultivation experts. The images were captured using an RGB camera (A32, Samsung electronics, Suwon, Gyeonggi-do, Republic of Korea). A total of 12,000 oriental melon leaf images were collected, comprising 4000 healthy, 4000 downy mildew, and 4000 powdery mildew samples. The dataset was divided into training and test sets at a ratio of 80:20. Data augmentation was applied only to the training set, where an additional 1600 generative images per class were generated using the proposed GAN-based method, corresponding to half of the number of real samples per class.
2.3. Data Augmentation
2.3.1. Traditional Data Augmentation
Traditional data augmentation utilizes various image transformations to increase the diversity of the healthy and diseased leaf images (
Figure 4). The different types of transformation that have been applied to obtain augmented data involve operations such as crop and resize, horizontal and vertical flip, rotation, erasing, and noise injection using a Gaussian blur.
2.3.2. Generative Image Augmentation
DCGAN
Generative adversarial networks (GANs) consist of two neural network blocks, called a generator and a discriminator, that compete with each other (
Figure 5). The generator creates synthetic images from random noise, while the discriminator attempts to distinguish between real and generated images. Through an adversarial training process, both networks improve simultaneously, allowing the generator to create better realistic images. A major strength of GANs is their ability to model data distributions implicitly, without relying on explicit probability models.
A deep convolutional generative adversarial network (DCGAN) is an extension of the basic GAN architecture that leverages convolutional neural networks (CNNs) to significantly improve the quality and stability of image generation (
Figure 6). In DCGAN, both the generator and discriminator are constructed using CNN layers. The generator progressively upsamples a latent noise vector using transposed convolution layers to produce a full-resolution image, employing ReLU and tanh as activation functions. The discriminator processes input images through convolutional layers with LeakyReLU activations and outputs a probability indicating whether the image is real or generated. Training in DCGAN follows the standard adversarial framework, alternating between updating the discriminator to classify real and generated images correctly, training the generator to produce realistic images that can fool the discriminator. Due to its convolutional architecture and carefully designed layer configurations, DCGAN has demonstrated success in generating diverse and high-quality images across a wide range of domains.
Modified DCGAN
Modified DCGAN was achieved by modifying the base DCGAN model by adding a residual connection with various activation functions to achieve higher performance image generation with lesser computation requirements and simpler architecture modifications.
Residual Connection in DCGAN
Residual connections, also known as skip or shortcut connections, are a fundamental architectural component introduced to address the vanishing gradient problem, which hinders the training of deep neural networks. First introduced in ResNet (residual network), this technique facilitates smoother gradient propagation by allowing information to bypass certain layers, making it easier to train deeper networks efficiently. In conventional DCGANs, as the generator deepens, training may become more challenging, affecting the quality of generated images. To mitigate this issue, this study incorporates residual connections using 1 × 1 kernel convolutions in all layers of the generator (
Figure 7). This design ensures a smoother information flow, helping the model retain important features throughout the generation process. Our aim in this is to prevent performance degradation in deeper networks and achieve more stable training.
This lightweight residual mechanism does not alter the overall depth or structure of DCGAN but enables smoother training and improves the stability of the image generation process. Ultimately, this modification aims to enhance the quality and diversity of the synthesized plant disease images without increasing architectural complexity. Accordingly,
Figure 8 shows the architecture of the discriminator network, which is the same as the conventional DCGAN.
Activation Functions
The choice of activation function in the generator significantly influences the model’s training efficiency and the quality of generated images. While ReLU is commonly used in DCGANs, this study compares three activation functions, namely ReLU, ELU, and Mish, to evaluate their impact on image generation.
ReLU is one of the most widely used activation functions, which sets negative values to zero while keeping positive values unchanged. It is computationally efficient and helps mitigate the vanishing gradient problem:
ELU allows small negative outputs, unlike ReLU, maintaining a non-zero gradient for negative values. This property helps improve gradient flow and stabilizes training. Additionally, ELU’s outputs are centered around zero, which can enhance learning efficiency:
Mish is a recently introduced activation function that is smooth and non-saturating, allowing for better gradient propagation. Compared to ReLU and ELU, Mish provides a more continuous activation curve, which can improve overall training dynamics and enhance the quality of generated images:
Three types of modified DCGAN were developed using a combination of residual connections with activation functions as follows:
- (a)
DCGAN + Residual Connection (ReLU).
- (b)
DCGAN + Residual Connection (ELU).
- (c)
DCGAN + Residual Connection (Mish).
2.4. DL CNN Classification Models
To validate the performance of the proposed approach, we conducted a set of experiments using a real dataset combined with various augmented datasets. The DL CNN leaf disease classification models used are AlexNet, VGGNet, and GoogLeNet.
AlexNet was introduced in 2012 at the ILSVRC (ImageNet Large Scale Visual Recognition Challenge) and marked a breakthrough in deep learning-based computer vision. The model consists of a total of eight layers: five convolutional layers and three fully connected layers. AlexNet uses data augmentation techniques like cropping and horizontal flipping to improve generalization, along with dropout as a regularization method to reduce overfitting. By leveraging powerful GPUs, AlexNet accelerates training and achieves significantly lower error rates than previous models, drawing widespread attention.
Developed by the Visual Geometry Group (VGG) in 2014, VGGNet was designed to explore the relationship between convolutional neural network (CNN) depth and performance [
26]. VGGNet comes in 16-layer (VGG16) and 19-layer (VGG19) versions, and consistently uses small 3 × 3 filters throughout its layers. By deepening the network while maintaining a manageable number of parameters, VGGNet achieves substantial improvements in accuracy and generalization. Its deep structure allows it to capture fine image details effectively, and it has since become a foundational architecture in CNN research.
Introduced in the 2014 ILSVRC, GoogLeNet is based on the Inception architecture, which utilizes an Inception module to design an efficient network structure [
27]. The Inception module combines filters of different sizes, including 1 × 1, 3 × 3, and 5 × 5, within the same layer, allowing the network to capture features at various receptive fields simultaneously. Although it is a 22-layer network, GoogLeNet has a relatively small number of parameters, optimizing both computation and memory usage. This efficiency made GoogLeNet known for its improved speed and performance.
2.5. Evaluation Metric
2.5.1. DCGAN Evaluation Metric
Quantitative Metrics
The performance of a GAN is primarily determined by the quality and diversity of the images it generates. In general, the closer the generated images are to real images in terms of visual fidelity and variation, the better the model’s performance. Generative performance of the GAN can be evaluated using two quantitative metrics: inception score (IS) and Fréchet inception distance (FID).
The inception score (IS), proposed by [
28], combines two key criteria, fidelity and diversity, into a single metric. The IS captures the desirable properties of generated samples by measuring the average Kullback–Leibler (KL) divergence between the conditional label distribution and the marginal label distribution of generated images. A higher IS indicates a better model with sharper and more diverse samples.
The formula for IS is as follows:
The is the predicted label distribution for a generated image, and is the overall distribution of labels across the dataset. The KL divergence measures the difference between the classifier’s predictions for individual images and the overall distribution, promoting models that produce images with high label confidence and a broad variety of generated samples.
FID is a method for embedding generated samples in the feature space [
29]. The mean and covariance between the generated samples and the real data are computed on the assumption that the generated samples follow a multivariate Gaussian distribution. The FID between the two Gaussian distributions is then computed to assess the quality of the generated samples. A lower FID score indicates a smaller discrepancy between the synthetic and real data distributions.
The formula for FID is as follows:
The represent the mean and covariance of the real data distribution, while correspond to those of the generated data distribution.
Qualitative Metric
To qualitatively assess the distribution of the generated images, t-distributed stochastic neighbor embedding (t-SNE) is employed as a visualization technique [
30]. t-SNE is a non-linear dimensionality reduction algorithm that projects high-dimensional data into a lower-dimensional space, typically two or three dimensions, while preserving the local relationships among data points. This allows for an intuitive visual representation of how closely the generated images resemble real samples and whether the synthetic data effectively captures the variability present in real plant disease images. In this study, t-SNE is applied to the feature representations of real and augmented images. They are embedded into a shared two-dimensional space, enabling a qualitative comparison of their distributions.
First, the t-SNE algorithm begins by computing the pairwise similarities among all data points in the high-dimensional space. This is performed using a Gaussian kernel with a bandwidth
. The similarity between two data points
and
is determined with the following formula:
where
denotes the conditional probability of selecting point
, assuming point
is the center of a Gaussian distribution in the high-dimensional space. The denominator ensures normalization so that the sum of conditional probabilities over all possible neighbors equals 1.
In the lower-dimensional space, the probability distribution is defined using a Student’s t-distribution with one degree of freedom instead of a Gaussian distribution. This is performed to address the crowding problem, which occurs when mid-range distances in the high-dimensional space are overly compressed in the low-dimensional space. The similarity between embedded data points
and
in the lower-dimensional space is computed as follows:
where
and
are the corresponding embeddings of the high-dimensional data points in the lower-dimensional space. The Student’s t-distribution ensures that mid-range distances are expanded, improving cluster separation. The distribution
is normalized so that the sum of all probabilities equals 1.
The objective of t-SNE is to minimize the difference between the high-dimensional probability distribution
and the low-dimensional probability distribution
. This difference is quantified using the KL divergence, which measures how much information is lost when
is used to approximate
:
Since KL divergence is non-symmetric, it penalizes points that are close in high-dimensional space but far apart in low-dimensional space more than the reverse. This property ensures that local structures are preserved in the embedding.
To minimize the KL divergence, t-SNE employs stochastic gradient descent (SGD) to iteratively update the positions of points in the low-dimensional space. The gradient of the KL divergence with respect to each low-dimensional point
is given by the following:
where
is the learning rate.
represents the iteration number. The attractive force
moves similar points closer together, while the repulsive force expands dissimilar points. The term
ensures that larger distances contribute less to the gradient, helping stabilize the optimization process. This gradient descent step is repeated until the KL divergence reaches a minimum, resulting in an optimal low-dimensional representation.
This qualitative evaluation helps visualize the extent to which the modified DCGAN improves over the conventional DCGAN augmentation. Additionally, it serves as a complementary evaluation to quantitative metrics such as IS and FID, providing deeper insights into the model’s ability to synthesize realistic plant disease images.
2.5.2. Classification Model Evaluation Matrix
The performance of the classification models was evaluated using four standard metrics: accuracy, precision, recall, and F1-score. Each row corresponds to an actual class, while each column represents a predicted class. Accuracy, precision, recall, and F1-score are defined as follows:
For multi-class classification involving three categories, a confusion matrix was constructed to evaluate model performance. For each class, true positives (TP) represent correctly identified samples of that class, false positives (FP) denote samples from other classes misclassified as that class, false negatives (FN) refer to samples of that class misclassified into other categories, and true negatives (TN) correspond to all remaining correctly classified samples not belonging to that class.
3. Results
The results section compares the base DCGAN model with the modified DCGAN architecture. The performance evaluation will be based on quantitative and qualitative metrics to deduce the best modified DCGAN architecture among them all.
After finding the best modified DCGAN architecture, the output images from the dataset augmentation block of the traditional methods, DCGAN, and the modified DCGAN are inputted into various DL CNN disease classification models for differentiation of healthy, downy mildew, and powdery mildew. The performance of the classification models based on the various augmentations was evaluated and compared.
3.1. Quantitative Performance Comparison Between DCGAN and Modified DCGAN
To evaluate the impact of architectural modifications on generative models, we compared the conventional DCGAN with the modified DCGAN. Each model’s performance was quantified using IS and FID. These metrics comprehensively evaluate image clarity, similarity, and representational completeness.
As summarized in
Table 1, the IS results demonstrated considerable differences among model configurations for generating different disease images. The DCGAN achieved IS values of 1.56 for downy mildew, 1.77 for healthy leaves, and 2.31 for powdery mildew, whereas with residual connections and ReLU being added, the IS values improved to 1.60, 1.76, and 2.36, respectively, showing improvement. Further improvement was observed in the DCGAN model when residual connections and ELU were applied, with the highest gain observed in the powdery mildew. Finally, the model that combined residual connections and the Mish function showed the highest IS values for all classes, suggesting an improvement in the % value for classes with respect to conventional DCGAN. These results indicate that models with residual connections and Mish improved the clarity and diversity of different disease classes of generative images.
The FID results also supported the IS value, suggesting that the best performance of the DCGAN model is with the residual connection and Mish function. The DCGAN showed relatively high FID values of 204.04 for downy mildew, 224.94 for healthy, and 169.98 for powdery mildew, suggesting less similarity to real images. In comparison, the DCGAN model with residual connection and Mish achieved lower FID values by 7.87% for downy mildew, 7.26% for healthy leaves, and 6.64% for powdery mildew, respectively, suggesting higher resemblance with original disease classes for generative images.
Additionally, as per the IS value representing image similarity and diversity available in the generative images of different diseases, for the downy mildew class, a generative image model can be ranked as DCGAN + residual connection (Mish), DCGAN + residual connection (ReLU), and DCGAN + residual connection (ELU), whereas for healthy and powdery mildew classes, the generative image model can be ranked as DCGAN + residual connection (Mish), DCGAN + residual connection (ELU), and DCGAN + residual connection (ReLU).
Similarly, as per the FID value representing image distribution, for downy mildew and healthy classes, the generative image model can be ranked as DCGAN + residual connection (Mish), DCGAN + residual connection (ELU), and DCGAN + residual connection (ReLU); however, for powdery mildew, DCGAN + residual connection (ELU) performed best.
To complement the quantitative analysis,
Figure 9 presents visual examples of generative leaf images generated by the DCGAN and modified DCGAN for three disease categories. The DCGAN with residual connection and Mish function visually produced samples with greater textural variation and morphological detail for different classes of disease, including downy mildew, healthy, and powdery mildew. These visual patterns generated for the different diseases resemble real images and are consistent with the numerical results, reinforcing the combination of residual learning and the Mish activation function.
In summary, the IS value, FID, and visual representations of generative images for different classes of healthy and disease images suggested architecture enhancements that led to improvement. The model incorporating residual connections and the Mish activation function consistently outperformed DCGAN and all other modified DCGANs. Therefore, the disease classification model that includes the DCGAN with the residual connection and Mish activation function is chosen as the modified DCGAN augmentation.
3.2. Qualitative Performance Comparison Between Traditional, DCGAN, and Modified DCGAN Augmentation
To evaluate how different data augmentation methods influence feature representation and variation of pattern representation with real images, we also conducted t-SNE visualization for each method. The three visualizations that correspond to traditional augmentation, DCGAN, and modified DCGAN (DCGAN with residual connection and Mish activation function) are shown in
Figure 10. Each Figure illustrates the distribution of real and generated images in the feature space for different augmentation methods, with yellow representing downy mildew, purple representing powdery mildew, and green representing healthy leaves. Real images are denoted by circles and generative images by crosses.
A consistent pattern was observed across all three t-SNE visualizations for the generative and real images; they did not form overlapping clusters in feature space. The observed separation is due to image resolution, since t-SNE is sensitive to differences in input resolution. The resolution of the real images that were captured was very high, approximately from 1000 × 1000 to 1300 × 1300 pixels, whereas the generative image resolution was small, with 64 × 64 pixels, due to the higher computation requirement for generating high-resolution images. Furthermore, for traditional augmentation, the image size chosen was resized to 224 × 224 pixels, as per the classification model standard. Therefore, while analyzing distributional and representativeness, we focused our analysis on the internal clustering patterns among the generative images’ classes and real images’ distribution in feature space.
In the traditional augmentation setting (
Figure 10a), generative images and various classes of images, including downy mildew, healthy leaves, and powdery mildew, seem to overlap in feature space with unknown distinction compared to the real images. Particularly, the generative image for the powdery mildew class showed significant overlap with the healthy class. This indicates that traditional augmentation was not sufficient to preserve class-specific characteristics and different feature spaces between the overlapping disease and healthy classes. The resulting clusters were less distinct, and class boundaries were poorly defined.
Although the overall distribution patterns of DCGAN and modified DCGAN were similar, the modified DCGAN produced slightly more dispersed clusters while reducing class interference. As shown in
Figure 10b,c, the feature spaces of the DCGAN and modified DCGAN exhibit clear and distinct separation among the three classes.
In summary, GAN-based augmentation methods outperformed traditional augmentation by generating images that resembled real image disease-specific feature distributions. While both DCGAN and modified DCGAN improved the structure of generative images and the distinction between class features, modified DCGAN offered additional advantages by enhancing intra-class diversity, which can lead to improved generalization in classification models.
3.3. Disease Classification Model Performance on Augmented Datasets
This section compares the effects of three data augmentation strategies on the classification performance of AlexNet, VGG16, and GoogLeNet. The augmentation methods applied include traditional augmentation, DCGAN, and modified DCGAN (DCGAN with residual connection and Mish function). Evaluation metrics used for the DL CNN classification model in this comparison include precision, recall, F1-score, and accuracy. To further validate the results, confusion matrices were analyzed for each combination of classification model and augmentation method.
Evaluation results were shown in
Table 2 for AlexNet, VGG16, and GoogLeNet classification models; performance increased progressively when image augmentation changed from traditional augmentation to DCGAN, and it increased further when changed to modified DCGAN. For the AlexNet model, the modified DCGAN-based augmentation achieved the best overall performance, surpassing all other augmentation methods with a precision, recall, F1-score, and accuracy value of 0.975. For the VGG16 model, augmentation methods DCGAN and modified DCGAN have similar performance with an overall precision, recall, F1-score, and accuracy value of 0.954. However, the performance of DCGAN and modified DCGAN was significantly greater than traditional augmentation. Finally, for the GoogLeNet, the modified DCGAN outperformed all the augmentation methods by a slight increment of overall evaluation metrics over DCGAN and traditional augmentation methods.
The classification model performance was further analyzed using a confusion matrix to determine class-wise prediction and misclassification patterns. The test image sample for each class was 800. According to
Figure 11a, for the AlexNet model with different augmentation methods, the traditional augmentation performed well for the downy mildew and healthy classes; however, it showed the poorest performance for the powdery mildew class. In contrast, both the DCGAN and modified DCGAN methods achieved superior classification for powdery mildew, with fewer than three misclassifications. Furthermore, for
Figure 11b, for the VGG16 model, modified DCGAN augmentation showed the best class-wise classification performance, with minimum misclassification among all augmentations. Furthermore, for
Figure 11c, for the GoogLeNet model, the modified DCGAN overall performance for downy mildew and healthy class was the best; however, for powdery mildew, it slightly lags behind traditional augmentation.
When comparing the modified DCGAN augmentation’s impact on the classification performance across different diseases and health classes, the GoogLeNet classifier demonstrated the best performance for the downy mildew and healthy classes; however, it showed relatively poor performance for the powdery mildew class.
These findings indicate that GAN-based data augmentation contributes meaningfully to the improvement of classification performance. In particular, modified DCGAN enhanced both the quality and diversity of synthetic images, leading to improved generalization overall for DL CNN-based classification models. The improvements observed for evaluation metrics and confusion matrix analysis validate the effectiveness of DCGAN and modified DCGAN generative augmentation approaches.
4. Discussion
The study evaluated different data augmentation strategies, traditional augmentation, DCGAN, and modified DCGAN, and their effects on generating images of healthy and diseased leaves. The two-phase analysis approach involved first selecting the best modified DCGAN augmentation model, and second, analyzing and comparing the improvement brought by different types of augmentation on various class prediction models. The result confirms the best generative augmentation method as the modified DCGAN with residual connection and Mish activation function with the highest IS (1.74 for downy mildew, 1.91 for healthy leaves, and 2.57 for powdery mildew) and decreasing FID value (187.99 for downy mildew, 208.6 for healthy leaves, and 158.69 for powdery mildew). This is due to the inclusion of the residual connection placed in the generative part of the modified DCGAN; the vanishing gradient loss was removed, allowing the model to generate different class images; and, with the Mish activation function, its self-regularized non-monotonic characteristics allowed for enhanced gradient propagation and expressive power, inducing greater resemblance and diversity in producing the generative image similar to real image classes.
Furthermore, t-SNE distribution for the modified DCGAN with residual connection and Mish function showed higher accuracy in representing disease-specific features and inter-class differences in feature space, confirming class difference between disease and healthy generative images.
Regarding the classification models under different augmentation scenarios, the classification model performance was compared with precision, recall, F1-score, and accuracy. The modified DCGAN showed an overall better performance for every classification model, where GoogLeNet had the highest classification results for all the augmentation methods, and is, therefore, the best classification model among them all. For augmentation performance comparison on the basis of classification, DCGAN and modified DCGAN perform almost the same, with a slight increment in the performance metrics for modified DCGAN. However, traditional augmentation has less classification performance. The GoogLeNet performed best, and was supported by its fast and efficient training time and moderate parameter size [
31]. However, for AlexNet, although the training time was fast, it had a small parameter size, whereas for VGG16, it had a slow training time and a large parameter size, requiring higher computation time and memory [
32,
33].
Similarly, the results can be interpreted from the confusion matrix, showing that GoogLeNet performs the best, overall, for disease and healthy classification but struggles with powdery mildew across all augmentation methods. The VGG16 model, however, performs the worst across all classes and augmentation techniques, likely due to its larger parameter size and higher training data requirements, which may hinder effective feature learning in this context. Future work could extend this approach to various plant species, disease types, and growth stages to evaluate its broader applicability and robustness across diverse agricultural conditions.
The study demonstrated that structure-aware GAN-based data augmentation can effectively improve the generation of different plant disease classes using real-world data. The findings highlight the potential of this approach to support accurate disease detection in agricultural applications. The key contribution of this study lies not in the simple application of existing GAN frameworks, but in the theoretical and architectural enhancement of the DCGAN structure through the integration of residual connections and the Mish activation function. This combination enables a more stable training process and richer feature representation that is specifically optimized for plant disease imagery. Moreover, the effectiveness of the proposed model allows for the overall improvement of the classification model performance, suggesting practical applicability.
5. Conclusions
This study evaluated the effectiveness of different data augmentation strategies, including traditional augmentation, DCGAN, and a modified DCGAN, for leaf disease classification. In particular, various modified DCGAN architectures incorporating residual connections and different activation functions were analyzed. Among these, the DCGAN model with residual connections and the Mish activation function demonstrated the best overall performance, showing an improvement in IS from 7.9 to 11.54% and a reduction in FID from 6.6 to 7.8% compared to the conventional DCGAN.
For the classification task, GoogLeNet combined with the modified DCGAN augmentation achieved the highest performance across all models, with average precision, recall, and F1-score values of approximately 98%. These results indicate that generative model-based data augmentation effectively mitigates the challenges associated with limited datasets in plant disease classification. By generating balanced and diverse generative images, the approach enhances the robustness and generalization ability of deep learning classifiers.
Furthermore, the findings emphasize the critical role of activation function selection in optimizing generative model performance, in comparison to the traditional augmentation methods, with Mish-based modified DCGAN augmentation performing best. Future research should focus on exploring alternative GAN architectures for better generative images and understanding the impact of different classification algorithm performance differences on different classes of disease to allow for a generalized classification model. Finally, integrating these augmentation methods into real-world agricultural systems and validating their effectiveness under practical field conditions will be essential to assess their applicability.
Overall, this study demonstrates the potential for application of the modified DCGAN for generative models to improve precision agriculture applications, particularly by properly detecting and classifying different plant diseases through enhanced data augmentation and deep learning strategies.