Skip Content
You are currently on the new version of our website. Access the old version .
Engineering ProceedingsEngineering Proceedings
  • Proceeding Paper
  • Open Access

2 February 2026

Enhancing Imbalanced Data Classification Using Style-Based Generative Adversarial Network-Based Data Augmentation: A Case Study of Computed Tomography Images of Brain Stroke †

,
,
,
,
and
Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan
*
Author to whom correspondence should be addressed.
Presented at the 8th International Conference on Knowledge Innovation and Invention 2025 (ICKII 2025), Fukuoka, Japan, 22–24 August 2025.
This article belongs to the Proceedings 8th International Conference on Knowledge Innovation and Invention

Abstract

Stroke is a leading cause of death and disability. However, brain computed tomography image classification using machine learning and deep learning algorithms frequently suffers from a class imbalance problem, making it difficult to effectively extract deep-detailed features from instances of minority stroke lesions. In this study, we systematically implement three style-based generative adversarial network (StyleGAN)-based data augmentation approaches: StyleGAN2, StyleGAN3, and conditional StyleGAN3 to address class imbalance in brain stroke classification. Furthermore, we deploy an ensemble learning-based deep neural network to enhance the effect of those data augmentation algorithms on downstream classification tasks. Experimental results show that StyleGAN3 effectively outperforms the other two StyleGAN data augmentation approaches in terms of precision, recall, and F1-score when addressing highly imbalanced brain stroke classification. Overall, this paper demonstrates the efficacy of three StyleGAN-based data augmentation approaches in addressing imbalanced brain stroke detection.

1. Introduction

Stroke is a leading cause of death and disability worldwide, particularly in elderly populations [1]. According to the World Health Organization, approximately 15 million people worldwide experience stroke annually. The report indicates one-third die and another one-third are left with permanent disabilities [2]. Recently, computed tomography (CT) scans have been widely used in the clinical domain for brain stroke diagnosis due to their speed and ability to provide real-time brain structure information. This facilitates differentiation between hemorrhagic and ischemic strokes [3]. However, in actual medical applications, stroke examples are significantly fewer than normal samples, leading to an imbalance in class distribution. This causes traditional classification models to lean towards normal examples during training, thereby negatively affecting stroke detection accuracy.
To address the issue of class imbalance, He and Garcia [4] addressed traditional classification models that tend to overlook minority classes when class distributions are uneven. To solve the class imbalance problem, many studies have proposed sampling-level approaches, including under-sampling and over-sampling. Under-sampling involves randomly removing examples from the majority class to balance data distribution. While this approach can reduce class size disparities, it may lead to important information loss [5]. In contrast, the over-sampling method is used to directly extend the training data size by generating synthetic minority class samples.
In recent years, GAN [6] has gained great achievements in deep learning. GANs offer high-quality synthetic images through adversarial training between generators and discriminators. According to Frid-Adar et al.’s study [7], by generating synthetic images, the GAN model significantly improved imbalanced liver lesion recognition rates with convolutional neural networks. Although GANs have been widely applied to various image generation tasks, Saad et al. [8] indicated that typical GANs often suffer from mode collapse during training. To address this issue, some studies have proposed several improved GAN-based architectures. Among these architectures, Mirza and Osindero [9] proposed a conditional GAN, which incorporates label conditions into inputs to enable generators to produce images for specific label(s). However, due to classical GAN architecture limitations, CGAN will generate examples with limited diversity. To capture complex and deep features, Karras et al. [10] proposed the StyleGAN architecture. This method introduces style control mechanisms into generators and uses adaptive instance normalization (AdaIN) to adjust image style layer by layer during training. This enables enhanced diversity in generated images. However, the original StyleGAN still faces problems like feature mixing and local fake artifacts.
To address these limitations, Karras et al. [11] proposed StyleGAN2, which improves path length regularization in a generator structure, thereby effectively enhancing generated image quality. Additionally, to address overfitting issues in small-sample cases, Karras et al. [12] proposed the StyleGAN2-based adaptive discriminator augmentation (ADA) strategy, which significantly improves generation quality for small datasets. Recently, StyleGAN3 has introduced alias-free convolution to effectively avoid aliasing issues for generated images [13], ensuring virtual image correctness after rotation or translation. Furthermore, conditional StyleGAN3 can produce virtual images according to specific label(s).
In this study, we applied a StyleGAN-based model to boost imbalanced stroke detection performance by balancing class distribution. Furthermore, we adopted an ensemble learning predictive model to integrate prediction results based on multiple deep learning models for improving overall classification performance. We used evaluation metrics such as precision, recall, and F1-score to assess imbalanced data classification accuracy. Our experimental results demonstrated that among the three StyleGAN-based approaches, StyleGAN3 achieved the best classification performance in dealing with imbalanced stroke data.

3. Methodology

The research method for generating synthetic stroke images is depicted in Figure 1.
Figure 1. Stroke data augmentation using StyleGAN.

3.1. StyleGAN Models for Stroke Image Generation

The developed method comprised two modules: StyleGAN models, such as StyleGAN2 (SG2), StyleGAN3 (SG3), and conditional StyleGAN3 (CSG3), to extend training data size, and an ensemble learning module to boost detection accuracy of imbalanced stroke. The parameter settings of the three StyleGAN models are provided in Table 1.
Table 1. StyleGAN model parameter configuration.

3.2. Ensemble Learning Classification Model for Stroke Detection

After performing data augmentation, this study adopted an ensemble learning classification algorithm to improve the stability and generalization capability of the brain CT image for binary classification. To avoid overfitting, the final classification outcome was obtained from three deep learning networks: EfficientNet-B0, Xception, and InceptionV3. In each sub-network, we added batch normalization, ReLU activation, and dropout layers to learn brain stroke data characteristics with a 256 × 256 CT scan image size. All sub-networks used hyperparameter settings during training: a batch size of 16, Adam optimizer with learning rate 1 × 10 4 , and 40 epochs. We set this to a loss function as “CrossEntropyLoss” indicator. In the ensemble learning stage, we used soft voting to fuse predictions from all sub-models. For each sub-model M i , the predicted result of input x denoted as label k , as seen in Equation (11). We averaged the predictive outputs across all sub-networks to obtain average probability for each label, as seen in Equation (12). We selected the label with highest average probability as the final predictive result, as seen in Equation (13).
p k i = P y = k x ; M i
p k ¯ = 1 N i = 1 N p k i
y ^ = arg max k p k ¯

4. Experiment

All experiments were conducted on a computer equipped with an Intel(R) Core(TM) i7-14700KF CPU (27-processor) and Nvidia GeForce RTX 4090 graphics (24 gigabytes random access memory) running Ubuntu 24.04 LTS. The deep learning model was built using PyTorch. We used the PyTorch container version 25.03-py3 from NVIDIA NGC, which includes PyTorch version (2.7.0a0 + 7c8ec84dab) with compute unified device architecture 12.8.1 support.

4.1. Case Description

The brain stroke CT image dataset is publicly available at https://www.kaggle.com/datasets/iashiqul/brain-stroke-prediction-ct-scan-image-dataset%20 (accessed on 28 January 2026). We modified it to establish an imbalanced scenario. To avoid data leakage, we split the original dataset into the training sample set which included 1080 normal images and 108 stroke images (i.e., an imbalance ratio of N normal N stroke = 10 ), the validation sample set which included 307 normal images and 130 stroke images, and the testing sample set which included 157 normal images and 64 stroke images.

4.2. Experiment Setting

In this study, we compared three types of StyleGAN data augmentation approaches for brain CT image classification under an imbalance ratio of IR = 10. The three approaches included “Geometric augmentation (GEO) + SG2”, “GEO + SG3”, and “GEO + CSG3”. We comprehensively analyzed their effects for stroke detection. SG2 employed a generative adversarial network to augment minority class stroke images. SG3 utilized an alias-free convolution mechanism to enhance the generation stability of images. CSG3 further incorporated conditional label variables to control the label of artificial images. By defining the stroke class as a positive class and the normal class as a negative class, we calculated true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Four evaluation metrics were used to assess stroke detection performance (Equations (14)–(17)).
A c c u r a c y = T P + T N T P + F P + T N + F N
R e c a l l = T P T P + F N
Precision = T P T P + F P
F 1 - s c o r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l

4.3. Results

We averaged predictive results over 30 independent experiments. The results of average and standard deviation (SD) are listed in Table 2.
Table 2. Results of average and standard deviation.
Among the three StyleGAN-based approaches, SG3 achieved the best classification accuracy score of 0.852 ± 0.01. CSG3 presented a better classification accuracy score of 0.837 ± 0.01, and SG2 achieved a score of 0.829 ± 0.02, as shown in Table 2 and Figure 2. For the stroke label, SG3 demonstrated a superior score of 0.496 ± 0.05 compared to the SG2 score of 0.414 ± 0.07 and the CSG3 score of 0.441 ± 0.05 in terms of recall. The F1-score had been improved from 0.580 ± 0.07 by SG2 to 0.659 ± 0.05 by SG3. This indicated that SG3’s alias-free architecture effectively enhanced performance for the difficult-to-classify minority stroke class. For the normal class, precision and recall remained high across all three StyleGAN-based approaches. This demonstrated that the data augmentation strategies did not compromise the classification performance of the majority class. From Table 2, SG3 outperformed the other two approaches in terms of Macro_f1-score at 0.782 ± 0.03 and Weighted_f1-score at 0.834 ± 0.02. This indicated SG3’s ability to deal with imbalance stroke class cases.
Figure 2. Classification accuracy using StyleGAN2, StyleGAN3, and conditional StyleGAN3.

4.4. Comparison for FID Results

We selected FID to measure image generation quality using Equation (18).
FID = μ r μ g 2 + Tr Σ r + Σ g 2 Σ r Σ g 1 / 2
where · represents the Euclidean norm, Tr () denotes the matrix trace (sum of diagonal elements), and Σ r Σ g 1 / 2 represents the matrix square root of Σ r Σ g . μ r and μ g represent the average vectors of real and generated images, respectively. Σ r and Σ g are the corresponding covariance matrices. FID was used to measure the distance between generated and real image distributions. Lower FID scores indicated that generated images were closer to real images. As shown in Figure 3, CSG3 (in the red line) achieved the lowest FID.
Figure 3. FID results for StyleGAN2, StyleGAN3, and conditional StyleGAN3.
Here, SG2 achieved the lowest FID score of 65.91 with 400 k images. SG3 reached FID’s score at 101.04 with 1000 k images. CSG3 achieved the best FID at 63.62 with 1000 k images. According to these results, we suggest that incorporating the majority class features can effectively help improve the identification of minority class instances in brain structures.

5. Conclusions

To address the class imbalance problem in stroke classification, we compared StyleGAN-based over-sampling models: StyleGAN2, StyleGAN3, and conditional StyleGAN3. We selected the best generative model among them. Although the SG3 model achieved the best classification performance with an ensemble learning model, on generation quality and model generalization, StyleGAN3 showed overfitting. This deteriorated the SG3 model’s effectiveness under certain parameter settings. In this study, we evaluated classification performance for a highly imbalanced ratio value of 10 between stroke and normal classes. Experimental results validated that StyleGAN3 outperformed both StyleGAN2 and conditional StyleGAN3 methods in terms of precision, recall, and F1-score metrics. Two future research directions include expanding SG3’s validation across different medical datasets and using Keras tuner to optimize hyperparameters of the StyleGAN model.

Author Contributions

Software, writing—original draft preparation, J.-S.L.; conceptualization, methodology, writing—review and editing, L.-S.L.; writing—original draft preparation, P.-C.C.; formal analysis C.-E.X.; writing—original draft preparation, Y.-Y.C.; writing—original draft preparation, C.-H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science and Technology Council grant contract NSTC 113-2221-E-227-004-MY2.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The experimental dataset is openly available at the Kaggle repository.

Acknowledgments

This study was supported by the National Science and Technology Council, Taiwan, and the National Taipei University of Nursing and Health Sciences, Taiwan.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Santos, L.I.; Camargos, M.O.; D’Angelo, M.F.S.V.; Mendes, J.B.; De Medeiros, E.E.C.; Guimarães, A.L.S.; Palhares, R.M. Decision tree and artificial immune systems for stroke prediction in imbalanced data. Expert Syst. Appl. 2022, 191, 116221. [Google Scholar] [CrossRef]
  2. World Health Organization. Stroke, Cerebrovascular Accident. Available online: https://www.emro.who.int/health-topics/stroke-cerebrovascular-accident/index.html (accessed on 13 July 2025).
  3. Chaudhari, A.; Rajadhyaksha, A.; Patil, S.; Pawar, H. CNN and GAN Based Stroke Detection Using CT Scan Images. Int. J. Image Graph. Signal Process. 2025, 17, 94–105. [Google Scholar]
  4. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
  5. Ding, H.; Chen, L.; Dong, L.; Fu, Z.; Cui, X. Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection. Future Gener. Comput. Syst. 2022, 131, 240–254. [Google Scholar] [CrossRef]
  6. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 139–144. [Google Scholar]
  7. Frid-Adar, M.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. Synthetic data augmentation using GAN for improved liver lesion classification. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 289–293. [Google Scholar]
  8. Saad, M.M.; O’Reilly, R.; Rehmani, M.H. A survey on training challenges in generative adversarial networks for biomedical image analysis. Artif. Intell. Rev. 2024, 57, 19. [Google Scholar] [CrossRef]
  9. Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar] [CrossRef]
  10. Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
  11. Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8110–8119. [Google Scholar]
  12. Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training generative adversarial networks with limited data. Adv. Neural Inf. Process. Syst. 2020, 33, 12104–12114. [Google Scholar]
  13. Karras, T.; Aittala, M.; Laine, S.; Härkönen, E.; Hellsten, J.; Lehtinen, J.; Aila, T. Alias-free generative adversarial networks. Adv. Neural Inf. Process. Syst. 2021, 34, 852–863. [Google Scholar]
  14. Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. StyleGAN2-ADA-PyTorch. Available online: https://github.com/NVlabs/stylegan2-ada-pytorch (accessed on 13 July 2025).
  15. Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. StyleGAN3: Official PyTorch Implementation. Available online: https://github.com/NVlabs/stylegan3/blob/main/docs/configs.md (accessed on 13 July 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.