Conditional Tabular Generative Adversarial Network Based Clinical Data Augmentation for Enhanced Predictive Modeling in Chronic Kidney Disease Diagnosis
Abstract
1. Introduction
2. Literature Review
| Study | Best Performing Model | Dataset | Accuracy | Limitations |
|---|---|---|---|---|
| [6] | XGBoost | UCI’s CKD dataset | 99.2% | Black box nature of tree-based models, Limited dataset |
| [8] | Linear Support Vector Machines | UCI’s CKD dataset, TCIA dataset | 98.86% | Small datasets, Limited exploration of different models |
| [9] | Soft Voting | UCI’s CKD dataset | 99% | Limited dataset |
| [10] | Ensemble of CNN-Adamax, LSTM-Adam, LSTM-BLSTM | Taiwan’s NHIRD database | 98% | Strenuous data collection process, Computational complexity because of numerous features |
| [11] | Random Forest, AdaBoost | UCI’s CKD dataset | 100% | Blackbox nature of tree-based models, limited dataset, perfect overfitting. |
| [12] | Random Forest, Decision Tree, Gradient Boost, XGBoost | UCI’s CKD dataset | 100% | Black box nature of tree-based models, limited dataset, perfect overfitting. |
| [13] | XGBoost | Tawam Hospital’s medical records | 93.29% | Limited dataset |
| [14] | XGBoost | Tawam Hospital’s medical records | 95% | Limited dataset |
| [22] | Random Forest Classifier | UCI’s CKD dataset | 96% | Black box nature of tree-based models, Limited dataset |
| [25] | Random Forest | UCI’s CKD dataset | 99.16% | Black box nature of tree-based models, Limited dataset |
| [26] | Multiclass Decision Forest | UCI’s CKD dataset | 99.1% | Black box nature of tree-based models, Limited dataset |
| [27] | Decision Tree | UCI’s CKD dataset | 91% | Black box nature of tree-based models, Limited dataset |
| [28] | Support Vector Machines | UCI’s CKD dataset | 99.3% | Limited dataset |
3. Methodology
3.1. Data Processing
3.2. Data Augmentation
- Generator: The generator network, typically composed of multiple neural network layers, is tasked with generating fake samples that resemble the original data as closely as possible. More formally, Generator learns a distribution pg over the training data x. The generator initially takes random noise z as input sampled from a prior distribution pz(z). It then learns to model this noise into samples that closely resemble the distribution of training data.
- Discriminator: The discriminator is a binary classifier that distinguishes between real and synthetic data. As time progresses, the discriminator becomes better at distinguishing between samples from the original dataset and those synthesized by the generator. Formally, D(x) represents the probability of x coming from the training distribution or pg. D is trained to assign low probabilities to the samples from pg and higher probabilities for the samples from the real data.
Adversarial Process
- Conditional Generation: CTGAN conditions the generated data according to specific classes with their respective distributions. This enables CTGANs to model relationships across the distributions of various classes and, consequently, generate more realistic synthetic data.
- Mode-Specific Normalization: In the approach, for each continuous-valued column, all its modes are computed along with their probability densities. Then, for a specific row, the mode with the highest probability distribution is determined for that numerical feature and is represented using a concatenation of a one-hot vector that indicates which mode the row corresponds to and a scalar that is determined by normalizing the value of the feature using the chosen mode. Therefore, each row is represented as a concatenation of scalars and one-hot vectors, because categorical features are already represented using one-hot vectors.
- Training-by-sampling: Owing to the imbalances presented in the categorical columns, an accurate representation of the minority classes in the generator’s distribution becomes a challenge. To tackle this, CTGAN employs a method called training-by-sampling, which involves sampling from both the real and conditional generator distributions so that the discriminator becomes adept at effectively estimating the distance between the two samples.
3.3. Model Training for Classification
- K-Nearest Neighbors (KNN): It works by calculating the distance between the data point and the closest cluster with a defined class and assigning the label of that class to the current data point, as in Equation (2).
- Decision Tree Classifier (DTC): The Decision Tree Classifier (DTC) constructs a hierarchical tree structure where each internal node represents a decision based on a feature, leading to the assignment of class labels at the leaf nodes in Equations (3) and (4).
- is the information gain achieved by splitting set based on attribute
- is the entropy of set (measure of uncertainty)
- is the subset of containing data points with value for attribute
- is the Gini impurity of set
- is the proportion of data points in class within set
- Gradient Boosting: Gradient boosting is a powerful ensemble method that sequentially builds a series of decision trees to correct the errors of preceding trees. By minimizing a predefined loss function, gradient boosting iteratively adds shallow trees to the ensemble, optimizing performance. Gradient boosting relies on specific loss functions depending on the task (classification or regression) in Equation (5).
- L(y, f(x)) is the binary cross-entropy loss function
- is the true value for data point
- is the predicted value for data point by the ensemble model.
- Support Vector Machine (SVM): Support Vector Machine (SVM) works by finding the hyperplane that best separates classes in the feature space in Equations (6) and (7).
- is the decision function that predicts the class of input .
- w is the weight vector.
- x is the input vector.
- b is the bias term.
- are the Lagrange multipliers.
- yi are the class labels.
- K(xi, x) is the kernel function that computes the inner product of vectors and x in the transformed feature space.
Accuracy Metrics
- TN = True Negatives
- FN = False Negatives
- FP = False Positives
3.4. Exploratory Data Analysis
Synthetic Data Generation
- t-distributed Stochastic Neighbor Embedding (t-SNE): This is an unsupervised, non-linear method for dimensionality reduction. t-SNE is proficient at modeling non-linear relationships and is employed here to condense the high-dimensional feature space into two dimensions for display purposes. This is especially useful for mapping data clusters, offering insight into whether the synthetic data maintains the local structure of the original dataset.
- Principal Component Analysis (PCA): This technique emphasizes the modeling of linear relationships in the data by converting highly correlated aspects into a reduced set of uncorrelated features, referred to as principal components. PCA, although less effective than t-SNE for intricate, non-linear structures, is proficient at detecting global outliers and verifying whether the synthetic dataset significantly diverges from the primary linear trends of the original data.
4. Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shrestha, N.; Gautam, S.; Mishra, S.R.; Virani, S.S.; Dhungana, R.R. Burden of chronic kidney disease in the general population and high-risk groups in South Asia: A systematic review and meta-analysis. PLoS ONE 2021, 16, e0258494. [Google Scholar] [CrossRef] [PubMed]
- Ma, X.; Liu, R.; Xi, X.; Zhuo, H.; Gu, Y. Global burden of chronic kidney disease due to diabetes mellitus, 1990–2021, and projections to 2050. Front. Endocrinol. 2025, 16, 1513008. [Google Scholar] [CrossRef] [PubMed]
- Rubini, L.S.P.; Eswaran, P.; UCI Machine Learning Repository. Chronic Kidney Disease. 2015. Available online: https://archive.ics.uci.edu/dataset/336/chronic+kidney+disease (accessed on 27 November 2025).
- Waskom, M.L. Seaborn: Statistical data visualization. J. Open-Source Softw. 2021, 6, 3021. [Google Scholar] [CrossRef]
- Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional GAN. In Advances in Neural Information Processing Systems 32; Curran Associates: Red Hook, NY, USA, 2019; pp. 7335–7345. [Google Scholar]
- Islam, M.A.; Majumder, M.Z.H.; Hussein, M.A. Chronic kidney disease prediction based on machine learning algorithms. J. Pathol. Inform. 2023, 14, 100189. [Google Scholar] [CrossRef] [PubMed]
- Chowdhury, M.N.H.; Reaz, M.B.I.; Ali, S.H.M.; Crespo, M.L.; Ahmad, S.; Salim, G.M.; Haque, F.; Ordóñez, L.G.G.; Islam, J.; Mahdee, T.M.; et al. Deep learning for early detection of chronic kidney disease stages in diabetes patients: A TabNet approach. Artif. Intell. Med. 2025, 166, 103153. [Google Scholar] [CrossRef]
- Chittora, P.; Chaurasia, S.; Chakrabarti, P.; Kumawat, G.; Chakrabarti, T.; Leonowicz, Z.; Jasinski, M.; Jasinski, L.; Gono, R.; Jasinska, E.; et al. Prediction of chronic kidney disease—A machine learning perspective. IEEE Access 2021, 9, 17312–17334. [Google Scholar] [CrossRef]
- Dritsas, E.; Trigka, M. Machine learning techniques for chronic kidney disease risk prediction. Big Data Cogn. Comput. 2022, 6, 98. [Google Scholar] [CrossRef]
- Saif, D.; Sarhan, A.M.; Elshennawy, N.M. Early prediction of chronic kidney disease based on an ensemble of deep learning models and optimizers. J. Electr. Syst. Inf. Technol. 2024, 11, 17. [Google Scholar] [CrossRef]
- Halder, R.K.; Uddin, M.N.; Uddin, A.M.; Aryal, S.; Saha, S.; Hossen, R.; Ahmed, S.; Rony, M.A.T.; Akter, F. ML-CKDP: Machine learning-based chronic kidney disease prediction with smart web application. J. Pathol. Inform. 2024, 15, 100371. [Google Scholar] [CrossRef]
- Hema, K.; Meena, K.; Pandian, R. Analyze the impact of feature selection techniques in the early prediction of CKD. Int. J. Cogn. Comput. Eng. 2024, 5, 66–77. [Google Scholar] [CrossRef]
- Ghosh, S.K.; Khandoker, A.H. Investigation on explainable machine learning models to predict chronic kidney diseases. Sci. Rep. 2024, 14, 3687. [Google Scholar] [CrossRef]
- Zheng, J.X.; Li, X.; Zhu, J.; Guan, S.Y.; Zhang, S.X.; Wang, W.M. Interpretable machine learning for predicting chronic kidney disease progression risk. Digit. Health 2024, 10, 20552076231224225. [Google Scholar] [CrossRef] [PubMed]
- Pedregosa, F.; Michel, V.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; VanderPlas, J.; Cournapeau, D.; Varoquaux, G.; Gramfort, A.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
- Binu, S.K.; Devi, R. Adaptive synthetic sampling with generative adversarial networks (AS-GAN) for predicting chronic kidney disease on unbalanced data. In Proceedings of the 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC 2024), Tumkuru, India, 4–5 December 2024; pp. 1–6. [Google Scholar]
- Cascella, M.; Scarpati, G.; Bignami, E.G.; Cuomo, A.; Vittori, A.; Di Gennaro, P.; Crispo, A.; Coluccia, S. Utilizing an artificial intelligence framework (conditional generative adversarial network) to enhance telemedicine strategies for cancer pain management. J. Anesth. Analg. Crit. Care 2023, 3, 19. [Google Scholar] [CrossRef]
- Rao, P.K.; Chatterjee, S. TabNet to identify risks in chronic kidney disease using GAN’s synthetic data. In Proceedings of the 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS 2022), Tashkent, Uzbekistan, 10–12 October 2022; pp. 209–215. [Google Scholar]
- Tian, G.; Rehman, A.; Xing, H.; Feng, L.; Gulzar, N.; Hussain, A. Automatic intelligent chronic kidney disease detection in Healthcare 5.0. In Proceedings of the IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom 2023), Exeter, UK, 1–3 November 2023; pp. 2134–2140. [Google Scholar]
- Kannan, M.; Umamaheswari, D.; Manimekala, B.; Mary, I.P.S.; Savitha, P.M.; Rozario, J. An enhancement of machine learning model performance in disease prediction with synthetic data generation. Sci. Rep. 2025, 15, 33482. [Google Scholar] [CrossRef]
- Kaur, C.; Kumar, M.S.; Anjum, A.; Binda, M.B.; Mallu, M.R.; Ansari, M.S.A. Chronic kidney disease prediction using machine learning. J. Adv. Inf. Technol. 2023, 14, 384–391. [Google Scholar] [CrossRef]
- Kuo, N.I.; Gallego, B.; Jorm, L. Attention-based synthetic data generation for calibration-enhanced survival analysis: A case study for chronic kidney disease using electronic health records. arXiv 2025, arXiv:2503.06096. [Google Scholar] [CrossRef]
- Liu, K.; Altman, R.B. Conditional generative models for synthetic tabular data: Applications for precision medicine and diverse representations. Annu. Rev. Biomed. Data Sci. 2025, 8, 21–49. [Google Scholar] [CrossRef]
- Revathy, S.; Bharathi, B.; Jeyanthi, P.; Ramesh, M. Chronic kidney disease prediction using machine learning models. Int. J. Eng. Adv. Technol. 2019, 9, 6364–6367. [Google Scholar] [CrossRef]
- Gunarathne, W.H.S.D.; Perera, K.D.M.; Kahandawaarachchi, K.A.D.C.P. Performance evaluation on machine learning classification techniques for disease classification and forecasting through data analytics for chronic kidney disease (CKD). In Proceedings of the IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE 2017), Washington, DC, USA, 23–25 October 2017; pp. 291–296. [Google Scholar]
- Anantha Padmanaban, K.R.; Parthiban, G. Applying machine learning techniques for predicting the risk of chronic kidney disease. Indian J. Sci. Technol. 2016, 9. [Google Scholar] [CrossRef]
- Swain, D.; Mehta, U.; Bhatt, A.; Patel, H.; Patel, K.; Mehta, D.; Acharya, B.; Gerogiannis, V.C.; Kanavos, A.; Manika, S. A robust chronic kidney disease classifier using machine learning. Electronics 2023, 12, 212. [Google Scholar] [CrossRef]
- Irianto, S.Y.; Karnila, S.; Hasibuan, M.S.; Dewi, D.A.; Kurniawan, T.B.; Kurniawan, H. Progressive massive fibrosis detection using generative adversarial networks and long short-term memory. J. Appl. Data Sci. 2025, 6, 2298–2311. [Google Scholar] [CrossRef]
- Zafar, R.; Rehman, I.U.; Shah, Y.; Ming, L.C.; Goh, K.W.; Suleiman, A.K.; Khan, T.M. Impact of pharmacist-led intervention for reducing drug-related problems and improving quality of life among chronic kidney disease patients: A randomized controlled trial. PLoS ONE 2025, 20, e0317734. [Google Scholar] [CrossRef]
- Towards Data Science. GANs for Tabular Data. Available online: https://towardsdatascience.com/review-of-gans-for-tabular-data-a30a2199342 (accessed on 31 March 2024).










| Attribute | Meaning | Category | Scale | Missing |
|---|---|---|---|---|
| age | Age | Numerical | Years | 9 |
| bp | Blood Pressure | Numerical | mm/Hg | 12 |
| sg | Specific gravity | Nominal | 1.005 to 1.025 | 47 |
| all | Albumin | Nominal | 0 to 5 | 46 |
| su | Sugar | Nominal | 0 to 5 | 49 |
| rbc | Red blood cells | Nominal | Abnormal, Normal | 152 |
| pc | white blood cell | Nominal | Abnormal, Normal | 65 |
| pcc | white blood cell clumps | Nominal | Not present, Present | 4 |
| ba | Bacteria | Nominal | Not present, Present | 4 |
| bgr | Blood glucose random | Numerical | mg/dL | 44 |
| but | Blood urea | Numerical | mg/dL | 19 |
| sc | Serum creatinine | Numerical | mg/dL | 17 |
| sod | Sodium | Numerical | mEq/L | 87 |
| pot | Potassium | Numerical | mEq/L | 88 |
| hemo | Hemoglobin | Numerical | gms | 52 |
| pcv | Packed cell volume | Numerical | P cv | 71 |
| wc | White blood cell count | Numerical | cells/cum | 106 |
| rc | Red blood cell count | Nominal | millions/cmm | 131 |
| htn | Hypertension | Nominal | No, Yes | 2 |
| dm | Diabetes mellitus | Nominal | No, Yes | 2 |
| cad | Coronary artery disease | Nominal | No, Yes | 2 |
| appet | Appetite | Nominal | Poor, Good | 1 |
| pe | Pedal edema | Nominal | No, Yes | 1 |
| and | Anemia | Nominal | No, Yes | 1 |
| Classification | Class | Nominal | Not CKD, CKD | 0 |
| Feature Type | Imputation Method | Specific Application |
|---|---|---|
| Numerical | Random Sampling | Null values were replaced with a randomly selected existing value from the same feature, preserving the feature’s distributional shape. |
| Categorical | Random Sampling | Applied to columns with a higher proportion of missing values (e.g., red_blood_cells, pus_cells) to avoid skewing the category distribution toward the mode. |
| Categorical | Mode Substitution | Applied to the remaining categorical features, null values were filled with the most frequent category. |
| Model | Dataset | Original Data (%) | Scaled Data (%) |
|---|---|---|---|
| K-Nearest Neighbors | Train | 100 (k = 1) | 99.64 (k = 2) |
| Test | 79.16 (k = 1) | 99.167 (k = 2) | |
| Decision Tree | Train | 100 | 100 |
| Test | 96.67 | 96.67 | |
| Decision Tree with Tuning | Train | 98.21 | 98.21 |
| Test | 97.5 | 97.5 | |
| Random Forest | Train | 100 | 100 |
| Test | 98.33 | 98.33 | |
| Ada Boost | Train | 100 | 98.93 |
| Test | 100 | 99.167 | |
| Gradient Boosting | Train | 100 | 100 |
| Test | 97.5 | 97.5 | |
| Stochastic Gradient Boosting | Train | 100 | 100 |
| Test | 96.67 | 96.67 | |
| XGBoost | Train | 100 | 100 |
| Test | 99.167 | 96.67 | |
| Categorical Boost | Train | 100 | 100 |
| Test | 97.5 | 97.5 | |
| Extra Trees Classifier | Train | 97.86 | 97.86 |
| Test | 99.167 | 99.167 | |
| LGBM | Train | 100 | 100 |
| Test | 99.167 | 99.167 | |
| SVM | Train | 96.07 | 98.57 |
| Test | 96.67 | 99.167 |
| Synthetic Sample Size | Column Shapes | Column Pair Trends | Overall Score |
|---|---|---|---|
| 200 | 92.09% | 84.23% | 88.16% |
| 500 | 91.82% | 85.3% | 88.56% |
| 800 | 92.67% | 87.56% | 90.11% |
| 1000 | 92.33% | 87.49% | 89.91% |
| 2000 | 92.48% | 88.56% | 90.52% |
| Dataset | Model | Training Accuracy | Test Accuracy | Precision | Recall | F1-Score | Support | Confusion Matrix |
|---|---|---|---|---|---|---|---|---|
| Six hundred Samples (67% dilution) | KNN | 86.9 | 85.55 | 86 | 86 | 85 | 180 | [[119, 4] [22, 35]] |
| DTC | 100 | 94.44 | 94 | 94 | 94 | 180 | [[120, 3] [7, 50]] | |
| DTC (tuned) | 99.76 | 95.56 | 96 | 96 | 96 | 180 | [[120, 3] [5, 52]] | |
| Random Forest | 99.76 | 97.78 | 98 | 98 | 98 | 180 | [[123, 0] [4, 53]] | |
| Ada Boost | 100 | 96.67 | 97 | 97 | 97 | 180 | [[123, 0] [6, 51]] | |
| Gradient Boosting | 100 | 97.78 | 98 | 98 | 98 | 180 | [[123, 0] [4, 53]] | |
| Stochastic Gradient Boosting | 100 | 97.78 | 98 | 98 | 98 | 180 | [[123, 0] [4, 53]] | |
| XGBoost | 100 | 97.78 | 98 | 98 | 98 | 180 | [[123, 0] [4, 53]] | |
| Categorical Boosting | 100 | 97.78 | 98 | 98 | 98 | 180 | [[123, 0] [4, 53]] | |
| Extra Trees | 98.57 | 97.22 | 97 | 97 | 97 | 180 | [[122, 1] [4, 53]] | |
| LGBM | 100 | 97.78 | 98 | 98 | 98 | 180 | [[123, 0], [4, 53]] | |
| SVM | 94.52 | 92.22 | 92 | 92 | 92 | 180 | [[114, 9] [5, 52]] | |
| Nine hundred Samples (44% dilution) | KNN | 80.32 | 81.11 | 82 | 81 | 81 | 270 | [[154, 29] [22, 65]] |
| DTC | 100 | 94.07 | 94 | 94 | 94 | 270 | [[176, 7] [9, 78]] | |
| DTC (tuned) | 96.35 | 92.22 | 93 | 92 | 92 | 270 | [[169 14] [7 80]] | |
| Random Forest | 100 | 96 | 96 | 96 | 96 | 270 | [[178 5] [6 81]] | |
| Ada Boost | 100 | 94.44 | 94 | 94 | 94 | 270 | [[175 8] [7 80]] | |
| Gradient Boosting | 100 | 96.29 | 96 | 96 | 96 | 270 | [[176 7] [3 84]] | |
| Stochastic Gradient Boosting | 100 | 95.55 | 96 | 96 | 96 | 270 | [[176 7] [5 82]] | |
| XGBoost | 100 | 97.04 | 97 | 97 | 97 | 270 | [[179 4] [4 83]] | |
| Categorical Boosting | 99.84 | 96.3 | 96 | 96 | 96 | 270 | [[177 6] [4 83]] | |
| Extra Trees | 97.14 | 95.2 | 95 | 95 | 95 | 270 | [[173 10] [3 84]] | |
| LGBM | 100 | 97.04 | 97 | 97 | 97 | 270 | [[177 6] [2 85]] | |
| SVM | 94.44 | 95.55 | 96 | 96 | 96 | 270 | [[173 10] [2 85]] | |
| 1200 Samples (33% dilution) | KNN | 82.38 | 79.44 | 80 | 79 | 80 | 360 | [[197 42] [32 89]] |
| DTC | 100 | 91.11 | 91 | 91 | 91 | 360 | [[218 21] [11 110]] | |
| DTC (tuned) | 94.3 | 93.33 | 93 | 93 | 93 | 360 | [[225 14] [10 111]] | |
| Random Forest | 99.76 | 96.94 | 97 | 97 | 97 | 360 | [[235 4] [7 114]] | |
| Ada Boost | 100 | 96.11 | 96 | 96 | 96 | 360 | [[230 9] [5 116]] | |
| Gradient Boosting | 100 | 96.94 | 97 | 97 | 97 | 360 | [[233 6] [5 116]] | |
| Stochastic Gradient Boosting | 100 | 96.67 | 97 | 97 | 97 | 360 | [[233 6] [6 115]] | |
| XGBoost | 100 | 96.39 | 96 | 96 | 96 | 360 | [[231 8] [5 116]] | |
| Categorical Boosting | 98.81 | 96.11 | 96 | 96 | 96 | 360 | [[231 8] [6 115]] | |
| Extra Trees | 95.83 | 95.56 | 96 | 96 | 96 | 360 | [[229 10] [6 115]] | |
| LGBM | 100 | 96.39 | 96 | 96 | 96 | 360 | [[230 9] [4 117]] | |
| SVM | 95.24 | 94.72 | 95 | 95 | 95 | 360 | [[227 12] [7 114]] | |
| 1400 Samples (29% dilution) | KNN | 79.59 | 80.47 | 80 | 80 | 80 | 420 | [[273 26] [56 65]] |
| DTC | 100 | 93.81 | 94 | 94 | 94 | 420 | [[284 15] [11 10]] | |
| DTC (tuned) | 96.22 | 94.29 | 94 | 94 | 94 | 420 | [[289 10] [14 107]] | |
| Random Forest | 99.89 | 95.48 | 95 | 95 | 95 | 420 | [[290 9] [10 111]] | |
| Ada Boost | 99.8 | 94.76 | 95 | 95 | 95 | 420 | [[286 13] [9 112]] | |
| Gradient Boosting | 100 | 95.47 | 95 | 95 | 95 | 420 | [[289 10] [9 112]] | |
| Stochastic Gradient Boosting | 100 | 95.24 | 95 | 95 | 95 | 420 | [[288 11] [9 112]] | |
| XGBoost | 100 | 95 | 95 | 95 | 95 | 420 | [[287 12] [9 112]] | |
| Categorical Boosting | 98.88 | 95.48 | 96 | 95 | 96 | 420 | [[287 12] [7 114]] | |
| Extra Trees | 96.22 | 94.52 | 95 | 95 | 95 | 420 | [[285 14] [9 112]] | |
| LGBM | 100 | 95.95 | 96 | 96 | 96 | 420 | [[290 9] [8 113]] | |
| SVM | 93.88 | 94.52 | 95 | 95 | 95 | 420 | [[286 13] [10 111]] | |
| 2400 Samples (17% dilution) | KNN | 79.94 | 77.64 | 77 | 78 | 77 | 720 | [[449 55] [106 110]] |
| DTC | 100 | 92.08 | 92 | 92 | 92 | 720 | [[475 29] [28 188]] | |
| DTC (tuned) | 94.64 | 93.47 | 94 | 93 | 93 | 720 | [[478 26] [21 195]] | |
| Random Forest | 98.57 | 95.42 | 95 | 95 | 95 | 720 | [[486 18] [15 201]] | |
| Ada Boost | 95.24 | 95.69 | 96 | 96 | 96 | 720 | [[485 19] [12 204]] | |
| Gradient Boosting | 97.26 | 95.14 | 95 | 95 | 95 | 720 | [[483 21] [14 202]] | |
| Stochastic Gradient Boosting | 100 | 95.27 | 95 | 95 | 95 | 720 | [[485 19] [15 201]] | |
| XGBoost | 100 | 94.72 | 95 | 95 | 95 | 720 | [[481 23] [15 201]] | |
| Categorical Boosting | 96.96 | 94.86 | 95 | 95 | 95 | 720 | [[482 22] [15 201]] | |
| Extra Trees | 94.17 | 95 | 95 | 95 | 95 | 720 | [[481 23] [13 203]] | |
| LGBM | 100 | 94.58 | 95 | 95 | 95 | 720 | [[487 17] [22 194]] | |
| SVM | 93.45 | 93.9 | 94 | 94 | 94 | 720 | [[479 25] [19 197]] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Randhawa, P.; Jasthi, V.N.; Piyush, K.; Kaushik, G.K.; Batamulay, M.; Prasad, S.N.; Rawat, M.; Veernapu, K.; Naik, N. Conditional Tabular Generative Adversarial Network Based Clinical Data Augmentation for Enhanced Predictive Modeling in Chronic Kidney Disease Diagnosis. BioMedInformatics 2026, 6, 6. https://doi.org/10.3390/biomedinformatics6010006
Randhawa P, Jasthi VN, Piyush K, Kaushik GK, Batamulay M, Prasad SN, Rawat M, Veernapu K, Naik N. Conditional Tabular Generative Adversarial Network Based Clinical Data Augmentation for Enhanced Predictive Modeling in Chronic Kidney Disease Diagnosis. BioMedInformatics. 2026; 6(1):6. https://doi.org/10.3390/biomedinformatics6010006
Chicago/Turabian StyleRandhawa, Princy, Veerendra Nath Jasthi, Kumar Piyush, Gireesh Kumar Kaushik, Malathy Batamulay, S. N. Prasad, Manish Rawat, Kiran Veernapu, and Nithesh Naik. 2026. "Conditional Tabular Generative Adversarial Network Based Clinical Data Augmentation for Enhanced Predictive Modeling in Chronic Kidney Disease Diagnosis" BioMedInformatics 6, no. 1: 6. https://doi.org/10.3390/biomedinformatics6010006
APA StyleRandhawa, P., Jasthi, V. N., Piyush, K., Kaushik, G. K., Batamulay, M., Prasad, S. N., Rawat, M., Veernapu, K., & Naik, N. (2026). Conditional Tabular Generative Adversarial Network Based Clinical Data Augmentation for Enhanced Predictive Modeling in Chronic Kidney Disease Diagnosis. BioMedInformatics, 6(1), 6. https://doi.org/10.3390/biomedinformatics6010006

