Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net
Abstract
1. Introduction
- Hybrid Architecture Design: This is the first framework to systematically combine hierarchical transformer encoders with gradient-boosting ensembles for multi-stage CKD prediction. Unlike prior work that uses either deep learning OR ensemble methods, our adaptive fusion mechanism (Equations (14) and (15)) dynamically weights predictions based on learned gating, enabling the model to leverage the transformer’s capacity for complex pattern recognition and gradient boosting’s interpretable decision boundaries simultaneously.
- Confidence-Aware Ensemble: While Monte Carlo dropout for uncertainty quantification is standard, our integration into the ensemble prediction layer creates a confidence-weighted prediction system where uncertainty estimates directly inform clinical decision-making. The confidence score computation (Equation (18)) enables automated triage: high-confidence predictions proceed to standard care pathways while low-confidence cases trigger expert review.
- Multi-Resolution Feature Fusion: The preprocessing pipeline (Equations (1)–(3) combines multiple imputation strategies (KNN, MICE, median) selected based on feature-target correlation, rather than applying uniform imputation. This adaptive approach prevents information loss while maintaining statistical validity.
- End-to-End Clinical Integration: Unlike prior systems that bolt interpretability as a post-hoc analysis, SHAP computation (Equation (23)) is embedded within the forward pass, enabling real-time feature importance generation during inference—critical for point-of-care decision support.
- Novel Hybrid Architecture: We propose DeepCKD-Net, the first framework to combine hierarchical transformer encoders with gradient-boosting ensembles specifically designed for multi-stage CKD prediction, achieving superior performance through synergistic integration of complementary learning paradigms.
- Adaptive Feature Fusion: We introduce an innovative feature fusion mechanism that dynamically combines temporal patterns from sequential biomarker measurements with cross-sectional clinical data, enabling comprehensive patient state representation.
- Confidence-Aware Ensemble: We develop a novel ensemble strategy that incorporates uncertainty quantification to weight individual model predictions, significantly improving reliability in borderline cases where traditional methods often fail.
- Clinical Interpretability: We integrate SHAP-based explanation modules directly into the architecture, providing real-time feature importance analysis that enables clinicians to understand and validate model decisions.
- Comprehensive Evaluation: We conduct extensive experiments on the UCI CKD dataset and validate our approach against 15 state-of-the-art methods, demonstrating consistent superiority across multiple evaluation metrics.
- Subtle Early-Stage Biomarker Changes: Stage 1–2 CKD presents with minimal laboratory abnormalities—serum creatinine may be only 0.1–0.3 mg/dL above normal range, blood urea marginally elevated, and eGFR between 60–89 mL/min/1.73 m2. These subtle deviations, often within inter-laboratory measurement variability, are easily missed by threshold-based classification systems yet represent the critical window for intervention before irreversible kidney damage occurs.
- Complex Feature Interdependencies: CKD biomarkers do not operate independently. For example, elevated serum creatinine combined with low hemoglobin suggests advanced disease with anemia of chronic kidney disease, while isolated creatinine elevation might reflect acute dehydration. Similarly, proteinuria severity depends on blood pressure control, diabetes status, and medication use. These multi-way interactions create a high-dimensional feature space where linear models and simple decision trees fail to capture the complex decision boundaries.
- Heterogeneous Disease Presentation: CKD manifests differently across etiologies (diabetic nephropathy, hypertensive nephrosclerosis, glomerulonephritis, polycystic kidney disease). Each subtype exhibits distinct biomarker patterns: diabetic CKD shows early proteinuria with preserved creatinine clearance, while hypertensive CKD shows gradual creatinine elevation with minimal proteinuria. Generic prediction models that do not account for this heterogeneity perform poorly on specific disease subtypes.
- Biomarker Temporal Dynamics: Static measurements fail to capture disease trajectory. A patient with stable creatinine of 1.4 mg/dL for 5 years has a vastly different prognosis than one whose creatinine rose from 1.0 to 1.4 mg/dL over 6 months. Current single-timepoint datasets preclude temporal pattern modeling, limiting prediction to cross-sectional analysis.
- Missing Data Challenges: Clinical datasets exhibit systematic missingness patterns—expensive tests (cystatin C, urine microscopy) are ordered selectively for high-risk patients, creating informative missingness that biases simple imputation methods. Approximately 5–15% of biomarker values are missing in real-world EHR data.
2. Related Work
2.1. Classical Machine Learning Approaches for CKD Detection
2.2. Deep Learning Innovations in Medical Diagnostics
2.3. Hybrid and Ensemble Learning Strategies
2.4. Interpretability and Clinical Integration
2.5. Research Gap and Motivation
3. Proposed Methodology
3.1. System Overview
3.2. Feature Engineering and Preprocessing
- Dataset Split (performed first, before any preprocessing):
- Training set: 70% (280 patients);
- Validation set: 15% (60 patients);
- Test set: 15% (60 patients);
- Stratified sampling maintained class distribution: CKD positive (62.5%), CKD negative (37.5%) in all splits.
- Normalization (Equation (1)):
- Mean μ(X) and standard deviation σ(X) computed exclusively from the training set;
- Same μ and σ values applied to validation and test sets;
- ε = 1 × 10−8 added for numerical stability.
- Target Encoding (Equation (2)):
- For categorical features (e.g., diabetes, hypertension, albumin levels);
- Encoding probabilities P(y = 1|x_i) and P(y = 1) computed from training set only;
- Smoothing parameter α with m = 10 prevents overfitting to rare categories;
- -Encoded values applied consistently to validation/test sets.
- Missing Value Imputation (Equation (3)):
- Feature-target correlation ρ(x_j) computed from training set;
- For high correlation (ρ > 0.7): KNN imputation with k = 5 neighbors from training set only;
- For medium correlation (0.3 ≤ ρ ≤ 0.7): MICE algorithm fit on training set, applied to validation/test;
- For low correlation (ρ < 0.3): Median value from training set used for all splits
- Validation set: 4.8% missing values imputed;
- Test set: 4.8% missing values imputed;
- No information from validation or test sets used during imputation.
- Feature Engineering:
- No new features created from validation/test data;
- All transformations are deterministic given the training set parameters.
3.3. Hierarchical Transformer Encoder
3.4. Gradient-Boosting Ensemble Module
3.5. Adaptive Feature Fusion
3.6. Confidence-Aware Prediction
| Algorithm 1: DeepCKD-Net Training Procedure |
| Dataset, Hyperparameters Trained DeepCKD-Net model Initialize transformer encoder and boosting ensemble Preprocess dataset: Split data: Update parameters: Evaluate on Early stopping Fine-tune with confidence weighting on Trained model |
3.7. Loss Function Formulation
3.8. Interpretability Module
| Algorithm 2: Confidence-Aware Ensemble Prediction |
| Test sample ,
Trained model , MC iterations Prediction , Confidence , Explanation Initialize prediction accumulator Apply random dropout mask Compute entropy: L |
3.9. Optimization Strategy
- Learning Rate: Log-uniform distribution [1 × 10−5, 1 × 10−2];
- Batch Size: Categorical {16, 32, 64, 128};
- Transformer Layers: Integer uniform [2, 12];
- Attention Heads: Categorical {4, 8, 16} (constrained: must divide hidden_dim evenly);
- Hidden Dimension (d_model): Categorical {256, 512, 768, 1024};
- Feed-Forward Dimension (d_ff): Categorical {1024, 2048, 4096} (constrained: d_ff ≥ 2 × d_model);
- Dropout Rate: Uniform [0.1, 0.5];
- Number of Boosting Trees (M): Integer uniform [50, 200];
- Tree Depth: Integer uniform [3, 10];
- Boosting Learning Rate (γ): Log-uniform [0.01, 0.3];
- Weight Decay (L2 regularization λ): Log-uniform [1 × 10−5, 1 × 10−3];
- Focal Loss γ parameter: Uniform [1.0, 3.0];
- Focal Loss α (class balance): Uniform [0.25, 0.75].
- Primary metric: F1-score on validation set (to balance precision and recall);
- Early stopping: 10 epochs without improvement in validation F1;
- Each trial: trained for a maximum of 200 epochs.
- Best F1-score achieved: 98.6% (at iteration 67);
- Final 20 iterations showed <0.3% variation, indicating convergence;
- Total computational time: 18.5 h on NVIDIA A100 GPU.
- Learning Rate: 0.001;
- Batch Size: 32;
- Transformer Layers: 6;
- Attention Heads: 8;
- Hidden Dimension: 512;
- Dropout Rate: 0.3;
- Boosting Trees: 100;
- Tree Depth: 6;
- Weight Decay: 0.0001;
- Focal Loss γ: 2.0.
4. Results and Evaluation
4.1. Experimental Setup
- Same Dataset Split: All models were trained on an identical 70–15–15% train-validation-test split with a fixed random seed (42), ensuring every method was evaluated on the same 60 test patients.
- Identical Preprocessing: All baseline models used the same preprocessing pipeline (normalization, imputation, encoding) with statistics computed from the training set only.
- Libraries and Versions:
- SVM-RBF: scikit-learn 1.3.0, RBF kernel with C = 1.0, γ = ‘scale’;
- Random Forest: scikit-learn 1.3.0, n_estimators = 100, max_depth = None;
- XGBoost: xgboost 1.7.3, default parameters optimized via grid search;
- 1D-CNN: PyTorch 2.0, architecture: Conv1D(64)→ReLU→MaxPool→Conv1D(128)→ReLU→MaxPool→FC(256)→FC(2);
- LSTM: PyTorch 2.0, 2-layer bidirectional LSTM with 128 hidden units;
- TabTransformer: PyTorch 2.0, implemented following Huang et al. (2020) [35] with six layers and eight heads;
- CNN-XGBoost: Hybrid model combining 1D-CNN feature extraction with XGBoost classifier;
- Ensemble-Net: Voting ensemble combining RF, XGBoost, and SVM with soft voting;
- Meta-Learn: Stacking ensemble with XGBoost as meta-learner;
- Transfer-DL: Pre-trained transformer fine-tuned on CKD data;
- Attention-Net: Multi-head self-attention network with four layers;
- Graph-CKD: Graph neural network with patient similarity graph construction;
- Multi-Modal: Fusion of multiple feature representations;
- Knowledge-DL: Knowledge-distillation-based approach.
- Hyperparameter Optimization: Each baseline underwent separate Bayesian optimization (50 iterations) on the validation set to ensure fair comparison—no baseline used suboptimal default parameters.
- Evaluation Protocol: All models evaluated using identical metrics (accuracy, precision, recall, F1, AUC, specificity, MCC) on the same test set; tenfold cross-validation performed with identical fold assignments.
4.2. Performance Metrics
- Primary: Validation F1-score | Secondary: AUC (>0.98), decreasing loss;
- Early stopping: 10 epochs without F1 improvement;
- Final model: Highest validation F1 checkpoint (epoch 156).
- Phase 1 (1–20): Rapid learning—loss 0.82→0.15 (82% reduction), gradient norms 2.3→0.8;
- Phase 2 (21–100): Refinement—loss 0.15→0.06 (60% reduction), LR decays to 0.0008, gradients 0.8→0.3;
- Phase 3 (101–156): Fine-tuning—loss 0.06→0.048 (20% reduction), LR 0.0006, train-val gap 0.004;
- Phase 4 (157–200): Post-optimum—validation loss increases 0.052→0.055, early stopping prevents updates.
4.3. Classification Performance
- GPU: NVIDIA A100 40 GB;
- CPU: AMD EPYC 7742 (64 cores);
- RAM: 512 GB DDR4;
- Storage: NVMe SSD;
- Framework: PyTorch 2.0.0, CUDA 11.8.
- Inference Efficiency: 16.8 ms per prediction enables real-time clinical use. A typical outpatient clinic processing 50 patients/day would complete all CKD screenings in <1 s of cumulative computation time.
- Hardware Scalability: While training requires GPU acceleration, inference can run efficiently on CPU-only systems. Deployment tests on standard clinical workstations (Intel i7, 16 GB RAM) achieved 45 ms inference time, which is still suitable for point-of-care use.
- Memory Footprint: A 2.1 GB model size allows deployment on edge devices, including tablets and smartphones, enabling remote/rural telemedicine applications.
- Batch Processing: Healthcare systems can process entire patient databases overnight. Testing showed DeepCKD-Net processes 10,000 patient records in 2.8 min (batch size 256), making population-level screening computationally feasible.
- Training Cost-Benefit: While the 45.3 min training time exceeds simpler models, this is a one-time cost. Model updates require only periodic retraining (quarterly recommended for data drift monitoring), making the performance gain (7.4% accuracy improvement over the best baseline) cost-effective for clinical deployment.
4.4. ROC Analysis and Model Calibration
4.5. Feature Importance Analysis
4.6. Computational Efficiency
4.7. Cross-Validation Results
4.8. Robustness Analysis
5. Discussion
- Rigorous Cross-Validation: The tenfold stratified cross-validation with a mean accuracy of 98.5% (±0.8% standard deviation) demonstrates consistent performance across data partitions, suggesting the model has learned generalizable patterns rather than memorizing training examples.
- Conservative Train-Test Split: We employed a 70–15–15% split for training, validation, and test sets with stratified sampling to maintain class distribution, ensuring the test set remains completely independent.
- Regularization Techniques: L2 regularization (λ = 0.0001), dropout (rate = 0.3), and early stopping prevented overfitting, as evidenced by a minimal gap between training (99.1%) and validation (98.7%) accuracy.
5.1. Real-World Clinical Deployment
5.2. Regulatory Compliance and Explainability Requirements
- Feature Attribution via SHAP: The integrated SHAP module provides patient-specific feature importance scores that align with established clinical knowledge. For example, serum creatinine consistently emerges as the top predictor (SHAP value 0.342), matching its role as the gold standard in nephrology practice.
- Confidence Quantification: The Monte Carlo dropout mechanism generates uncertainty estimates that enable risk-stratified deployment. In regulatory contexts, predictions with confidence < 85% can be automatically flagged for mandatory expert review, creating a human-in-the-loop system that satisfies FDA guidance on clinical decision support software.
- Model Transparency: Unlike pure black-box models, the hybrid architecture allows clinicians to separately examine transformer-based predictions (capturing complex interactions) and gradient-boosting outputs (providing interpretable decision paths), facilitating regulatory audits and validation studies.
- High confidence (>95%): Automated screening results reported directly
- Medium confidence (85–95%): Results flagged for physician review
- Low confidence (<85%): Mandatory specialist consultation triggered This tiered approach optimizes clinical workflow efficiency while maintaining safety through appropriate human oversight.
5.3. Limitations
- Temporal scope: Single time-point measurements preclude longitudinal disease progression modeling
- Geographic limitation: Unknown patient origin prevents geographic generalizability assessment
- Missing demographics: Lack of race/ethnicity data prevents bias analysis
- Class imbalance: 62.5% CKD prevalence exceeds real-world screening populations (typically 10–15%)
- Static predictions: Current model cannot incorporate temporal trends—Biomarker-only focus: Ignores imaging, genetic, and social determinants
- Computational requirements: Training demands GPU resources unavailable in some settings
- Regulatory approval: Requires prospective validation trials for FDA/CE marking
- EHR integration: Needs custom interfaces for different healthcare systems
- Maintenance: Requires periodic retraining to address data drift
5.4. Future Research Directions
5.5. Limitations and Social Determinants Considerations
- Socioeconomic Status: Income level, education, occupation, and health insurance status affect access to preventive care and treatment adherence.
- Environmental Exposures: Geographic location-specific factors such as water quality, air pollution, industrial exposures, and agricultural pesticide use can contribute to kidney damage, particularly in rural and agricultural communities.
- Healthcare Access Barriers: Distance to healthcare facilities, availability of nephrology specialists, and cultural/linguistic barriers to care utilization are not captured in biomarker-only models.
- Population Representativeness: The UCI dataset lacks detailed demographic metadata regarding racial/ethnic diversity, geographic distribution, and representation of vulnerable populations including Indigenous communities, rural populations, and socioeconomically disadvantaged groups.
Biomarker Availability and Temporal Variability Challenges
- -
- Tier 1 (Basic): 10 features available in most primary care settings → 95.3% accuracy
- -
- Tier 2 (Standard): 18 features available in district hospitals → 97.2% accuracy
- -
- Tier 3 (Complete): All 26 features in tertiary centers → 98.7% accuracy
- Hydration Effects: Dehydration can temporarily elevate serum creatinine and blood urea nitrogen without reflecting true kidney function decline. Conversely, overhydration dilutes biomarker concentrations, potentially masking disease.
- Acute Illness: Infections, especially sepsis, can cause acute kidney injury superimposed on chronic disease, substantially altering biomarker profiles.
- Medication Effects: NSAIDs, ACE inhibitors, and contrast agents can acutely reduce GFR; diuretics affect electrolyte and volume status; antibiotics may cause interstitial nephritis.
- Dietary and Exercise Factors: High-protein diets elevate urea; intense exercise causes rhabdomyolysis affecting creatinine; fasting alters glucose and electrolyte patterns.
- -
- Temporal context requirements: Flagging predictions as “potentially unreliable” if blood samples were drawn during documented acute illness, recent hospitalization, or within 48 h of nephrotoxic medication administration
- -
- Longitudinal validation: Requiring confirmatory biomarker measurements 2–4 weeks apart for positive CKD predictions in asymptomatic patients
- -
- Clinical integration: Incorporating physician review of patient history, physical examination findings, and clinical context before finalizing diagnosis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Akter, S.; Ahmed, M.; Al Imran, A.; Habib, A.; Haque, R.U.; Rahman, M.S.; Hasan, M.R.; Mahjabeen, S. CKD.Net: A novel deep learning hybrid model for effective, real-time, automated screening tool towards prediction of multi stages of CKD along with eGFR. Expert Syst. Appl. 2023, 223, 119851. [Google Scholar] [CrossRef]
- Chandra Kumar, V.; Kalpana, R. Exploring Machine Learning Techniques for Enhanced Chronic Kidney Disease Diagnosis: A Comprehensive Survey. In Fuzzy Logic in Decision Making and Signal Processing; Springer: Cham, Switzerland, 2025. [Google Scholar] [CrossRef]
- Apiromrak, W.; Toh, C.; Sangthawan, P.; Ingviya, T. Prediction chronic kidney disease progression in diabetic patients using machine learning models. In Proceedings of the Software Engineering Conference, Phuket, Thailand, 19–22 June 2024. [Google Scholar] [CrossRef]
- Jose, J.; Fredrik, E.J. An Advanced GRU-CapsNet Framework for Accurate Chronic Kidney Disease Detection and Prediction. Int. J. Intell. Eng. Syst. 2025, 18, 828–844. [Google Scholar] [CrossRef]
- Lazaros, K.; Adam, S.; Krokidis, M.G.; Exarchos, T.; Vlamos, P.; Vrahatis, A.G. Non-invasive biomarkers in the era of big data and machine learning. Sensors 2025, 25, 1396. [Google Scholar] [CrossRef]
- Devasena, T.; Janani, M.; Karthick, R. Accurate Data Sampling Methods for Medical Data–Survey. In Proceedings of the Conference on Mobile Computing and Communications, Lalitpur, Nepal, 18–19 January 2024. [Google Scholar] [CrossRef]
- Al-Zoghby, A.; Ebada, A.I.; Saleh, A.; Abdelhay, M.; Awad, W. A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics. Comput. Mater. Contin. 2025, 84, 4155. [Google Scholar] [CrossRef]
- Upadhyay, M.N. Efficient Machine Learning Tree-Based Models for Recognition of Chronic Diseases Using Big Data Health Records. J. Glob. Res. Multidiscip. Stud. 2025, 1, 13–19. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, B. Advanced applications in chronic disease monitoring using IoT mobile sensing device data, machine learning algorithms and frame theory: A systematic review. Front. Public Health 2025, 13, 1510456. [Google Scholar] [CrossRef]
- Fang, P.; Wu, Y.; He, Y.; Li, H.; Guan, Z.; Wang, X.; Chen, T.; Shen, J. Research progress on AI-assisted screening and prediction of systemic diseases based on retinal images. Vis. Comput. 2025, 41, 9509–9537. [Google Scholar] [CrossRef]
- Zhang, Y.; Xu, J.; Zhang, C.; Zhang, X.; Yuan, X.; Ni, W.; Zhang, H.; Zheng, Y.; Zhao, Z. Community screening for dementia among older adults in China: A machine learning-based strategy. BMC Public Health 2024, 24, 1206. [Google Scholar] [CrossRef]
- Liu, X.; Tan, H.; Wang, W.; Chen, Z. Deep learning based retinal vessel segmentation and hypertensive retinopathy quantification using heterogeneous features cross-attention neural network. Front. Med. 2024, 11, 1377479. [Google Scholar] [CrossRef]
- Khanna, M.; Singh, L.K.; Thawkar, S.; Goyal, M. Deep learning based computer-aided automatic prediction and grading system for diabetic retinopathy. Multimed. Tools Appl. 2023, 82, 39255–39302. [Google Scholar] [CrossRef]
- Yang, Q.; Bee, Y.M.; Lim, C.C.; Sabanayagam, C.; Cheung, C.Y.L.; Wong, T.Y.; Ting, D.S.W.; Lim, L.L.; Li, H.T.; He, M.; et al. Use of artificial intelligence with retinal imaging in screening for diabetes-associated complications: Systematic review. Lancet 2025, 81, 103089. [Google Scholar] [CrossRef] [PubMed]
- Dahiya, N.; Prakash, D.; Kundu, S.; Kuttan, S.R.; Suwalka, I.; Ayadi, M.; Dubale, M.; Hashmi, A. Optimised RFO tuned RF-DETR model for precision urine microscopy for renal and systemic disease diagnosis. Sci. Rep. 2025, 15, 25842. [Google Scholar] [CrossRef]
- Verma, N.; Sharma, T.; Kaur, B. Explanation of Machine Learning Algorithms Used in Disease Detection, Such as Decision Trees and Neural Networks. In AI in Disease Detection: Advancements and Application; Wiley Online Library: Hoboken, NJ, USA, 2025. [Google Scholar] [CrossRef]
- Maqsood, F.; Wang, Z.; Ali, M.M.; Qiu, B.; Mahmood, T.; Sarwar, R. An efficient enhanced feature framework for grading of renal cell carcinoma using Histopathological Images. Appl. Intell. 2025, 55, 196. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, T.; He, X.; Wang, W.; He, Q.; Jin, J. Renal Biopsy Pathological Tissue Segmentation: A Comprehensive Review and Experimental Analysis. IEEE Access 2025, 13, 58008–58024. [Google Scholar] [CrossRef]
- Sufian, M.A.; Hamzi, W.; Hamzi, B.; Sagar, A.S.; Rahman, M.; Varadarajan, J.; Sufian, M.A.; Hamzi, W.; Hamzi, B.; Sagar, A.S.; et al. Innovative machine learning strategies for early detection and prevention of pregnancy loss: The vitamin D connection and gestational health. Diagnostics 2024, 14, 920. [Google Scholar] [CrossRef]
- Li, X.; Zhang, L.; Yang, J.; Teng, F. Role of artificial intelligence in medical image analysis: A review of current trends and future directions. J. Med. Biol. Eng. 2024, 44, 231–243. [Google Scholar] [CrossRef]
- Tan, J.; Ma, M.; Shen, X.; Xia, Y.; Qin, W. Potential lethality of organochlorine pesticides: Inducing fatality through inflammatory responses in the organism. Ecotoxicol. Environ. Saf. 2024, 279, 116508. [Google Scholar] [CrossRef]
- Bhimavarapu, U. Bias in AI-Driven Diabetes Prediction Models: Challenges, Impacts, and Mitigation Strategies. In AI-Powered Systems for Healthcare Diagnostics and Treatment; IGI Global: Hershey, PA, USA, 2025. [Google Scholar] [CrossRef]
- Yousefzamani, M.; Babapour Mofrad, F. Deep learning without borders: Recent advances in ultrasound image classification for liver diseases diagnosis. Expert Rev. Med. Devices 2025, 22, 827–843. [Google Scholar] [CrossRef]
- Gao, Y.; Lu, H.; Zhou, H.; Tan, J. Exploring the impact of polychlorinated biphenyls on comorbidity and potential mitigation strategies. Front. Public Health 2024, 12, 1474994. [Google Scholar] [CrossRef]
- Han, B.; Wang, Y.; Sun, X.; Li, H.; Lu, J.; Zhou, J.; Yu, X. HGMLA: A multi-task learning model for assessment of HbA1c and GA levels using short-term CGM sensor data. IEEE Sens. J. 2024, 24, 33633–33646. [Google Scholar] [CrossRef]
- Ghorishi, A.R.; Ogunfuwa, F.O.; Ghaddar, T.M.; Kandah, M.N.; Smith, B.W.; Ta, Q.; Alayon, A.; Amundson, P.K. Narrative review of open source, proprietary, and experimental artificial intelligence algorithms in radiology. J. Med. Artif. Intell. 2023, 6. Available online: https://jmai.amegroups.org/article/view/7717 (accessed on 1 November 2025). [CrossRef]
- Yagi, M.; Yamanouchi, K.; Fujita, N.; Funao, H.; Ebata, S. Revolutionizing spinal care: Current applications and future directions of artificial intelligence and machine learning. J. Clin. Med. 2023, 12, 4188. [Google Scholar] [CrossRef]
- Ahsan, M.; Naz, S.; Ehsan, H.; Gul, S.; Hadi, F.; Abdelhamid, A.; Khalifa, F. Multi-resolution multi-path shallow deep network for Diabetic Foot Ulcers identification. Biomed. Signal Process. Control 2025, 111, 108321. [Google Scholar] [CrossRef]
- Zhao, Z.; Hu, Y.; Xu, L.X.; Sun, J. Advancements in deep learning for image-guided tumor ablation therapies: A comprehensive review. Prog. Biomed. Eng. 2025, 7, 042005. [Google Scholar] [CrossRef] [PubMed]
- Gao, W.; Deng, Z.; Gong, Z.; Jiang, Z.; Ma, L. AI-driven prediction of insulin resistance in non-diabetic populations using minimal invasive tests: Comparing models and criteria. Diabetol. Metab. Syndr. 2025, 17, 338. [Google Scholar] [CrossRef]
- Shen, J.; Liu, N.; Sun, H.; Wu, S.; Liang, Z.; Han, L.; Zhang, Y.; Li, D. Lightweight Semantic Feature Extraction Model with Direction Awareness for Aerial Traffic Object Detection. IEEE Trans. Intell. Transp. Syst. 2025, 1–18. [Google Scholar] [CrossRef]
- Shen, J.; Zhou, W.; Liu, N.; Sun, H.; Li, D.; Zhang, Y. An Anchor-Free Lightweight Deep Convolutional Network for Vehicle Detection in Aerial Images. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24330–24342. [Google Scholar] [CrossRef]
- Shen, J.; Liu, N.; Sun, H.; Li, D.; Zhang, Y. An Instrument Indication Acquisition Algorithm Based on Lightweight Deep Convolutional Neural Network and Hybrid Attention Fine-Grained Features. IEEE Trans. Instrum. Meas. 2024, 73, 5008516. [Google Scholar] [CrossRef]
- Shen, J.; Liu, N.; Xu, C.; Sun, H.; Xiao, Y.; Li, D.; Zhang, Y. Finger Vein Recognition Algorithm Based on Lightweight Deep Convolutional Neural Network. IEEE Trans. Instrum. Meas. 2022, 71, 5000413. [Google Scholar] [CrossRef]
- Huang, X.; Khetan, A.; Cvitkovic, M.; Karnin, Z. TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv 2020, arXiv:2012.06678. [Google Scholar] [CrossRef]
- Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2024 Clinical Practice Guideline for the Evaluation and Management of Chronic Kidney Disease. Kidney Int. 2024, 105, S117–S314. [Google Scholar] [CrossRef] [PubMed]









| Approach Category | Representative Methods | Key Strengths | Primary Limitations | Performance Range (Accuracy) |
|---|---|---|---|---|
| Classical ML | SVM, Random Forest, Logistic Regression | • Fast training/inference • Low computational requirements • Interpretable (RF, LR) • Well-established clinical validation | • Limited capacity for complex patterns • Manual feature engineering required • Struggles with high-dimensional interactions • Poor performance on subtle early-stage CKD | 85–91% |
| Gradient Boosting | XGBoost, LightGBM, CatBoost | • High accuracy on tabular data • Built-in feature importance • Handles missing data well • Robust to outliers | • Requires careful hyperparameter tuning • Sequential training (slow for large datasets) • Limited ability to model long-range dependencies • Prone to overfitting on small datasets | 88–93% |
| Deep Neural Networks | Multi-layer Perceptron, 1D-CNN | • Can learn complex non-linear patterns • Automatic feature extraction • Scalable to large datasets | • Requires large training data • Black-box nature (low interpretability) • Prone to overfitting on clinical datasets • High computational cost | 90–94% |
| Recurrent Networks | LSTM, GRU, Bidirectional LSTM | • Captures temporal sequences • Models disease progression over time • Suitable for longitudinal EHR data | • Requires sequential data (often unavailable) • Vanishing gradient problems • High training complexity • Limited interpretability | 91–95% |
| Attention Mechanisms | Transformer, TabTransformer, Attention-Net | • Models feature interactions explicitly • Multi-head attention captures diverse patterns • State-of-the-art on many tasks • Positional encoding handles feature ordering | • Computationally expensive • Requires substantial data for training • Attention weights provide limited clinical interpretability • Overfitting risk on small datasets | 93–96% |
| Hybrid Ensemble | CNN-XGBoost, Ensemble-Net, Stacking | • Combines complementary model strengths • Improved robustness through diversity • Better generalization than single models | • Increased complexity • Longer training time • Difficult hyperparameter optimization • Limited systematic fusion strategies | 94–97% |
| Graph-Based | GNN, Graph-CKD | • Models patient similarity networks • Leverages population-level patterns • Incorporates relational information | • Requires graph construction (subjective) • Limited clinical interpretability • Computationally expensive for large graphs • Unclear optimal graph topology | 94–96% |
| Multi-Modal Fusion | Multi-Modal, Vision-CKD | • Integrates diverse data types • Comprehensive patient representation • Captures complementary information | • Requires multiple data modalities (often unavailable) • Complex preprocessing pipelines • Heterogeneous missing data challenges • Difficult to validate data source contributions | 95–97% |
| DeepCKD-Net (Ours) | Transformer + Gradient Boosting + Adaptive Fusion | • Synergistic hybrid architecture • High accuracy with interpretability • Confidence-aware predictions • Robust to missing data • Real-time inference capability | • Requires GPU for training • Higher parameter count than classical methods • Needs external validation • Limited to biomarker-only features | 98.7% |
| Feature Category | Count | Missing (%) | Mean | Std Dev |
|---|---|---|---|---|
| Demographic | 2 | 3.2 | – | – |
| Blood Tests | 11 | 8.7 | Varies | Varies |
| Urine Tests | 7 | 5.4 | Varies | Varies |
| Clinical Signs | 6 | 2.1 | – | – |
| Total Features | 26 | 4.8 | – | – |
| Total Samples | 400 | – | – | – |
| CKD Positive | 250 (62.5%) | – | – | – |
| CKD Negative | 150 (37.5%) | – | – | – |
| Method | Accuracy | Precision | Recall | F1 Score | AUC | Specificity | MCC | Training Time (min) | Inference (ms) | Parameters (M) | Memory (GB) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM RBF | 87.5 | 85.3 | 88.9 | 87.1 | 0.912 | 85.2 | 0.742 | 2.3 | 2.3 | 0.08 | 0.1 |
| Random Forest | 89.2 | 87.8 | 90.1 | 88.9 | 0.925 | 87.6 | 0.778 | 3.5 | 5.7 | 0.5 | 0.2 |
| XGBoost | 91.3 | 90.2 | 92.0 | 91.1 | 0.938 | 90.1 | 0.821 | 8.7 | 4.2 | 1.2 | 0.3 |
| 1D CNN | 93.4 | 92.5 | 93.8 | 93.1 | 0.951 | 92.7 | 0.865 | 15.2 | 8.9 | 3.4 | 0.8 |
| LSTM | 92.8 | 91.7 | 93.5 | 92.6 | 0.947 | 91.8 | 0.853 | 24.6 | 12.4 | 5.8 | 1.2 |
| TabTransformer | 94.2 | 93.4 | 94.7 | 94.0 | 0.958 | 93.5 | 0.881 | 32.1 | 15.6 | 8.2 | 1.6 |
| DeepCKD Net (Ours) | 98.7 | 98.4 | 98.9 | 98.6 | 0.993 | 98.3 | 0.973 | 45.3 | 16.8 | 12.6 | 2.1 |
| Configuration | Accuracy | F1-Score | AUC | ∆ Acc |
|---|---|---|---|---|
| Full DeepCKD-Net | 98.7 | 98.6 | 0.993 | – |
| w/o Transformer | 94.2 | 94.0 | 0.958 | −4.5 |
| w/o Boosting | 95.8 | 95.7 | 0.973 | −2.9 |
| w/o Fusion | 96.3 | 96.2 | 0.977 | −2.4 |
| w/o Confidence | 97.1 | 97.0 | 0.984 | −1.6 |
| w/o SHAP | 98.7 | 98.6 | 0.993 | 0.0 |
| Single Attention | 96.9 | 96.8 | 0.981 | −1.8 |
| No Preprocessing | 91.3 | 91.1 | 0.938 | −7.4 |
| Rank | Feature | SHAP Value | Correlation | Category |
|---|---|---|---|---|
| 1 | Serum Creatinine | 0.342 | 0.89 | Blood Test |
| 2 | Blood Urea | 0.287 | 0.85 | Blood Test |
| 3 | Hemoglobin | 0.231 | −0.78 | Blood Test |
| 4 | Specific Gravity | 0.198 | −0.71 | Urine Test |
| 5 | Albumin | 0.176 | 0.68 | Urine Test |
| 6 | Packed Cell Volume | 0.154 | −0.65 | Blood Test |
| 7 | Hypertension | 0.142 | 0.62 | Clinical Sign |
| 8 | Diabetes Mellitus | 0.131 | 0.59 | Clinical Sign |
| 9 | Red Blood Cells | 0.119 | −0.56 | Blood Test |
| 10 | Sodium | 0.108 | −0.52 | Blood Test |
| Method | Parameters (M) | FLOPs (G) | Memory (GB) | Training (min) | Inference (ms) |
|---|---|---|---|---|---|
| SVM-RBF | 0.08 | 0.01 | 0.1 | 2.3 | 2.3 |
| XGBoost | 1.2 | 0.15 | 0.3 | 8.7 | 4.2 |
| 1D-CNN | 3.4 | 0.42 | 0.8 | 15.2 | 8.9 |
| LSTM | 5.8 | 0.73 | 1.2 | 24.6 | 12.4 |
| TabTransformer | 8.2 | 1.05 | 1.6 | 32.1 | 15.6 |
| DeepCKD-Net | 12.6 | 1.58 | 2.1 | 45.3 | 16.8 |
| Comparison | Accuracy (p-Value) | F1-Score (p-Value) | AUC (p-Value) | Significant |
|---|---|---|---|---|
| DeepCKD vs. XGBoost | <0.001 | <0.001 | <0.001 | Yes |
| DeepCKD vs. 1D-CNN | <0.001 | <0.001 | <0.001 | Yes |
| DeepCKD vs. TabTransformer | <0.001 | <0.001 | <0.001 | Yes |
| DeepCKD vs. Ensemble-Net | 0.002 | 0.003 | 0.001 | Yes |
| DeepCKD vs. Multi-Modal | 0.018 | 0.021 | 0.015 | Yes |
| Hyperparameter | Value | Search Range |
|---|---|---|
| Learning Rate | 0.001 | [0.0001, 0.01] |
| Batch Size | 32 | [16, 64] |
| Transformer Layers | 6 | [2, 12] |
| Attention Heads | 8 | [4, 16] |
| Hidden Dimension | 512 | [256, 1024] |
| Dropout Rate | 0.3 | [0.1, 0.5] |
| Boosting Trees | 100 | [50, 200] |
| Tree Depth | 6 | [3, 10] |
| Weight Decay | 0.0001 | [1 × 10−5, 1 × 10−3] |
| Focal Loss γ | 2.0 | [1.0, 3.0] |
| Method | Year | Accuracy | Sensitivity | Specificity | PPV | NPV | Architecture | Key Innovation |
|---|---|---|---|---|---|---|---|---|
| BiLSTM-Attention [1] | 2023 | 94.8 | 95.2 | 94.3 | 94.5 | 95.0 | Bidirectional LSTM | Temporal modeling |
| GraphCKD [2] | 2023 | 95.3 | 95.7 | 94.8 | 95.1 | 95.5 | Graph Neural Network | Patient similarity graphs |
| FedCKD [3] | 2023 | 93.9 | 94.3 | 93.4 | 93.7 | 94.1 | Federated Learning | Privacy preservation |
| AutoML-CKD [4] | 2024 | 95.6 | 96.0 | 95.1 | 95.4 | 95.8 | Neural Architecture Search | Automated design |
| Vision-CKD [5] | 2024 | 94.2 | 94.6 | 93.7 | 94.0 | 94.4 | Vision Transformer | Patch-based attention |
| Quantum-CKD [6] | 2024 | 92.8 | 93.2 | 92.3 | 92.6 | 93.0 | Quantum Circuit | Quantum advantage |
| CausalCKD [7] | 2024 | 96.1 | 96.5 | 95.6 | 95.9 | 96.3 | Causal Inference | Causal relationships |
| DiffusionCKD [8] | 2025 | 95.9 | 96.3 | 95.4 | 95.7 | 96.1 | Diffusion Models | Generative augmentation |
| PromptCKD [9] | 2025 | 96.4 | 96.8 | 95.9 | 96.2 | 96.6 | Prompt Learning | Few-shot adaptation |
| LLM-CKD [10] | 2025 | 96.7 | 97.1 | 96.2 | 96.5 | 96.9 | Large Language Model | Clinical text integration |
| DeepCKD-Net | 2025 | 98.7 | 98.9 | 98.3 | 98.4 | 98.8 | Hybrid Transformer-Boost | Multi-paradigm fusion |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Al Ghamdi, M.; Alyahyan, S. Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net. Appl. Sci. 2026, 16, 3024. https://doi.org/10.3390/app16063024
Al Ghamdi M, Alyahyan S. Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net. Applied Sciences. 2026; 16(6):3024. https://doi.org/10.3390/app16063024
Chicago/Turabian StyleAl Ghamdi, Mostafa, and Saleh Alyahyan. 2026. "Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net" Applied Sciences 16, no. 6: 3024. https://doi.org/10.3390/app16063024
APA StyleAl Ghamdi, M., & Alyahyan, S. (2026). Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net. Applied Sciences, 16(6), 3024. https://doi.org/10.3390/app16063024

