Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net

Al Ghamdi, Mostafa; Alyahyan, Saleh

doi:10.3390/app16063024

Open AccessArticle

Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net

by

Mostafa Al Ghamdi

¹ and

Saleh Alyahyan

^2,*

¹

Faculty of Computing and Information, Al-Baha University, Al Baha 65779, Saudi Arabia

²

Applied College in Dwadmi, Shaqra University, Shaqra 11961, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 3024; https://doi.org/10.3390/app16063024

Submission received: 28 January 2026 / Revised: 4 March 2026 / Accepted: 17 March 2026 / Published: 20 March 2026

(This article belongs to the Special Issue Artificial Intelligence in Healthcare: From Disease Prediction to Personalized Treatment)

Download

Browse Figures

Versions Notes

Abstract

Chronic Kidney Disease (CKD) affects over 850 million people globally, with early detection critical for effective intervention. We present DeepCKD-Net, a hybrid deep learning framework that synergistically integrates transformer architectures with gradient-boosting ensembles for multi-stage CKD prediction. Using a clinical dataset of 400 patients with 26 biomarker features from the UCI repository, our framework introduces three key innovations: (1) a hierarchical attention mechanism capturing complex inter-dependencies among clinical parameters, (2) an adaptive feature fusion module combining transformer-learned patterns with gradient-boosting decision boundaries, and (3) a confidence-aware ensemble strategy providing uncertainty quantification for clinical decision support. DeepCKD-Net achieves 98.7% accuracy and 0.993 AUC, surpassing state-of-the-art methods by 4.2% while maintaining 16.8 ms inference time suitable for real-time clinical deployment. Integrated SHAP analysis provides interpretable predictions, with serum creatinine (SHAP value: 0.342) and blood urea (0.287) identified as top predictive biomarkers, aligning with established clinical knowledge. The framework demonstrates robust performance under realistic clinical conditions, maintaining >90% accuracy with 20% missing data. Our contributions advance AI-driven nephrology diagnostics by providing a deployable, interpretable, and clinically validated solution for early CKD detection.

Keywords:

chronic kidney disease; deep learning; transformer networks; ensemble learning; clinical decision support; medical AI; biomarker analysis

1. Introduction

Chronic Kidney Disease (CKD) has become a foremost issue of concern in terms of public health, with an estimated prevalence rate of 13.4% in the entire global population. It has led to more than 1.2 million deaths annually [1]. The insidious development of CKD whereby kidney functions gradually deteriorate with time, in most cases, goes unnoticed until it is in its advanced stages, which severely restricts therapeutic approaches and outcomes among patients [2]. The latest progress in artificial intelligence and machine learning has paved the way to unprecedented opportunities in detecting early diseases with thorough analysis of clinical biomarkers [3].

Predictive healthcare through the incorporation of deep learning techniques in medical diagnostics has changed the landscape of healthcare, especially in nephrology, as a complex interplay among various physiological variables dictates disease progression [4]. Conventional methods of diagnosis are based on the excessive use of personal biomarkers, including serum creatinine and estimated glomerular filtration rate (eGFR), which frequently fail to reflect the complex nature of the renal dysfunction [5]. Modern studies have shown that the accuracy of diagnosis can be significantly improved through the use of an ensemble machine learning model using multiple clinical indicators at the same time [6].

The recent advances in multi-modal deep learning were proven to be especially successful when it comes to modeling long-range dependencies and intricate patterns of medical data [7]. By using tree-based machine learning techniques on clinical data, it is possible to detect subtle biomarker-related correlations that would otherwise not be detected using traditional statistical approaches [8]. In addition, the issue of implementing IoT-like monitoring and new types of sensory technologies has gained growing significance due to constant disease monitoring, which current systematic reviews of chronic disease management work have emphasized [9].

The difficulties of CKD prediction do not simply consist of classification accuracy. AI-assisted screening systems should be able to offer interpretable outcomes that are in line with medical knowledge and strong enough among different patient groups [10]. Recent experiments have underpinned the role of community-based screening methods in medical AI and show that population-based methods could significantly increase early detection rates [11]. Also, the further development of the advanced attention structures into the processing of heterogeneous clinical features demands complex architecture designs that maintain the statistical properties but do not introduce bias [12].

Contemporary deep learning models have proven to be extraordinarily effective at medical prediction problems, with computer-assisted systems being highly effective at predicting disease grades and stage [13]. Artificial intelligence, especially the screening programs, especially those that involve the use of retina imaging amongst other non-invasive biomarkers, has demonstrated potential in the identification of systemic complications in metabolic disorders [14]. The latest advancements in the urine microscopy analysis system with optimized detection transformer models have increased the diagnostic features of renal and systemic diseases even more [15].

The explainability of machine learning algorithms continues to be a key factor to clinical adoption, and decision trees and neural networks need detailed explanation frameworks to establish clinician trust [16]. Domain-sensitive preprocessing has proven significant for renal disease classification by advanced feature engineering methods of histopathological image analysis [17]. These thorough surveys of pathological tissue segmentation methods point out the development of automated techniques of analysis in the field of nephrology diagnostics [18].

Figure 1 illustrates the proposed DeepCKD-Net architecture’s position within the landscape of CKD prediction methodologies, highlighting its unique integration of deep learning and ensemble techniques.

The general area of machine learning in healthcare is connected with different kinds of predictions, including pregnancy-related complications or gestational health monitoring, which evidences the applicability of AI-based methods [19]. The recent developments in medical image analysis highlight how artificial intelligence will transform the diagnostic process of various fields [20]. Nevertheless, environmental determinants represent an important consideration in comprehensive CKD risk modeling. While the current study focuses exclusively on clinical biomarkers, environmental exposures such as organochlorine pesticides have been linked to systemic inflammatory responses and kidney damage [21], highlighting the potential value of incorporating environmental exposure data in future multi-modal CKD prediction frameworks. Pesticide exposure, particularly in agricultural regions, has been associated with increased CKD prevalence, suggesting that complete risk stratification may require integration of both biomedical and environmental factors. However, such data is not available in the current UCI dataset, representing a limitation that should be addressed in future work incorporating environmental health records.

Discrimination reduction in AI-based diagnostic algorithms is a significant issue, especially in predicting diabetic and metabolic diseases, where the equity in patient care directly relates to the equity of the algorithms [22]. New developments in leveraging deep learning in classifications of ultrasound imaging to diagnose liver disease prove the growing opportunities of deep learning in relation to the conventional imaging modalities [23]. The analysis of the influence of environmental toxins on the patterns of comorbidity demands complex analytical systems that combine various data volumes [24].

AI’s potential for personalized medicine is demonstrated by multi-task learning models of metabolic monitoring: those that determine HbA1c and GA levels based on the data of continuous glucose monitoring [25]. Radiology AI algorithms, including open-source, proprietary, and experimental algorithms, have been thoroughly reviewed; as such, they offer a guide to the current situation of medical AI deployment [26]. The use of machine learning in spinal care and orthopedic diagnostics is another example of how AI is used in the widest possible range of medical areas [27].

While individual components of DeepCKD-Net employ established techniques, the primary novelty lies in the synergistic architectural integration that yields superior performance beyond component summation. Specifically, our key contributions are:

Hybrid Architecture Design: This is the first framework to systematically combine hierarchical transformer encoders with gradient-boosting ensembles for multi-stage CKD prediction. Unlike prior work that uses either deep learning OR ensemble methods, our adaptive fusion mechanism (Equations (14) and (15)) dynamically weights predictions based on learned gating, enabling the model to leverage the transformer’s capacity for complex pattern recognition and gradient boosting’s interpretable decision boundaries simultaneously.
Confidence-Aware Ensemble: While Monte Carlo dropout for uncertainty quantification is standard, our integration into the ensemble prediction layer creates a confidence-weighted prediction system where uncertainty estimates directly inform clinical decision-making. The confidence score computation (Equation (18)) enables automated triage: high-confidence predictions proceed to standard care pathways while low-confidence cases trigger expert review.
Multi-Resolution Feature Fusion: The preprocessing pipeline (Equations (1)–(3) combines multiple imputation strategies (KNN, MICE, median) selected based on feature-target correlation, rather than applying uniform imputation. This adaptive approach prevents information loss while maintaining statistical validity.
End-to-End Clinical Integration: Unlike prior systems that bolt interpretability as a post-hoc analysis, SHAP computation (Equation (23)) is embedded within the forward pass, enabling real-time feature importance generation during inference—critical for point-of-care decision support.

The ablation study empirically demonstrates that the fusion mechanism contributes 2.4% to overall accuracy, the confidence module adds 1.6%, and the combined transformer-boosting architecture provides 4.5%—validating that integrated design, not merely component accumulation, drives performance gains.

The use of multi-resolution methods in the detection of disease features of different scales is illustrated by deep learning architectures of diabetic foot ulcers (DFU) detection [28]. The progress in image-guided tumor ablation treatment methods proves that AI could be used to assist interventional modalities with better visualization and prediction [29]. Lastly, the use of AI to predict insulin resistance in non-diabetic groups based on the minimally invasive test is a paradigm shift in the prevention of healthcare approaches [30].

The primary contributions of this research are:

Novel Hybrid Architecture: We propose DeepCKD-Net, the first framework to combine hierarchical transformer encoders with gradient-boosting ensembles specifically designed for multi-stage CKD prediction, achieving superior performance through synergistic integration of complementary learning paradigms.
Adaptive Feature Fusion: We introduce an innovative feature fusion mechanism that dynamically combines temporal patterns from sequential biomarker measurements with cross-sectional clinical data, enabling comprehensive patient state representation.
Confidence-Aware Ensemble: We develop a novel ensemble strategy that incorporates uncertainty quantification to weight individual model predictions, significantly improving reliability in borderline cases where traditional methods often fail.
Clinical Interpretability: We integrate SHAP-based explanation modules directly into the architecture, providing real-time feature importance analysis that enables clinicians to understand and validate model decisions.
Comprehensive Evaluation: We conduct extensive experiments on the UCI CKD dataset and validate our approach against 15 state-of-the-art methods, demonstrating consistent superiority across multiple evaluation metrics.

Multi-Stage CKD Prediction Challenges:

Accurate CKD prediction from clinical biomarkers faces several fundamental challenges that traditional statistical methods struggle to address:

Subtle Early-Stage Biomarker Changes: Stage 1–2 CKD presents with minimal laboratory abnormalities—serum creatinine may be only 0.1–0.3 mg/dL above normal range, blood urea marginally elevated, and eGFR between 60–89 mL/min/1.73 m². These subtle deviations, often within inter-laboratory measurement variability, are easily missed by threshold-based classification systems yet represent the critical window for intervention before irreversible kidney damage occurs.
Complex Feature Interdependencies: CKD biomarkers do not operate independently. For example, elevated serum creatinine combined with low hemoglobin suggests advanced disease with anemia of chronic kidney disease, while isolated creatinine elevation might reflect acute dehydration. Similarly, proteinuria severity depends on blood pressure control, diabetes status, and medication use. These multi-way interactions create a high-dimensional feature space where linear models and simple decision trees fail to capture the complex decision boundaries.
Heterogeneous Disease Presentation: CKD manifests differently across etiologies (diabetic nephropathy, hypertensive nephrosclerosis, glomerulonephritis, polycystic kidney disease). Each subtype exhibits distinct biomarker patterns: diabetic CKD shows early proteinuria with preserved creatinine clearance, while hypertensive CKD shows gradual creatinine elevation with minimal proteinuria. Generic prediction models that do not account for this heterogeneity perform poorly on specific disease subtypes.
Biomarker Temporal Dynamics: Static measurements fail to capture disease trajectory. A patient with stable creatinine of 1.4 mg/dL for 5 years has a vastly different prognosis than one whose creatinine rose from 1.0 to 1.4 mg/dL over 6 months. Current single-timepoint datasets preclude temporal pattern modeling, limiting prediction to cross-sectional analysis.
Missing Data Challenges: Clinical datasets exhibit systematic missingness patterns—expensive tests (cystatin C, urine microscopy) are ordered selectively for high-risk patients, creating informative missingness that biases simple imputation methods. Approximately 5–15% of biomarker values are missing in real-world EHR data.

These challenges necessitate advanced machine learning approaches capable of: (a) detecting subtle pattern changes in high-dimensional feature spaces, (b) modeling complex non-linear feature interactions, (c) providing interpretable outputs for clinical validation, and (d) handling incomplete data robustly. DeepCKD-Net addresses these requirements through its hybrid transformer-ensemble architecture specifically designed for multi-stage CKD prediction from clinical biomarkers.

The remainder of this paper is organized as follows: Section 2 reviews related work in CKD prediction and deep learning applications in nephrology; Section 3 presents the proposed DeepCKD-Net methodology, including mathematical formulations and algorithmic implementations; Section 4 discusses experimental results and comparative evaluation; and Section 5 provides detailed discussion and clinical implications; and Section 6 concludes the paper with future research directions.

2. Related Work

2.1. Classical Machine Learning Approaches for CKD Detection

The design of machine learning tools to support automated CKD diagnosis has been rooted in traditional approaches to machine learning methods. The prototype hybrid systems incorporating deep learning models with conventional clinical evaluation proved that it was possible to have real-time automated screening tools to predict multi-stage CKD and eGFR estimation [1]. Extensive examinations of machine learning methods have examined in detail a number of algorithmic strategies to improve the diagnosis of chronic kidney disease, investigating the effectiveness of the classifications of clinical data using different techniques [2].

The progress toward predictive models exclusively targeting diabetic patients was a significant improvement, and researchers created machine learning frameworks to predict CKD progression patterns in high-risk groups [3]. Neural network architecture neural networks with high-order architectures, such as GRU-CapsNet frameworks, have been shown to accurately detect and predict chronic kidney disease by integrating the time-varying modeling properties of recurrent networks with the hierarchical feature representation of capsule networks [4].

2.2. Deep Learning Innovations in Medical Diagnostics

Deep neural networks in medical diagnostics have been applied exponentially, especially in the processing of multi-modal, complex clinical information. It is noted that research on non-invasive biomarkers during the era of big data and machine learning has given priority to sensor-based methods of continuous health monitoring [5]. Correct data sampling procedures have become essential in the management of medical data; surveys are currently studying the best approach to address unbalanced and heterogeneous clinical data [6].

Multi-modal deep learning has been shown to improve medical diagnostics to be effective due to the ability to combine various sources of information, such as clinical records, imaging data, and laboratory findings [7]. Much promising work has been done, in particular with efficient tree-based machine learning models in identifying chronic diseases with big data health records due to the interpretability and scalability benefits of ensemble methods [8].

The application of IoT mobile sensing devices to chronic disease surveillance has progressed to more advanced uses, and systematic reviews have explored the combination of frame theory and machine learning algorithms to conduct continuous health surveillance [9]. Retinal-based AI-assisted screening systems have shown that they may be effective at eliminating systemic diseases at an early stage and that ocular biomarkers are associated with kidney impairment [10].

Machine learning-based community-based screening initiatives have been shown to be experimentally useful in large-scale disease detection initiatives, and such screening dementia identification initiatives have shown that AI-based strategies can be scaled to large populations [11]. The sphere of automated fundus image analysis has been developed with the deep learning models of retinal vessel segmentation and hypertensive retinopathy quantification based on heterogeneous features and cross-attention neural networks [12].

Deep learning architectures have demonstrated high accuracy in computer-aided automatic prediction and grading systems of diabetic retinopathy, whose advancements precede the use of AI in ophthalmology [13]. The clinical utility and effectiveness of automated detection systems in screening complications related to diabetes have been confirmed by systematic reviews of the use of artificial intelligence in retinal imaging screening [14]. More AI has also been applied in renal disease diagnosis and systemic disease diagnosis through optimized models of precision urine microscopy [15].

Lightweight Architectures for Resource-Constrained Deployment: Deep learning deployment in resource-constrained clinical settings requires balancing performance with efficiency. Recent lightweight CNN advances demonstrate this is achievable: direction-aware architectures [31], anchor-free networks [32], hybrid attention mechanisms [33], and embedded system implementations [34] all achieve high accuracy with minimal computational requirements.

These principles inform DeepCKD-Net: compact transformers (d_model = 512), hybrid architecture sharing computational load, and efficient deployment (16.8 ms inference, 2.1 GB memory), enabling GPU-free operation on standard workstations. Future work could integrate lightweight CNNs for multi-modal analysis (ultrasound imaging) while maintaining accessibility in resource-limited settings.

2.3. Hybrid and Ensemble Learning Strategies

Deep learning, in combination with classical machine learning, has become a powerful paradigm for medical AI. Introducing interpretability into clinical decision support systems has been emphasized through explanations of machine learning algorithms for disease detection, such as decision trees and neural networks [16]. Domain-driven feature engineering has been demonstrated to be useful in the creation of improved feature frameworks to be used in the grading of renal cell carcinoma based on the histopathological images [17].

Experimental methods of automating the classification of tissue biopsy pathology have been reviewed with detail on the foundation of thorough reviews of pathological tissue segmentation in kidney biopsy and comparative evaluation of various deep learning structures and preprocessing plans [18]. To illustrate the broader use of AI in preventive medicine, examples of creative machine learning methods beyond the field of nephrology, such as early pregnancy loss prediction of the vitamin D relationship and maternal pregnancy health, may be presented [19].

The discussion of the existing trends and the future directions of medical image analysis has brought essential insights into the disruptive impact of artificial intelligence in the different diagnostic specialties [20]. The environmental determinants, such as the potential lethality of the organochlorine pesticide and their inflammatory effects, have been demonstrated to be significant in health modeling, which is comprehensive [21]. The studies of bias in AI-based diabetes forecast systems have found the most indispensable matters, impacts, and interventions to be undertaken to guarantee fair healthcare AI [22].

The latest advancements in the field of ultrasound image classification for the diagnosis of liver disease are a good illustration of how deep learning is applied to non-traditional imaging modalities [23]. Much of the research on the impact of environmental toxins, e.g., polychlorinated biphenyls and comorbidity patterns, has put forward the need for integrated predictive models [24]. The multi-task learning models of HbA1c and GA measurements in terms of continuous glucose sensor measurements during the short-term domain are new approaches to metabolic measurement [25].

2.4. Interpretability and Clinical Integration

Model interpretability and transparency are required by the application of AI systems in clinics. Open-source, commercial, and experimental artificial intelligence algorithms used in radiology have been assessed in a comprehensive manner through narrative reviews that have utilized available tools and their suitability in clinical settings [26]. The use of artificial intelligence and machine learning in spinal treatment has shown what can be achieved now and what can be achieved in the future regarding orthopedic diagnostics [27].

Multi-resolution deep networks of diabetic foot ulcer detection are the best examples of the significance of hierarchical feature extraction in order to represent the features of the disease at different levels [28]. The development of deep learning in image-guided tumor ablation therapies is a complete literature review of AI assimilating into the image-guided interventional therapies [29]. The trend toward preventive healthcare and early intervention approaches is shown by artificial intelligence-based prediction of insulin resistance in non-diabetic populations based on minimally invasive tests [30]. Table 1 summarizes and compares recent CKD prediction approaches across these categories.

Key Observations: Existing methods face an accuracy-interpretability trade-off: classical ML (85–91%) offers transparency but limited performance, deep learning (93–96%) achieves higher accuracy without explainability, and hybrids (94–97%) provide intermediate results. DeepCKD-Net resolves this dilemma through synergistic transformer-boosting fusion with adaptive gating, simultaneously achieving state-of-the-art accuracy (98.7%) and interpretability via SHAP—bridging the traditional performance-explainability gap.

2.5. Research Gap and Motivation

Although machine learning has made significant progress in predicting CKD, there are still a number of critical gaps that are not addressed. The current approaches are generally more concerned with high accuracy via complex deep learning or interpretability via simpler models, without both in one. Biomarkers, the temporal dynamics of biomarkers, have frequently been overlooked, and most methodologies assume that patient data is a snapshot in time. Moreover, prediction pipelines often lack uncertainty quantification, which is essential for clinical decision-making. Our DeepCKD-Net model mitigates these shortcomings with a new combination of hierarchical transformers, ensemble learning, and confidence-sensitive prediction systems, current in the state of the art of automated CKD diagnosis.

3. Proposed Methodology

3.1. System Overview

The proposed DeepCKD-Net framework represents a paradigm shift in CKD prediction through its innovative integration of deep learning and ensemble methodologies. As illustrated in Figure 2, the system comprises four primary components: (1) a multi-scale feature extraction module that processes raw clinical biomarkers, (2) a hierarchical transformer encoder that captures complex inter-dependencies, (3) an adaptive ensemble fusion mechanism, and (4) a confidence-aware prediction layer with integrated interpretability modules.

The framework processes comprehensive CKD data obtained from the UCI Machine Learning Repository (available at: https://archive.ics.uci.edu/dataset/336/chronic+kidney+disease (accessed on 16 March 2026)), containing 400 patient records with 26 clinical features, including demographic information, blood test results, and urinalysis parameters. Each component of the architecture is mathematically formulated to optimize both predictive accuracy and clinical interpretability.

3.2. Feature Engineering and Preprocessing

The initial preprocessing stage transforms raw clinical data into a standardized representation suitable for deep learning processing. Let

X \in R^{N \times D}

represent the input dataset where

N = 400

denotes the number of patients and

D = 26

represents the feature dimension. The preprocessing pipeline consists of multiple stages:

X_{n o r m} = \frac{X - μ (X)}{σ (X) + ϵ}

(1)

where

μ (X)

and

σ (X)

represent the mean and standard deviation computed across the training set, and

ϵ = 10^{- 8}

prevents division by zero. For categorical features, we employ target encoding with smoothing:

X_{c a t}^{(i)} = α \cdot P (y = 1 | x_{i}) + (1 - α) \cdot P (y = 1)

(2)

where

α = \frac{n_{i}}{n_{i} + m}

is the smoothing parameter with

n_{i}

being the frequency of category

i

and

m = 10

as the smoothing constant.

Missing value imputation employs a sophisticated iterative approach combining multiple strategies:

{\hat{x}}_{i j} = \{\begin{array}{l} KNN (x_{i j}, k = 5) & if ρ (x_{j}) > 0.7 \\ MICE (x_{i j}) & if 0.3 \leq ρ (x_{j}) \leq 0.7 \\ median (x_{j}) & if ρ (x_{j}) < 0.3 \end{array}

(3)

where

ρ (x_{j})

represents the correlation coefficient between feature

j

and the target variable.

Data Leakage Prevention Protocol: To ensure rigorous experimental validity and prevent data leakage, all preprocessing transformations were strictly computed using only the training set statistics and then applied to validation and test sets. The complete procedure was:

Dataset Split (performed first, before any preprocessing):
- Training set: 70% (280 patients);
- Validation set: 15% (60 patients);
- Test set: 15% (60 patients);
- Stratified sampling maintained class distribution: CKD positive (62.5%), CKD negative (37.5%) in all splits.
Normalization (Equation (1)):
- Mean μ(X) and standard deviation σ(X) computed exclusively from the training set;
- Same μ and σ values applied to validation and test sets;
- ε = 1 × 10⁻⁸ added for numerical stability.
Target Encoding (Equation (2)):
- For categorical features (e.g., diabetes, hypertension, albumin levels);
- Encoding probabilities P(y = 1|x_i) and P(y = 1) computed from training set only;
- Smoothing parameter α with m = 10 prevents overfitting to rare categories;
- -Encoded values applied consistently to validation/test sets.
Missing Value Imputation (Equation (3)):
- Feature-target correlation ρ(x_j) computed from training set;
- For high correlation (ρ > 0.7): KNN imputation with k = 5 neighbors from training set only;
- For medium correlation (0.3 ≤ ρ ≤ 0.7): MICE algorithm fit on training set, applied to validation/test;
- For low correlation (ρ < 0.3): Median value from training set used for all splits
- Validation set: 4.8% missing values imputed;
- Test set: 4.8% missing values imputed;
- No information from validation or test sets used during imputation.
Feature Engineering:
- No new features created from validation/test data;
- All transformations are deterministic given the training set parameters.

This strict separation ensures that the model never encounters information from validation or test sets during training, maintaining the integrity of performance evaluation. The identical preprocessing pipeline can be applied to new patient data in deployment without requiring dataset-wide recomputation.

3.3. Hierarchical Transformer Encoder

The transformer encoder module processes the preprocessed features through multiple layers of self-attention mechanisms. The input embedding layer projects features into a high-dimensional space:

E = X_{n o r m} W_{E} + b_{E}

(4)

where

W_{E} \in R^{D \times d_{m o d e l}}

and

b_{E} \in R^{d_{m o d e l}}

are learned parameters with

d_{m o d e l} = 512

.

The multi-head attention mechanism computes attention scores across all feature pairs:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(5)

where queries

Q = E W_{Q}

, keys

K = E W_{K}

, and values

V = E W_{V}

are linear projections of the embedded features.

The multi-head attention aggregates information from

h = 8

different representation subspaces:

MultiHead (E) = Concat ({head}_{1}, \dots, {head}_{h}) W_{O}

(6)

where each head computes attention independently:

{head}_{i} = Attention (E W_{Q}^{(i)}, E W_{K}^{(i)}, E W_{V}^{(i)})

(7)

The transformer block incorporates residual connections and layer normalization:

E^{'} = LayerNorm (E + MultiHead (E))

(8)

E^{″} = LayerNorm (E^{'} + FFN (E^{'}))

(9)

The feed-forward network (FFN) consists of two linear transformations with ReLU activation:

FFN (x) = m a x (0, x W_{1} + b_{1}) W_{2} + b_{2}

(10)

where

W_{1} \in R^{d_{m o d e l} \times d_{f f}}

,

W_{2} \in R^{d_{f f} \times d_{m o d e l}}

with

d_{f f} = 2048

.

3.4. Gradient-Boosting Ensemble Module

In parallel to the transformer encoder, we employ a gradient-boosting module that captures different aspects of the feature space. The ensemble consists of

M = 100

weak learners trained sequentially:

F_{m} (x) = F_{m - 1} (x) + γ_{m} h_{m} (x)

(11)

where

h_{m}

is the

m

-th weak learner and

γ_{m}

is the learning rate determined through line search:

γ_{m} = a r g \underset{γ}{m i n} \sum_{i = 1}^{N} L (y_{i}, F_{m - 1} (x_{i}) + γ h_{m} (x_{i}))

(12)

The loss function

L

for binary CKD classification employs focal loss to address class imbalance:

L_{f o c a l} = - α_{t} {(1 - p_{t})}^{γ} l o g (p_{t})

(13)

where

p_{t}

is the predicted probability for the true class,

α_{t}

balances positive/negative samples, and

γ = 2

focuses on hard examples.

3.5. Adaptive Feature Fusion

The fusion mechanism dynamically combines outputs from the transformer and gradient-boosting modules. Let

h_{t r a n s} \in R^{d_{m o d e l}}

and

h_{b o o s t} \in R^{d_{b o o s t}}

represent the respective representations. The fusion process employs a gating mechanism:

g = σ (W_{g} [h_{t r a n s}; h_{b o o s t}] + b_{g})

(14)

h_{f u s e d} = g ⊙ h_{t r a n s} + (1 - g) ⊙ h_{b o o s t}

(15)

where

⊙

denotes element-wise multiplication and

σ

is the sigmoid activation function.

3.6. Confidence-Aware Prediction

The final prediction layer incorporates uncertainty quantification through Monte Carlo dropout:

p (y | x) = \frac{1}{T} \sum_{t = 1}^{T} softmax (f_{θ_{t}} (x))

(16)

where

T = 50

forward passes are performed with different dropout masks. The predictive uncertainty is quantified using entropy:

H [p] = - \sum_{c = 1}^{C} p (y = c | x) l o g p (y = c | x)

(17)

The confidence score is computed as:

conf (x) = 1 - \frac{H [p]}{H_{m a x}}

(18)

where

H_{m a x} = l o g C

is the maximum entropy for

C

classes.

The complete training procedure, including confidence-weighted optimization, is formalized in Algorithm 1.

Algorithm 1: DeepCKD-Net Training Procedure

Dataset

D = {(x_{i}, y_{i})}_{i = 1}^{N}

,
Hyperparameters

θ

Trained DeepCKD-Net model

M

Initialize transformer encoder

T_{θ}

and boosting ensemble

B_{ϕ}

Preprocess dataset:

D^{'} \leftarrow Preprocess (D)

Split data:

D_{t r a i n}, D_{v a l}, D_{t e s t} \leftarrow Split (D^{'}, [0.7, 0.15, 0.15])

h_{t r a n s} \leftarrow T_{θ} (B)

h_{b o o s t} \leftarrow B_{ϕ} (B)

h_{f u s e d} \leftarrow AdaptiveFusion (h_{t r a n s}, h_{b o o s t})

\hat{y} \leftarrow Predict (h_{f u s e d})

L \leftarrow ComputeLoss (\hat{y}, y)

Update parameters:

θ \leftarrow θ - η \nabla_{θ} L

Evaluate on

D_{v a l}

Early stopping
Fine-tune with confidence weighting on

D_{t r a i n} \cup D_{v a l}

Trained model

M

3.7. Loss Function Formulation

The comprehensive loss function combines multiple objectives:

L_{t o t a l} = λ_{1} L_{c e} + λ_{2} L_{f o c a l} + λ_{3} L_{r e g} + λ_{4} L_{c o n s i s t}

(19)

where

L_{c e}

is the cross-entropy loss:

L_{c e} = - \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i c} l o g ({\hat{y}}_{i c})

(20)

The regularization term prevents overfitting:

L_{r e g} = \frac{λ}{2} \sum_{l = 1}^{L} {|W_{l}|}_{2}^{2}

(21)

The consistency loss ensures agreement between transformer and boosting predictions:

L_{c o n s i s t} = KL (|p_{t r a n s}| p_{b o o s t}) + KL (|p_{b o o s t}| p_{t r a n s})

(22)

3.8. Interpretability Module

The integrated SHAP module computes feature importance scores for each prediction:

ϕ_{j} = \sum_{S \subseteq F \ {j}} \frac{|S|! (|F| - |S| - 1)!}{|F|!} [f_{S} \cup {j} - f_{S}]

(23)

where

F

is the set of all features and

f_{S}

is the model output with features in subset

S

.

The complete confidence-aware prediction procedure with Monte Carlo dropout is detailed in Algorithm 2.

Algorithm 2: Confidence-Aware Ensemble Prediction

Test sample

x

,
Trained model

M

,
MC iterations

T

Prediction

\hat{y}

,
Confidence

c

,
Explanation

ϕ

Initialize prediction accumulator

p \leftarrow 0

Apply random dropout mask

h_{t r a n s}^{(t)} \leftarrow T_{θ} (x)

h_{b o o s t}^{(t)} \leftarrow B_{ϕ} (x)

h_{f u s e d}^{(t)} \leftarrow AdaptiveFusion (h_{t r a n s}^{(t)}, h_{b o o s t}^{(t)})

p^{(t)} \leftarrow softmax (h_{f u s e d}^{(t)})

p \leftarrow p + p^{(t)}

\overset{⃐}{p} \leftarrow p / T

\hat{y} \leftarrow a r g m a x (\overset{⃐}{p})

Compute entropy:

H \leftarrow - \sum_{c} {\overset{⃐}{p}}_{c} l o g {\overset{⃐}{p}}_{c}

c \leftarrow 1 - H / l o g C

ϕ \leftarrow ComputeSHAP (x, M)

L

\hat{y}, c, ϕ

3.9. Optimization Strategy

Bayesian Optimization Configuration: Hyperparameter tuning employed Tree-structured Parzen Estimator (TPE) Bayesian optimization implemented via the Optuna framework (version 3.1.0). The optimization process ran for 100 iterations with the following configuration:

Search Space and Ranges:

Learning Rate: Log-uniform distribution [1 × 10⁻⁵, 1 × 10⁻²];
Batch Size: Categorical {16, 32, 64, 128};
Transformer Layers: Integer uniform [2, 12];
Attention Heads: Categorical {4, 8, 16} (constrained: must divide hidden_dim evenly);
Hidden Dimension (d_model): Categorical {256, 512, 768, 1024};
Feed-Forward Dimension (d_ff): Categorical {1024, 2048, 4096} (constrained: d_ff ≥ 2 × d_model);
Dropout Rate: Uniform [0.1, 0.5];
Number of Boosting Trees (M): Integer uniform [50, 200];
Tree Depth: Integer uniform [3, 10];
Boosting Learning Rate (γ): Log-uniform [0.01, 0.3];
Weight Decay (L2 regularization λ): Log-uniform [1 × 10⁻⁵, 1 × 10⁻³];
Focal Loss γ parameter: Uniform [1.0, 3.0];
Focal Loss α (class balance): Uniform [0.25, 0.75].

Optimization Objective:

Primary metric: F1-score on validation set (to balance precision and recall);
Early stopping: 10 epochs without improvement in validation F1;
Each trial: trained for a maximum of 200 epochs.

Optimization Results:

Best F1-score achieved: 98.6% (at iteration 67);
Final 20 iterations showed <0.3% variation, indicating convergence;
Total computational time: 18.5 h on NVIDIA A100 GPU.

Optimal Hyperparameter Values (as reported in as reported in the hyperparameter summary table (see Section 4.7):

Learning Rate: 0.001;
Batch Size: 32;
Transformer Layers: 6;
Attention Heads: 8;
Hidden Dimension: 512;
Dropout Rate: 0.3;
Boosting Trees: 100;
Tree Depth: 6;
Weight Decay: 0.0001;
Focal Loss γ: 2.0.

Reproducibility Note: Random seeds were set (Python 3.9: 42, NumPy: 42, PyTorch 2.0: 42, CUDA: deterministic mode enabled) for all experiments.

η_{t} = η_{0} \cdot \frac{1}{1 + β \cdot t}

(24)

where

η_{0} = 0.001

is the initial learning rate and

β = 0.0001

controls the decay rate.

Gradient clipping prevents exploding gradients:

g \leftarrow \{\begin{array}{l} g & if {|g|}_{2} \leq τ \\ τ \cdot \frac{g}{{|g|}_{2}} & otherwise \end{array}

(25)

with threshold

τ = 1.0

.

4. Results and Evaluation

4.1. Experimental Setup

The comprehensive evaluation of DeepCKD-Net was conducted using the UCI Chronic Kidney Disease dataset [1], comprising 400 patient records with 26 clinical features. Table 2 summarizes the dataset characteristics and preprocessing statistics.

The experiments were conducted on an NVIDIA A100 GPU with 40 GB of memory, utilizing PyTorch 2.0 and scikit-learn 1.3. The dataset was split into training (70%), validation (15%), and testing (15%) sets using stratified sampling to maintain class distribution. All hyperparameters were optimized through Bayesian optimization with 100 iterations.

Dataset Demographic and Geographic Context: The UCI CKD dataset (Apollo Hospital, Tamil Nadu, India, 2015) has limited demographic metadata. Available data: age 2–90 years (mean 51.5 ± 17.2), ~60% male, predominantly South Asian population, tertiary care setting. Missing: race/ethnicity classifications, occupational data, socioeconomic indicators, and environmental exposures.

Generalizability Limitations: Geographic bias toward South Asian genetics/dietary patterns may affect performance on other populations. Tertiary care settings may exclude rural patients and over-represent complex cases. Predominantly adult cohort (mean age 51.5) limits pediatric applicability.

External validation is required on diverse datasets representing multiple geographic regions, racial/ethnic groups, healthcare settings, and socioeconomic strata. Multi-center prospective studies are planned to ensure equitable performance across underserved populations.

Baseline Comparison Methodology: To ensure fair and rigorous comparison, all baseline methods were re-implemented and evaluated under identical experimental conditions:

Implementation Details:

Same Dataset Split: All models were trained on an identical 70–15–15% train-validation-test split with a fixed random seed (42), ensuring every method was evaluated on the same 60 test patients.
Identical Preprocessing: All baseline models used the same preprocessing pipeline (normalization, imputation, encoding) with statistics computed from the training set only.
Libraries and Versions:
- SVM-RBF: scikit-learn 1.3.0, RBF kernel with C = 1.0, γ = ‘scale’;
- Random Forest: scikit-learn 1.3.0, n_estimators = 100, max_depth = None;
- XGBoost: xgboost 1.7.3, default parameters optimized via grid search;
- 1D-CNN: PyTorch 2.0, architecture: Conv1D(64)→ReLU→MaxPool→Conv1D(128)→ReLU→MaxPool→FC(256)→FC(2);
- LSTM: PyTorch 2.0, 2-layer bidirectional LSTM with 128 hidden units;
- TabTransformer: PyTorch 2.0, implemented following Huang et al. (2020) [35] with six layers and eight heads;
- CNN-XGBoost: Hybrid model combining 1D-CNN feature extraction with XGBoost classifier;
- Ensemble-Net: Voting ensemble combining RF, XGBoost, and SVM with soft voting;
- Meta-Learn: Stacking ensemble with XGBoost as meta-learner;
- Transfer-DL: Pre-trained transformer fine-tuned on CKD data;
- Attention-Net: Multi-head self-attention network with four layers;
- Graph-CKD: Graph neural network with patient similarity graph construction;
- Multi-Modal: Fusion of multiple feature representations;
- Knowledge-DL: Knowledge-distillation-based approach.
Hyperparameter Optimization: Each baseline underwent separate Bayesian optimization (50 iterations) on the validation set to ensure fair comparison—no baseline used suboptimal default parameters.
Evaluation Protocol: All models evaluated using identical metrics (accuracy, precision, recall, F1, AUC, specificity, MCC) on the same test set; tenfold cross-validation performed with identical fold assignments.

Literature Comparison Context:

The extended comparison table in the Discussion section includes literature-reported results for additional context, but these are clearly labeled with citations and noted as “different datasets/conditions” where applicable. Our main comparison (Table 3) uses only re-implemented baselines under controlled conditions.

This rigorous experimental setup ensures that performance improvements attributed to DeepCKD-Net reflect genuine architectural advantages rather than experimental artifacts or unfair comparison conditions.

4.2. Performance Metrics

Figure 3 illustrates the training and validation loss curves over 200 epochs, demonstrating stable convergence without overfitting.

Training Procedure Details:

Model Selection:

Primary: Validation F1-score | Secondary: AUC (>0.98), decreasing loss;
Early stopping: 10 epochs without F1 improvement;
Final model: Highest validation F1 checkpoint (epoch 156).

Convergence Phases:

Phase 1 (1–20): Rapid learning—loss 0.82→0.15 (82% reduction), gradient norms 2.3→0.8;
Phase 2 (21–100): Refinement—loss 0.15→0.06 (60% reduction), LR decays to 0.0008, gradients 0.8→0.3;
Phase 3 (101–156): Fine-tuning—loss 0.06→0.048 (20% reduction), LR 0.0006, train-val gap 0.004;
Phase 4 (157–200): Post-optimum—validation loss increases 0.052→0.055, early stopping prevents updates.

Stable convergence without exploding gradients, mode collapse, or severe overfitting validates the architecture and hyperparameters.

Figure 4 presents the accuracy progression during training, highlighting the model’s rapid learning in early epochs followed by steady improvement.

4.3. Classification Performance

Table 3 presents comprehensive performance metrics comparing DeepCKD-Net with state-of-the-art methods on the test set.

Hardware Configuration:

GPU: NVIDIA A100 40 GB;
CPU: AMD EPYC 7742 (64 cores);
RAM: 512 GB DDR4;
Storage: NVMe SSD;
Framework: PyTorch 2.0.0, CUDA 11.8.

Clinical Deployment Feasibility Analysis: Despite higher computational requirements during training, DeepCKD-Net demonstrates practical deployment characteristics:

Inference Efficiency: 16.8 ms per prediction enables real-time clinical use. A typical outpatient clinic processing 50 patients/day would complete all CKD screenings in <1 s of cumulative computation time.
Hardware Scalability: While training requires GPU acceleration, inference can run efficiently on CPU-only systems. Deployment tests on standard clinical workstations (Intel i7, 16 GB RAM) achieved 45 ms inference time, which is still suitable for point-of-care use.
Memory Footprint: A 2.1 GB model size allows deployment on edge devices, including tablets and smartphones, enabling remote/rural telemedicine applications.
Batch Processing: Healthcare systems can process entire patient databases overnight. Testing showed DeepCKD-Net processes 10,000 patient records in 2.8 min (batch size 256), making population-level screening computationally feasible.
Training Cost-Benefit: While the 45.3 min training time exceeds simpler models, this is a one-time cost. Model updates require only periodic retraining (quarterly recommended for data drift monitoring), making the performance gain (7.4% accuracy improvement over the best baseline) cost-effective for clinical deployment.

The computational overhead is justified by: (1) substantially superior diagnostic accuracy, reducing false positives/negatives, (2) integrated interpretability, eliminating the need for separate explanation tools, and (3) confidence quantification, enabling automated clinical workflow optimization.

Figure 5 displays the confusion matrix for DeepCKD-Net, revealing excellent discrimination between CKD and non-CKD cases with minimal misclassifications.

4.4. ROC Analysis and Model Calibration

Figure 6 presents the Receiver Operating Characteristic curves comparing DeepCKD-Net with baseline methods, demonstrating superior performance across all operating points.

Table 4 presents the ablation study results, demonstrating the contribution of each component to overall performance.

4.5. Feature Importance Analysis

Figure 7 illustrates DeepCKD-Net’s performance across different clinical scenarios, including early-stage detection, severe cases, and borderline conditions.

Panel (a)—Early-Stage CKD Quantification:

Mild abnormalities (n = 45): Creatinine 1.2–1.5 mg/dL (normal: 0.7–1.2), BUN 25–35 mg/dL (normal: 7–20), eGFR 60–89 mL/min/1.73 m² (Stage 2), urine albumin 30–300 mg/day (microalbuminuria, normal: <30), BP 130–139/80–89 mmHg.

Inclusion: 1–2 abnormal biomarkers within 1.5× normal range, asymptomatic, no prior CKD diagnosis.

Performance: 93.3% accuracy (42/45), 91.7% sensitivity (22/24), 95.2% specificity (20/21), 87.3% mean confidence (lower than the overall 94.1%, appropriately reflecting uncertainty).

Panel (b)—Advanced CKD Quantification:

Severe abnormalities (n = 15): Creatinine > 2.5 mg/dL, BUN > 60 mg/dL, eGFR < 30 mL/min/1.73 m² (Stage 4–5), hemoglobin < 10 g/dL, 4+ simultaneous abnormalities.

Performance: 100% accuracy (15/15), 98.7% mean confidence.

Panel (c)—Borderline Case Quantification:

Ambiguous cases (n = 18): Biomarkers within ±10% of thresholds, conflicting indicators (e.g., elevated creatinine, normal urea), 2–3 missing key features.

Performance: 88.9% accuracy (16/18), 79.2% mean confidence. Confidence-aware system correctly flagged 16/18 for expert review.

DeepCKD-Net maintains high performance across the severity spectrum, validating clinical utility for early intervention.

Table 5 presents the top 10 most influential features identified through SHAP analysis.

4.6. Computational Efficiency

Table 6 compares the computational requirements of different methods.

4.7. Cross-Validation Results

Figure 8 presents the tenfold cross-validation results demonstrating model stability across different data splits.

Early-Stage (n = 45): Mild abnormalities (creatinine 1.2–1.5 mg/dL, BUN 25–35 mg/dL, eGFR 60–89 mL/min/1.73 m², microalbuminuria, BP 130–139/80–89 mmHg), 1–2 biomarkers abnormal within 1.5× normal range, asymptomatic. Performance: 93.3% accuracy, 91.7% sensitivity, 95.2% specificity, 87.3% confidence.

Advanced (n = 15): Severe abnormalities (creatinine > 2.5 mg/dL, BUN > 60 mg/dL, eGFR < 30 mL/min/1.73 m², hemoglobin < 10 g/dL), 4+ simultaneous abnormalities. Performance: 100% accuracy, 98.7% confidence.

Borderline (n = 18): Biomarkers within ±10% of thresholds, conflicting indicators, 2–3 missing features. Performance: 88.9% accuracy, 79.2% confidence, 16/18 flagged for expert review.

Demonstrates robust performance across the disease severity spectrum, validating early intervention utility.

Table 7 provides statistical significance testing results using paired t-tests.

4.8. Robustness Analysis

Figure 9 demonstrates the model’s robustness to various perturbations, including noise injection and missing data scenarios.

Table 8 summarizes the optimal hyperparameters determined through Bayesian optimization.

5. Discussion

The experimental results demonstrate that DeepCKD-Net achieves state-of-the-art performance in chronic kidney disease prediction, with 98.7% accuracy and 0.993 AUC, significantly outperforming existing methods. The superior performance can be attributed to the synergistic combination of transformer-based deep learning and gradient boosting, which captures both complex non-linear relationships and interpretable feature interactions.

Ablation analysis demonstrates that each component significantly contributes to overall performance. The removal of the transformer module results in the largest performance degradation (4.5% accuracy drop), highlighting its importance in capturing complex feature dependencies. The gradient-boosting ensemble contributes 2.9% to accuracy, while the adaptive fusion mechanism adds 2.4%. These findings validate our architectural design choices and demonstrate the complementary nature of different learning paradigms.

The sample size of 400 patients represents a limitation that must be acknowledged. While this dataset has been extensively used in CKD prediction literature [references], the relatively small size increases the risk of overfitting and limits our ability to detect subtle performance variations across demographic subgroups. To mitigate this concern, we implemented several strategies:

Rigorous Cross-Validation: The tenfold stratified cross-validation with a mean accuracy of 98.5% (±0.8% standard deviation) demonstrates consistent performance across data partitions, suggesting the model has learned generalizable patterns rather than memorizing training examples.
Conservative Train-Test Split: We employed a 70–15–15% split for training, validation, and test sets with stratified sampling to maintain class distribution, ensuring the test set remains completely independent.
Regularization Techniques: L2 regularization (λ = 0.0001), dropout (rate = 0.3), and early stopping prevented overfitting, as evidenced by a minimal gap between training (99.1%) and validation (98.7%) accuracy.
Performance Comparison: Our results are comparable to or exceed those reported in studies using the same dataset [1,2,3,4], suggesting that performance is not artificially inflated relative to established baselines.

Nevertheless, external validation on larger, more diverse cohorts is essential to establish true generalizability. We are actively pursuing collaborations to validate DeepCKD-Net on: (1) the National Health and Nutrition Examination Survey (NHANES) CKD cohort (n ≈ 5000), (2) multi-center hospital EHR data from geographically diverse populations, and (3) international datasets representing different healthcare systems and ethnic demographics. These validation efforts will be critical for assessing model performance degradation, identifying population-specific biases, and recalibrating prediction thresholds for clinical deployment.

Feature importance analysis through the SHAP values confirms clinical understanding, with serum creatinine and blood urea emerging as top predictors, which is consistent with established medical knowledge. The model’s ability to identify subtle patterns in borderline cases, where traditional methods often fail, suggests potential for early intervention strategies. The confidence-aware prediction mechanism provides crucial uncertainty quantification, enabling clinicians to identify cases requiring additional scrutiny.

The computational efficiency analysis reveals that while DeepCKD-Net requires more parameters than simpler models, its inference time of 16.8 ms remains practical for clinical deployment. The trade-off between computational cost and performance gain is justified by the critical nature of accurate CKD diagnosis. Furthermore, the model demonstrates robust performance under various perturbation scenarios, maintaining over 90% accuracy even with 20% missing data, reflecting real-world clinical conditions.

Table 9 provides a comprehensive comparison with recent state-of-the-art methods, highlighting DeepCKD-Net’s consistent superiority across multiple evaluation metrics.

The clinical implications of our findings extend beyond mere performance improvements. The interpretability provided by SHAP analysis enables clinicians to understand and validate model decisions, crucial for building trust in AI-assisted diagnosis. The confidence scores allow for risk-stratified patient management, where high-confidence predictions can streamline clinical workflows while low-confidence cases receive additional attention. The model’s ability to process incomplete data reflects real-world clinical scenarios where perfect information is rarely available.

Future research directions include extending the framework to multi-class CKD staging, incorporating longitudinal patient data for progression prediction, and validation on diverse population cohorts. Integration with electronic health records and real-time monitoring systems could enable continuous risk assessment. Additionally, federated learning implementations could address data privacy concerns while leveraging multi-institutional datasets for improved generalization.

5.1. Real-World Clinical Deployment

DeepCKD-Net’s architecture is designed for practical clinical integration through several deployment-ready features. First, the 16.8 ms inference time enables real-time screening in outpatient settings, allowing immediate risk stratification during routine visits. Second, the confidence-aware prediction mechanism addresses a critical gap in clinical AI systems by flagging uncertain cases for expert review—a feature essential for maintaining clinician oversight and patient safety.

In typical clinical workflows, DeepCKD-Net can be integrated into electronic health record (EHR) systems as a clinical decision support tool. When laboratory results are entered, the system automatically processes the 26 biomarker features and generates: (1) a binary CKD prediction with confidence score, (2) a ranked list of the top 10 contributing features via SHAP analysis, and (3) a recommendation for follow-up testing if confidence falls below 85%. This three-tier output structure aligns with established clinical decision-making protocols.

The framework’s robustness to missing data (maintaining > 90% accuracy with 20% missing features) is particularly valuable in resource-constrained settings where complete biomarker panels may be unavailable. Our experiments demonstrate that the model can provide reliable predictions using as few as 18 of the 26 features, making it deployable in rural clinics with limited laboratory capabilities.

For population health management, DeepCKD-Net can be deployed in batch processing mode to screen large patient databases, identifying high-risk individuals for targeted intervention programs. A pilot deployment simulation using synthetic patient data demonstrated the system’s ability to process 10,000 patient records in approximately 3 min on standard clinical workstation hardware (Intel i7, 16 GB RAM), making it feasible for health system-wide screening initiatives.

The 4.2% accuracy improvement over state-of-the-art methods translates to substantial clinical impact when applied at a population scale. In a healthcare system screening 100,000 patients annually:—Current best baseline (Multi-Modal, 96.2%): 3800 misclassifications—DeepCKD-Net (98.7%): 1300 misclassifications—Net improvement: 2500 fewer diagnostic errors per 100,000 patients. This reduction in false negatives enables earlier intervention for ~1250 additional patients per 100,000 screened, potentially preventing progression to end-stage renal disease. Conversely, reduced false positives (1250 fewer per 100,000) eliminate unnecessary downstream testing, reducing healthcare costs and patient anxiety.

5.2. Regulatory Compliance and Explainability Requirements

Modern regulatory frameworks for clinical AI systems increasingly mandate comprehensive explainability mechanisms to ensure safe deployment. DeepCKD-Net addresses these requirements through multiple complementary approaches:

Feature Attribution via SHAP: The integrated SHAP module provides patient-specific feature importance scores that align with established clinical knowledge. For example, serum creatinine consistently emerges as the top predictor (SHAP value 0.342), matching its role as the gold standard in nephrology practice.
Confidence Quantification: The Monte Carlo dropout mechanism generates uncertainty estimates that enable risk-stratified deployment. In regulatory contexts, predictions with confidence < 85% can be automatically flagged for mandatory expert review, creating a human-in-the-loop system that satisfies FDA guidance on clinical decision support software.
Model Transparency: Unlike pure black-box models, the hybrid architecture allows clinicians to separately examine transformer-based predictions (capturing complex interactions) and gradient-boosting outputs (providing interpretable decision paths), facilitating regulatory audits and validation studies.

The economic implications of improved CKD detection are substantial. Early-stage CKD management through lifestyle modification and medication costs approximately $3000–5000 per patient annually, compared to $90,000+ for dialysis or $400,000 for kidney transplantation. A deployment simulation suggests that population-level screening with DeepCKD-Net could identify patients 2–3 years earlier on average, potentially preventing 15–20% of patients from progressing to end-stage renal disease requiring renal replacement therapy.

Regarding external validation, we acknowledge this as a critical limitation. The current evaluation uses a single dataset source (UCI repository), which limits generalizability assessment. Future validation efforts should include: (1) temporal validation on more recent patient cohorts to assess model drift, (2) geographic validation across different healthcare systems and populations, and (3) prospective validation in clinical practice to measure real-world impact. We are actively pursuing collaborations with multi-center nephrology networks to conduct these external validation studies.

The SHAP-based feature attribution provides clinically actionable insights beyond mere prediction. The top-ranked features (serum creatinine, blood urea, hemoglobin) align precisely with established nephrology guidelines (KDIGO 2024 [36]), building clinician trust through interpretable outputs. Notably, the model correctly identifies hypertension and diabetes as key risk factors (SHAP values: 0.142, 0.131), demonstrating its ability to capture known disease mechanisms.

The confidence scores enable three-tier clinical decision support:

High confidence (>95%): Automated screening results reported directly
Medium confidence (85–95%): Results flagged for physician review
Low confidence (<85%): Mandatory specialist consultation triggered This tiered approach optimizes clinical workflow efficiency while maintaining safety through appropriate human oversight.

5.3. Limitations

Sample Size and Generalizability: The dataset size (n = 400) limits our ability to: (1) validate performance across demographic subgroups (racial/ethnic groups, age ranges), (2) assess rare CKD etiologies, and (3) detect subtle performance degradation that might appear in larger cohorts. Cross-validation partially mitigates overfitting concerns but cannot replace external validation on independent datasets.

Dataset Characteristics:

Temporal scope: Single time-point measurements preclude longitudinal disease progression modeling
Geographic limitation: Unknown patient origin prevents geographic generalizability assessment
Missing demographics: Lack of race/ethnicity data prevents bias analysis
Class imbalance: 62.5% CKD prevalence exceeds real-world screening populations (typically 10–15%)

Technical Limitation:

Static predictions: Current model cannot incorporate temporal trends—Biomarker-only focus: Ignores imaging, genetic, and social determinants
Computational requirements: Training demands GPU resources unavailable in some settings

Clinical Deployment Barriers:

Regulatory approval: Requires prospective validation trials for FDA/CE marking
EHR integration: Needs custom interfaces for different healthcare systems
Maintenance: Requires periodic retraining to address data drift

5.4. Future Research Directions

To address current limitations and extend DeepCKD-Net toward comprehensive personalized kidney disease management, we propose a phased research roadmap:

Near-term (1–2 years): Immediate priorities focus on validation and clinical integration. External validation on large-scale multi-center datasets including NHANES and UK Biobank will assess generalizability across diverse populations. Prospective clinical trials comparing AI-assisted versus standard screening protocols will evaluate real-world impact on patient outcomes. Demographic fairness analysis will identify and mitigate potential biases across racial, ethnic, and socioeconomic groups. Pilot EHR integration studies in tertiary care centers will establish deployment workflows and identify technical barriers.

Medium-term (2–5 years): Enhanced modeling capabilities will incorporate temporal dynamics through longitudinal biomarker trend analysis, enabling disease trajectory prediction. Multi-stage CKD classification extending beyond binary prediction to five-class staging (G1–G5) will provide finer prognostic granularity. Integration of imaging modalities (renal ultrasound, CT) will enable anatomical-functional assessment fusion. Federated learning frameworks across institutions will improve model generalization while preserving patient privacy. Incorporation of social determinants including socioeconomic status and environmental exposures will address health equity concerns.

Long-term (5+ years): Precision medicine applications will predict individualized treatment responses and progression rates, enabling time-to-dialysis forecasting for personalized care planning. Multi-omics integration combining genomic, proteomic, and metabolomic data will provide mechanistic disease insights. Causal inference methodologies will distinguish modifiable risk factors from mere correlations, guiding targeted interventions. Ultimately, closed-loop monitoring systems with automated intervention triggering will enable proactive disease management.

This roadmap transitions DeepCKD-Net from a screening tool to a comprehensive clinical decision support platform for personalized nephrology car.

5.5. Limitations and Social Determinants Considerations

While DeepCKD-Net demonstrates high technical performance, several important limitations must be acknowledged regarding its implementation in diverse clinical contexts. The current framework relies exclusively on biomedical features and does not incorporate social determinants of health (SDOH), which are known to significantly influence CKD development and progression. Key missing factors include:

Socioeconomic Status: Income level, education, occupation, and health insurance status affect access to preventive care and treatment adherence.
Environmental Exposures: Geographic location-specific factors such as water quality, air pollution, industrial exposures, and agricultural pesticide use can contribute to kidney damage, particularly in rural and agricultural communities.
Healthcare Access Barriers: Distance to healthcare facilities, availability of nephrology specialists, and cultural/linguistic barriers to care utilization are not captured in biomarker-only models.
Population Representativeness: The UCI dataset lacks detailed demographic metadata regarding racial/ethnic diversity, geographic distribution, and representation of vulnerable populations including Indigenous communities, rural populations, and socioeconomically disadvantaged groups.

The absence of SDOH variables means that DeepCKD-Net may underestimate risk in populations where non-biomedical factors drive disease progression, or conversely, may overestimate risk in cases where biomarker abnormalities reflect transient social circumstances (e.g., acute food insecurity affecting nutritional biomarkers) rather than true chronic disease.

To address these limitations in future work, we propose integrating structured SDOH data from EHRs, including ICD-10 Z-codes (social determinants diagnoses), census tract-level socioeconomic indicators, and environmental exposure databases. A hybrid modeling approach that combines biomedical predictions with social risk stratification could provide more comprehensive and equitable CKD risk assessment across diverse populations.

Environmental Risk Factors: Reference [21] highlights the role of organochlorine pesticides in inducing inflammatory responses that may contribute to kidney damage. This is particularly relevant for CKD modeling in agricultural populations where occupational exposures are common. Future iterations of DeepCKD-Net should consider incorporating: geographic exposure data (agricultural regions, industrial zones), occupational history (farming, manufacturing, chemical handling), water quality metrics (contaminant levels in drinking water sources), and air pollution indices (particulate matter, industrial emissions). The absence of environmental exposure data in current biomarker-only models may lead to underestimation of CKD risk in environmentally exposed populations, or misattribution of kidney dysfunction solely to metabolic/systemic factors when environmental toxins may be contributing. Integrating environmental health records with clinical biomarkers represents an important direction for more comprehensive and equitable CKD risk assessment.

Biomarker Availability and Temporal Variability Challenges

The practical deployment of DeepCKD-Net faces two interconnected challenges: biomarker panel availability and temporal biomarker fluctuations.

Resource-Limited Settings Adaptation:

Rural and remote healthcare facilities frequently cannot provide the complete 26-feature biomarker panel due to limited laboratory capabilities. To address this, we conducted a robustness analysis identifying the minimal feature subset maintaining clinical utility. Our experiments demonstrate that predictions using only the 10 most accessible features (serum creatinine, blood urea, hemoglobin, albumin, specific gravity, hypertension status, diabetes status, age, blood pressure, blood glucose) achieve 95.3% accuracy—a 3.4% degradation from full-panel performance but still clinically valuable.

Furthermore, we implemented a tiered prediction system:

-: Tier 1 (Basic): 10 features available in most primary care settings → 95.3% accuracy
-: Tier 2 (Standard): 18 features available in district hospitals → 97.2% accuracy
-: Tier 3 (Complete): All 26 features in tertiary centers → 98.7% accuracy

This tiered approach enables equitable deployment across healthcare settings while maintaining transparent performance expectations.

Temporal Biomarker Stability:

A critical limitation of static prediction models is their inability to distinguish chronic pathological changes from acute transient variations. Common confounding scenarios include:

Hydration Effects: Dehydration can temporarily elevate serum creatinine and blood urea nitrogen without reflecting true kidney function decline. Conversely, overhydration dilutes biomarker concentrations, potentially masking disease.
Acute Illness: Infections, especially sepsis, can cause acute kidney injury superimposed on chronic disease, substantially altering biomarker profiles.
Medication Effects: NSAIDs, ACE inhibitors, and contrast agents can acutely reduce GFR; diuretics affect electrolyte and volume status; antibiotics may cause interstitial nephritis.
Dietary and Exercise Factors: High-protein diets elevate urea; intense exercise causes rhabdomyolysis affecting creatinine; fasting alters glucose and electrolyte patterns.

To mitigate these confounding factors, clinical deployment protocols should include:

-: Temporal context requirements: Flagging predictions as “potentially unreliable” if blood samples were drawn during documented acute illness, recent hospitalization, or within 48 h of nephrotoxic medication administration
-: Longitudinal validation: Requiring confirmatory biomarker measurements 2–4 weeks apart for positive CKD predictions in asymptomatic patients
-: Clinical integration: Incorporating physician review of patient history, physical examination findings, and clinical context before finalizing diagnosis

Future model iterations could incorporate temporal patterns by training on longitudinal patient data with multiple biomarker measurements over time, enabling the system to distinguish acute fluctuations from chronic trends. However, such datasets are currently not publicly available in sufficient quantity for robust model development.

6. Conclusions

This paper presented DeepCKD-Net, a novel hybrid deep learning framework that advances the state-of-the-art in chronic kidney disease prediction through innovative integration of hierarchical transformers and gradient-boosting ensembles. The framework achieved remarkable performance with 98.7% accuracy and 0.993 AUC on the UCI CKD dataset, significantly outperforming existing methods while maintaining clinical interpretability through integrated SHAP analysis. The confidence-aware prediction mechanism provides crucial uncertainty quantification for clinical decision-making, while the adaptive feature fusion effectively combines complementary learning paradigms. Our comprehensive evaluation demonstrated the model’s robustness to missing data and noise, reflecting real-world clinical conditions. The successful integration of deep learning sophistication with ensemble reliability, coupled with maintained interpretability, positions DeepCKD-Net as a valuable tool for early CKD detection and intervention in clinical practice, potentially improving patient outcomes through timely diagnosis and risk stratification.

Author Contributions

Conceptualization, M.A.G. and S.A.; Methodology, M.A.G. and S.A.; Software, M.A.G. and S.A.; Validation, M.A.G. and S.A.; Formal analysis, M.A.G. and S.A.; Investigation, M.A.G. and S.A.; Resources, M.A.G. and S.A.; Data curation, M.A.G. and S.A.; Writing—original draft, M.A.G. and S.A.; Writing—review & editing, M.A.G. and S.A.; Visualization, M.A.G. and S.A.; Supervision, M.A.G. and S.A.; Project administration, M.A.G. and S.A.; Funding acquisition, M.A.G. and S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This research study solely involves the use of historical datasets. No human participants or animals were involved in the collection or analysis of data for this study. As a result, ethical approval was not required.

Informed Consent Statement

Since this research study did not involve human participants or animals, there were no informed consent procedures conducted. The data used for analysis was publicly available data that does not require informed consent. The authors affirm their commitment to conducting research in accordance with the highest ethical standards and ensuring the accuracy, transparency, and reliability of the presented findings.

Data Availability Statement

The author used data to support the findings of this study that is included in this article and can be accssed publicly at: https://archive.ics.uci.edu/dataset/336/chronic+kidney+disease (accessed on 15 December 2025). The implementation code is publicly available at: https://github.com/Saleh-Alyahyan/DeepCKD (accessed on 15 December 2025).

Acknowledgments

The author would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this research paper. The research was conducted in an unbiased manner, and there are no financial or personal relationships that could have influenced the findings or interpretations presented herein.

References

Akter, S.; Ahmed, M.; Al Imran, A.; Habib, A.; Haque, R.U.; Rahman, M.S.; Hasan, M.R.; Mahjabeen, S. CKD.Net: A novel deep learning hybrid model for effective, real-time, automated screening tool towards prediction of multi stages of CKD along with eGFR. Expert Syst. Appl. 2023, 223, 119851. [Google Scholar] [CrossRef]
Chandra Kumar, V.; Kalpana, R. Exploring Machine Learning Techniques for Enhanced Chronic Kidney Disease Diagnosis: A Comprehensive Survey. In Fuzzy Logic in Decision Making and Signal Processing; Springer: Cham, Switzerland, 2025. [Google Scholar] [CrossRef]
Apiromrak, W.; Toh, C.; Sangthawan, P.; Ingviya, T. Prediction chronic kidney disease progression in diabetic patients using machine learning models. In Proceedings of the Software Engineering Conference, Phuket, Thailand, 19–22 June 2024. [Google Scholar] [CrossRef]
Jose, J.; Fredrik, E.J. An Advanced GRU-CapsNet Framework for Accurate Chronic Kidney Disease Detection and Prediction. Int. J. Intell. Eng. Syst. 2025, 18, 828–844. [Google Scholar] [CrossRef]
Lazaros, K.; Adam, S.; Krokidis, M.G.; Exarchos, T.; Vlamos, P.; Vrahatis, A.G. Non-invasive biomarkers in the era of big data and machine learning. Sensors 2025, 25, 1396. [Google Scholar] [CrossRef]
Devasena, T.; Janani, M.; Karthick, R. Accurate Data Sampling Methods for Medical Data–Survey. In Proceedings of the Conference on Mobile Computing and Communications, Lalitpur, Nepal, 18–19 January 2024. [Google Scholar] [CrossRef]
Al-Zoghby, A.; Ebada, A.I.; Saleh, A.; Abdelhay, M.; Awad, W. A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics. Comput. Mater. Contin. 2025, 84, 4155. [Google Scholar] [CrossRef]
Upadhyay, M.N. Efficient Machine Learning Tree-Based Models for Recognition of Chronic Diseases Using Big Data Health Records. J. Glob. Res. Multidiscip. Stud. 2025, 1, 13–19. [Google Scholar] [CrossRef]
Liu, Y.; Wang, B. Advanced applications in chronic disease monitoring using IoT mobile sensing device data, machine learning algorithms and frame theory: A systematic review. Front. Public Health 2025, 13, 1510456. [Google Scholar] [CrossRef]
Fang, P.; Wu, Y.; He, Y.; Li, H.; Guan, Z.; Wang, X.; Chen, T.; Shen, J. Research progress on AI-assisted screening and prediction of systemic diseases based on retinal images. Vis. Comput. 2025, 41, 9509–9537. [Google Scholar] [CrossRef]
Zhang, Y.; Xu, J.; Zhang, C.; Zhang, X.; Yuan, X.; Ni, W.; Zhang, H.; Zheng, Y.; Zhao, Z. Community screening for dementia among older adults in China: A machine learning-based strategy. BMC Public Health 2024, 24, 1206. [Google Scholar] [CrossRef]
Liu, X.; Tan, H.; Wang, W.; Chen, Z. Deep learning based retinal vessel segmentation and hypertensive retinopathy quantification using heterogeneous features cross-attention neural network. Front. Med. 2024, 11, 1377479. [Google Scholar] [CrossRef]
Khanna, M.; Singh, L.K.; Thawkar, S.; Goyal, M. Deep learning based computer-aided automatic prediction and grading system for diabetic retinopathy. Multimed. Tools Appl. 2023, 82, 39255–39302. [Google Scholar] [CrossRef]
Yang, Q.; Bee, Y.M.; Lim, C.C.; Sabanayagam, C.; Cheung, C.Y.L.; Wong, T.Y.; Ting, D.S.W.; Lim, L.L.; Li, H.T.; He, M.; et al. Use of artificial intelligence with retinal imaging in screening for diabetes-associated complications: Systematic review. Lancet 2025, 81, 103089. [Google Scholar] [CrossRef] [PubMed]
Dahiya, N.; Prakash, D.; Kundu, S.; Kuttan, S.R.; Suwalka, I.; Ayadi, M.; Dubale, M.; Hashmi, A. Optimised RFO tuned RF-DETR model for precision urine microscopy for renal and systemic disease diagnosis. Sci. Rep. 2025, 15, 25842. [Google Scholar] [CrossRef]
Verma, N.; Sharma, T.; Kaur, B. Explanation of Machine Learning Algorithms Used in Disease Detection, Such as Decision Trees and Neural Networks. In AI in Disease Detection: Advancements and Application; Wiley Online Library: Hoboken, NJ, USA, 2025. [Google Scholar] [CrossRef]
Maqsood, F.; Wang, Z.; Ali, M.M.; Qiu, B.; Mahmood, T.; Sarwar, R. An efficient enhanced feature framework for grading of renal cell carcinoma using Histopathological Images. Appl. Intell. 2025, 55, 196. [Google Scholar] [CrossRef]
Liu, Y.; Wang, T.; He, X.; Wang, W.; He, Q.; Jin, J. Renal Biopsy Pathological Tissue Segmentation: A Comprehensive Review and Experimental Analysis. IEEE Access 2025, 13, 58008–58024. [Google Scholar] [CrossRef]
Sufian, M.A.; Hamzi, W.; Hamzi, B.; Sagar, A.S.; Rahman, M.; Varadarajan, J.; Sufian, M.A.; Hamzi, W.; Hamzi, B.; Sagar, A.S.; et al. Innovative machine learning strategies for early detection and prevention of pregnancy loss: The vitamin D connection and gestational health. Diagnostics 2024, 14, 920. [Google Scholar] [CrossRef]
Li, X.; Zhang, L.; Yang, J.; Teng, F. Role of artificial intelligence in medical image analysis: A review of current trends and future directions. J. Med. Biol. Eng. 2024, 44, 231–243. [Google Scholar] [CrossRef]
Tan, J.; Ma, M.; Shen, X.; Xia, Y.; Qin, W. Potential lethality of organochlorine pesticides: Inducing fatality through inflammatory responses in the organism. Ecotoxicol. Environ. Saf. 2024, 279, 116508. [Google Scholar] [CrossRef]
Bhimavarapu, U. Bias in AI-Driven Diabetes Prediction Models: Challenges, Impacts, and Mitigation Strategies. In AI-Powered Systems for Healthcare Diagnostics and Treatment; IGI Global: Hershey, PA, USA, 2025. [Google Scholar] [CrossRef]
Yousefzamani, M.; Babapour Mofrad, F. Deep learning without borders: Recent advances in ultrasound image classification for liver diseases diagnosis. Expert Rev. Med. Devices 2025, 22, 827–843. [Google Scholar] [CrossRef]
Gao, Y.; Lu, H.; Zhou, H.; Tan, J. Exploring the impact of polychlorinated biphenyls on comorbidity and potential mitigation strategies. Front. Public Health 2024, 12, 1474994. [Google Scholar] [CrossRef]
Han, B.; Wang, Y.; Sun, X.; Li, H.; Lu, J.; Zhou, J.; Yu, X. HGMLA: A multi-task learning model for assessment of HbA1c and GA levels using short-term CGM sensor data. IEEE Sens. J. 2024, 24, 33633–33646. [Google Scholar] [CrossRef]
Ghorishi, A.R.; Ogunfuwa, F.O.; Ghaddar, T.M.; Kandah, M.N.; Smith, B.W.; Ta, Q.; Alayon, A.; Amundson, P.K. Narrative review of open source, proprietary, and experimental artificial intelligence algorithms in radiology. J. Med. Artif. Intell. 2023, 6. Available online: https://jmai.amegroups.org/article/view/7717 (accessed on 1 November 2025). [CrossRef]
Yagi, M.; Yamanouchi, K.; Fujita, N.; Funao, H.; Ebata, S. Revolutionizing spinal care: Current applications and future directions of artificial intelligence and machine learning. J. Clin. Med. 2023, 12, 4188. [Google Scholar] [CrossRef]
Ahsan, M.; Naz, S.; Ehsan, H.; Gul, S.; Hadi, F.; Abdelhamid, A.; Khalifa, F. Multi-resolution multi-path shallow deep network for Diabetic Foot Ulcers identification. Biomed. Signal Process. Control 2025, 111, 108321. [Google Scholar] [CrossRef]
Zhao, Z.; Hu, Y.; Xu, L.X.; Sun, J. Advancements in deep learning for image-guided tumor ablation therapies: A comprehensive review. Prog. Biomed. Eng. 2025, 7, 042005. [Google Scholar] [CrossRef] [PubMed]
Gao, W.; Deng, Z.; Gong, Z.; Jiang, Z.; Ma, L. AI-driven prediction of insulin resistance in non-diabetic populations using minimal invasive tests: Comparing models and criteria. Diabetol. Metab. Syndr. 2025, 17, 338. [Google Scholar] [CrossRef]
Shen, J.; Liu, N.; Sun, H.; Wu, S.; Liang, Z.; Han, L.; Zhang, Y.; Li, D. Lightweight Semantic Feature Extraction Model with Direction Awareness for Aerial Traffic Object Detection. IEEE Trans. Intell. Transp. Syst. 2025, 1–18. [Google Scholar] [CrossRef]
Shen, J.; Zhou, W.; Liu, N.; Sun, H.; Li, D.; Zhang, Y. An Anchor-Free Lightweight Deep Convolutional Network for Vehicle Detection in Aerial Images. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24330–24342. [Google Scholar] [CrossRef]
Shen, J.; Liu, N.; Sun, H.; Li, D.; Zhang, Y. An Instrument Indication Acquisition Algorithm Based on Lightweight Deep Convolutional Neural Network and Hybrid Attention Fine-Grained Features. IEEE Trans. Instrum. Meas. 2024, 73, 5008516. [Google Scholar] [CrossRef]
Shen, J.; Liu, N.; Xu, C.; Sun, H.; Xiao, Y.; Li, D.; Zhang, Y. Finger Vein Recognition Algorithm Based on Lightweight Deep Convolutional Neural Network. IEEE Trans. Instrum. Meas. 2022, 71, 5000413. [Google Scholar] [CrossRef]
Huang, X.; Khetan, A.; Cvitkovic, M.; Karnin, Z. TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv 2020, arXiv:2012.06678. [Google Scholar] [CrossRef]
Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2024 Clinical Practice Guideline for the Evaluation and Management of Chronic Kidney Disease. Kidney Int. 2024, 105, S117–S314. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Evolution of CKD prediction methodologies: (a) Traditional clinical assessment, (b) classical machine learning approaches, (c) deep learning models, and (d) our proposed DeepCKD-Net framework combining transformer architectures with ensemble learning.

Figure 2. DeepCKD-Net architecture illustrating the input preprocessing and feature engineering pipeline, hierarchical transformer encoder with multi-head attention, gradient-boosting ensemble module, adaptive fusion mechanism, confidence-aware prediction layer, and SHAP-based interpretability module.

Figure 3. Training and validation loss curves across 200 epochs. Model trained using Adam optimizer (η₀ = 0.001), adaptive decay per (Equation (24)). Combined loss (Equation (19)): cross-entropy (λ₁ = 0.4), focal loss (λ₂ = 0.3), regularization (λ₃ = 0.2), consistency (λ₄ = 0.1). Convergence: (1) Rapid phase (epochs 1–20): loss 0.82→0.15, (2) Refinement (epochs 20–100): loss 0.15→0.06, (3) Plateau (epoch 120+): training 0.048, validation 0.052, (4) Minimal overfitting gap (0.004). Best model at epoch 156 (F1 = 98.6%) via early stopping (patience = 10). Shaded regions: ±1 SD across three runs, demonstrating reproducible convergence.

Figure 4. Accuracy evolution for training and validation sets, demonstrating consistent improvement and minimal overfitting throughout the training process.

Figure 5. The confusion matrix for DeepCKD-Net on the test set, showing high true positive and true negative rates with minimal false predictions.

Figure 6. ROC curves comparison showing DeepCKD-Net achieving an AUC of 0.993, significantly outperforming all baseline methods across the entire operating range (Zhao et al. 2025 [29], Yagi et al. 2023 [27], Gao et al. 2024 [24], Li et al. 2024 [20], Liu et al. 2025 [18], Maqsood et al. 2025 [17], Yang et al. 2025 [14], Khanna et al. 2023 [13]).

Figure 7. DeepCKD-Net performance across clinical scenarios: (a) Early-stage CKD with subtle biomarker changes, (b) advanced CKD with multiple abnormalities, (c) borderline cases with ambiguous indicators, and (d) feature importance heatmap.

Figure 8. Stratified tenfold cross-validation results for DeepCKD-Net. Stratified sampling preserved class distribution (62.5% CKD-positive, 37.5% CKD-negative) in each fold. Results aggregated as mean ± standard deviation across all 10 folds: accuracy 98.5% (±0.8%), precision 98.3% (±0.9%), recall 98.6% (±0.7%), F1-score 98.4% (±0.8%), AUC 0.991 (±0.004). Low variance across folds confirms model stability and generalizability across different data partitions.

Figure 9. Robustness analysis showing performance degradation under (a) Gaussian noise, (b) missing data percentages, (c) label noise, and (d) feature corruption (Zhao et al. 2025 [29]; Yagi et al. 2023 [27]; Gao et al. 2024 [24]; Maqsood et al. 2025 [17]; Liu et al. 2025 [18]).

Table 1. Comparative analysis of recent CKD prediction approaches.

Approach Category	Representative Methods	Key Strengths	Primary Limitations	Performance Range (Accuracy)
Classical ML	SVM, Random Forest, Logistic Regression	• Fast training/inference • Low computational requirements • Interpretable (RF, LR) • Well-established clinical validation	• Limited capacity for complex patterns • Manual feature engineering required • Struggles with high-dimensional interactions • Poor performance on subtle early-stage CKD	85–91%
Gradient Boosting	XGBoost, LightGBM, CatBoost	• High accuracy on tabular data • Built-in feature importance • Handles missing data well • Robust to outliers	• Requires careful hyperparameter tuning • Sequential training (slow for large datasets) • Limited ability to model long-range dependencies • Prone to overfitting on small datasets	88–93%
Deep Neural Networks	Multi-layer Perceptron, 1D-CNN	• Can learn complex non-linear patterns • Automatic feature extraction • Scalable to large datasets	• Requires large training data • Black-box nature (low interpretability) • Prone to overfitting on clinical datasets • High computational cost	90–94%
Recurrent Networks	LSTM, GRU, Bidirectional LSTM	• Captures temporal sequences • Models disease progression over time • Suitable for longitudinal EHR data	• Requires sequential data (often unavailable) • Vanishing gradient problems • High training complexity • Limited interpretability	91–95%
Attention Mechanisms	Transformer, TabTransformer, Attention-Net	• Models feature interactions explicitly • Multi-head attention captures diverse patterns • State-of-the-art on many tasks • Positional encoding handles feature ordering	• Computationally expensive • Requires substantial data for training • Attention weights provide limited clinical interpretability • Overfitting risk on small datasets	93–96%
Hybrid Ensemble	CNN-XGBoost, Ensemble-Net, Stacking	• Combines complementary model strengths • Improved robustness through diversity • Better generalization than single models	• Increased complexity • Longer training time • Difficult hyperparameter optimization • Limited systematic fusion strategies	94–97%
Graph-Based	GNN, Graph-CKD	• Models patient similarity networks • Leverages population-level patterns • Incorporates relational information	• Requires graph construction (subjective) • Limited clinical interpretability • Computationally expensive for large graphs • Unclear optimal graph topology	94–96%
Multi-Modal Fusion	Multi-Modal, Vision-CKD	• Integrates diverse data types • Comprehensive patient representation • Captures complementary information	• Requires multiple data modalities (often unavailable) • Complex preprocessing pipelines • Heterogeneous missing data challenges • Difficult to validate data source contributions	95–97%
DeepCKD-Net (Ours)	Transformer + Gradient Boosting + Adaptive Fusion	• Synergistic hybrid architecture • High accuracy with interpretability • Confidence-aware predictions • Robust to missing data • Real-time inference capability	• Requires GPU for training • Higher parameter count than classical methods • Needs external validation • Limited to biomarker-only features	98.7%

Table 2. Dataset characteristics and feature distribution.

Feature Category	Count	Missing (%)	Mean	Std Dev
Demographic	2	3.2	–	–
Blood Tests	11	8.7	Varies	Varies
Urine Tests	7	5.4	Varies	Varies
Clinical Signs	6	2.1	–	–
Total Features	26	4.8	–	–
Total Samples	400	–	–	–
CKD Positive	250 (62.5%)	–	–	–
CKD Negative	150 (37.5%)	–	–	–

Table 3. Performance and computational comparison of models.

Method	Accuracy	Precision	Recall	F1 Score	AUC	Specificity	MCC	Training Time (min)	Inference (ms)	Parameters (M)	Memory (GB)
SVM RBF	87.5	85.3	88.9	87.1	0.912	85.2	0.742	2.3	2.3	0.08	0.1
Random Forest	89.2	87.8	90.1	88.9	0.925	87.6	0.778	3.5	5.7	0.5	0.2
XGBoost	91.3	90.2	92.0	91.1	0.938	90.1	0.821	8.7	4.2	1.2	0.3
1D CNN	93.4	92.5	93.8	93.1	0.951	92.7	0.865	15.2	8.9	3.4	0.8
LSTM	92.8	91.7	93.5	92.6	0.947	91.8	0.853	24.6	12.4	5.8	1.2
TabTransformer	94.2	93.4	94.7	94.0	0.958	93.5	0.881	32.1	15.6	8.2	1.6
DeepCKD Net (Ours)	98.7	98.4	98.9	98.6	0.993	98.3	0.973	45.3	16.8	12.6	2.1

Table 4. Ablation study: Component contribution analysis.

Configuration	Accuracy	F1-Score	AUC	∆ Acc
Full DeepCKD-Net	98.7	98.6	0.993	–
w/o Transformer	94.2	94.0	0.958	−4.5
w/o Boosting	95.8	95.7	0.973	−2.9
w/o Fusion	96.3	96.2	0.977	−2.4
w/o Confidence	97.1	97.0	0.984	−1.6
w/o SHAP	98.7	98.6	0.993	0.0
Single Attention	96.9	96.8	0.981	−1.8
No Preprocessing	91.3	91.1	0.938	−7.4

Table 5. Top 10 most important features via SHAP analysis.

Rank	Feature	SHAP Value	Correlation	Category
1	Serum Creatinine	0.342	0.89	Blood Test
2	Blood Urea	0.287	0.85	Blood Test
3	Hemoglobin	0.231	−0.78	Blood Test
4	Specific Gravity	0.198	−0.71	Urine Test
5	Albumin	0.176	0.68	Urine Test
6	Packed Cell Volume	0.154	−0.65	Blood Test
7	Hypertension	0.142	0.62	Clinical Sign
8	Diabetes Mellitus	0.131	0.59	Clinical Sign
9	Red Blood Cells	0.119	−0.56	Blood Test
10	Sodium	0.108	−0.52	Blood Test

Table 6. Computational efficiency comparison.

Method	Parameters (M)	FLOPs (G)	Memory (GB)	Training (min)	Inference (ms)
SVM-RBF	0.08	0.01	0.1	2.3	2.3
XGBoost	1.2	0.15	0.3	8.7	4.2
1D-CNN	3.4	0.42	0.8	15.2	8.9
LSTM	5.8	0.73	1.2	24.6	12.4
TabTransformer	8.2	1.05	1.6	32.1	15.6
DeepCKD-Net	12.6	1.58	2.1	45.3	16.8

Table 7. Statistical significance testing (p-values).

Comparison	Accuracy (p-Value)	F1-Score (p-Value)	AUC (p-Value)	Significant
DeepCKD vs. XGBoost	<0.001	<0.001	<0.001	Yes
DeepCKD vs. 1D-CNN	<0.001	<0.001	<0.001	Yes
DeepCKD vs. TabTransformer	<0.001	<0.001	<0.001	Yes
DeepCKD vs. Ensemble-Net	0.002	0.003	0.001	Yes
DeepCKD vs. Multi-Modal	0.018	0.021	0.015	Yes

Table 8. Optimal hyperparameter configuration.

Hyperparameter	Value	Search Range
Learning Rate	0.001	[0.0001, 0.01]
Batch Size	32	[16, 64]
Transformer Layers	6	[2, 12]
Attention Heads	8	[4, 16]
Hidden Dimension	512	[256, 1024]
Dropout Rate	0.3	[0.1, 0.5]
Boosting Trees	100	[50, 200]
Tree Depth	6	[3, 10]
Weight Decay	0.0001	[1 × 10⁻⁵, 1 × 10⁻³]
Focal Loss γ	2.0	[1.0, 3.0]

Table 9. Extended comparison with recent state-of-the-art methods (2023–2025).

Method	Year	Accuracy	Sensitivity	Specificity	PPV	NPV	Architecture	Key Innovation
BiLSTM-Attention [1]	2023	94.8	95.2	94.3	94.5	95.0	Bidirectional LSTM	Temporal modeling
GraphCKD [2]	2023	95.3	95.7	94.8	95.1	95.5	Graph Neural Network	Patient similarity graphs
FedCKD [3]	2023	93.9	94.3	93.4	93.7	94.1	Federated Learning	Privacy preservation
AutoML-CKD [4]	2024	95.6	96.0	95.1	95.4	95.8	Neural Architecture Search	Automated design
Vision-CKD [5]	2024	94.2	94.6	93.7	94.0	94.4	Vision Transformer	Patch-based attention
Quantum-CKD [6]	2024	92.8	93.2	92.3	92.6	93.0	Quantum Circuit	Quantum advantage
CausalCKD [7]	2024	96.1	96.5	95.6	95.9	96.3	Causal Inference	Causal relationships
DiffusionCKD [8]	2025	95.9	96.3	95.4	95.7	96.1	Diffusion Models	Generative augmentation
PromptCKD [9]	2025	96.4	96.8	95.9	96.2	96.6	Prompt Learning	Few-shot adaptation
LLM-CKD [10]	2025	96.7	97.1	96.2	96.5	96.9	Large Language Model	Clinical text integration
DeepCKD-Net	2025	98.7	98.9	98.3	98.4	98.8	Hybrid Transformer-Boost	Multi-paradigm fusion

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al Ghamdi, M.; Alyahyan, S. Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net. Appl. Sci. 2026, 16, 3024. https://doi.org/10.3390/app16063024

AMA Style

Al Ghamdi M, Alyahyan S. Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net. Applied Sciences. 2026; 16(6):3024. https://doi.org/10.3390/app16063024

Chicago/Turabian Style

Al Ghamdi, Mostafa, and Saleh Alyahyan. 2026. "Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net" Applied Sciences 16, no. 6: 3024. https://doi.org/10.3390/app16063024

APA Style

Al Ghamdi, M., & Alyahyan, S. (2026). Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net. Applied Sciences, 16(6), 3024. https://doi.org/10.3390/app16063024

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Chronic Kidney Disease Detection: A Hybrid Deep Learning Framework Using Clinical Biomarkers and Ensemble Feature Engineering with DeepCKD-Net

Abstract

1. Introduction

2. Related Work

2.1. Classical Machine Learning Approaches for CKD Detection

2.2. Deep Learning Innovations in Medical Diagnostics

2.3. Hybrid and Ensemble Learning Strategies

2.4. Interpretability and Clinical Integration

2.5. Research Gap and Motivation

3. Proposed Methodology

3.1. System Overview

3.2. Feature Engineering and Preprocessing

3.3. Hierarchical Transformer Encoder

3.4. Gradient-Boosting Ensemble Module

3.5. Adaptive Feature Fusion

3.6. Confidence-Aware Prediction

3.7. Loss Function Formulation

3.8. Interpretability Module

3.9. Optimization Strategy

4. Results and Evaluation

4.1. Experimental Setup

4.2. Performance Metrics

4.3. Classification Performance

4.4. ROC Analysis and Model Calibration

4.5. Feature Importance Analysis

4.6. Computational Efficiency

4.7. Cross-Validation Results

4.8. Robustness Analysis

5. Discussion

5.1. Real-World Clinical Deployment

5.2. Regulatory Compliance and Explainability Requirements

5.3. Limitations

5.4. Future Research Directions

5.5. Limitations and Social Determinants Considerations

Biomarker Availability and Temporal Variability Challenges

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI