Beyond Glycemic Control: Precision Medicine in Type 2 Diabetes Using Multi-Output Explainable Artificial Intelligence for Personalized SGLT2 and DPP-4 Therapy Selection

Ihalapathirana, Anusha; Lavikainen, Piia; Siirtola, Pekka; Tamminen, Satu; Chandra, Gunjan; Laatikainen, Tiina; Martikainen, Janne; Röning, Juha

doi:10.3390/ai7060183

Open AccessArticle

Beyond Glycemic Control: Precision Medicine in Type 2 Diabetes Using Multi-Output Explainable Artificial Intelligence for Personalized SGLT2 and DPP-4 Therapy Selection

by

Anusha Ihalapathirana

^1,*

,

Piia Lavikainen

²

,

Pekka Siirtola

¹

,

Satu Tamminen

¹

,

Gunjan Chandra

¹

,

Tiina Laatikainen

³

,

Janne Martikainen

²

and

Juha Röning

^1,4

¹

Biomimetics and Intelligent Systems Group, University of Oulu, 90014 Oulu, Finland

²

School of Pharmacy, University of Eastern Finland, 70211 Kuopio, Finland

³

Institute of Public Health and Clinical Nutrition, School of Medicine, Faculty of Health Sciences, University of Eastern Finland, 70211 Kuopio, Finland

⁴

Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune 412115, India

^*

Author to whom correspondence should be addressed.

AI 2026, 7(6), 183; https://doi.org/10.3390/ai7060183

Submission received: 15 April 2026 / Revised: 12 May 2026 / Accepted: 16 May 2026 / Published: 22 May 2026

(This article belongs to the Special Issue Digital Health: AI-Driven Personalized Healthcare and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Traditional treatment strategies for Type 2 diabetes (T2D) adopt a “one-size-fits-all” approach, limiting individual effectiveness. This study presents an explainable, data-driven framework for multi-treatment and single-treatment selection of SGLT2 inhibitors (SGLT2-i) and DPP-4 inhibitors (DPP4-i) based on patient-specific health characteristics. Our approach evaluates treatment effectiveness across four outcomes—glycosylated hemoglobin (HbA1c), low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and body mass index (BMI)—to enable individualized treatment recommendations. The multi-treatment model, based on multi-output regression, achieved an R² score of 0.44 and an RMSE of 5.58, identifying benefit subgroups for SGLT2-i and DPP4-i across all outcomes. Integrated with SHapley Additive exPlanations (SHAP) analysis, the model offers insights into the factors influencing treatment effects. The single-treatment selection algorithm achieved an accuracy of 0.47 and an F1 score of 0.46, showing a higher average treatment effect with SGLT2-i on all outcomes, notably in the reduction in HbA1c, LDL, and BMI and a modest increase in HDL. While DPP4-i demonstrated beneficial effects on HbA1c, LDL, and HDL, it was associated with an increase in BMI. These findings highlight the benefits of a multi-faceted, patient-centered precision medicine approach for T2D management, enabling treatment strategies that address individual health needs beyond HbA1c.

Keywords:

precision medicine; data-driven treatment optimization; explainable AI; multi-treatment selection; treat-to-benefit; type 2 diabetes

1. Introduction

Type 2 diabetes (T2D) is a prevalent chronic disease that affects millions of people worldwide and poses significant challenges to healthcare systems globally. Many guidelines [1,2,3] suggest the use of metformin as the first-line treatment for T2D given its availability, low cost, and safety profile [4,5]. Subsequently, sodium-glucose cotransporter 2 inhibitors (SGLT2-i) and dipeptidyl peptidase-4 inhibitors (DPP4-i) are two prominent second-line treatment options for individuals with T2D, with both exhibiting comparable efficacy in lowering glucose levels [6]. However, they offer distinct advantages. SGLT2-i are on average associated with weight loss, reduction in blood pressure, lower risk of hypoglycemia, and long-term cardiovascular benefits, whereas DPP4-i are weight-neutral, do not increase hypoglycemia risk, and are well tolerated and safe to use in patients with advanced renal disease [7]. Therefore, identifying individuals who are more likely to experience a higher relative benefit from one drug class over another is important.

The current practice of a “one-size-fits-all” approach considers an average patient, neglecting the heterogeneity among patients, and failing to benefit everyone. In contrast, precision medicine aims to optimize healthcare quality by customizing the healthcare process to consider the unique characteristics of each individual [8], including individual variability in genes, environment, and lifestyle. Recent studies have explored precision medicine approaches in T2D by modeling differential treatment responses [9,10,11]. For example, Dennis et al. [10] developed a treatment selection algorithm using routine clinical features to predict HbA1c response between SGLT2-i and DPP4-i therapies, while Venkatasubramaniam et al. [11] compared statistical and machine learning approaches for individualized treatment selection. However, these studies primarily focus on HbA1c as a single outcome and rely on limited feature sets, without considering broader metabolic indicators or multi-dimensional treatment effects.

T2D is associated with multiple metabolic factors beyond glycemic control, including body mass index (BMI), low-density lipoprotein (LDL) cholesterol, and high-density lipoprotein (HDL) cholesterol [12]. Overweight and obesity are common risk factors for T2D [13], and T2D is also associated with changes in the amount of circulating lipids, including elevated triglycerides, increased LDL, and decreased HDL [14]. Integrating these health parameters into treatment selection expands the range to consider additional factors, which could lead to more personalized and efficient strategies for managing T2D. An optimal treatment selection should account for individual characteristics and multiple effectiveness indicators, such as HbA1c, LDL cholesterol, HDL cholesterol, and BMI.

This study aims to develop an explainable artificial intelligence (XAI) multi-output model-based treatment selection method to identify the optimal treatment approach between DPP4-i and SGLT2-i for patients with T2D. The multi-output regression model simultaneously predicts four health indicator responses—HbA1c, LDL, HDL, and BMI—for two therapies using a single set of predictors. These predicted outcomes are used to evaluate treatment effectiveness at the individual level and are subsequently aggregated to derive a single treatment recommendation tailored to each patient’s health profile. This framework supports individualized decision-making and aims to improve the precision of diabetes management, while its explainability provides transparency into the factors influencing treatment selection.

This study makes several key contributions. First, we introduce a multi-output modeling framework that captures correlated responses across multiple outcomes, enabling a more comprehensive evaluation of treatment effects beyond single-outcome prediction. Second, we propose a dynamic trade-off resolution strategy that integrates these predictions into a single personalized recommendation based on individual patient profiles. Third, by incorporating SHAP-based explainability, the framework provides interpretable insights into the drivers of treatment decisions, supporting clinically actionable decision-making.

2. Materials and Methods

2.1. Study Design and Dataset

Patients diagnosed with T2D by the end of 2021 were identified with ICD-10 code E11 from the regional electronic health records (EHR) of The Joint Municipal Authority for North Karelia Social and Health Services—Siun sote, Finland. The collected information included patient-level records from both primary and specialized healthcare, including diagnostic and laboratory data spanning from 2012 to 2022. Additionally, medication prescriptions from 2012 to 2022 were obtained.

Initiations of antidiabetic medications were identified from the medication prescription data, focusing on the initiation of DPP4-i and SGLT2-i therapies. The date of initiation of an antidiabetic medication was defined as a baseline. Patients who initiated an antidiabetic medication between 2013 and 2021 and did not have a prescription in 2012 were considered new users and included in the analysis.

To be eligible for this study, the prescription of antidiabetic medications had to be in effect for at least 365 days. Patients who initiated more than one antidiabetic medication simultaneously were excluded. In addition, included patients were not allowed to start with another antidiabetic medication within 365 days of the initiation, and only prescriptions started at least 365 days after the previous antidiabetic medication prescription were considered. Lastly, medication prescription episodes where the patient died before reaching 365 days were excluded. One patient could have several treatment episodes with different antidiabetic medications but with the rules above.

2.2. Outcomes

The main outcomes (prediction targets) were the values of HbA1c, LDL cholesterol, HDL cholesterol, and BMI achieved 12 months after drug initiation. In the dataset, these outcomes were defined as the values closest to 12 months after drug initiation (within the range of 3 weeks to 12 months). HbA1c was analyzed with the turbidimetric inhibition immunoassay method and LDL and HDL with the photometric direct enzymatic method. All samples were analyzed in the Eastern Finland Laboratory (ISLAB, Kuopio, Finland; https://www.islab.fi), which is an accredited laboratory and participates in external quality surveys. All values were standardized to International Federation of Clinical Chemistry (IFCC) units.

In evaluating the treatment selection model, favorable treatment outcomes are defined by the direction of the predicted values for each health outcome. Lower predicted values of HbA1c, LDL cholesterol, and BMI are desirable, as they indicate improved health conditions. In contrast, higher HDL cholesterol values are favorable, as they are associated with better cardiovascular health.

2.3. Potential Predictors

Several potential predictors were formed from the EHR data. These included clinical and treatment-related factors. Clinical factors included demographic variables, such as age, sex, and duration of T2D, and laboratory values on baseline HbA1c, fasting plasma glucose, BMI, LDL, HDL, total cholesterol, triglycerides, creatinine, and eGFR. In addition, clinical factors included the existence of comorbidities, such as hypertension, coronary artery disease, atrial fibrillation, heart failure, peripheral arterial diseases, stroke, chronic kidney failure, neuropathies, blindness, cancers, asthma, gout, glaucoma, depression, dementia, mental diseases, chronic obstructive pulmonary disease, rheumatoid and other arthritis, osteoporosis, neuromuscular diseases, and liver diseases, at the baseline.

Treatment-related factors included information on the prescriptions for other antidiabetic medications at baseline. Prescriptions for other than antidiabetic medications were also identified based on the third level of Anatomical Therapeutic Chemical (ATC) codes. In addition, information on smoking status at baseline was available. Detailed definitions of potential predictors are presented in Supplementary Table S1.

2.4. Treatment Selection Model Development

The treatment selection model architecture, illustrated in Figure 1, is structured into five phases. Phase 1 involves data preprocessing, including feature selection techniques to prepare the dataset. Phase 2 focuses on the development of a multi-output model, which is the core for predicting treatment options. Phase 3 contains the development of the multi-treatment selection algorithm. In Phase 4, the results from multi-treatment selection model are aggregated to formulate a single-treatment selection approach. Finally, Phase 5 involves evaluating the model to measure the effectiveness of the selection strategy.

Figure 1. Proposed personalized treatment selection model architecture: Each outcome from the multi-output model is processed individually through the treatment selection model, and the results are evaluated separately for each outcome. The model evaluation is detailed in Figure 2.

Figure 2. Treatment selection evaluation framework: introduced in [15].

2.4.1. Data Preprocessing

The data preprocessing can be divided into six main steps. The initial dataset contained 5480 samples (patients) and 128 variables. In the first step, the data were filtered to include only participants with a baseline HbA1c ranging from 53 mmol/mol to 120 mmol/mol and an eGFR of 45 mL/min/1.73 m² or higher [10]. Furthermore, variables with more than 40% missing values were eliminated. For variables with correlations above 0.7, one variable was retained based on relevance to the analysis, while the others were excluded to avoid multicollinearity and redundancy.

During the second step of preprocessing, all categorical labels were converted to numerical labels using the LabelEncoder [16] and the dataset was randomly split into training and testing sets in a 3:1 ratio. SimpleImputer [17] was then applied to impute missing values in the independent variables by replacing them with their respective modes. This approach was adopted in this study due to the relatively low proportion of missing values across most independent variables. Following this, Min–Max scaling was used to standardize the input features. The dataset consisted of 637 samples from the SGLT2-i group and 440 samples from the DPP4-i group. To address this imbalance, random oversampling was applied to the training dataset by duplicating samples from the minority class until both classes were balanced, resulting in equal representation in the training set. This approach ensured adequate representation of both treatments and reduced potential bias toward the majority class during model training. The test dataset remained unmodified throughout the analysis to ensure unbiased evaluation and avoid data leakage.

The outcomes contained missing values, and removing samples with these missing values would have resulted in significant data loss. In the third step, this issue was addressed by developing separate predictive models for each outcome using only the training data. These models were then used to impute missing outcome values within the training dataset. Samples with missing outcome values were excluded from the test dataset to prevent information leakage and ensure unbiased model evaluation. However, the performance of the model developed to predict missing values in LDL cholesterol outcome was unsatisfactory. Therefore, we decided to remove all samples in the training data that contained missing values for the LDL outcome. Table 1 presents the performance of prediction models used to impute missing values in the outcomes.

We used a residual-based outlier detection technique to identify extreme observations in the dataset (Step 4). We fitted ordinary least squares (OLS) regression models for each outcome in the training dataset and subsequently used these models to make predictions on both the training and testing datasets. Next, we computed the standardized residuals for each prediction and flagged observations with standardized residuals greater than 4 as potential outliers. The detected outliers were removed from the datasets to reduce the influence of extreme values that may arise from measurement errors or atypical data points. After these preprocessing steps, the training dataset contained 1256 samples and the test dataset contained 101 samples.

In step 5 of preprocessing, we implemented a custom feature selection procedure to identify the most relevant features for our multi-output regression model. The feature selection process was conducted using three different algorithms: SelectKBest (Kbest) [18], Recursive Feature Elimination (REF) [19], and ReliefF [20]. We used MultiOutputRegressor [21] as the estimator for the REF algorithm. To address feature selection with the Kbest and ReliefF algorithms, which do not directly support multi-output feature selection, we iterated over each target feature. We then applied the respective method to select the most relevant features that show a strong relation with each target feature. Following the selection process for each target, we aggregated the selected features into a single list, removing duplicate entries. Furthermore, the drug class feature was retained in the selected feature set because it represents the treatment assignment variable within the treatment selection framework, enabling the estimation of potential outcomes under different therapies rather than functioning solely as a predictive feature.

In the final step, we implemented 3-fold cross-validation using the training dataset to assess the performance and generalizability of our model. For each iteration of the cross-validation loop, the model was trained on two folds and evaluated on the third fold. This process was repeated three times to ensure that each fold served as both training and testing data. After training and evaluating the model on each fold, we calculated the mean accuracy and variance in the model’s performance across all folds.

2.4.2. Multi-Treatment Strategy: Treatment Selection Based on Multi-Output Model Predictions

We experimented with several multi-output regression models, including the multi-layer perceptron (MLPR) [22], XGBoost [23], CatBoost [24], LightGBM [25], Random Forest [26], and linear regression [27]. Furthermore, to improve model performance, we used the voting regressor ensemble [28] method to combine predictions from best-performing individual regression models. Except for the MLPR model, the remaining models were assessed using MultiOutputRegressor and RegressorChain [29] wrappers to extend their support and flexibility to multi-output regression.

All models underwent cross-validation and were trained using the training dataset. The model performances for predicting health parameter outcomes were assessed using the test dataset, and the R² score and RMSE were used as evaluation metrics. Furthermore, we used SHapley Additive exPlanations [30] (SHAP-version 0.44.0) to interpret the predictions and understand the feature contributions of the multi-output model (Figure 1 Phase 2).

In the multi-treatment selection method, for each health parameter, a patient is evaluated and assigned one of the two possible treatments based on the predicted outcome for that parameter (Figure 1 Phase 3). The model facilitates the prediction of each health parameter outcome on each therapy. This enables the prediction of individualized treatment effect on specific health parameters. Subsequently, for each individual, the therapy associated with the highest predicted effectiveness for each health parameter was selected as the treatment option for that parameter. Later, the differences between the predicted outcomes of health parameters for the two therapies and the baseline values were calculated for each individual to get individualized treatment effects.

2.4.3. Single-Treatment Strategy: Treatment Selection Through Aggregation of Multi-Treatment Predictions

The multi-treatment approach outputs one of the two therapies for each health outcome per patient, allowing a patient to be assigned to different therapies based on the efficiency of each specific outcome. In the single-treatment strategy, we aggregate the results from the multi-treatment selection method and assign a single treatment to each individual (Figure 1 Phase 4). We experimented with two aggregation methods to combine these therapy options: majority voting and importance-weighted aggregation.

The majority vote approach determined the final therapy by selecting the most frequently assigned therapy. In the event of equivalence, therapy was prioritized based on the predicted therapy for the HbA1c outcome.

The importance-weighted aggregation approach combines multiple treatment recommendations using feature importance values derived from the multi-output regression model, focusing on baseline features associated with each outcome (HbA1c, LDL, HDL, and BMI). Feature importance values were extracted for each outcome and normalized to ensure comparability, and were used to assign weights reflecting the relative contribution of each outcome to the final treatment decision. For each patient, a weighted score was calculated by multiplying the assigned therapy for each outcome by its corresponding weight and summing these values. The threshold for assigning the final treatment was defined as the mean of the weighted scores. Patients were then assigned to the final therapy based on whether their weighted score exceeds this threshold. Since aggregation was based on treatment assignments rather than raw outcome values, differences in outcome scales did not directly affect the aggregation process.

2.4.4. Treatment Selection Model Evaluation

In individual treatment selection, it is challenging to directly observe the difference in response between therapies for a given individual, as their responses to multiple treatments cannot be evaluated simultaneously. Consequently, the standard model performance metrics are insufficient for evaluating treatment selection models, as these metrics are primarily designed to assess the accuracy of predicting individual treatment outcomes, rather than evaluating the difference in effectiveness between therapies for each individual [15].

The performance of multi-treatment selection and single-treatment selection models was evaluated in the test dataset using the framework introduced in [15] (Figure 1 Phase 5). Figure 2 shows the evaluation approach of the treatment selection method. First, the multi-output model was used to predict the four outcomes for all individuals. Subsequently, predictions for each outcome were used independently to estimate the optimal therapy for individual patients using the multi-treatment selection method. In the next step, following the framework, we divided the population into two groups based on the predicted treatment. Then, we defined the concordant (therapy actually received is the therapy predicted by the method) and discordant (therapy actually received is not the therapy predicted by the method) subgroups on each predicted treatment group, based on the therapy actually received (observed) by each individual. Next, we evaluated the treatment selection model performance using the average health outcome improvement in the concordant compared to the discordant group within each predicted treatment group. This validation approach was applied separately for each outcome.

The same evaluation method was applied to assess the single-treatment selection model. After determining the final treatment decision for each individual using the aggregation method, we identified concordant and discordant subgroups within each predicted treatment group. The performance of the model was then evaluated by comparing the improvement in average effectiveness in health outcomes between the concordant and discordant groups (Figure 2).

All implementations were conducted using Python (version 3.11.5). We have made the source code accessible on GitHub: https://github.com/anushaihalapathirana/individualize_treatment_selection_t2d (accessed on 7 April 2026).

3. Results

To contextualize predictive performance, we compared the trained machine learning models against a cohort mean prediction baseline and regularized linear regression models. The mean baseline achieved an R² of −0.007 and an RMSE of 6.59, while Elastic Net and Ridge regression achieved R² scores of 0.195 and 0.464, with RMSE values of 5.759 and 5.501, respectively. The selected LightGBM model achieved an R² score of 0.441 and an RMSE of 5.582. Although Ridge regression achieved slightly higher predictive performance than LightGBM, it assigned all patients to the SGLT2-i treatment group and demonstrated no ability to capture treatment heterogeneity. In contrast, the LightGBM model demonstrated the highest treatment effectiveness during validation, with the strongest ability to discriminate between concordant and discordant subgroups, and was therefore selected as the final model. Appendix A presents the performance of the other highest-performing multi-output models.

The selected LightGBM multi-output regressor was configured with a maximum tree depth of 6 and a learning rate of 0.1 (Figure 1 Phase 2). The model was trained using 13 features selected by the REF algorithm: baseline HbA1c, baseline BMI, baseline HDL, baseline LDL, drug class, creatinine, eGFR, glucose, HbA1c (7–18 months) before drug initiation, age at drug initiation (years), obesity, duration of type 2 diabetes (years) and triglycerides (Figure 1 Phase 1). The observed and predicted values for the four outcomes are shown in the Figure 3. The observed BMI values and predictions were more similar than the predictions for the other outcomes.

Figure 4 illustrates the global feature importance plots generated using SHAP for each of the four outcomes. These plots provide insights into the individual contribution of features to the model predictions for each outcome.

We removed outliers from each predicted outcome using the residual-based outlier detection method to ensure the robustness and precision of our treatment selection model. Predictions with extreme residuals—indicating large discrepancies between predicted and observed values—were considered potentially implausible. Such extreme values may occur when the model makes predictions for patient profiles that are poorly represented in the data or due to noise in real-world EHR data, rather than reflecting reliable clinical responses. Excluding these predictions reduces the influence of unstable estimates and improves the robustness of treatment effect comparisons while limiting the impact of potentially implausible model outputs.

The preprocessed test data included 52 users of SGLT2-i and 49 users of DPP4-i. Following outlier removal, our treatment selection model predicted that 35 patients would benefit from DPP4-i in terms of lowering HbA1c levels, while 65 patients would benefit from SGLT2-i. Regarding LDL cholesterol levels, the model identified benefits for 61 patients with SGLT2-i and 40 patients with DPP4-i. Additionally, the model anticipated positive effects on HDL cholesterol levels for 28 patients with SGLT2-i and 73 patients with DPP4-i. In terms of lowering BMI, the model predicted advantages for 80 patients using SGLT2-i and 21 patients using DPP4-i.

Table 2 displays the evaluation of the multi-treatment selection model (Figure 1 Phase 3 and 5), including the observed treatment effects in the observed data and the treatment effect within the concordant and discordant subgroups.

The outcomes of the multi-treatment selection model were aggregated to assign a single treatment to each individual. Table 3 displays the performance metrics for both aggregated approaches (Figure 1 Phase 4). The results indicate that the majority vote approach outperforms the importance-weighted aggregation method in all performance metrics. Furthermore, we calculated Cohen’s kappa score to assess the level of agreement between the majority voting and importance-weighted aggregation approaches. The resulting Cohen’s kappa score of 0.58 indicates a moderate level of agreement between the two methods, suggesting some consistency in treatment selection outcomes. Using majority voting, the model predicted that 64 individuals would benefit from SGLT2-i, with 31 in the concordant subgroup and 33 in the discordant subgroup. For DPP4-i, the model identified 37 individuals as likely to benefit, comprising 16 in the concordant subgroup and 21 in the discordant subgroup. The importance-weighted aggregation approach identified 42 individuals as likely to benefit from SGLT2-i, including 17 in the concordant subgroup and 25 in the discordant subgroup. Additionally, this approach identified 59 individuals as likely to benefit from DPP4-i, with 24 in the concordant subgroup and 35 in the discordant subgroup.

Table 4 presents the validation results of the majority voting approach, while Table 5 provides the validation results of the importance-weighted aggregation approach (Figure 1 Phase 5).

We evaluated changes in four health parameters—HbA1c, LDL, HDL, and BMI—by computing the differences between baseline values and the values observed and predicted at 12 months within the concordant groups (Figure 5). In the analysis using the single-treatment majority vote aggregation approach, the following improvements were observed over a 12-month period: For HbA1c levels, 26 samples demonstrated improvement in predicted data, compared to 21 samples in the observed data. Similarly, for LDL levels, 25 samples showed improvement in the predicted data, whereas 22 samples showed improvement in the observed data. HDL levels improved in 24 samples according to predicted data and in 23 samples based on observed data. Notably, for BMI, 28 samples experienced improvement according to predicted data, while only 19 samples showed improvement in the observed data.

In a comparable analysis using the single-treatment importance-weighted aggregation approach, the following improvements were noted over the 12-month period: For HbA1c, 22 samples showed improvement in the predicted data compared to 19 samples in the observed data. For LDL levels, 20 samples showed improvement in predicted data, while 21 samples showed improved in the observed data. HDL improvements were seen in 23 samples for the predicted data and in 18 samples for the observed data. Finally, for BMI, 21 samples exhibited improvement in the predicted data compared to 20 samples in the observed data.

4. Discussion

Our study presents an explainable personalized treatment selection model designed to identify the optimal therapy for T2D based on individual patient characteristics and the effectiveness of multiple health parameter responses. This approach addresses the increasing need to move beyond HbA1c levels in the personalized treatment of T2D. Our model incorporates additional efficacy outcomes, including the patient’s LDL, HDL, and BMI along with HbA1c, to enhance the precision and effectiveness of treatment decisions. Furthermore, this approach supports a shift from the traditional `treat-to-target’ approach to a more holistic `treat-to-benefit’ paradigm in the management of T2D [31]. Importantly, beyond predictive performance, its value lies in translating model outputs into clinically actionable insights, enabling transparent treatment decisions in practice.

The SHAP analysis of the multi-output model provided detailed insights into the model’s predictions for each health parameter. For HbA1c, the most significant predictors identified were baseline HbA1c, HbA1c measurements taken 7–18 months prior, duration of T2D, creatinine levels, baseline LDL levels, and age. In contrast, LDL cholesterol levels were primarily influenced by baseline LDL values, age, creatinine, and triglyceride levels. For HDL cholesterol and BMI, baseline values of HDL and BMI, respectively, were the dominant predictors, with other factors contributing less significantly. These findings highlight the variability in feature impacts across different health parameters. Therefore, it is important to tailor interventions relevant to each health parameter to improve treatment effectiveness. This approach ensures that treatment plans are more aligned with individual patient profiles and their unique health outcomes.

Experiments with various multi-output regression models revealed that the LightGBM regression model significantly outperformed the other models in evaluating the treatment selection model, effectively identifying treatment benefit groups for SGLT2-i and DPP4-i across all four outcomes. In particular, the multi-treatment model demonstrated good performance in distinguishing treatment benefit strata for SGLT2-i compared to DPP4-i in all four outcomes. Furthermore, except for the DPP4-i concordant group in the HbA1c outcome, the predicted average treatment effect for concordant groups showed an improvement over the average treatment effect in the observed real-world data. However, it is important to highlight that our multi-treatment model identified relatively smaller groups benefiting from DPP4-i in terms of treatment effect on HbA1c, LDL, and BMI outcomes, compared to those benefiting from SGLT2-i.

The proposed single-treatment selection approaches fused the results of the multi-treatment model and evaluated the effectiveness of SGLT2-i and DPP4-i treatments across HbA1c, LDL, HDL, and BMI outcomes. Despite its relative simplicity, the majority voting approach performed well in evaluating treatment effectiveness. SGLT2-i consistently showed a greater average treatment effect across all measured outcomes, including significant reductions in HbA1c, LDL, and BMI, alongside a modest increase in HDL. In contrast, while DPP4-i also improved treatment effectiveness in HbA1c, LDL, and HDL, it was associated with an increase in BMI within the concordant subgroup, indicating a potential trade-off. Notably, this increase in BMI contrasts with the general perception of DPP4-i as weight-neutral in real-world data, highlighting its potential impact on the studied population [32,33]. The consistency between our model’s predictions and actual clinical observations underscores the model’s capability to identify both the benefits and trade-offs of DPP4-i therapy, highlighting the need for careful evaluation of BMI effects when applying these treatments. Additionally, considering LDL and HDL, the outcome effectiveness differences were comparable between the two drugs, with DPP4-i showing a slightly higher effectiveness than SGLT2-i. Furthermore, considering all efficacy outcomes, the majority-voting single-treatment model demonstrated a higher number of patients with improved outcomes in the predicted data compared to the observed real-world data. This approach has advantages over the traditional “one-size-fits-all” treatment strategy, highlighting the potential of personalized treatment plans.

The importance-weighted aggregation method did not work well in the evaluation. This approach identified SGLT2-i subgroups that demonstrate improved treatment effects in three outcomes. However, the performance of the DPP4-i treatment was less consistent. While the method identified subgroups with enhanced treatment effects for LDL cholesterol, it was less effective for HbA1c, HDL and BMI. Overall, both aggregation approaches revealed consistent findings for SGLT2-i treatment, which showed improved efficacy across outcomes. However, both methods identified limitations with DPP4-i, particularly regarding the BMI outcome, which aligns with observed data. These findings highlight the need for further research on DPP4-i’s effects on BMI.

Our findings indicate that personalized medication leads to better health outcomes, improving the individual’s quality of life. In addition, on a broader scale, personalized medications contribute to a reduction in the economic burden on the healthcare system. Studies showed that individual-level reductions in these health parameters will result in long-term cost savings at national levels [34,35,36]. Furthermore, since the number of patients with T2D is predicted to increase to 643 million by 2030 and 783 million by 2045 [37], the cumulative economic impact of reducing the levels of HbA1c, LDL, HDL and BMI even by modest amounts could have profound implications at the societal and global levels, including significant savings in healthcare costs and improved overall health outcomes.

Notable strengths of our study include the introduction of a comprehensive personalized treatment selection model that addresses the limitations of evaluating treatment effectiveness based solely on HbA1c levels in T2D treatments [11]. By incorporating multiple health parameters such as LDL, HDL, and BMI, this approach enhances the precision of treatment decisions and addresses the growing need for more individualized diabetes management outlined in both European [38] and US [39] treatment guidelines. Additionally, the use of SHAP analysis provides a detailed understanding of the model’s predictions for each health parameter. This insight into the most significant predictors for HbA1c, LDL, HDL, and BMI supports the development of tailored treatment strategies and the variability in feature impacts. Furthermore, our study explored different machine learning models to evaluate treatment effectiveness, demonstrating the effectiveness of the LightGBM regression model in distinguishing treatment benefit groups and improving the precision of treatment selection.

Our study has several limitations. The dataset represents a population from a specific region in Finland, which may limit the generalizability of our model to more diverse populations. We acknowledge that further validation of the model across different sub-populations and geographical regions is needed. Moreover, we were unable to validate individual-level treatment effects because the treatment outcomes were observed for only one treatment, which presents a fundamental challenge in causal inference. Furthermore, the complexity of treatment effects across different outcomes may not be fully captured by the aggregation methods we introduced, highlighting the necessity for more advanced analytical techniques. Additionally, analyzing the number of patients with improved outcomes does not account for those who already had their outcomes within the target range, for whom no further improvement was needed or possible. In this study, we only consider EHR data and medication prescription data, while other variables, such as physical activities, genetics, and adherence levels, which might influence treatment outcomes, were not included and could affect observed results. In addition, missing data in the independent variables were handled using single imputation. While more advanced imputation methods can better account for uncertainty in missing values, this uncertainty was not explicitly modeled in the current study, which may lead to underestimated variability. Missing outcome values were imputed using model-based methods. While this retained more training data, imputed values were treated as observed without accounting for prediction uncertainty, which may introduce bias. Finally, due to the limited size of the test dataset and the further reduction within concordant and discordant subgroups, some subgroup analyses are based on small sample sizes. This leads to increased variability and wide confidence intervals, thereby limiting the statistical power to detect reliable differences in treatment effects.

In future studies, we aim to explore more advanced methods for imputing missing values in independent variables, including multiple imputation and machine-learning-based techniques, and to develop and refine methods that more effectively integrate multi-treatment selection outcomes. We will also examine off-policy evaluation strategies to better assess the causal impact of treatment recommendations. Additionally, our future work will focus on externally validating the model using diverse population data and evaluating the economic outcomes of diabetes management based on the results of the developed treatment selection model.

5. Conclusions

Our study proposed an explainable personalized treatment selection model for T2D, emphasizing the importance of incorporating multiple health parameters beyond HbA1c. This model enhances the precision and effectiveness of treatment decisions by tailoring interventions to individual patient characteristics. The study focuses on SGLT2-i and DPP4-i; however, the methodology presented here serves as a framework that can be applied to other medications, including more recent therapies like GLP-1 analogs, as their data availability increases.

At the core of the treatment selection model, the multi-output model predicted individual’s HbA1c, LDL, HDL, and BMI levels 12 months after drug initiation, achieving an R² score of 0.44. SHAP analysis of this model revealed key predictors for each health parameter, highlighting the importance of personalized treatment strategies. This multi-output model served as the foundation for developing the multi-treatment selection model, which allows clinicians to assign treatments tailored to specific health outcome. By aggregating the results of the multi-treatment selection model, the proposed single-treatment selection algorithm achieved an accuracy of 0.47 and an F1 score of 0.46. It demonstrated a strong treatment effect with SGLT2-i compared to DPP4-i. However, the model identified a negative impact on BMI with DPP4-i, suggesting a need for further research to address this reduced treatment effect. Overall, our approach showed better health outcomes compared to the data observed in the real world, indicating the potential for personalized treatment strategies to improve quality of life and reduce healthcare costs.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ai7060183/s1, Table S1. Definitions of potential predictors.

Author Contributions

Conceptualization, A.I., P.L., P.S., S.T., G.C. and J.M.; methodology, A.I., P.L., P.S., S.T., G.C. and J.M.; software, A.I.; validation, A.I. and P.L.; formal analysis, A.I.; investigation, A.I.; resources, J.R.; data curation, A.I., T.L. and P.L.; writing—original draft preparation, A.I. and P.L.; writing—review and editing, A.I.; visualization, A.I.; supervision, P.L., P.S., S.T. and J.M.; project administration, J.R.; funding acquisition, P.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 825162.

Institutional Review Board Statement

Approval from the Ethics Committee of the Northern Savo Hospital District was obtained on 13 November 2012 (diary number 81/2012).

Informed Consent Statement

Only register-based data were utilized and, thus, consent from the patients was not needed. This study complies with the Declaration of Helsinki.

Data Availability Statement

The European Union and Finnish laws regulate access to data; therefore, sharing sensitive data is not possible and data are not publicly available. Researchers who meet the criteria set by the European Union and Finnish laws for accessing confidential data can obtain an anonymized version of the dataset with a data permit from the appropriate authority. Contact aineistoneuvonta@siunsote.fi for data requests from the Siun sote—Joint Municipal Authority for North Karelia Social and Health Services and tietoaineistot@kela.fi for data requests from the Social Insurance Institute.

Acknowledgments

This study is part of the HTx project, which is a Horizon 2020 project supported by the European Union lasting for 5 years from January 2019.

Conflicts of Interest

J.M. is a founding partner of ESiOR Oy. This company was not involved in carrying out this research. The other authors declare no conflicts of interest.

Appendix A

This section presents the performance results of the other highest-performing multi-output models trained during the experiments.

Table A1. Performance of trained multi-output models.

Model (Feature Selection Method)	R² Score	RMSE	Features
MLPR (REF)	0.49	5.11	C10A lipid-modifying agents, MD_RCT, creatinine, BMI_b, vascular_comp, drug class, eGFR, glucose, pre-initiation HbA1c, HbA1c_b, HDL_b, age, insulin, LDL_b, met_oad, number of diseases, smoking, sum_drugs, T2D duration, triglycerides
XGB (Kbest)	0.42	5.87	Creatinine, BMI_b, drug class, eGFR, glucose, pre-initiation HbA1c, HbA1c_b, HDL_b, LDL_b, obese, T2D duration, triglycerides
CatB (REF)	0.45	5.41	Creatinine, BMI_b, drug class, eGFR, glucose, pre-initiation HbA1c, HbA1c_b, HDL_b, age, LDL_b, T2D duration, triglycerides
LGBM (REF)	0.44	5.58	HbA1c_b, BMI_b, HDL_b, LDL_b, drug class, eGFR, creatinine, obesity, glucose, age, triglycerides, T2D duration, pre-initiation HbA1c
Random Forest (REF)	0.45	5.57	MD_RCT, Creatinine, BMI_b, drug class, eGFR, glucose, pre-initiation HbA1c, HbA1c_b, HDL_b, age, LDL_b, number of diseases, smoking, sum_drugs, T2D duration, triglycerides
Linear Regression (Kbest)	0.43	5.83	BMI_b, drug class, eGFR, pre-initiation HbA1c, HbA1c_b, HDL_b, LDL_b, obese, triglycerides
VR (MLPR + CatB + LGBM) (ReliefF)	0.47	5.32	MD_RCT, BMI_b, vascular_comp, Concordant diseases, cvd_comp, drug class, eGFR, pre-initiation HbA1c, HbA1c_b, HDL_b, age, insulin, LDL_b, obese, sex, sum_drugs, T2D duration, triglycerides
VR (CatB + LGBM + Random Forest) (ReliefF)	0.46	5.48	BMI_b, vascular_comp, concordant diseases, cvd_comp, drug class, eGFR, pre-initiation HbA1c, HbA1c_b, HDL_b, age, insulin, LDL_b, obese, sex, sum_drugs, triglycerides

Note: MLPR: Multi-Layer perceptron regressor, XGB: XGBoost regressor, CatB: CatBoost regressor, LGBM: LightGBM regressor, VR: Voting regressor, REF: Recursive feature elimination, MD_RTC: Mean difference for change in HbA1c of the drug based on RCTs, variable_b: baseline measurements, vascular_comp: Microvascular and/or macrovascular complications before drug initiation, pre-initiation HbA1c: HbA1c (7–18 months) before drug initiation, age: age at drug initiation (y), met_oad: Availablity of metformin and other oral antidiabetic drug, sum_drugs: Sum of different types of antidiabetic medications, T2D duration: duration of type 2 diabetes (y), cvd_comp: Cardiovascular disease complications.

References

Diabetes Canada Clinical Practice Guidelines Expert Committee; Lipscombe, L.; Butalia, S.; Dasgupta, K.; Eurich, D.T.; MacCallum, L.; Shah, B.R.; Simpson, S.; Senior, P.A. Pharmacologic glycemic management of type 2 diabetes in adults: 2020 update. Can. J. Diabetes 2020, 44, 575–591. [Google Scholar] [CrossRef]
Association, A.D. Standards of medical care in diabetes-2022 abridged for primary care providers. Clin. Diabetes 2022, 40, 10–38. [Google Scholar] [CrossRef]
Davies, M.J.; Aroda, V.R.; Collins, B.S.; Gabbay, R.A.; Green, J.; Maruthur, N.M.; Rosas, S.E.; Del Prato, S.; Mathieu, C.; Mingrone, G.; et al. Management of hyperglycemia in type 2 diabetes, 2022. a consensus report by the american diabetes association (ada) and the european association for the study of diabetes (easd). Diabetes Care 2022, 45, 2753–2786. [Google Scholar] [CrossRef]
Rojas, L.B.A.; Gomes, M.B. Metformin: An old but still the best treatment for type 2 diabetes. Diabetol. Metab. Syndr. 2013, 5, 6. [Google Scholar] [CrossRef]
Di Fusco, D.; Dinallo, V.; Monteleone, I.; Laudisi, F.; Marafini, I.; Franz, E.; Di Grazia, A.; Dwairi, R.; Colantoni, A.; Ortenzi, A.; et al. Metformin inhibits inflammatory signals in the gut by controlling ampk and p38 map kinase activation. Clin. Sci. 2018, 132, 1155–1168. [Google Scholar] [CrossRef]
Scheen, A.J. Efficacy/safety balance of dpp-4 inhibitors versus sglt2 inhibitors in elderly patients with type 2 diabetes. Diabetes Metab. 2021, 47, 101275. [Google Scholar] [CrossRef]
Avogaro, A.; Delgado, E.; Lingvay, I. When metformin is not enough: Pros and cons of sglt2 and dpp-4 inhibitors as a second line therapy. Diabetes/Metab. Res. Rev. 2018, 34, e2981. [Google Scholar] [CrossRef]
Kosorok, M.R.; Laber, E.B. Precision medicine. Annu. Rev. Stat. Its Appl. 2019, 6, 263–286. [Google Scholar] [CrossRef]
Dennis, J.M.; Shields, B.M.; Hill, A.V.; Knight, B.A.; McDonald, T.J.; Rodgers, L.R.; Weedon, M.N.; Henley, W.E.; Sattar, N.; Holman, R.R.; et al. Precision medicine in type 2 diabetes: Clinical markers of insulin resistance are associated with altered short-and long-term glycemic response to dpp-4 inhibitor therapy. Diabetes Care 2018, 41, 705–712. [Google Scholar] [CrossRef]
Dennis, J.M.; Young, K.G.; McGovern, A.P.; Mateen, B.A.; Vollmer, S.J.; Simpson, M.D.; Henley, W.E.; Holman, R.R.; Sattar, N.; Pearson, E.R.; et al. Development of a treatment selection algorithm for sglt2 and dpp-4 inhibitor therapies in people with type 2 diabetes: A retrospective cohort study. Lancet Digit. Health 2022, 4, e873–e883. [Google Scholar] [CrossRef]
Venkatasubramaniam, A.; Mateen, B.A.; Shields, B.M.; Hattersley, A.T.; Jones, A.G.; Vollmer, S.J.; Dennis, J.M. Comparison of causal forest and regression-based approaches to evaluate treatment effect heterogeneity: An application for type 2 diabetes precision medicine. BMC Med. Inform. Decis. Mak. 2023, 23, 110. [Google Scholar] [CrossRef]
Mukherjee, S.; Ray, S.K.; Jadhav, A.A.; Wakode, S.L. Multi-level analysis of hba1c in diagnosis and prognosis of diabetic patients. Curr. Diabetes Rev. 2024, 20, 85–92. [Google Scholar] [CrossRef]
Yashi, K.; Daley, S.F. Obesity and Type 2 Diabetes; StatPearls Publishing: Treasure Island, FL, USA, 2024. Available online: http://www.ncbi.nlm.nih.gov/books/NBK592412/ (accessed on 12 May 2026).
Farbstein, D.; Levy, A.P. Hdl dysfunction in diabetes: Causes and possible treatments. Expert Rev. Cardiovasc. Ther. 2012, 10, 353. [Google Scholar] [CrossRef]
Dennis, J.M. Precision medicine in type 2 diabetes: Using individualized prediction models to optimize selection of treatment. Diabetes 2020, 69, 2075–2085. [Google Scholar] [CrossRef]
Scikit-Learn Developers. LabelEncoder—Scikit-Learn Documentation. Available online: https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.LabelEncoder.html (accessed on 1 October 2024).
Scikit-Learn Developers. Simpleimputer—Scikit-Learn Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html (accessed on 20 September 2024).
Scikit-Learn Developers. Selectkbest—Scikit-Learn Documentation. 2024. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html (accessed on 7 April 2026).
Scikit-Learn Developers. RFE—Scikit-Learn Documentation. 2024. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html (accessed on 7 April 2026).
Kononenko, I. Estimating attributes: Analysis and extensions of relief. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 1994; pp. 171–182. [Google Scholar]
Scikit-Learn Developers. MultiOutputRegressor—Scikit-Learn Documentation. 2024. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html (accessed on 7 April 2026).
Murtagh, F. Multilayer perceptrons for classification and regression. Neurocomputing 1991, 2, 183–197. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. Catboost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6639–6649. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar]
Zhou, Z. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Scikit-Learn Developers. RegressorChain—Scikit-Learn Documentation. 2024. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.RegressorChain.html (accessed on 7 April 2026).
Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
Morieri, M.L.; Longato, E.; Di Camillo, B.; Sparacino, G.; Avogaro, A.; Fadini, G.P. Management of type 2 diabetes with a treat-to-benefit approach improved long-term cardiovascular outcomes under routine care. Cardiovasc. Diabetol. 2022, 21, 274. [Google Scholar] [CrossRef]
Ahren, B. Dpp-4 inhibitors. Best Pract. Res. Clin. Endocrinol. Metab. 2007, 21, 517–533. [Google Scholar] [CrossRef]
Gallwitz, B. Clinical use of dpp-4 inhibitors. Front. Endocrinol. 2019, 10, 389. [Google Scholar] [CrossRef]
Lage, M.J.; Boye, K.S. The relationship between hba1c reduction and healthcare costs among patients with type 2 diabetes: Evidence from a us claims database. Curr. Med. Res. Opin. 2020, 36, 1441–1447. [Google Scholar] [CrossRef]
Fan, M.; Stephan, A.-J.; Emmert-Fees, K.; Peters, A.; Laxy, M. Health and economic impact of improved glucose, blood pressure and lipid control among german adults with type 2 diabetes: A modelling study. Diabetologia 2023, 66, 1693–1704. [Google Scholar] [CrossRef]
Dilla, T.; Valladares, A.; Nicolay, C.; Salvador, J.; Reviriego, J.; Costi, M. Healthcare costs associated with change in body mass index in patients with type 2 diabetes mellitus in spain: The ecobim study. Appl. Health Econ. Health Policy 2012, 10, 417–430. [Google Scholar] [CrossRef]
Magliano, D.J.; Boyko, E.J. Idf Diabetes Atlas, 10th ed.; International Diabetes Federation: Brussels, Belgium, 2021; Available online: https://diabetesatlas.org (accessed on 14 September 2024).
Chung, W.K.; Erion, K.; Florez, J.C.; Hattersley, A.T.; Hivert, M.-F.; Lee, C.G.; McCarthy, M.I.; Nolan, J.J.; Norris, J.M.; Pearson, E.R.; et al. Precision medicine in diabetes: A consensus report from the american diabetes association (ada) and the european association for the study of diabetes (easd). Diabetologia 2020, 63, 1671–1693. [Google Scholar] [CrossRef]
American Diabetes Association. American Diabetes Association Releases Standards of Care in Diabetes—2024. 2024. Available online: https://diabetes.org/newsroom/press-releases/american-diabetes-association-releases-standards-care-diabetes-2024 (accessed on 9 December 2024).

Figure 3. Scatter plot of observed outcome values vs predicted outcome values.

Figure 4. Global interpretations of the four outcomes from the multi-output prediction model, with features arranged by importance and color-coded to represent their values. (a) HbA1c outcome, (b) LDL outcome, (c) HDL outcome, (d) BMI outcome.

Figure 5. Comparison of the number of patients showing health parameter improvements in predicted and observed data using importance-weighted and majority-vote approaches within concordant groups.

Table 1. Performance of predictive imputation models on test data for individual outcomes.

Outcome	Model (Feature Selection Method)	R² Score	RMSE	Features
HbA1c	MLPR (ReliefF)	0.402	9.120	baseline HbA1c, use of insulin, pre-initiation HbA1c, sum_drugs, MD_RCT, T2D duration (y), glucose, concordant diseases, met_oad, triglycerids, hypertension, age
LDL	VotingR (Kbest)	0.213	0.733	baseline HbA1c, age (y), baseline LDL, insulin, sum_drugs, hypertension, coronary heart disease, cvd_comp, obese, C02A antiadrenergic drugs, C10A lipid-modifying agents
HDL	XGB (Kbest)	0.679	0.199	MD_RCT, pre-initiation HbA1c, baseline LDL, baseline HDL, glucose, met_oad
BMI	XGB (Kbest)	0.859	2.052	sex, age (y), T2D duration (y), classified creatinine, baseline BMI, use of DPP-4 inhibitors, baseline HDL, triglycerids, obese

Note: MLPR: Multi-Layer perceptron regressor, VotingR: VotingRegressor, combining XGBoostRegressor (XGB), Random Forest Regressor (RFR), and CatBoostRegressor (CatB) estimators, demonstrated the highest model performance in the LDL imputation model. pre-initiation HbA1c: HbA1c (7-18 months) before drug initiation, sum_drugs: sum of different types of antidiabetic medications, MD_RTC: mean difference for change in HbA1c of the drug based on RCTs, met_oad: use of metformin and other oral antidiabetic medications, cvd_comp: cardiovascular disease complications.

Table 2. Validation in multi-treatment selection model: Treatment effects across subgroups. For HbA1c, LDL, and BMI, a negative treatment effect indicates a favorable outcome, while for HDL, a positive treatment effect is favorable. The outcome effectiveness represents the treatment effect difference between concordant and discordant subgroups, reported with 95% bootstrap confidence intervals in parentheses.

	Avg Treat. Effect (Obs.)	Subgroup	No. Patients	Avg Treat. Effect (Pred.)	Outcome Effectiveness (95% CI)
HbA1C (mmol/mol)
SGLT2-i	−9.44	concordant	31	−13.45	−10.92
SGLT2-i	−9.44	discordant	34	−2.53	(−17.9, −4.2)
DPP4-i	−4.10	concordant	14	−3.86	−0.34
DPP4-i	−4.10	discordant	21	−3.52	(−6.0, 5.2)
LDL cholesterol (mmol/L)
SGLT2-i	−0.27	concordant	27	−0.60	−0.46
SGLT2-i	−0.27	discordant	34	−0.14	(−0.9, 0.0)
DPP4-i	−0.16	concordant	15	−0.21	−0.29
DPP4-i	−0.16	discordant	25	0.08	(−0.7, 0.1)
HDL cholesterol (mmol/L)
SGLT2-i	0.01	concordant	10	0.02	0.03
SGLT2-i	0.01	discordant	10	−0.01	(0.0, 0.1)
DPP4-i	0.00	concordant	38	0.01	0.00
DPP4-i	0.00	discordant	42	0.01	(−0.1, 0.1)
BMI (kg/m²)
SGLT2-i	−1.27	concordant	44	−1.55	−1.78
SGLT2-i	−1.27	discordant	44	0.23	(−2.8, −0.7)
DPP4-i	0.22	concordant	5	0.13	−0.16
DPP4-i	0.22	discordant	8	0.29	(−3.9, 3.3)

Table 3. Performance of single-treatment selection models.

Method	Accuracy	F1 Score (Micro)	Precision	Recall
Majority voting	0.47	0.46	0.48	0.59
Importance-weighted	0.41	0.40	0.40	0.32

Table 4. Validation of the single-treatment selection model using a majority voting aggregation approach: The table displays the predicted average treatment effects and outcome effectiveness across subgroups. For HbA1c, LDL, and BMI, a negative treatment effect indicates a favorable outcome, while for HDL, a positive treatment effect is favorable. The outcome effectiveness represents the treatment effect difference between concordant and discordant subgroups, reported with 95% bootstrap confidence intervals in parentheses.

	Subgroup	Avg Treat. Effect (Predicted)	Outcome Effectiveness (95% CI)
HbA1C (mmol/mol)
SGLT2-i	concordant	−13.48	−11.12
SGLT2-i	discordant	−2.36	(−18.4, −4.0)
DPP4-i	concordant	−7.69	−4.21
DPP4-i	discordant	−3.48	(−13.6, 3.6)
LDL cholesterol (mmol/L)
SGLT2-i	concordant	−0.26	−0.25
SGLT2-i	discordant	−0.01	(−0.6, 0.0)
DPP4-i	concordant	−0.46	−0.17
DPP4-i	discordant	−0.29	(−0.7, 0.4)
HDL cholesterol (mmol/L)
SGLT2-i	concordant	0.01	0.03
SGLT2-i	discordant	-0.01	(0, 0.1)
DPP4-i	concordant	0.02	0.01
DPP4-i	discordant	0.01	(−0.1, 0.1)
BMI (kg/m²)
SGLT2-i	concordant	−1.84	−1.56
SGLT2-i	discordant	−0.28	(−2.6, −0.5)
DPP4-i	concordant	1.25	1.68
DPP4-i	discordant	−0.43	(−0.2, 3.7)

Table 5. Validation of the single-treatment selection model using importance-weighted aggregation approach: The table displays the predicted average treatment effects and outcome effectiveness across subgroups. For HbA1c, LDL, and BMI, a negative treatment effect indicates a favorable outcome, while for HDL, a positive treatment effect is favorable. The outcome effectiveness represents the treatment effect difference between concordant and discordant subgroups, reported with 95% bootstrap confidence intervals in parentheses.

	Subgroup	Avg Treat. Effect (Predicted)	Outcome Effectiveness (95% CI)
HbA1C (mmol/mol)
SGLT2-i	concordant	−12.06	−9.66
SGLT2-i	discordant	−2.40	(−19.5, −0.6)
DPP4-i	concordant	−5.88	2.29
DPP4-i	discordant	−8.17	(−5.4, 9.2)
LDL cholesterol (mmol/L)
SGLT2-i	concordant	−0.37	−0.33
SGLT2-i	discordant	−0.04	(−0.8, 0.1)
DPP4-i	concordant	−0.28	−0.05
DPP4-i	discordant	−0.23	(−0.4, 0.3)
HDL cholesterol (mmol/L)
SGLT2-i	concordant	0.01	−0.03
SGLT2-i	discordant	0.04	(−0.1, 0.0)
DPP4-i	concordant	−0.04	−0.05
DPP4-i	discordant	0.01	(−0.1, 0.1)
BMI (kg/m²)
SGLT2-i	concordant	−1.44	−1.04
SGLT2-i	discordant	−0.40	(−2.3, 0.2)
DPP4-i	concordant	0.87	2.06
DPP4-i	discordant	−1.19	(0.6, 3.5)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ihalapathirana, A.; Lavikainen, P.; Siirtola, P.; Tamminen, S.; Chandra, G.; Laatikainen, T.; Martikainen, J.; Röning, J. Beyond Glycemic Control: Precision Medicine in Type 2 Diabetes Using Multi-Output Explainable Artificial Intelligence for Personalized SGLT2 and DPP-4 Therapy Selection. AI 2026, 7, 183. https://doi.org/10.3390/ai7060183

AMA Style

Ihalapathirana A, Lavikainen P, Siirtola P, Tamminen S, Chandra G, Laatikainen T, Martikainen J, Röning J. Beyond Glycemic Control: Precision Medicine in Type 2 Diabetes Using Multi-Output Explainable Artificial Intelligence for Personalized SGLT2 and DPP-4 Therapy Selection. AI. 2026; 7(6):183. https://doi.org/10.3390/ai7060183

Chicago/Turabian Style

Ihalapathirana, Anusha, Piia Lavikainen, Pekka Siirtola, Satu Tamminen, Gunjan Chandra, Tiina Laatikainen, Janne Martikainen, and Juha Röning. 2026. "Beyond Glycemic Control: Precision Medicine in Type 2 Diabetes Using Multi-Output Explainable Artificial Intelligence for Personalized SGLT2 and DPP-4 Therapy Selection" AI 7, no. 6: 183. https://doi.org/10.3390/ai7060183

APA Style

Ihalapathirana, A., Lavikainen, P., Siirtola, P., Tamminen, S., Chandra, G., Laatikainen, T., Martikainen, J., & Röning, J. (2026). Beyond Glycemic Control: Precision Medicine in Type 2 Diabetes Using Multi-Output Explainable Artificial Intelligence for Personalized SGLT2 and DPP-4 Therapy Selection. AI, 7(6), 183. https://doi.org/10.3390/ai7060183

Article Menu

Beyond Glycemic Control: Precision Medicine in Type 2 Diabetes Using Multi-Output Explainable Artificial Intelligence for Personalized SGLT2 and DPP-4 Therapy Selection

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design and Dataset

2.2. Outcomes

2.3. Potential Predictors

2.4. Treatment Selection Model Development

2.4.1. Data Preprocessing

2.4.2. Multi-Treatment Strategy: Treatment Selection Based on Multi-Output Model Predictions

2.4.3. Single-Treatment Strategy: Treatment Selection Through Aggregation of Multi-Treatment Predictions

2.4.4. Treatment Selection Model Evaluation

3. Results

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI