A Machine Learning Model to Predict Length of Stay and Mortality among Diabetes and Hypertension Inpatients

Background and Objectives: Taiwan is among the nations with the highest rates of Type 2 Diabetes Mellitus (T2DM) and Hypertension (HTN). As more cases are reported each year, there is a rise in hospital admissions for people seeking medical attention. This creates a burden on hospitals and affects the overall management and administration of the hospitals. Hence, this study aimed to develop a machine learning (ML) model to predict the Length of Stay (LoS) and mortality among T2DM and HTN inpatients. Materials and Methods: Using Taiwan’s National Health Insurance Research Database (NHIRD), this cohort study consisted of 58,618 patients, where 25,868 had T2DM, 32,750 had HTN, and 6419 had both T2DM and HTN. We analyzed the data with different machine learning models for the prediction of LoS and mortality. The evaluation was done by plotting descriptive statistical graphs, feature importance, precision-recall curve, accuracy plots, and AUC. The training and testing data were set at a ratio of 8:2 before applying ML algorithms. Results: XGBoost showed the best performance in predicting LoS (R2 0.633; RMSE 0.386; MAE 0.123), and RF resulted in a slightly lower performance (R2 0.591; RMSE 0.401; MAE 0.027). Logistic Regression (LoR) performed the best in predicting mortality (CV Score 0.9779; Test Score 0.9728; Precision 0.9432; Recall 0.9786; AUC 0.97 and AUPR 0.93), closely followed by Ridge Classifier (CV Score 0.9736; Test Score 0.9692; Precision 0.9312; Recall 0.9463; AUC 0.94 and AUPR 0.89). Conclusions: We developed a robust prediction model for LoS and mortality of T2DM and HTN inpatients. Linear Regression showed the best performance for LoS, and Logistic Regression performed the best in predicting mortality. The results showed that ML algorithms can not only help healthcare professionals in data-driven decision-making but can also facilitate early intervention and resource planning.


Introduction
Non-Communicable Diseases (NCDs), such as Type 2 Diabetes Mellitus (T2DM) and Hypertension (HTN), are a major public health problem and a leading cause of mortality worldwide. They pose great economic threats and burdens due to their treatment cost and complications [1]. Mainly triggered by obesity, fatty food, physical inactivity, and a sedentary lifestyle, T2DM is one of the most common NCDs [2]. The number of cases and prevalence of diabetes have continued to increase over the last few decades, as approximately 422 million people have diabetes worldwide, with the majority living in low and middle-income countries [3]. Each year, diabetes is directly responsible for 1.6 million deaths, and an estimated 193 million people are diabetic but are unaware of it [4]. T2DM and obesity are the leading factors for the global prevalence of Hypertension (HTN). HTN is one of the silent killer NCDs because sometimes, people with HTN do not manifest signs and symptoms [5]. It is a major risk factor for cardiovascular, brain, kidney, and other diseases. The prevalence of HTN increases with age, as an estimated 1.2 billion adults aged 30-79 have HTN worldwide, with a significant ratio in low and middle-income countries, and approximately 46% of people are unaware of having this condition [6].
In the literature, various studies have been conducted to assess the treatment/outcomes of T2DM and HTN patients. Hospitalized patients, mostly with hypertensive emergencies or urgency, who then sporadically exhibit acute HTN and who are deemed worthy of clinical attention, may also have chronic HTN [7]. Cases of patients with HTN are common among diabetes, with prevalence depending on the type and duration of diabetes, age, sex, race/ethnicity, BMI, history of glycemic control, and presence of kidney disease, among other factors [8]. HTN is also a major cause of morbidity and mortality for individuals with diabetes. More than 50% of patients with HTN also have DM [9]. T2DM increases the risks of heart failure and mortality in patients with HTN. Given their common risk factors, HTN and T2DM often coexist. In general, HTN is prevalent among 70% of T2DM patients, whereas patients with HTN are 2.5 times more likely to develop T2DM as a primary comorbidity [10][11][12].
The Length of Stay (LoS) is the amount of time a patient stays in the hospital after being admitted due to a medical condition and is regarded as one of the most important metrics for hospital administration and management [13]. Several studies have shown that LoS is associated with other clinical outcomes; for example, if the patient remains in the ICU for more than three days, he is more likely to die [14]. However, Lingsma et al. [15] indicated that there is a direct correlation between LoS and mortality during the index admission, and Sud et al. [16] showed that prolonged LoS is associated with higher rates of mortality and readmission.
Taiwan implemented the National Health Insurance (NHI) system in 1995, and it has a high coverage and utilization rate. However, the healthcare system in Taiwan is facing immense challenges due to rapid population aging [17]. Approximately 9996 people in Taiwan died from DM in 2019, with 2736 of the deaths being recorded to be among people 85 years of age and above [18]. The number of deaths due to HTN in 2019 was 6255, and the majority of the cases had an age of 85 years or more [18]. Taiwan is among the nations with the highest rates of Type 2 Diabetes Mellitus (T2DM) and Hypertension (HTN). As more cases are reported each year, there is a rise in hospital admissions for people seeking medical attention. This creates a burden on the hospitals and affects the overall management and administration of the hospitals. Accurate identification of patients enables early planning of treatment and provision of more intensive care to accelerate their recovery, intervention, and improvement of clinical outcomes, thereby reducing LoS as well as improving the planning and resource management [19].
Artificial Intelligence (AI) is an innovative field of computer science that has transformed the practice of medicine and reshaped the delivery of healthcare. With the latest surge of AI in healthcare, one of its powerful domains, Machine Learning (ML), has extensively been used for improving the accuracy, prediction, and quality of work in this domain [20]. An important application of ML algorithms used in hospitals is the precise prediction of mortality and LoS, which in turn can classify patients with different risk factors of outcome [21]. Accurate LoS prediction of inpatients is not only important in improving patient care but is also critical for resource management and planning in hospitals [13]. Therefore, the objective of this study was to utilize ML algorithms in order to predict the LoS and mortality of patients diagnosed with T2DM and HTN using the data from Taiwan's National Health Insurance Research Database (NHIRD). The National Health Insurance Research Database (NHIRD) of Taiwan is a unique and large national database that has been widely used as an excellent resource for scientific research in healthcare, along with its other benefits and purposes [22,23].

Data Source
Taiwan began its healthcare reforms in the 1980s, following two decades of rapid economic growth. The National Health Insurance (NHI) Act was passed on 19 July 1994, and the NHI model was adopted. The Bureau of NHI developed NHIRD to support datadriven decision and policy making [24]. NHIRD is a cohort of registry and claims data of all 23 million residents of Taiwan [19]. NHIRD data were made available to researchers for the period of 2000-2013. We have taken the four years' latest releases, i.e., data from 2010-2013, in our study. We queried NHIRD for participant user files to conduct this retrospective cohort study. Data included sex, age, birthdates, discharge status, treatment, status change indicator, death, hospital cost, LoS, etc.
Our study population consisted of 65,037 patients. However, there were patients with T2DM (n = 25,868), HTN (n = 32,750), and both T2DM and HTN (n = 6419). In general, 70% of patients with T2DM had HTN, and patients with previous HTN were 2.5 times more likely to develop T2DM [3].

Inclusion and Exclusion Criteria
We included patients with T2DM or HTN. The inclusion criteria were as follows. (i) Patients diagnosed with T2DM using the following International Classification of Diseases-9 (ICD-9) Revision codes: (25000, 25002 Patients with both T2DM and HTN were identified by querying the above two datasets. Finally, we selected patients with either T2DM or HTN as primary comorbidities for predicting LoS and mortality. We also excluded patients with duplicate records, patients who died on discharge, those with missing/incomplete data, patients who died on the day of admission, and deaths due to injuries or suicide. Demographic characteristics of all patients with T2DM, HTN, and both HTN and T2DM has been explained in the results. However, we continued to develop the prediction model by previously excluding patients who had both T2DM and HTN (see Figure 1).

Predictors and Outcomes
The outcome of interest in our study included mortality and LoS. The outcome predicted using the model for mortality was a categorical variable with the values 1 = "alive" or 0 = "death", and LoS was predicted as a continuous variable. All the selected predictors were based on data obtained before discharge. A total of 67 predictor variables consisting of hospital cost, vital signs and symptoms, comorbidities, and demographic characteristics were extracted from NHIRD.
The covariates of interest in our study included Gender, Age, Discharge status, HTN, T2DM, Number of comorbidities, Hospital cost, LoS, Days spent in acute bed, Days spent in chronic bed, Transfer code, Case classification, Pneumonia, Urinary tract infection (UTI), Cellulitis, Congestive heart failure, Inguinal hernia, Acute pancreatitis, Aneurysm, Hearing, LoS, Hypertrophy, Acute pyelonephritis, Cerebral artery hemorrhage, Intracerebral hemorrhage, Congestive heart failure, Calculus of urethra, Obstructive chronic bronchitis, Displacement of the lumbar vertebral disc, and Malignant neoplasm of liver. All were analyzed using EDA plots and descriptive statistics.

Predictors and Outcomes
The outcome of interest in our study included mortality and LoS. The ou dicted using the model for mortality was a categorical variable with the values or 0 = "death", and LoS was predicted as a continuous variable. All the selected were based on data obtained before discharge. A total of 67 predictor variable of hospital cost, vital signs and symptoms, comorbidities, and demographic tics were extracted from NHIRD.
The covariates of interest in our study included Gender, Age, Discharge s T2DM, Number of comorbidities, Hospital cost, LoS, Days spent in acute bed, in chronic bed, Transfer code, Case classification, Pneumonia, Urinary tra (UTI), Cellulitis, Congestive heart failure, Inguinal hernia, Acute pancreatitis, Hearing, LoS, Hypertrophy, Acute pyelonephritis, Cerebral artery hemorrhag ebral hemorrhage, Congestive heart failure, Calculus of urethra, Obstructive ch chitis, Displacement of the lumbar vertebral disc, and Malignant neoplasm were analyzed using EDA plots and descriptive statistics.

Handling Missing Values
As with most clinical data, NHIRD data contained a significant number values. Initially, all the variables that were not selected for inclusion in the removed. Thereafter, we examined the proportion of missing values in each o

Handling Missing Values
As with most clinical data, NHIRD data contained a significant number of missing values. Initially, all the variables that were not selected for inclusion in the study were removed. Thereafter, we examined the proportion of missing values in each of the candidate variables. The overall missingness in each of the features was less than 10%, so we ultimately removed all the missing values from our study.

Features Selection
In our study, we selected predictors based on literature review, expert opinion, and univariate and bivariate analysis [25]. First, we identified features through expert opinions and a literature review. Thereafter, we conducted univariate and bivariate analyses on the feature set using chi-square. Subsequently, 24 features were selected for the LoS prediction, and 27 features were considered for the mortality prediction (see Figure 2).

Managing Class Imbalance
Accuracy is one of the most commonly used metrics to evaluate ML models. This measure is usually not sufficient when the data are highly imbalanced (as in the case of our study, the variance between survivors and the mortality was considerably high). However, the nature of our prediction problem required a high rate of correct detection of the mortality of patients. The most commonly used methods in many types of research to solve the class imbalance problem are oversampling the minority class [26], under-sampling the majority class [27], or a combination of both [24]. However, under-sampling may cause the loss of vital information by removing significant patterns, and similarly, over-sampling may cause overfitting and introduce additional computational tasks. To solve this problem, Chawla et al. [28] introduced a Synthetic Minority Over-sampling Technique (SMOTE) by generating a synthetic example rather than replacement with replication. Our study used a combination of oversampling by SMOTE and under-sampling by Random Under Sampler to address the class imbalance, and this combination gave us very good results in predicting mortality.

Managing Class Imbalance
Accuracy is one of the most commonly used metrics to evaluate ML models. This measure is usually not sufficient when the data are highly imbalanced (as in the case of our study, the variance between survivors and the mortality was considerably high). However, the nature of our prediction problem required a high rate of correct detection of the mortality of patients. The most commonly used methods in many types of research to solve the class imbalance problem are oversampling the minority class [26], under-sampling the majority class [27], or a combination of both [24]. However, under-sampling may cause the loss of vital information by removing significant patterns, and similarly, oversampling may cause overfitting and introduce additional computational tasks. To solve this problem, Chawla et al. [28] introduced a Synthetic Minority Over-sampling Technique (SMOTE) by generating a synthetic example rather than replacement with replication. Our study used a combination of oversampling by SMOTE and under-sampling by Random Under Sampler to address the class imbalance, and this combination gave us very good results in predicting mortality.

Predictive Model Development and Evaluation
Once the data were preprocessed, it was split into train and test datasets, and prediction algorithms were applied. We tested and evaluated various ML algorithms before finetuning the model hyperparameters. For the classification problem, we tested with a Decision Tree Classifier, Random Forest Classifier, Logistic Regression, AdaBoost Classifier, Bagging Classifier, Gradient Boosting Classifier, XGBoost Classifier, Support Vector Machines, K-Neighbors Classifier, and Naïve Bayes. After evaluation, we shortlisted a set of algorithms for hyperparameter tuning, namely Logistic Regression (LoR), Ridge Classifier (RC), Gradient Boosting Classifier (GBC), Bagging Classifier (BC), K-Neighbors Classifier (KNN), Random Forest Classifier (RFC), and Support Vector Machine (SVM) to predict mortality. For the regression problem, we used Linear Regression (LR), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), and Random Forest (RF) to predict LoS based on the patient characteristics described in predictors and outcome section. Other than predicting the response from a set of predictors, another common important step in data-driven modeling is to identify which predictors are most relevant to the prediction task and the contribution of each

Predictive Model Development and Evaluation
Once the data were preprocessed, it was split into train and test datasets, and prediction algorithms were applied. We tested and evaluated various ML algorithms before fine-tuning the model hyperparameters. For the classification problem, we tested with a Decision Tree Classifier, Random Forest Classifier, Logistic Regression, AdaBoost Classifier, Bagging Classifier, Gradient Boosting Classifier, XGBoost Classifier, Support Vector Machines, K-Neighbors Classifier, and Naïve Bayes. After evaluation, we shortlisted a set of algorithms for hyperparameter tuning, namely Logistic Regression (LoR), Ridge Classifier (RC), Gradient Boosting Classifier (GBC), Bagging Classifier (BC), K-Neighbors Classifier (KNN), Random Forest Classifier (RFC), and Support Vector Machine (SVM) to predict mortality. For the regression problem, we used Linear Regression (LR), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), and Random Forest (RF) to predict LoS based on the patient characteristics described in predictors and outcome section. Other than predicting the response from a set of predictors, another common important step in data-driven modeling is to identify which predictors are most relevant to the prediction task and the contribution of each feature in predicting the outcome. A grid search was set up for each combination of hyperparameters, and the best combination was selected by comparing scores from nested and non-nested 10-fold cross-validation procedures. After the selection of optimal tuning parameters, these were then used to train and evaluate the algorithms through nested 10-fold cross-validation. Ten percent of the training portion of each cross-validation was set aside for selecting the optimal classification threshold and the rest for final evaluation. The Receiver Operating Characteristics (ROC), Area Under the Curve (AUC), F-beta, Precision, Recall, Cross Validation score, Accuracy score, Balanced Accuracy score, Test score, and Area Under Precision-Recall (AUPR) were used to evaluate the models. Both mortality and LoS datasets were partitioned into training and testing sets in an 8:2 ratio. The training set was used to run the model, and the testing set was used to determine the performance of the model after learning.

Document Software and Libraries
We used string R version 4.

Patient Characteristics
A total of 65,037 patients were descriptively analyzed, as shown in Table 1. For the predictive analysis, we excluded the patients having both T2DM and HTN and considered the cohort with a total of 58,618 patients that included the patients with either T2DM and HTN only. The mean age of the cohort was 75.12 ± 13.65 years. Over half of the included patients were male (51.16%), and half of them were between the age of 58 and 80 (50.26%). Patients with only HTN (55.87%) had a higher proportion than patients with only T2DM (44.13%). The average LoS for patients with only T2DM (8.46 days) was higher than those for patients with only HTN (6.56 days). Patients with T2DM had a higher mortality rate (2.26%) than those with HTN (0.91%). The demographic characteristics of overall patients are shown in Table 1. Note-Length of stay (LoS), interquartile range (IQR), maximum (max), minimum (min). Continuous values were recorded as median (1st-3rd quantile), and categorical values were recorded as absolute numbers and percentages; * The difference is significant for p value < 0.05.

Features Selection
The following 24 variables were selected for LoS prediction: Gender, Closed fracture of unspecified part of neck of femur, Age, Age categorical, Pneumonia, Diabetes, Hypertension, Cerebral artery occlusion, UTI, Cellulitis, Intracerebral hemorrhage, Congestive heart failure, Hearing loss, Acute pyelonephritis, Acute pancreatitis, Aneurysm, Osteoarthrosis, Calculus of ureter, Inguinal hernia, Obstructive chronic bronchitis, Hypertrophy, Malignant neoplasm of the liver, and Displacement of a lumbar intervertebral disc.
The following 27 variables were selected for mortality prediction: Days in acute bed, Days in chronic bed, Transfer code, Case classification, Gender, LOS, Age, Age group, Pneumonia, Diabetes, Hypertension, Cerebral artery occlusion, UTI, Cellulitis, Intracerebral hemorrhage, Congestive heart failure, Hearing loss, Acute pyelonephritis, Acute pancreatitis, Aneurysm, Osteoarthrosis, Calculus of ureter, Inguinal hernia, Obstructive chronic bronchitis, Hypertrophy, Malignant neoplasm of the liver, Displacement of a lumbar intervertebral disc, Closed fracture of unspecified part of neck or femur.

Comorbidities of T2DM and HTN
The most common comorbidities among patients with T2DM and HTN from our data are shown in Table 2. Metabolic disorders were the highest of the most common comorbidities of inpatients with T2DM and HTN. Table 2. Most common comorbidities with T2DM and HTN.

Length of Stay (LoS)
We evaluated various metrics for LoS prediction. The performance of all the models is presented in Table 3. The best-performing model was XGBoost with R2 of 0.633, followed by RMSE 0.386, MAE 0.123, and MSE 0.312.

Feature Importance
A list of the top 15 features' importance plot in LoS prediction is shown in Figure 3. The figure shows that age is the most important feature in LoS prediction using chi-square. The other most important features influencing the LoS prediction were gender and diabetes as co-morbidity. Table 4 indicates the mortality prediction performance. The best results were determined by using RF with an AUROC of 0.996. Note: Models which were not included in the above tables were either overfit models or less accurate in predicting LoS and mortality. This can be further understood in Table 5.

Feature Importance
The top 15 features' importance scores in mortality prediction are shown in Figure 4. We selected all the features to predict mortality. The figure shows that the displacement of a lumbar intervertebral disc was the most important feature in mortality prediction using Random Forest Classifier. This comorbidity could be the major factor affecting mortality. The other most important features influencing mortality prediction were cerebral artery occlusion and age.  Table 4 indicates the mortality prediction performance. The best results were determined by using RF with an AUROC of 0.996. Models which were not included in the above tables were either overfit models or less accurate in predicting LoS and mortality. This can be further understood in Table 5.

Feature Importance
The top 15 features' importance scores in mortality prediction are shown in Figure 4. of a lumbar intervertebral disc was the most important feature in mortality prediction using Random Forest Classifier. This comorbidity could be the major factor affecting mortality. The other most important features influencing mortality prediction were cerebral artery occlusion and age.  Figures 5 and 6 show the accuracy and LoS plot for mortality prediction by using a neural network model. We developed a neural network to produce the accuracy and LoS plots on a training dataset. We obtained both accuracy and LoS plot values to evaluate the   Figures 5 and 6 show the accuracy and LoS plot for mortality prediction by using a neural network model. We developed a neural network to produce the accuracy and LoS plots on a training dataset. We obtained both accuracy and LoS plot values to evaluate the performance of the classification of subjects on each iteration. However, accuracy is a parameter that may present any bias in the data. Therefore, to confirm that the classification of subjects was statistically significant, we also estimate other parameter metrics. The LoS plot in Figure 5 shows that the model is relatively good since the dataset is unbalanced; however, the model needs to learn more.

The AUC and Precision-Recall Curves
The Figures 7-12 represent the AUC and Precision-recall curves for the different models used for prediction of LoS and mortality.

Calibration
The Figures 13-16 shows the calibration of different models used for prediction of LoS and mortality.

The AUC and Precision-Recall Curves
The Figures 7-12 represent the AUC and Precision-recall curves for the different models used for prediction of LoS and mortality.

Calibration
The Figures 13-16 shows the calibration of different models used for prediction of LoS and mortality.

Calibration
The Figures 13-16 shows the calibration of different models used for prediction of LoS and mortality.

Cross-Validation
The results from nested and non-nested cross-validation on the training dataset are compared in the figures below (see . We conducted five trials and compared the nested and non-nested cross-validation scores as well as the average difference in scores from each experiment. The x-axis and y-axis represented the individual trial # (# depicts number) and score, respectively. This was then applied to all the algorithms. We observed that the average difference between trials was noticeably small, i.e., the difference between nested and non-nested cross-validation scores was not much.

Cross-Validation
The results from nested and non-nested cross-validation on the training dataset are compared in the figures below (see . We conducted five trials and compared the nested and non-nested cross-validation scores as well as the average difference in scores from each experiment. The x-axis and y-axis represented the individual trial # (# depicts number) and score, respectively. This was then applied to all the algorithms. We observed that the average difference between trials was noticeably small, i.e., the difference between nested and non-nested cross-validation scores was not much.

Discussion
Earlier, we conducted a similar study using several machine learning techniques to predict LoS and mortality for patients diagnosed with T2DM and HTN in Indonesia [29]. Our previous study had the same objectives as the current one, but it was conducted using

Discussion
Earlier, we conducted a similar study using several machine learning techniques to predict LoS and mortality for patients diagnosed with T2DM and HTN in Indonesia [29]. Our previous study had the same objectives as the current one, but it was conducted using an Indonesian insurance claim-based dataset called Indonesia Case-Based Groups (INA-CBGs) from a state-owned type B regional public hospital in Tasikmalaya, the Dr. Soekardjo Regional Public Hospital (RSUD Dr. Soekardjo), with a sample size of 4376 patients. Our current study was conducted using Taiwan's NHIRD data using a greater sample size of 65,037 patients. The advantage of the current study is the NHIRD's data, which are made up of multiple hospitals and healthcare service clinics, and it is the best representation of the national population as it covers more than 99% of the resident population of Taiwan. In comparison to our previous study results, where LR and GBM models best predicted LoS and MLP best predicted the mortality, the current study also showed that XGBoost had the best performance in predicting the patients' LoS, along with RF, which had similar performance, while LoR performed the best in predicting mortality, closely followed by Ridge Classifier. The ML models in both of these two studies corroborate a good prediction of LoS and mortality among T2DM and HTN patients and hence, prove their utility in medical decision-making, patient safety, and hospital resource management.
In addition to our previous study, there is an abundance of other studies in the literature that utilize ML models for the prediction of diseases. For example, two of the studies used ML approaches for the prediction of LoS or mortality in diabetic patients [30,31], but neither of these studies predicted both LoS and mortality. Compared to that, in our study, LoS and mortality were predicted in order to enhance healthcare quality. The findings from our study revealed that the majority of the patients diagnosed with T2DM and HTN were male. Our findings differed from another study done in Taiwan that showed that women were more associated with HTN [30]. Our results showed that the majority of T2DM and HTN patients fall in the age group between 58 and 80, with the youngest patient being 35 years old. A population-based cross-sectional survey also found that the majority of the population aged 60 years and above were diagnosed with HTN in Taiwan [32]. Another study forecasted that the number of cases of diabetes in people aged ≥65 years will increase from 9.2 million in 2014 to 21 million in 2030 [33]. Although an increasing number of individuals with T1DM were old aged [34], this discussion of pathophysiology concerns T2DM, the most common incident and prevalent type in older age groups overwhelmingly, as older adults are at high risk for the development of T2DM due to the combined effects of increasing insulin resistance and impaired pancreatic islet function with aging.
Our study revealed that the discharge status of a large number of patients with HTN and T2DM was at the end of transfer in outpatient treatment. Comorbidity was also one of the factors affecting the outcome of a patient's medical condition. Our findings revealed that the majority of patients with T2DM have at least three or more comorbidities, while patients with HTN have at least two comorbidities. The most common comorbidities in our study included metabolic disorders, coronary artery disease, myocardial infarction, stroke, and congestive heart failure. Another study also indicated that ischemic stroke is one of the major vascular complications of diabetes mellitus [35]. Atherosclerotic cardiovascular disease, including coronary heart disease, cerebrovascular disease, and peripheral arterial disease, is the major cause of death and disability in patients with T2DM [36]. Furthermore, T2DM is associated with an increased risk of multiple coexisting medical conditions in older adults, such as cardiovascular and microvascular diseases [37,38]. A group of conditions termed geriatric syndromes also occurs at higher frequency in older adults with T2DM and may affect self-care abilities and health outcomes, including quality of life [39].
Our current findings (see Table 1) indicated that the inpatient cost of patients with T2DM exceeds patients with HTN. The findings are consistent with a study by Mutsa P. Mutowo, who also showed that there was a higher median cost and interquartile range (IQR) for DM patients compared with HTN patients [40]. In a study done in Taiwan, the risk of hospitalization and healthcare cost associated with diabetes complication severity index in Taiwan's NHIRD showed that inpatient costs constituted a large part of the total medical costs of DM and its complications [41]. In addition, it was found that the greater the number and severity level of T2DM complications, the higher risk of mortality and hospitalizations [42]. Furthermore, previous estimates of the costs associated with T2DM and its related problems in Taiwan have been based on (The Adapted Diabetes Complications Severity Index) DCSI scores rather than individual complications. The average inpatient LoS for T2DM patients was eight days; for HTN, it was approximately one week. Our results differ from a study done in Japan, where the mean LoS of DM patients ranged from 10.9 days to 15.1 days, depending on the patient's age [43].
Evaluation metrics are an integral aspect of ML, as they are used as indicators to assess the performance of ML models. The most commonly used metrics are accuracy and error rate [24]; however, these metrics are not the best measures to use if you have data that are highly imbalanced, as the overall accuracy will be biased toward the majority class regardless of the minority class, which will consequently lead to poor performance.
From the literature, the majority of researchers have used oversampling since this method is capable of balancing class distributions without removing potentially critical majority examples [44]. One of the most common errors that most people make is applying oversampling to the entire original data, conducting cross-validation, and finally evaluating the model [45]. This error usually leads to building biased models and producing overoptimistic error estimates. One of the strengths of our study is that we performed a combination of oversampling (SMOTE) and under-sampling methods. This procedure was applied during nested and non-nested cross-validation, the dataset was first divided into k stratified partitions, and only the training set was oversampled. In this procedure, the observations included in the test set are never oversampled or seen by the model during the training stage, thus allowing a proper evaluation of the model's capability to generalize. The top feature predictor is the displacement of a lumbar intervertebral disc in patients, which ultimately results in higher LoS and mortality. Several studies have been conducted in this area; for example, Sakellaridis et al. proved that patients operated on for lumbar disk disease have a statistically significant increased incidence of diabetes mellitus compared to similar patients operated on for other reasons [46].

Conclusions
In this study, we used ML algorithms to predict the LoS and mortality among T2DM and HTN patients. The results showed that the XGBoost was the best model for LoS, and LoR provided good results in mortality prediction. The low R2 score for LoS algorithms is a concern for practical use; we have taken LoS as a feature for mortality prediction and obtained good balanced accuracy. Therefore, we recommend that this model could be a possible prediction tool for medical decision-making. An accurate forecast of hospital stay and mortality enables early planning and treatment to improve patient's clinical outcomes. It can also help with better resource allocation and availability of hospital beds. Our results lay the foundation for future work in developing rapid and robust classification and regression algorithms that can leverage the minimal amount of available data. Moreover, the combination of oversampling and under-sampling can be applied to the unbalanced dataset. Institutional Review Board Statement: The Institutional Review Board of Taipei Medical University waived ethical review and approval for this study since only anonymized data were used.

Informed Consent Statement: Not applicable.
Data Availability Statement: Although anonymized data were used, the data that support the findings of this study are not publicly available. However, aggregated data are available from authors upon reasonable request and with the permission of NHIRD.