Developing a Prediction Model for 7-Year and 10-Year All-Cause Mortality Risk in Type 2 Diabetes Using a Hospital-Based Prospective Cohort Study

Leveraging easily accessible data from hospitals to identify high-risk mortality rates for clinical diabetes care adjustment is a convenient method for the future of precision healthcare. We aimed to develop risk prediction models for all-cause mortality based on 7-year and 10-year follow-ups for type 2 diabetes. A total of Taiwanese subjects aged ≥18 with outpatient data were ascertained during 2007–2013 and followed up to the end of 2016 using a hospital-based prospective cohort. Both traditional model selection with stepwise approach and LASSO method were conducted for parsimonious models’ selection and comparison. Multivariable Cox regression was performed for selected variables, and a time-dependent ROC curve with an integrated AUC and cumulative mortality by risk score levels was employed to evaluate the time-related predictive performance. The prediction model, which was composed of eight influential variables (age, sex, history of cancers, history of hypertension, antihyperlipidemic drug use, HbA1c level, creatinine level, and the LDL /HDL ratio), was the same for the 7-year and 10-year models. Harrell’s C-statistic was 0.7955 and 0.7775, and the integrated AUCs were 0.8136 and 0.8045 for the 7-year and 10-year models, respectively. The predictive performance of the AUCs was consistent with time. Our study developed and validated all-cause mortality prediction models with 7-year and 10-year follow-ups that were composed of the same contributing factors, though the model with 10-year follow-up had slightly greater risk coefficients. Both prediction models were consistent with time.


Introduction
Type 2 diabetes mellitus (T2DM) is a common chronic disease that imposes a significant financial burden on the health system. As T2DM is recognized as a systemic disease that often results in multiple complications, substantial direct and indirect medical expenditures related to routine care and complications throughout the lifetime of persons with T2DM [1] are likely to arise. Therefore, estimating the prospective economic cost from a population perspective is crucial to developing health policies. However, cost estimation is highly dependent on ethnicity, health care systems, culture, disease prevalence and progression, and mortality. Recently, Yang et al. organized a collaborative effort involving 22 prospective cohort studies in Asian countries with more than 1 million individuals to evaluate the risk of all-cause mortality in persons with T2DM. This study showed that the all-cause mortality risk in persons with T2DM was significantly higher by 1.89-fold than that in nondiabetic individuals; moreover, the risk varied by country and region and might well be influenced by socioeconomic status, health system, culture, etc. [2]. In the systematic review of prognostic indices for older adults, those explanatory variables were differentiated by settings (ex. community-dwelling patients, nursing home residents, etc.). Therefore, prognostic indices should be considered for heterogeneous populations to test accuracy [3]. The cohort study used for the prediction model, the Translating Research Into Action for Diabetes (TRIAD) study, started collecting data in 2000 and reported follow-up data at 4 years and 8 years: the significant factors for predicting all-cause mortality among T2DM patients were similar but with different coefficients [4,5]. However, more studies or cohorts are needed to examine whether this phenomenon can be applied in other countries. One would therefore conjecture that causes of death for persons with T2DM are multifaceted and that an all-cause mortality risk prediction model for persons with T2DM is crucial in assessing the economic impact of DM.
Taiwan launched the well-lauded National Health Insurance (NHI) program in 1995, and this program currently covers more than 99% of the population [6,7]. The Taiwan NHI provides Taiwan's population of 23 million with comprehensive benefit coverages, which include prescription drugs, ambulatory visits to Western and Chinese medicine doctors and dentists, hospital emergency and inpatient services, home care, and hospice care. To enhance care quality for individuals with diabetes mellitus, the Bureau of NHI (now NHI Administration, NHIA) introduced a pay-for-performance (P4P) scheme for diabetes care in 2001 [8]. According to a report based on NHI claims data, the number of individuals with T2DM dramatically increased from 1.3 to 2.2 million between 2005 and 2014 [9] and reached 2.3 million in 2019 (11% prevalence rate) due to population aging, which may jeopardize the capacity of the health care system. Therefore, it is imperative to classify T2DM into different risk levels to enhance clinical management and aid health policy makers in impact assessment. The idea for this study is to develop the models based on clinical application in hospital (Supplementary Figure S1). The aim of this study was to develop, validate, and compare 7-year and 10-year risk prediction models of all-cause mortality in T2DM subjects based on a prospective cohort follow-up design.

Study Design, Population, and Data Source
We incorporated a database from one sizable regional hospital with 1089 beds, Chang Gung Memorial Hospital in Keelung (CGMH-K), located in Keelung City, northern Taiwan, which was founded by the Chang Gung Medical Foundation in 1985. The CGMH-K has provided an annual average of 175,000 outpatient visits and a fully engaged P4P program for diabetes care since 2007. Outpatient records from 1 Jan 2007 to 31 Dec 2013 were systematically retrieved from the hospital-based information management system, which was established in 1995 based on hospital administrative management and NHI reimbursement. Patients who were aged 18 or over and had at least one hospital admission or ≥3 outpatient visits recorded with the Classification (ICD) version ICD-9-CM code 250 within one year [10] were defined as having diabetes but excluding type 1 DM (coding 250.x1, 250.x3). A total of 18,202 T2DM subjects were recruited as our study population (Supplementary Figure S2).

Definitions for Comorbidity and Biomarkers
We also retrieved information on biochemical examinations (levels of HbA1c, cholesterol, HDL, creatinine, etc.), comorbidity history (hypertension, cancers, etc.), and drug treatments (antihypertension, antihyperlipidemia, etc.) from the hospital management system to generate individual factors/variables. Subjects who had three or more outpatient visits within one year with ICD-9-CM codes for hypertension or hyperlipidemia were defined as having a history of these diseases. Those for whom at least one visit was recorded within one year as cancers or peripheral vascular disease (PVD) (ICD-9-CM = 440, 441, 442, 443.1, 443.8, 443.9, 447.1, 785.4) were classified as having a history of cancer or PVD, respectively. The candidate predictors and definitions we used in this study have been described in Supplementary Table S1.
All biomarkers were assessed by the hospital centralized medical lab examination according to the standards of the College of American Pathologists (CAP) and recorded by the hospital electronic management system that was approved by the official central laboratory. In light of clinical laboratory criteria, patients whose biomarker results showed HbA1c < 7%, total cholesterol (TC) level < 200 dL, triglyceride (TG) level < 150 dL, lowdensity lipoprotein cholesterol (LDL) level < 100 dL, high-density lipoprotein (HDL) level > 40 for males or >50 dL for females, LDL/HDL ratio < 3.55 dL for males and <3.22 dL for females, and creatinine level 0.64-1.27 dL for males and 0.44-1.13 dL for females were defined as normal; otherwise, they were classified as abnormal subjects. For those with missing values for any of the biomarker variables, we used the missing-indicator method [11] to treat them as complete data for all analyses.

Study Observational End Points
We linked with the Taiwan National Mortality Registry System to ascertain the mortality information, including causes and date of death, using a unique number from the Health and Welfare Data Science Center (HWDC), which covers a nationwide official database and is governed by the Ministry of Health and Welfare, and followed up by the end of 2013 and 2016 for the 7-year and 10-year risk prediction models, respectively. This study protocol was reviewed and approved by the Institutional Review Board (IRB) of Chang Gung Memorial Hospital (issued numbers 103-3101B and 106-2459C).

Statistical Analysis
The individual follow-up person-years were calculated from the first date of those subjects diagnosed with T2DM and clinic visits between Jan. 2007 and Dec. 2013 to the date of death, which was treated as an event; otherwise, surviving patients were treated as censored. The censoring time points of Dec. 2013 and Dec. 2016 were applied for the 7-year and 10-year all-cause mortality model analyses, respectively. Time-to-event (death) analysis was employed to investigate the potential factors that affected all-cause mortality based on persons with T2DM in Taiwan. All statistical analyses were performed by SAS software, version 9.4 (SAS Institute Inc., Cary, North Carolina, USA). We also used SAS ® Viya ® 3.5 (SAS Visual Analytics) of Cloud Analytic Services (CAS) Library to perform LASSO (least absolute shrinkage and selection operator) method for model selection and best criterion value of model were selected based on SBC (Schwarz Bayesian criterion).

Model Selection and Development
The visualized graphical methods, plotting Schoenfeld residuals by time, were conducted to check the proportional hazards assumption (Supplementary Figure S3A,B). The multivariable Cox proportional hazards model was used to explore those factors and estimate the adjusted hazard ratio (aHR), which played a significant role in all-cause mortality for T2DM subjects, and was carried out using the stepwise approach with a p-value <0.05. Considering the number of variables included, the Akaike information criterion (AIC) was also applied for parsimonious model selection, and a lower AIC value was preferred. Besides the traditional model selection technique of stepwise approach, we also conducted the LASSO method that developed by Tibshirani [12] to compare the model selection. The plots for selection step of efficient sequence with standard coefficient and SBC criterion were generated demonstrated by selection procedures. The model with smaller SBC is better for selection.

Model Performance
As the continuous risk score generated by the prediction model, the receiver operating characteristic (ROC) curve was composed of sensitivity and specificity that were determined by different cutoff points. To evaluate the accuracy of our prediction models with long-term follow-up, Harrell's C-statistic for time-to-event analysis was applied for predictive performance examination and employed the time-dependent area under the ROC curve (AUC) to check the predictive accuracy and consistency at different time points at which the 95% confidence interval (CI) of the AUC with the standard error (SE) computed by inverse-probability of censoring weighted (IPCW) was generated by 500 iterations. The integrated AUC for all time points was also adopted for evaluation [13][14][15].

Model Validation
The full samples were used to construct the risk prediction model based on multivariable Cox regression. First, based on the individual risk score, they were categorized into low-(<33.3%), intermediate-(33.3-66.6%), and high-risk (>66.6%) groups based on tertile grouping and demonstrated the cumulative mortality curves that were examined by simultaneous multiple comparisons with the Šidák correction adjustment [16]. For model internal validation, the samples were randomly divided into two groups of equal size. One half of the sample, the training data, was used as the estimation sample to obtain a set of parameter estimates based on the variables from the full sample. Then, the other half of the sample, the validation data, was used for validation, and the predicted mortality was compared with the actual observed mortality using a time-dependent ROC curve, AUC, and cumulative mortality curves (Supplementary Figure S6). Based on the LASSO approach for model selection, we also conducted random 50% dataset for each training and validation to validate those models with selected parameters. The efficient sequence for selection with SBC criterion were simultaneously demonstrated and compared with results of training and validation datasets.

Characteristic of Study Subjects
The median follow-up time and number of deaths were 4.81 years (2779 deaths) and 6.75 years (4561 deaths) for the 7-and 10-year follow-ups, respectively (Supplementary Figure S2). A total of 18,202 T2DM subjects aged ≥18 years (mean age = 61.51, SD = 13.27) were recruited for this study, including 9065 females (49.8%) and 9137 males (50.2%). The distributions of age, year of study entry, and prevalence of diseases were similar between females and males. However, only total cholesterol levels, HDL levels, and the use of antihyperlipidemic drugs were slightly higher in females than in males (Supplementary Table S2). The all-cause mortality rates among individuals with T2DM were 3.50 and 3.71 per 100 for the 7-year and 10-year follow-ups, respectively. Higher mortality rates were observed for subjects with a history of cancer, PVD, hypertension, abnormal creatinine levels, and missing values on lipid profiles/biomarkers than in normal subjects or those with no history. Similar phenomena and trends were also observed at the 10-year follow-up ( Table 1). The distribution of causes of mortality was demonstrated to have no significant difference between the 7-year and 10-year follow-ups. The major cause of death was cancer (23-24%) (Supplementary Table S3).

Factors and Coefficients of Prediction Models for All-Cause Mortality
Before the Cox regression analysis, our data did not violate the assumption of proportional hazards according to the graphic method with Schoenfeld residuals over time. Taking HbA1c as an example, the three categories (normal, abnormal, and missing) were parallel to each other and independent of time (Supplementary Figure S3A for 7 years and S3B for 10 years). First, parsimonious multivariable Cox regression models were developed by stepwise selection and AIC criteria (Supplementary Table S4). The second and fourth columns in Table 2 present the adjusted HR prediction model for the 7-year and 10-year follow-up data. The variables that reached statistical significance included male sex, history of cancer, history of hypertension, abnormal HbA1c, high creatinine levels, and LDL /HDL ratio with adjusted HRs of 1.21, 1.40, 1.30, 1.28, 2.50, and 1.29, respectively. For patients aged <50 y/o, the adjusted HRs were 1.48, 2.69, and 5.64 for those aged 50-59, 60-69, and ≥70, respectively. However, for those who use antihyperlipidemic drugs, the adjusted HR shows a protective effect on all-cause mortality of 0.58. A similar adjusted HR trait was also present at the 10-year follow-up, but with a slight increase ( Table 2). In addition to age, abnormal creatinine levels, as a parameter of kidney function, demonstrate a higher risk of all-cause mortality for persons with T2DM. Furthermore, based on the LASSO method for model selection using SBC criterion, those selected variables for final models for 7-year and 10-year were same as stepwise approach (Figure 1, (A) 7-year, (B) 10-year). Both standard coefficients for variables and SBC criterion can demonstrate the efficient sequence of variables on all-cause mortality. Those SBC for selection steps were indicated by best criterion value (with *). The selected models were with same selected variables for both 7-year and 10-year models, respectively, but the effect order of steps was slightly different (Supplementary Table S5). The results of LASSO method demonstrated the similar trait for those selected variables (Supplementary Table S6).   The final prediction model for all-cause mortality was developed based on the model selection for 7-year and 10-year follow-ups. In addition to the adjusted HRs presenting the risk of mortality, the coefficients of the final parsimonious models are also provided in Table 2 for individual all-cause mortality risk prediction. The all-cause mortality risk scores can be calculated as follows:  The final prediction model for all-cause mortality was developed based on the model selection for 7-year and 10-year follow-ups. In addition to the adjusted HRs presenting the risk of mortality, the coefficients of the final parsimonious models are also provided in Table 2 for individual all-cause mortality risk prediction. The all-cause mortality risk scores can be calculated as follows:

Performance of Prediction Models for All-Cause Mortality
The individual risk score was generated based on the coefficients of the final Cox regression and the low-, intermediate-, and high-risk groups. The risk prediction by using cumulative all-cause mortality was successfully discriminated at 7 years and 10 years (Figure 2A,B), and both p-values of the log-rank test were <0.0001. To evaluate the concordance of the prediction model for the time to death based on the final models that we developed, considering the time-dependent dynamic event, Harrell's C-statistic was 0.7955 (95% CI: 0.7873, 0.8037) and 0.7775 (95% CI: 0.7708, 0.7842) for the 7-year and 10-year models, respectively ( Table 2). The time-varying AUCs at the 2nd, 4th, and 6th years were 0.8053, 0.7954, and 0.7934 for the 7-year follow-up and 0.7958, 0.7854, 0.7890, and 0.7897 (8th year) for the 10-year follow-up, respectively. These AUCs did not show significant differences at different follow-up times ( Figure 3A-D). Furthermore, considering the predictive performance of the AUC with 95% CI at continuous times, the IPCW method with 500-iterating samples demonstrated a slightly high AUC within one year, and AUCs were consistent with the follow-up time regardless of the different time points. The same pattern was shown in the 7-year and 10-year models ( Figure 4A,B).

Performance of Prediction Models for All-Cause Mortality
The individual risk score was generated based on the coefficients of the final Cox regression and the low-, intermediate-, and high-risk groups. The risk prediction by using cumulative all-cause mortality was successfully discriminated at 7 years and 10 years (Figure 2A,B), and both p-values of the log-rank test were <0.0001. To evaluate the concordance of the prediction model for the time to death based on the final models that we developed, considering the time-dependent dynamic event, Harrell's C-statistic was 0.7955 (95% CI: 0.7873, 0.8037) and 0.7775 (95% CI: 0.7708, 0.7842) for the 7-year and 10-year models, respectively ( Table 2). The time-varying AUCs at the 2nd, 4th, and 6th years were 0.8053, 0.7954, and 0.7934 for the 7-year follow-up and 0.7958, 0.7854, 0.7890, and 0.7897 (8th year) for the 10-year follow-up, respectively. These AUCs did not show significant differences at different follow-up times ( Figure 3A-D). Furthermore, considering the predictive performance of the AUC with 95% CI at continuous times, the IPCW method with 500-iterating samples demonstrated a slightly high AUC within one year, and AUCs were consistent with the follow-up time regardless of the different time points. The same pattern was shown in the 7-year and 10-year models ( Figure 4A,B).

Validation of Prediction Models for All-Cause Mortality
First, using the random half of dataset (training), the model selection based on SBC and best criterion value, the results of variables selected were the same as our final model. The SBC for both 7-year and 10-year were shown on Supplementary Table S7. Second, based on those 8 parameters of selected models, the random 50% cross-validation showed patterns of standard coefficient and coefficient progression step were similar ( Figure S4 (A) for 7-year and S5 (A) for 10-year). Comparing log-likelihood of training with validation datasets, they were close to each other for the selection step ( Figure S4 (B) for 7-year and Figure S5 (B) for 10-year). On the other hand, the time-dependent AUC based on cross-validation with 9101 and 9101 subjects for the training and validation datasets, respectively, was employed to validate the predictive performance, and the schema is shown in Supplementary Figure S6. The distributions of variables between the training and validation data were not significantly different (Supplementary Table S8). Second, the cumulative all-cause mortality curves showed that the predicted and observed data were very close regardless of whether the 7-year or 10-year follow-up data were assessed (Supplementary Figure S7 (A,B)). For the performance validation of prediction model, the ROC curves and AUCs for the 2nd-, 4th-, 6th-, and 10th-year time points also demonstrated no significant difference (Supplementary Figure S8

Validation of Prediction Models for All-Cause Mortality
First, using the random half of dataset (training), the model selection based on SBC and best criterion value, the results of variables selected were the same as our final model. The SBC for both 7-year and 10-year were shown on Supplementary Table S7. Second, based on those 8 parameters of selected models, the random 50% cross-validation showed patterns of standard coefficient and coefficient progression step were similar ( Figure S4A for 7-year and S5A for 10-year). Comparing log-likelihood of training with validation datasets, they were close to each other for the selection step ( Figure S4B for 7-year and Figure S5B for 10year). On the other hand, the time-dependent AUC based on cross-validation with 9101 and 9101 subjects for the training and validation datasets, respectively, was employed to validate the predictive performance, and the schema is shown in Supplementary Figure S6. The distributions of variables between the training and validation data were not significantly different (Supplementary Table S8). Second, the cumulative all-cause mortality curves showed that the predicted and observed data were very close regardless of whether the 7-year or 10-year follow-up data were assessed (Supplementary Figure S7A

Discussion
The CGMH-K is the largest hospital in Keelung, northern Taiwan, and cares for onethird of the people with T2DM in Keelung City, according to NHI statistics. Some studies of all-cause mortality prediction from Western countries have been reported, but few have been based on Taiwan, in which the national health insurance covers more than 99% of the population. Therefore, our study described the development of a prediction model for all-cause mortality based on data from individuals with T2DM. The predictive performance of the C-statistic was 0.7955 and 0.7775, and the integrated time-dependent AUC reached 0.8136 and 0.8045 for the 7-year and 10-year follow-up, respectively. The performance was also consistent at different time points; moreover, the cross-validation demonstrated a good fit for different risk levels. Compared with our prediction models for allcause mortality, the performance of the C-statistic was 0.80 in a multiethnic study in New Zealand [17], 0.77 (male) and 0.78 (female) in a Chinese study in Hong Kong [18], and 0.81 in a cohort study in Italy [19,20]. Regardless of ethnicity, these results were similar, and the prediction performance was slightly higher for females than for males. Our C-statistics for performance by sex are presented in Supplementary Table S9.
Epidemiological studies and the biological mechanism of inflammation in diabetes have demonstrated that diabetes is an independent risk factor for the incidence of specific cancers and increases the risk of all-cause mortality and poor prognosis [21]. On the other hand, according to the vital statistics reported by the Taiwan Ministry of Health and Welfare, overall cancer mortality has been the leading cause of mortality in Taiwan for over three decades. Therefore, the development of a risk prediction model for diabetes could not omit cancer status from the estimation, while the impact of cancer on health is well recognized. As shown in our results, a history of cancer was associated with a 1.47-fold (95% CI: 1.38, 1.57) increased risk of all-cause mortality. In light of a previous all-cause mortality prediction model for T2DM that was constructed based on the Hong Kong Diabetes Registry, a history of cancer presented the highest risk as a significant prediction factor [22]. However, the prediction model was based on a Hong Kong Chinese population excluding subjects who were diagnosed with cardiovascular disease (CVD) or cancers at baseline; more importantly, there was a high prevalence rate of cancers in our study (23.4%, Supplementary Table S2), and cancer and CVD were the top two leading causes of death in Taiwan (Supplementary Table S3) and other countries. This would underestimate the impact of diabetes on the outcome spectrum, especially on all-cause mortality. Though Hong Kong and Taiwan have similar ethnic Chinese populations (but different cultural and health care systems), the overall mortality rate in Hong Kong was 4.67% (male: 5.81%, female: 3.68%) [17], which was higher than that in the Taiwanese study (overall: 3.50%, male: 3.66%, female: 3.34%). Further study is needed to explore the factors/reasons contributing to the difference in mortality.

Discussion
The CGMH-K is the largest hospital in Keelung, northern Taiwan, and cares for one-third of the people with T2DM in Keelung City, according to NHI statistics. Some studies of all-cause mortality prediction from Western countries have been reported, but few have been based on Taiwan, in which the national health insurance covers more than 99% of the population. Therefore, our study described the development of a prediction model for all-cause mortality based on data from individuals with T2DM. The predictive performance of the C-statistic was 0.7955 and 0.7775, and the integrated time-dependent AUC reached 0.8136 and 0.8045 for the 7-year and 10-year follow-up, respectively. The performance was also consistent at different time points; moreover, the cross-validation demonstrated a good fit for different risk levels. Compared with our prediction models for all-cause mortality, the performance of the C-statistic was 0.80 in a multiethnic study in New Zealand [17], 0.77 (male) and 0.78 (female) in a Chinese study in Hong Kong [18], and 0.81 in a cohort study in Italy [19,20]. Regardless of ethnicity, these results were similar, and the prediction performance was slightly higher for females than for males. Our C-statistics for performance by sex are presented in Supplementary Table S9.
Epidemiological studies and the biological mechanism of inflammation in diabetes have demonstrated that diabetes is an independent risk factor for the incidence of specific cancers and increases the risk of all-cause mortality and poor prognosis [21]. On the other hand, according to the vital statistics reported by the Taiwan Ministry of Health and Welfare, overall cancer mortality has been the leading cause of mortality in Taiwan for over three decades. Therefore, the development of a risk prediction model for diabetes could not omit cancer status from the estimation, while the impact of cancer on health is well recognized. As shown in our results, a history of cancer was associated with a 1.47-fold (95% CI: 1.38, 1.57) increased risk of all-cause mortality. In light of a previous all-cause mortality prediction model for T2DM that was constructed based on the Hong Kong Diabetes Registry, a history of cancer presented the highest risk as a significant prediction factor [22]. However, the prediction model was based on a Hong Kong Chinese population excluding subjects who were diagnosed with cardiovascular disease (CVD) or cancers at baseline; more importantly, there was a high prevalence rate of cancers in our study (23.4%, Supplementary Table S2), and cancer and CVD were the top two leading causes of death in Taiwan (Supplementary Table S3) and other countries. This would underestimate the impact of diabetes on the outcome spectrum, especially on all-cause mortality. Though Hong Kong and Taiwan have similar ethnic Chinese populations (but different cultural and health care systems), the overall mortality rate in Hong Kong was 4.67% (male: 5.81%, female: 3.68%) [17], which was higher than that in the Taiwanese study (overall: 3.50%, male: 3.66%, female: 3.34%). Further study is needed to explore the factors/reasons contributing to the difference in mortality.
CVD is ranked as the leading cause of death and an important health care issue worldwide, but a high blood cholesterol level is a major determinant of CVD. Cholesterollowering drugs, such as statins, were developed in the 1990s and have also been issued for clinical care and covered by National Health Insurance in Taiwan since 2003. Our results showed that after adjustment for other significant factors, compared with no use of hyperlipidemia drugs, the use of antihyperlipidemic drugs significantly reduced all-cause mortality. In 2013, a meta-analysis based on several trials demonstrated the significant 14% reduction in all-cause mortality [23], and a meta-analysis based on statin trials with long-term follow-up (posttrial) found a 10% all-cause mortality reduction [24]. In 2017, a study with a 5-year follow-up based on people from Hong Kong with T2DM reported that statin use significantly reduced CVD risk and all-cause mortality (adjusted HR = 0.487) [25]. In 2018, Chen et al. conducted a retrospective cohort study based on hospital outpatients with T2DM in central Taiwan to evaluate the effect of statin use on all-cause mortality, and the results also demonstrated a significant reduction benefit [26]. It is obvious that the use of antihyperlipidemic drugs can make a significant contribution to reducing allcause mortality, and we could not omit this factor from the prediction model for all-cause mortality, especially for subjects from recent healthcare databases.
In 2019, Li et al. reported the annual all-cause mortality in persons with T2DM between 2005-2014 using the Taiwan NHI nationwide-scale database based on the same criteria as our study using the annual deaths divided by the prevalence of T2DM among individuals who were alive on 1 January of each year. The annual mortality rates were 3.24% for all persons with T2DM, 2.93% for females, and 3.54% for males [9]. Our study demonstrated that the allcause mortality rates were 3.50%, 3.34%, and 3.66% for all individuals, females, and males with T2DM, respectively, based on a 7-year follow-up. Compared with Li et al. [9], who employed a one-year follow-up, the slightly higher mortality rate in our study can be attributed to a longer-term follow-up. Moreover, the NHI database constructed from administrative claims data using ICD diagnosis codes does not include important biomarkers, such as levels of TG, HDL, HbA1c, creatinine, etc. Moreover, our study exploits a hospitalbased prospective cohort with rich laboratory biomarker information, which is crucial to complement the development of a risk prediction model.
The study by Li et al., which linked the Taiwan National Diabetes Care Management Program (NDCMP) with the Health Insurance Research Database using the same criteria as our study to identify T2DM subjects between 2001-2004 and calculated in-hospital mortality by follow-up until the end of 2011, found similar results as our study [27]. An abnormal creatinine level was identified as a highly significant risk predictor for mortality, and the prediction AUCs for in-hospital mortality at 5 and 8 years were 0.770 and 0.756, respectively. Compared with our study, the model reported by Li et al. [26] restricted the outcome to in-hospital deaths only; consequently, patients who died outside the hospital were not included, whereas our study linked the individual data with a nationwide death registry to identify all deaths. In addition, cancer history was not included in Li et al.'s model development. The performance of the prediction model might be enhanced if these two issues were addressed.
Missing values for important variables, such as HbA1c, LDL, and HDL levels, which suggests that persons with T2DM have low compliance or may miss regular follow-up visits (Table 1), is also an issue to address in our study. We hence adopted the missingindicator method to include those participants for complete data analysis, as it may capture health awareness or compliance into consideration.
Some research limitations bear mentioning in our study. The first is data limitation. Although our study sample was constructed with persons with T2DM from only one sizable regional hospital, CGMH-K, this hospital covers more than one-third of individuals with T2DM care in the northern City-Keelung. Hence, our study sample is still representative of the population. Our data also lack health-related behavioral factors, such as exercise, alcohol consumption, and cigarette smoking, which are usually unavailable in hospitalbased datasets. Second, model validation using an external population, such as persons with T2DM at hospitals of the same level in different counties, would be ideal but unfortunately not obtainable at the time of the study due to time and resource constraints. However, the internal validation results seem satisfactory.
Third, quality of care, sociodemographic characteristics, and individuals' levels of health awareness might vary by region in Taiwan; therefore, the prediction model might be slightly different among cities and counties. However, the rigorous approaches adopted in the development of our risk prediction model, including variable ascertainment and external validation, can still provide good references for other hospitals interested in building risk prediction models for clinical and research applications.

Conclusions
Our study developed and validated an all-cause mortality prediction model based on Taiwanese hospital-based diabetes with 7-year and 10-year follow-ups. The methods and risk prediction parameters can be applied to identify high-risk mortality in hospital clinical care and to further assess the net value of treatment options in economic evaluation.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10.3 390/jcm10204779/s1, Figure S1: Proposed study scenario and application; Figure S2: Flow diagram for study subjects; Figure S3 Table S1: Description of variables, candidate predictors and definition in this study; Table S2: Distribution of patient characteristics and risk factors by sex; Table S3: Numbers and causes of deaths by the 7-year and 10-year follow-ups; Table S4: AIC for model selection; Table S5: Model selection using SBC with best criterion value for 7-year and 10-year model; Table S6: The results of regression coefficient for 7-year and 10-year using Lasso method; Table S7: Model selection based on training data using SBC with best criterion value for 7-year and 10-year model; Table S8: Distribution of patient characteristics and risk factors by training and validation data; Table S9: Harrell's C statistic by sex based on 7-year and 10-year follow-ups.
Informed Consent Statement: Patient consent was waived due to the retrospective nature of the study, and the analysis used anonymous data.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to committee process.