Abstract
Background: Identifying patients with diabetes mellitus (DM) is often performed in epidemiological studies using electronic health records (EHR), but currently available algorithms have features that limit their generalizability. Methods: We developed a rule-based algorithm to determine DM status using the nationally aggregated EHR database. The algorithm was validated on two chart-reviewed samples (n = 2813) of (a) patients with atrial fibrillation (AF, n = 1194) and (b) randomly sampled hospitalized patients (n = 1619). Results: DM diagnosis codes alone resulted in a sensitivity of 77.0% and 83.4% in the AF and random hospitalized samples, respectively. The proposed algorithm combines blood glucose values and DM medication usage with diagnostic codes and exhibits sensitivities between 96.9% and 98.0%, while positive predictive values (PPV) ranged between 61.1% and 75.6%. Performances were comparable across sexes, but a lower specificity was observed in younger patients (below 65 versus 65 and above) in both validation samples (75.8% vs. 90.8% and 60.6% vs. 88.8%). The algorithm was robust for missing laboratory data but not for missing medication data. Conclusions: In this nationwide EHR database analysis, an algorithm for identifying patients with DM has been developed and validated. The algorithm supports quantitative bias analyses in future studies involving EHR-based DM studies.
1. Introduction
Identifying patients with diabetes mellitus (DM) is often an obligatory feature in many electronic health record (EHR)-based analyses for defining and applying inclusion/exclusion criteria and addressing possible confounding and effect modifications.
Many algorithms for detecting DM in EHR data already exist, but they are predominantly derived from data arising from a single/common health system(s) in which critical data elements used to define DM status (e.g., coded diagnoses and specific medication exposures) are presumably identical throughout the data [,,,,]. However, these features of existing algorithms undermine their applicability in external settings such as ours in Singapore, where data from various medical record systems across the country are aggregated with minimal processing. These aggregated medical records are primarily intended for care provision, in a setting where patients routinely consult multiple providers across different health systems over time. Challenges, however, arise when analyzing this relatively unharmonized database for insights. While the upfront conversion of all data contributors to a common data model is a viable strategy for circumventing issues of disparate data schemas when conducting multi-center analyses [,,,], the migration of source data into the data model on a regular basis can be considerably burdensome [,].
In this validation study, we sought to develop an algorithm that is adequately accommodative for identifying patients with DM in an aggregated database of diverse EHR sources. A combination of EHR data elements is used, and the algorithm’s accuracy and consistency are assessed on two datasets comprising over 2000 chart-reviewed patients. The first dataset is a group of 1194 patients who were hospitalized and newly diagnosed with Atrial Fibrillation (AF) and who had initiated oral anticoagulation therapy in 2019 or 2020. The second was a randomly sampled set of patients admitted to any public healthcare institution in 2019 or 2020 who had the required data elements for the gold standard labelling of diabetes status through chart reviews (n = 1619).
2. Results
There were a total of 608 and 586 patients in the 2019 and 2020 AF cohorts and 808 and 811 patients in the 2019 and 2020 random hospitalized sample, respectively. Sex distributions across both samples were equivalent (Table 1). Patients in the random sample were expectedly younger (mean age 47.5 and 45.8 years in 2019 and 2020, respectively) compared to the AF cohort (mean age 72.2 and 72.4 years in 2019 and 2020, respectively). Similarly, there was a larger proportion of DM patients in the AF cohorts (37.5 and 39.1%) as compared with the random sample (24.5 and 20.8%, Table 1).

Table 1.
Demographic profile of patients in both study samples.
Table 1.
Demographic profile of patients in both study samples.
Atrial Fibrillation Cohort (n = 1194) | Random Hospitalized Sample (n = 1619) | ||||
---|---|---|---|---|---|
2019 (n = 608) | 2020 (n = 586) | 2019 (n = 808) | 2020 (n = 811) | ||
Sex, n (%) | Male | 305 (50.2%) | 310 (52.9%) | 380 (47.0%) | 401 (49.4%) |
Female | 303 (49.8%) | 276 (47.1%) | 428 (53.0%) | 410 (50.6%) | |
Race, n (%) | Chinese | 451 (74.2%) | 458 (78.2%) | 514 (63.6%) | 489 (60.3%) |
Malay | 92 (15.1%) | 81 (13.8%) | 139 (17.2%) | 137 (16.8%) | |
Indian | 29 (4.8%) | 25 (4.3%) | 84 (10.4%) | 99 (12.3%) | |
Others | 36 (5.9%) | 22 (3.8%) | 71 (8.8%) | 86 (10.6%) | |
Age | Mean | 72.2 | 72.4 | 47.5 | 45.8 |
Standard deviation | 11.8 | 12.0 | 28.8 | 27.5 | |
Diabetes | Yes | 228 (37.5%) | 229 (39.1%) | 198 (24.5%) | 169 (20.8%) |
No | 380 (62.5%) | 357 (60.9%) | 610 (75.5%) | 642 (79.2%) |
Collectively, 50.0% (n = 597) and 36.1% (n = 584) of patients were predicted to have DM in the AF and random samples, respectively using an algorithm which was designed classify the record using various checkpoints that screened for the presence of DM related diagnosis codes, abnormal lab tests, and diabetic medications. Figure 1 illustrates the number of patients identified at each stage of the algorithm for the combined AF cohort.
The sensitivity and positive predictive value (PPV) ranged from 96.9 to 98.0% and from 61.1 to 75.6%, respectively, across all groups (Table 2). The PPV was notably lower in the random hospitalized sample by approximately 12 to 15 percentage points compared to that of the AF cohort. False-negatives were, however, uncommon, as illustrated by the high negative predictive values (NPV) ranging between 97.5 and 99.3%.

Table 2.
Performance of the algorithm on the AF cohort and random sample of hospitalized patients.
With diagnosis codes alone, modest sensitivity values of 77.0% and 83.4% are achieved (Table 3). When additional laboratory tests and medication criteria are combined, the sensitivity rises to 97.4% for the AF cohort and 97.8% for the random sample. The majority of the DM patients were identified by the diagnosis and laboratory test checkpoints, likely due to their sequential application, although there were marked increases in false-negatives on applying the laboratory test criteria (Table 3).

Table 3.
Cumulative sensitivity with respect to the respective data element checkpoints for the AF cohort and random cohorts.
The algorithm performed consistently across the age and sex subgroups, with high sensitivity and NPV but lower specificity and PPV across all strata. In both instances, the specificity was higher in the younger age group compared to those aged 65 and above (90.8% vs. 75.8% and 88.8% vs. 60.6%) (Table 4).

Table 4.
Stratified performance of the algorithm in different age and sex subgroups.
A total of 152 false-positives and 12 false-negatives were found in the AF cohort, and 225 false-positives and 8 false-negatives were found in the random hospitalized sample. While the majority of the misclassifications occurred because DM was often stated in the hospital discharge summary but not captured in the structured data elements (diagnosis, laboratory tests or medication records data), there were also other reasons for misclassification, such as the patient having impaired fasting glucose, pre-diabetes or hyperglycemia due to other reasons (Table 5).

Table 5.
Reasons for algorithmic misclassification in both cohorts.
When modified to simulate scenarios of missing medication or laboratory test data (i.e., only diagnosis codes with either laboratory tests or medication data but not both), there were reductions in sensitivity in both samples, but to a larger degree in the Combined AF cohort (Table 6). While the availability of laboratory test data (but missing medications) led to a smaller loss in sensitivity compared to having medication data (but missing laboratory tests), considerably higher PPV and specificity are observed when medication data are available but laboratory tests are missing, suggesting that elevated glucose tests are more sensitive but DM medication use is more specific.

Table 6.
Algorithm performance in the absence of laboratory tests or medication data in both cohorts.
In terms of demographics, the DM cohorts identified by all three algorithms had similar age and sex distributions as compared with the actual DM patients. However, in the DM cohorts identified using (i) all three criteria and (ii) excluding medications, the proportion of Chinese patients identified was slightly higher as compared to the actual DM group (Table A7).
3. Methodology
3.1. Study Setting and Algorithm Development
The database includes patients with visits to all public healthcare facilities and captures approximately 85% of all nationwide acute hospital admissions and over 40% of all chronic disease outpatient visits []. An exploratory exercise was undertaken to identify potentially useful data elements that could help identify patients with DM in this database. All patients who fulfilled at least one of the following criteria (between 2018 and 2021) were first identified: (a) presence of a Systemized Nomenclature of Medicine—Clinical Terms (SNOMED-CT) or International Classification of Diseases—Ninth and Tenth Revision (ICD 9 or ICD 10) code related to DM, (b) an abnormal blood glucose or glycated hemoglobin (HbA1c) laboratory test result or (c) prescribed any DM-related medication. Commonly used DM diagnosis codes, medications, laboratory tests for measuring blood glucose levels along with their upper bound thresholds and observed test frequencies were shortlisted and used to derive an algorithm for identifying DM patients (Figure 2). The full lists of shortlisted diagnosis codes, laboratory tests and medications are found in Table A1, Table A2, Table A3, Table A4 and Table A5 of the Appendix A, respectively [,,].

Figure 2.
Flowchart used to phenotype patients with diabetes mellitus.
Patients are categorized as diabetic if any one of the following are fulfilled: presence of a DM-related diagnosis code, presence of at least two glucose or HbA1c laboratory tests above the upper limit of normal, separated at least 30 days apart, or if they were prescribed any DM-related medication. For the ease of deployment, the algorithm was modularly designed to allow for assessing one data element at a time. As our database includes records from different healthcare institutions using a variety of laboratory assay equipment, defining a fixed threshold for the upper bound of normal values on all relevant blood glucose tests was not possible, as different facilities have slightly varying reference ranges. Setting-specific reference ranges are therefore used to identify abnormally high test results.
3.2. Validation Population and Chart Review
We validated the algorithm on two distinct patient samples, each with data from 2019 and 2020. The first dataset was a pre-selected group of 1194 patients who were hospitalized and newly diagnosed with AF and who had initiated oral anticoagulation therapy in 2019 or 2020. Diabetes is an important risk factor that potentially influences complication risks in patients with AF, and it would therefore be of interest to accurately identify DM status amongst AF patients [,]. The second group was a randomly sampled set of patients admitted to any public healthcare institution in 2019 or 2020 who had the required data elements for the gold standard labelling of diabetes status through chart reviews (n = 1619), as with the two AF cohorts. Only data that were recorded before or on the discharge date of the patient’s inpatient admission episode were used. Stratified analyses were performed by age and sex, and the reasons for misclassification were reviewed for a sample of false-positives and false-negatives. The performance of the algorithm in instances of missing laboratory and medication data was additionally evaluated. Lastly, a comparison of the DM cohorts identified by each algorithm was performed to analyze the impact of the choice of algorithm on the final DM cohort selected.
Chart reviews were performed on all cases used for validation (n = 2183, from both samples) by 15 clinically trained pharmacovigilance officers who had previously annotated a common set of 200 patient records (not included in this paper) with a near perfect agreement of 98.1% against the collectively derived gold standard label and good inter-annotator agreements of 0.88–1 (Table A6) for the presence of DM.
4. Discussion
DM poses a significant public health burden worldwide. A ‘War on Diabetes’ has been officially declared by the health ministry in Singapore, and diabetes has been made a key research focus area by national research funding agencies, with the aim to identify effective strategies for minimizing the impact of DM on its citizens and the health system []. With initiatives to make EHR data available for secondary analysis more readily, several forthcoming EHR-based epidemiological analyses on DM may be expected []. The proposed algorithm is therefore developed in anticipation of its use over time.
A unique feature of this study is its inclusion of a relatively large validation sample. These samples include narrowly and broadly defined patient populations on which the algorithm was validated. Previously proposed DM algorithms have often been developed from single institutions and validated on pre-selected rather than random samples [,]. Chart reviews were performed by reviewers after an initial run-in annotation phase to confirm inter-annotator agreement. Sensitivity analyses in different subgroups and in scenarios of missing data facilitate subsequent studies that apply the algorithm, where adjustments can be performed to quantitatively correct for misclassification bias when DM is studied as an exposure or outcome [,]. Nonetheless, the following limitations should be considered. As our database captures only unstructured notes from the inpatient setting (but not outpatient clinic visit notes), it was not possible to conduct comprehensive chart reviews of patients who were not hospitalized and consequently not possible to validate the algorithm on outpatients. Although the database captures the necessary data elements from outpatient visits, the algorithm’s performance remains unassessed in a healthier population that has not required hospital admission.
Second, the proposed algorithm has been designed to maximize sensitivity and therefore generates a substantial number of false-positive predictions. The main data element responsible for this is the laboratory tests of consistently elevated blood glucose levels. Leveraging glucose test results taken in the inpatient setting have been shown to be less specific, as these capture patients who may not have DM but rather other conditions manifesting in abnormal glucose metabolism []. If PPV is deemed more important in future studies, it is possible to simplify the algorithm by dropping the laboratory test requirement altogether, using only diagnosis codes and medication records to detect DM cases; the algorithm is fairly robust for missing laboratory test values, where the loss in sensitivity incurred is relatively small, but substantial improvements in PPV and specificity are observed. Overall, in terms of sensitivity and specificity, the algorithm performs comparably against previously published algorithms, although data source differences may limit some of these comparisons [,,,].
While DM medication use serves as a useful discriminatory factor for identifying DM patients at present, it is noteworthy that some classes of DM medications (such as GLP-1 agonists and SGLT2 inhibitors) are increasingly prescribed for non-DM indications, such as obesity and heart failure. While there may be considerable overlap of these conditions with DM, performance drift of the algorithm is possible over time. Drifts are, however, less likely to occur with algorithms primarily based on diagnosis codes and laboratory test values. Lastly, the current algorithm does not distinguish between the main subtypes of DM. Further work is necessary to identify patients with Type 1 DM of whom a substantial proportion may have been misdiagnosed as having Type 2 DM initially, only to have their diagnosis revised through subsequent testing [,]. Likewise, identifying patients with gestational diabetes requires a preceding algorithm to detect pregnancy status. The current algorithm nonetheless provides a starting point for developing subsequent DM subtype-specific algorithms.
5. Conclusions
Identifying DM using diagnosis codes alone in EHR studies can generate inaccurate estimates of disease prevalence and measures of association relating to DM. An algorithm for detecting DM patients in this database has been developed and validated in two distinct chart-reviewed samples. The algorithm can be calibrated to prioritize PPV over sensitivity, if needed. The data presented in this paper support quantitative bias analyses by future investigators performing DM-related studies.
Author Contributions
Design conceptualization, H.X.T.; Data analysis, H.X.T., R.L.T.L., D.C.H.T. and S.R.D.; Manuscript writing, R.L.T.L., P.S.A., B.P.Q.F., Y.L.K., J.W.N., A.J.J.N., S.H.T., D.C.H.T., M.Y.T., A.J.Y.Y., N.K.M.N., C.W.P.L., L.F.P., H.H. and S.R.D.; Data collection, P.S.A., B.P.Q.F., Y.L.K., A.J.J.N., S.H.T., M.Y.T., A.J.Y.Y., N.K.M.N., C.W.P.L., L.F.P. and H.H.; Supervision, P.S.A. and S.R.D. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable because this analysis was conducted as part of activities to facilitate public health surveillance by a public health authority and does not constitute ‘research’.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data is not available on the public domain. The analysis is conducted as part of public health surveillance (not research) and therefore the data used for this analysis cannot be not considered to be ‘research data’.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A

Table A1.
SNOMED-CT codes related to Type 1 and Type 2 DM that were used as criteria for DM patients.
Table A1.
SNOMED-CT codes related to Type 1 and Type 2 DM that were used as criteria for DM patients.
Diagnosis Code | Description of Code |
---|---|
200687002 | Cellulitis in diabetic foot |
73211009 | Diabetes mellitus (DM) |
280137006 | Diabetic foot |
371087003 | Diabetic foot ulcer |
310505005 | Diabetic hyperosmolar non-ketotic state |
312912001 | Diabetic macular oedema |
399864000 | Diabetic macular oedema not clinically significant |
232020009 | Diabetic maculopathy |
25093002 | Diabetic oculopathy (eye disease) |
49455004 | Diabetic polyneuropathy |
268519009 | Diabetic—poor control |
127014009 | Diabetic peripheral vascular disease (angiopathy) |
127013003 | Diabetic renal disease |
4855003 | Diabetic retinopathy |
420789003 | Diabetic retinopathy associated with OR due to Type 1 DM |
232023006 | Diabetic traction retinal detachment |
312910009 | Diabetic vitreous hemorrhage |
402864004 | Diabetic wet gangrene of the foot |
441656006 | Hyperglycemic crisis due to OR in DM |
237633009 | Hypoglycemia due to DM |
421750000 | Ketoacidosis due to Type 2 DM |
420422005 | Ketoacidosis in DM |
426875007 | Latent autoimmune DM in adults (LADA) |
236499007 | Microalbuminuric diabetic nephropathy |
312903003 | Mild non-proliferative diabetic retinopathy |
312904009 | Moderate non-proliferative diabetic retinopathy |
230572002 | Neuropathy due to DM |
405749004 | Newly diagnosed diabetes |
390834004 | Non-proliferative diabetic retinopathy (NPDR)/Background diabetic retinopathy (BDR) |
59276001 | Proliferative diabetic retinopathy (PDR) |
236500003 | Proteinuric diabetic nephropathy |
312905005 | Severe non-proliferative diabetic retinopathy |
46635009 | Type 1 DM Insulin-Dependent Diabetes Mellitus (IDDM) |
44054006 | Type 2 DM Non-Insulin-Dependent Diabetes Mellitus (NIDDM) |
443694000 | Type 2 DM uncontrolled |
190331003 | Type 2 DM with hyperosmolar coma |

Table A2.
ICD-9 code used as criterion for DM patients.
Table A2.
ICD-9 code used as criterion for DM patients.
Diagnosis Code | Description of Code |
---|---|
25000 | DM without mention of complication, T2 or unspecified type, not stated as uncontrolled |

Table A3.
Glucose laboratory threshold values that were used as criteria for DM patients.
Table A3.
Glucose laboratory threshold values that were used as criteria for DM patients.
Laboratory Test | Components of Blood | Threshold Values $ | |
---|---|---|---|
mmol/L | mg/dL | ||
Fasting glucose | Plasma/Serum/Venous | ≥7.0 | ≥126 |
Glucose Tolerance Test (GTT)—Fasting | - | ≥7.0 | ≥126 |
Random glucose | Plasma/Serum/Venous | ≥11.1 | ≥200 |
Oral Glucose Tolerance Test (OGTT)—1 h | - | ≥10.0 | ≥180 |
Glucose 1 h post-prandial | - | ≥10.0 | ≥180 |
Glucose (60 min) | Plasma/Serum | ≥10.0 | ≥180 |
Oral Glucose Tolerance Test (OGTT)—2 h | - | ≥11.1 | ≥200 |
Glucose 2 h post-prandial | - | ≥11.1 | ≥200 |
Glucose (120 min) | Plasma/Serum | ≥11.1 | ≥200 |
$ not used in final algorithm.

Table A4.
HbA1c laboratory threshold value applied when phenotyping patients with DM [].
Table A4.
HbA1c laboratory threshold value applied when phenotyping patients with DM [].
Laboratory Test | Threshold Values | |
---|---|---|
% | mmol/mol | |
HbA1c | ≥6.5 | ≥48 |

Table A5.
List of DM-related medications, categorized according to their functions and drug classes, that were used as criteria for those with DM [,].
Table A5.
List of DM-related medications, categorized according to their functions and drug classes, that were used as criteria for those with DM [,].
Drug Class | Active Ingredient | Brand Name | |
---|---|---|---|
Biguanide | Metformin | Adimet | |
Diabetmin | |||
Diabetmin XR | |||
Diamet | |||
Formet | |||
Glucient | |||
Meijumet | |||
Thiazolidinedione | Pioglitazone | Actos | |
Sulfonylureas | Glipizide | Beapizide | |
Diacon | |||
Diactin | |||
Dibizide | |||
Glynase | |||
Melizide | |||
Minidiab | |||
Sunglucon | |||
Gliclazide | Diamicron | ||
Diamicron MR | |||
Dianorm | |||
Diapro | |||
Gliavis | |||
Gliclada | |||
Glimicron | |||
Glizide | |||
Glynade | |||
Medoclazide | |||
Melicron | |||
Mexan | |||
Sun-gliclazide | |||
Sun-glizide | |||
Glimepiride | Amaryl | ||
Dialosa | |||
Diapride | |||
Glibenclamide | Benil | ||
Clamide | |||
Daonil | |||
Glyboral | |||
Tolbutamide | Tobumide | ||
Tolmide | |||
Meglitinide | Repaglinide | Novonorm | |
Dipeptidyl peptidase-4 (DPP-4) inhibitors | Linagliptin | Trajenta | |
Saxagliptin | Onglyza | ||
Sitagliptin | Januvia | ||
Vildagliptin | Galvus | ||
GLP-1 Agonists (Incretin mimetics) | Dulaglutide | Trulicity | |
Liraglutide | Saxenda | ||
Victoza | |||
Semaglutide | Ozempic | ||
Rybelsus | |||
α-Glucosidase inhibitors | Acarbose | Garbose | |
Glucobay | |||
Sodium-glucose co-transporter-2 (SGLT-2) inhibitor | Canagliflozin | Invokana | |
Ertugliflozin | Steglatro | ||
Short-acting insulins (Bolus insulins) | Insulin aspart | Fiasp | |
Novorapid | |||
Insulin glulisine | Apidra Solostar | ||
Insulin lispro | Humalog | ||
Regular (soluble/neutral) insulin | Actrapid | ||
Humulin R | |||
Long-acting insulins (Basal insulins) | Insulin degludec | Ryzodeg | |
Tresiba | |||
Insulin detemir | Levemir | ||
Insulin glargine | Basalog one | ||
Lantus Solostar | |||
Semglee | |||
Toujeo Solostar | |||
Neutral Protamine Hagedorn (NPH)/isophane insulin | Humulin N | ||
Insulatard | |||
Mixed insulins | Insulin aspart and insulin aspart protamine crystals | Novomix | |
Insulin lispro and lispro protamine | Humalog mix | ||
Regular insulin and insulin isophane | Humulin 30/70 | ||
Regular insulin and isophane insulin | Mixtard | ||
Combination Medications | Vildagliptin | Metformin | Galvus Met |
Empagliflozin | Linagliptin | Glyxambi | |
Glibenclamide | Metformin HCL | Glucovance | |
Metformin/Metformin XR | Sitagliptin | Janumet/Janumet XR | |
Metformin XR | Saxagliptin | Kombiglyze | |
Linagliptin | Metformin HCL | Trajenta Duo | |
Insulin glargine | Lixisenatide | Soliqua | |
Sitagliptin | Ertugliflozin | Steglujan | |
Dapagliflozin | Metformin/Metformin XR | Xigduo/Xigduo XR |

Table A6.
Inter-annotator agreement between 15 adjudicators for establishing DM status.
Table A6.
Inter-annotator agreement between 15 adjudicators for establishing DM status.
Annotator ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 0.99 | 1 | 0.99 | 0.94 | 1 | 0.93 | 0.99 | 0.99 | 0.95 | 0.95 | 1 | 0.99 | |
2 | 1 | 1 | 0.99 | 1 | 0.99 | 0.94 | 1 | 0.93 | 0.99 | 0.99 | 0.95 | 0.95 | 1 | 0.99 | |
3 | 1 | 1 | 0.99 | 1 | 0.99 | 0.94 | 1 | 0.93 | 0.99 | 0.99 | 0.95 | 0.95 | 1 | 0.99 | |
4 | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.95 | 0.99 | 0.92 | 0.98 | 0.98 | 0.94 | 0.96 | 0.99 | 0.98 | |
5 | 1 | 1 | 1 | 0.99 | 0.99 | 0.94 | 1 | 0.93 | 0.99 | 0.99 | 0.95 | 0.95 | 1 | 0.99 | |
6 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 | 0.93 | 0.99 | 0.92 | 0.98 | 0.98 | 0.96 | 0.94 | 0.99 | 0.98 | |
7 | 0.94 | 0.94 | 0.94 | 0.95 | 0.94 | 0.93 | 0.94 | 0.92 | 0.93 | 0.93 | 0.89 | 0.96 | 0.94 | 0.93 | |
8 | 1 | 1 | 1 | 0.99 | 1 | 0.99 | 0.94 | 0.93 | 0.99 | 0.99 | 0.95 | 0.95 | 1 | 0.99 | |
9 | 0.93 | 0.93 | 0.93 | 0.92 | 0.93 | 0.92 | 0.92 | 0.93 | 0.94 | 0.92 | 0.88 | 0.95 | 0.93 | 0.92 | |
10 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 | 0.98 | 0.93 | 0.99 | 0.94 | 0.98 | 0.94 | 0.96 | 0.99 | 0.98 | |
11 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 | 0.98 | 0.93 | 0.99 | 0.92 | 0.98 | 0.94 | 0.94 | 0.99 | 0.98 | |
12 | 0.95 | 0.95 | 0.95 | 0.94 | 0.95 | 0.96 | 0.89 | 0.95 | 0.88 | 0.94 | 0.94 | 0.9 | 0.95 | 0.94 | |
13 | 0.95 | 0.95 | 0.95 | 0.96 | 0.95 | 0.94 | 0.96 | 0.95 | 0.95 | 0.96 | 0.94 | 0.9 | 0.95 | 0.94 | |
14 | 1 | 1 | 1 | 0.99 | 1 | 0.99 | 0.94 | 1 | 0.93 | 0.99 | 0.99 | 0.95 | 0.95 | 0.99 | |
15 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 | 0.98 | 0.93 | 0.99 | 0.92 | 0.98 | 0.98 | 0.94 | 0.94 | 0.99 |

Table A7.
Demographic profile of DM patients identified by each algorithm.
Table A7.
Demographic profile of DM patients identified by each algorithm.
Atrial Fibrillation Cohort (n = 1194) | |||||
---|---|---|---|---|---|
Actual DM Group (n = 457) | Diagnosis Codes and/or Laboratory Tests and/or Medications (n = 597) | Diagnosis Codes and/or Laboratory Tests (n = 574) | Diagnosis Codes and/or Medications (n = 456) | ||
Sex, n (%) | Male | 247 (54.0%) | 314 (52.6%) | 303 (52.8%) | 247 (54.2%) |
Female | 210 (46.0%) | 283 (47.4%) | 271 (47.2%) | 209 (45.8%) | |
Race, n (%) | Chinese | 328 (71.8%) | 436 (73.0%) | 424 (73.9%) | 333 (73.0%) |
Malay | 83 (18.2%) | 105 (17.6%) | 96 (16.7%) | 79 (17.3%) | |
Indian | 26 (5.7%) | 34 (5.7%) | 33 (5.7%) | 25 (5.5%) | |
Others | 20 (4.4%) | 22 (3.7%) | 21 (3.7%) | 19 (4.2%) | |
Age | Mean | 72.3 | 73.6 | 73.7 | 72.3 |
Standard deviation | 11.2 | 11.1 | 11.2 | 11.2 | |
Median | 73.0 | 74.0 | 75.0 | 73.0 | |
Interquartile range | 16.0 | 16.0 | 15.0 | 16.0 | |
Random hospitalized sample (n = 1619) | |||||
Actual DM group (n = 367) | Diagnosis codes and/or laboratory tests and/or medications (n = 584) | Diagnosis codes and/or laboratory tests (n = 573) | Diagnosis codes and/or medications (n = 382) | ||
Sex, n (%) | Male | 197 (53.7%) | 319 (54.6%) | 315 (55.0%) | 198 (51.8%) |
Female | 170 (46.3%) | 265 (45.4%) | 258 (45.0%) | 184 (48.2%) | |
Race, n (%) | Chinese | 237 (64.6%) | 404 (69.2%) | 398 (69.5%) | 251 (65.7%) |
Malay | 55 (15.0%) | 71 (12.2%) | 71 (12.4%) | 56 (14.7%) | |
Indian | 50 (13.6%) | 73 (12.5%) | 68 (11.9%) | 54 (14.1%) | |
Others | 25 (6.8%) | 36 (6.2%) | 36 (6.3%) | 21 (5.5%) | |
Age | Mean | 67.7 | 66.2 | 66.6 | 67.1 |
Standard deviation | 13.8 | 17.1 | 16.9 | 15.0 | |
Median | 69.0 | 68.0 | 68.0 | 68.5 | |
Interquartile range | 17.0 | 20.0 | 20.0 | 17.0 |
References
- Upadhyaya, S.G.; Murphree, D.H.; Ngufor, C.G.; Knight, A.M.; Cronk, D.J.; Cima, R.R.; Curry, T.B.; Pathak, J.; Carter, R.E.; Kor, D.J. Automated Diabetes Case Identification Using Electronic Health Record Data at a Tertiary Care Facility. Mayo Clin. Proc. Innov. Qual. Outcomes 2017, 1, 100–110. [Google Scholar] [CrossRef] [PubMed]
- Kagawa, R.; Kawazoe, Y.; Ida, Y.; Shinohara, E.; Tanaka, K.; Imai, T.; Ohe, K. Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach. J. Diabetes Sci. Technol. 2017, 11, 791–799. [Google Scholar] [CrossRef] [PubMed]
- Weerahandi, H.M.; Horwitz, L.I.; Blecker, S.B. Diabetes Phenotyping Using the Electronic Health Record. J. Gen. Intern. Med. 2020, 35, 3716–3718. [Google Scholar] [CrossRef] [PubMed]
- Spratt, S.E.; Pereira, K.; Granger, B.B.; Batch, B.C.; Phelan, M.; Pencina, M.; Miranda, M.L.; Boulware, E.; Lucas, J.E.; Nelson, C.L.; et al. Assessing electronic health record phenotypes against gold-standard diagnostic criteria for diabetes mellitus. J. Am. Med. Inform. Assoc. 2017, 24, e121–e128. [Google Scholar] [CrossRef] [PubMed]
- Richesson, R.L.; Rusincovitch, S.A.; Wixted, D.; Batch, B.C.; Feinglos, M.N.; Miranda, M.L.; Hammond, W.E.; Califf, R.M.; Spratt, S.E. A comparison of phenotype definitions for diabetes mellitus. J. Am. Med. Inform. Assoc 2013, 20, e319–e326. [Google Scholar] [CrossRef] [PubMed]
- Psaty, B.M.; Breckenridge, A.M. Mini-Sentinel and regulatory science--big data rendered fit and functional. N. Engl. J. Med. 2014, 370, 2165–2167. [Google Scholar] [CrossRef] [PubMed]
- Voss, E.A.; Makadia, R.; Matcho, A.; Martijn, S.; Knoll, C.; Schuemie, M.; DeFalco, F.J.; Londhe, A.; Zhu, V.; Ryan, P.B. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J. Am. Med. Inform. Assoc. 2015, 22, 553–564. [Google Scholar] [CrossRef] [PubMed]
- Klann, J.G.; Abend, A.; Raghavan, V.A.; Mandl, K.D.; Murphy, S.N. Data interchange using i2b2. J. Am. Med. Inform. Assoc. 2016, 23, 909–915. [Google Scholar] [CrossRef] [PubMed]
- Fleurence, R.L.; Curtis, L.H.; Califf, R.M.; Platt, R.; Selby, J.V.; Brown, J.S. Launching PCORnet, a national patient-centered clinical research network. J. Am. Med. Inform. Assoc. 2014, 21, 578–582. [Google Scholar] [CrossRef] [PubMed]
- Schneeweiss, S. Learning from big health care data. N. Engl. J. Med. 2014, 370, 2161–2163. [Google Scholar] [CrossRef] [PubMed]
- Bourke, A.; Bate, A.; Sauer, B.C.; Brown, J.S.; Hall, G.C. Evidence generation from healthcare databases: Recommendations for managing change. Pharmacoepidemiol. Drug. Saf. 2016, 25, 749–754. [Google Scholar] [CrossRef] [PubMed]
- Tan, C.C.; Lam, C.S.P.; Matchar, D.B.; Zee, Y.K.; Wong, J.E.L. Singapore’s health-care system: Key features, challenges, and shifts. Lancet 2021, 398, 1091–1104. [Google Scholar] [CrossRef] [PubMed]
- Christiansen, C.B.; Gerds, T.A.; Olesen, J.B.; Kristensen, S.L.; Lamberts, M.; Lip, G.Y.; Gislason, G.H.; Køber, L.; Torp-Pedersen, C. Atrial fibrillation and risk of stroke: A nationwide cohort study. Europace 2016, 18, 1689–1697. [Google Scholar] [CrossRef] [PubMed]
- Chao, T.F.; Lip, G.Y.; Liu, C.J.; Tuan, T.C.; Chen, S.J.; Wang, K.L.; Lin, Y.J.; Chang, S.L.; Lo, L.W.; Hu, Y.F.; et al. Validation of a Modified CHA2DS2-VASc Score for Stroke Risk Stratification in Asian Patients with Atrial Fibrillation: A Nationwide Cohort Study. Stroke 2016, 47, 2462–2469. [Google Scholar] [CrossRef] [PubMed]
- TRUST. Improving Health Outcomes through Trusted Data Exchange. Available online: https://trustplatform.sg/ (accessed on 2 February 2023).
- Lash, T.L.; Olshan, A.F. EPIDEMIOLOGY Announces the “Validation Study” Submission Category. Epidemiology 2016, 27, 613–614. [Google Scholar] [CrossRef] [PubMed]
- Marshall, R.J. Validation study methods for estimating exposure proportions and odds ratios with misclassified data. J. Clin. Epidemiol. 1990, 43, 941–947. [Google Scholar] [CrossRef] [PubMed]
- Lo-Ciganic, W.; Zgibor, J.C.; Ruppert, K.; Arena, V.C.; Stone, R.A. Identifying type 1 and type 2 diabetic cases using administrative data: A tree-structured model. J. Diabetes Sci. Technol. 2011, 5, 486–493. [Google Scholar] [CrossRef] [PubMed]
- Lipscombe, L.L.; Hwee, J.; Webster, L.; Shah, B.R.; Booth, G.L.; Tu, K. Identifying diabetes cases from administrative data: A population-based validation study. BMC Health Serv. Res. 2018, 18, 316. [Google Scholar] [CrossRef] [PubMed]
- Bao, Y.K.; Ma, J.; Ganesan, V.C.; McGill, J.B. Mistaken Identity: Missed Diagnosis of Type 1 Diabetes in an Older Adult. Med. Res. Arch. 2019, 7, 1962. [Google Scholar] [PubMed]
- Thomas, N.J.; Lynam, A.L.; Hill, A.V.; Weedon, M.N.; Shields, B.M.; Oram, R.A.; McDonald, T.J.; Hattersley, A.T.; Jones, A.G. Type 1 diabetes defined by severe insulin deficiency occurs after 30 years of age and is commonly treated as type 2 diabetes. Diabetologia 2019, 62, 1167–1172. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).