Next Article in Journal
Suspected Suicide Attempt and Intentional Misuse Cases Aged 50+ Involving Amphetamine or Methylphenidate and Medical Outcomes: Associations with Co-Used Other Substances
Previous Article in Journal
Antidepressants and the Risk of Fall-Related Injury in Older Adults with Incident Depression in the United States: A Comparative Safety Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Phenotyping Diabetes Mellitus on Aggregated Electronic Health Records from Disparate Health Systems

Vigilance & Compliance Branch, Health Products Regulation Group, Health Sciences Authority, Singapore 138667, Singapore
*
Author to whom correspondence should be addressed.
Pharmacoepidemiology 2023, 2(3), 223-235; https://doi.org/10.3390/pharma2030019
Submission received: 20 April 2023 / Revised: 2 June 2023 / Accepted: 28 June 2023 / Published: 3 July 2023

Abstract

:
Background: Identifying patients with diabetes mellitus (DM) is often performed in epidemiological studies using electronic health records (EHR), but currently available algorithms have features that limit their generalizability. Methods: We developed a rule-based algorithm to determine DM status using the nationally aggregated EHR database. The algorithm was validated on two chart-reviewed samples (n = 2813) of (a) patients with atrial fibrillation (AF, n = 1194) and (b) randomly sampled hospitalized patients (n = 1619). Results: DM diagnosis codes alone resulted in a sensitivity of 77.0% and 83.4% in the AF and random hospitalized samples, respectively. The proposed algorithm combines blood glucose values and DM medication usage with diagnostic codes and exhibits sensitivities between 96.9% and 98.0%, while positive predictive values (PPV) ranged between 61.1% and 75.6%. Performances were comparable across sexes, but a lower specificity was observed in younger patients (below 65 versus 65 and above) in both validation samples (75.8% vs. 90.8% and 60.6% vs. 88.8%). The algorithm was robust for missing laboratory data but not for missing medication data. Conclusions: In this nationwide EHR database analysis, an algorithm for identifying patients with DM has been developed and validated. The algorithm supports quantitative bias analyses in future studies involving EHR-based DM studies.

1. Introduction

Identifying patients with diabetes mellitus (DM) is often an obligatory feature in many electronic health record (EHR)-based analyses for defining and applying inclusion/exclusion criteria and addressing possible confounding and effect modifications.
Many algorithms for detecting DM in EHR data already exist, but they are predominantly derived from data arising from a single/common health system(s) in which critical data elements used to define DM status (e.g., coded diagnoses and specific medication exposures) are presumably identical throughout the data [1,2,3,4,5]. However, these features of existing algorithms undermine their applicability in external settings such as ours in Singapore, where data from various medical record systems across the country are aggregated with minimal processing. These aggregated medical records are primarily intended for care provision, in a setting where patients routinely consult multiple providers across different health systems over time. Challenges, however, arise when analyzing this relatively unharmonized database for insights. While the upfront conversion of all data contributors to a common data model is a viable strategy for circumventing issues of disparate data schemas when conducting multi-center analyses [6,7,8,9], the migration of source data into the data model on a regular basis can be considerably burdensome [10,11].
In this validation study, we sought to develop an algorithm that is adequately accommodative for identifying patients with DM in an aggregated database of diverse EHR sources. A combination of EHR data elements is used, and the algorithm’s accuracy and consistency are assessed on two datasets comprising over 2000 chart-reviewed patients. The first dataset is a group of 1194 patients who were hospitalized and newly diagnosed with Atrial Fibrillation (AF) and who had initiated oral anticoagulation therapy in 2019 or 2020. The second was a randomly sampled set of patients admitted to any public healthcare institution in 2019 or 2020 who had the required data elements for the gold standard labelling of diabetes status through chart reviews (n = 1619).

2. Results

There were a total of 608 and 586 patients in the 2019 and 2020 AF cohorts and 808 and 811 patients in the 2019 and 2020 random hospitalized sample, respectively. Sex distributions across both samples were equivalent (Table 1). Patients in the random sample were expectedly younger (mean age 47.5 and 45.8 years in 2019 and 2020, respectively) compared to the AF cohort (mean age 72.2 and 72.4 years in 2019 and 2020, respectively). Similarly, there was a larger proportion of DM patients in the AF cohorts (37.5 and 39.1%) as compared with the random sample (24.5 and 20.8%, Table 1).
Table 1. Demographic profile of patients in both study samples.
Table 1. Demographic profile of patients in both study samples.
Atrial Fibrillation Cohort
(n = 1194)
Random Hospitalized Sample (n = 1619)
2019 (n = 608)2020 (n = 586)2019 (n = 808)2020 (n = 811)
Sex, n (%)Male305 (50.2%)310 (52.9%)380 (47.0%)401 (49.4%)
Female303 (49.8%)276 (47.1%)428 (53.0%)410 (50.6%)
Race, n (%)Chinese 451 (74.2%)458 (78.2%)514 (63.6%)489 (60.3%)
Malay92 (15.1%)81 (13.8%)139 (17.2%)137 (16.8%)
Indian 29 (4.8%)25 (4.3%)84 (10.4%)99 (12.3%)
Others36 (5.9%)22 (3.8%)71 (8.8%)86 (10.6%)
AgeMean 72.272.447.545.8
Standard deviation11.812.028.827.5
DiabetesYes228 (37.5%)229 (39.1%)198 (24.5%)169 (20.8%)
No380 (62.5%)357 (60.9%)610 (75.5%)642 (79.2%)
Collectively, 50.0% (n = 597) and 36.1% (n = 584) of patients were predicted to have DM in the AF and random samples, respectively using an algorithm which was designed classify the record using various checkpoints that screened for the presence of DM related diagnosis codes, abnormal lab tests, and diabetic medications. Figure 1 illustrates the number of patients identified at each stage of the algorithm for the combined AF cohort.
The sensitivity and positive predictive value (PPV) ranged from 96.9 to 98.0% and from 61.1 to 75.6%, respectively, across all groups (Table 2). The PPV was notably lower in the random hospitalized sample by approximately 12 to 15 percentage points compared to that of the AF cohort. False-negatives were, however, uncommon, as illustrated by the high negative predictive values (NPV) ranging between 97.5 and 99.3%.
With diagnosis codes alone, modest sensitivity values of 77.0% and 83.4% are achieved (Table 3). When additional laboratory tests and medication criteria are combined, the sensitivity rises to 97.4% for the AF cohort and 97.8% for the random sample. The majority of the DM patients were identified by the diagnosis and laboratory test checkpoints, likely due to their sequential application, although there were marked increases in false-negatives on applying the laboratory test criteria (Table 3).
The algorithm performed consistently across the age and sex subgroups, with high sensitivity and NPV but lower specificity and PPV across all strata. In both instances, the specificity was higher in the younger age group compared to those aged 65 and above (90.8% vs. 75.8% and 88.8% vs. 60.6%) (Table 4).
A total of 152 false-positives and 12 false-negatives were found in the AF cohort, and 225 false-positives and 8 false-negatives were found in the random hospitalized sample. While the majority of the misclassifications occurred because DM was often stated in the hospital discharge summary but not captured in the structured data elements (diagnosis, laboratory tests or medication records data), there were also other reasons for misclassification, such as the patient having impaired fasting glucose, pre-diabetes or hyperglycemia due to other reasons (Table 5).
When modified to simulate scenarios of missing medication or laboratory test data (i.e., only diagnosis codes with either laboratory tests or medication data but not both), there were reductions in sensitivity in both samples, but to a larger degree in the Combined AF cohort (Table 6). While the availability of laboratory test data (but missing medications) led to a smaller loss in sensitivity compared to having medication data (but missing laboratory tests), considerably higher PPV and specificity are observed when medication data are available but laboratory tests are missing, suggesting that elevated glucose tests are more sensitive but DM medication use is more specific.
In terms of demographics, the DM cohorts identified by all three algorithms had similar age and sex distributions as compared with the actual DM patients. However, in the DM cohorts identified using (i) all three criteria and (ii) excluding medications, the proportion of Chinese patients identified was slightly higher as compared to the actual DM group (Table A7).

3. Methodology

3.1. Study Setting and Algorithm Development

The database includes patients with visits to all public healthcare facilities and captures approximately 85% of all nationwide acute hospital admissions and over 40% of all chronic disease outpatient visits [12]. An exploratory exercise was undertaken to identify potentially useful data elements that could help identify patients with DM in this database. All patients who fulfilled at least one of the following criteria (between 2018 and 2021) were first identified: (a) presence of a Systemized Nomenclature of Medicine—Clinical Terms (SNOMED-CT) or International Classification of Diseases—Ninth and Tenth Revision (ICD 9 or ICD 10) code related to DM, (b) an abnormal blood glucose or glycated hemoglobin (HbA1c) laboratory test result or (c) prescribed any DM-related medication. Commonly used DM diagnosis codes, medications, laboratory tests for measuring blood glucose levels along with their upper bound thresholds and observed test frequencies were shortlisted and used to derive an algorithm for identifying DM patients (Figure 2). The full lists of shortlisted diagnosis codes, laboratory tests and medications are found in Table A1, Table A2, Table A3, Table A4 and Table A5 of the Appendix A, respectively [1,2,3].
Patients are categorized as diabetic if any one of the following are fulfilled: presence of a DM-related diagnosis code, presence of at least two glucose or HbA1c laboratory tests above the upper limit of normal, separated at least 30 days apart, or if they were prescribed any DM-related medication. For the ease of deployment, the algorithm was modularly designed to allow for assessing one data element at a time. As our database includes records from different healthcare institutions using a variety of laboratory assay equipment, defining a fixed threshold for the upper bound of normal values on all relevant blood glucose tests was not possible, as different facilities have slightly varying reference ranges. Setting-specific reference ranges are therefore used to identify abnormally high test results.

3.2. Validation Population and Chart Review

We validated the algorithm on two distinct patient samples, each with data from 2019 and 2020. The first dataset was a pre-selected group of 1194 patients who were hospitalized and newly diagnosed with AF and who had initiated oral anticoagulation therapy in 2019 or 2020. Diabetes is an important risk factor that potentially influences complication risks in patients with AF, and it would therefore be of interest to accurately identify DM status amongst AF patients [13,14]. The second group was a randomly sampled set of patients admitted to any public healthcare institution in 2019 or 2020 who had the required data elements for the gold standard labelling of diabetes status through chart reviews (n = 1619), as with the two AF cohorts. Only data that were recorded before or on the discharge date of the patient’s inpatient admission episode were used. Stratified analyses were performed by age and sex, and the reasons for misclassification were reviewed for a sample of false-positives and false-negatives. The performance of the algorithm in instances of missing laboratory and medication data was additionally evaluated. Lastly, a comparison of the DM cohorts identified by each algorithm was performed to analyze the impact of the choice of algorithm on the final DM cohort selected.
Chart reviews were performed on all cases used for validation (n = 2183, from both samples) by 15 clinically trained pharmacovigilance officers who had previously annotated a common set of 200 patient records (not included in this paper) with a near perfect agreement of 98.1% against the collectively derived gold standard label and good inter-annotator agreements of 0.88–1 (Table A6) for the presence of DM.

4. Discussion

DM poses a significant public health burden worldwide. A ‘War on Diabetes’ has been officially declared by the health ministry in Singapore, and diabetes has been made a key research focus area by national research funding agencies, with the aim to identify effective strategies for minimizing the impact of DM on its citizens and the health system [12]. With initiatives to make EHR data available for secondary analysis more readily, several forthcoming EHR-based epidemiological analyses on DM may be expected [15]. The proposed algorithm is therefore developed in anticipation of its use over time.
A unique feature of this study is its inclusion of a relatively large validation sample. These samples include narrowly and broadly defined patient populations on which the algorithm was validated. Previously proposed DM algorithms have often been developed from single institutions and validated on pre-selected rather than random samples [1,3]. Chart reviews were performed by reviewers after an initial run-in annotation phase to confirm inter-annotator agreement. Sensitivity analyses in different subgroups and in scenarios of missing data facilitate subsequent studies that apply the algorithm, where adjustments can be performed to quantitatively correct for misclassification bias when DM is studied as an exposure or outcome [16,17]. Nonetheless, the following limitations should be considered. As our database captures only unstructured notes from the inpatient setting (but not outpatient clinic visit notes), it was not possible to conduct comprehensive chart reviews of patients who were not hospitalized and consequently not possible to validate the algorithm on outpatients. Although the database captures the necessary data elements from outpatient visits, the algorithm’s performance remains unassessed in a healthier population that has not required hospital admission.
Second, the proposed algorithm has been designed to maximize sensitivity and therefore generates a substantial number of false-positive predictions. The main data element responsible for this is the laboratory tests of consistently elevated blood glucose levels. Leveraging glucose test results taken in the inpatient setting have been shown to be less specific, as these capture patients who may not have DM but rather other conditions manifesting in abnormal glucose metabolism [4]. If PPV is deemed more important in future studies, it is possible to simplify the algorithm by dropping the laboratory test requirement altogether, using only diagnosis codes and medication records to detect DM cases; the algorithm is fairly robust for missing laboratory test values, where the loss in sensitivity incurred is relatively small, but substantial improvements in PPV and specificity are observed. Overall, in terms of sensitivity and specificity, the algorithm performs comparably against previously published algorithms, although data source differences may limit some of these comparisons [1,2,18,19].
While DM medication use serves as a useful discriminatory factor for identifying DM patients at present, it is noteworthy that some classes of DM medications (such as GLP-1 agonists and SGLT2 inhibitors) are increasingly prescribed for non-DM indications, such as obesity and heart failure. While there may be considerable overlap of these conditions with DM, performance drift of the algorithm is possible over time. Drifts are, however, less likely to occur with algorithms primarily based on diagnosis codes and laboratory test values. Lastly, the current algorithm does not distinguish between the main subtypes of DM. Further work is necessary to identify patients with Type 1 DM of whom a substantial proportion may have been misdiagnosed as having Type 2 DM initially, only to have their diagnosis revised through subsequent testing [20,21]. Likewise, identifying patients with gestational diabetes requires a preceding algorithm to detect pregnancy status. The current algorithm nonetheless provides a starting point for developing subsequent DM subtype-specific algorithms.

5. Conclusions

Identifying DM using diagnosis codes alone in EHR studies can generate inaccurate estimates of disease prevalence and measures of association relating to DM. An algorithm for detecting DM patients in this database has been developed and validated in two distinct chart-reviewed samples. The algorithm can be calibrated to prioritize PPV over sensitivity, if needed. The data presented in this paper support quantitative bias analyses by future investigators performing DM-related studies.

Author Contributions

Design conceptualization, H.X.T.; Data analysis, H.X.T., R.L.T.L., D.C.H.T. and S.R.D.; Manuscript writing, R.L.T.L., P.S.A., B.P.Q.F., Y.L.K., J.W.N., A.J.J.N., S.H.T., D.C.H.T., M.Y.T., A.J.Y.Y., N.K.M.N., C.W.P.L., L.F.P., H.H. and S.R.D.; Data collection, P.S.A., B.P.Q.F., Y.L.K., A.J.J.N., S.H.T., M.Y.T., A.J.Y.Y., N.K.M.N., C.W.P.L., L.F.P. and H.H.; Supervision, P.S.A. and S.R.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable because this analysis was conducted as part of activities to facilitate public health surveillance by a public health authority and does not constitute ‘research’.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is not available on the public domain. The analysis is conducted as part of public health surveillance (not research) and therefore the data used for this analysis cannot be not considered to be ‘research data’.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. SNOMED-CT codes related to Type 1 and Type 2 DM that were used as criteria for DM patients.
Table A1. SNOMED-CT codes related to Type 1 and Type 2 DM that were used as criteria for DM patients.
Diagnosis CodeDescription of Code
200687002Cellulitis in diabetic foot
73211009Diabetes mellitus (DM)
280137006Diabetic foot
371087003Diabetic foot ulcer
310505005Diabetic hyperosmolar non-ketotic state
312912001Diabetic macular oedema
399864000Diabetic macular oedema not clinically significant
232020009Diabetic maculopathy
25093002Diabetic oculopathy (eye disease)
49455004Diabetic polyneuropathy
268519009Diabetic—poor control
127014009Diabetic peripheral vascular disease (angiopathy)
127013003Diabetic renal disease
4855003Diabetic retinopathy
420789003Diabetic retinopathy associated with OR due to Type 1 DM
232023006Diabetic traction retinal detachment
312910009Diabetic vitreous hemorrhage
402864004Diabetic wet gangrene of the foot
441656006Hyperglycemic crisis due to OR in DM
237633009Hypoglycemia due to DM
421750000Ketoacidosis due to Type 2 DM
420422005Ketoacidosis in DM
426875007Latent autoimmune DM in adults (LADA)
236499007Microalbuminuric diabetic nephropathy
312903003Mild non-proliferative diabetic retinopathy
312904009Moderate non-proliferative diabetic retinopathy
230572002Neuropathy due to DM
405749004Newly diagnosed diabetes
390834004Non-proliferative diabetic retinopathy (NPDR)/Background diabetic retinopathy (BDR)
59276001Proliferative diabetic retinopathy (PDR)
236500003Proteinuric diabetic nephropathy
312905005Severe non-proliferative diabetic retinopathy
46635009Type 1 DM Insulin-Dependent Diabetes Mellitus (IDDM)
44054006Type 2 DM Non-Insulin-Dependent Diabetes Mellitus (NIDDM)
443694000Type 2 DM uncontrolled
190331003Type 2 DM with hyperosmolar coma
Table A2. ICD-9 code used as criterion for DM patients.
Table A2. ICD-9 code used as criterion for DM patients.
Diagnosis CodeDescription of Code
25000DM without mention of complication, T2 or unspecified type, not stated as uncontrolled
Table A3. Glucose laboratory threshold values that were used as criteria for DM patients.
Table A3. Glucose laboratory threshold values that were used as criteria for DM patients.
Laboratory Test Components of Blood Threshold Values $
mmol/Lmg/dL
Fasting glucosePlasma/Serum/Venous≥7.0≥126
Glucose Tolerance Test (GTT)—Fasting-≥7.0≥126
Random glucosePlasma/Serum/Venous≥11.1≥200
Oral Glucose Tolerance Test (OGTT)—1 h-≥10.0≥180
Glucose 1 h post-prandial-≥10.0≥180
Glucose (60 min)Plasma/Serum≥10.0≥180
Oral Glucose Tolerance Test (OGTT)—2 h-≥11.1≥200
Glucose 2 h post-prandial-≥11.1≥200
Glucose (120 min)Plasma/Serum≥11.1≥200
$ not used in final algorithm.
Table A4. HbA1c laboratory threshold value applied when phenotyping patients with DM [1].
Table A4. HbA1c laboratory threshold value applied when phenotyping patients with DM [1].
Laboratory TestThreshold Values
%mmol/mol
HbA1c≥6.5≥48
Table A5. List of DM-related medications, categorized according to their functions and drug classes, that were used as criteria for those with DM [2,3].
Table A5. List of DM-related medications, categorized according to their functions and drug classes, that were used as criteria for those with DM [2,3].
Drug ClassActive Ingredient Brand Name
BiguanideMetforminAdimet
Diabetmin
Diabetmin XR
Diamet
Formet
Glucient
Meijumet
ThiazolidinedionePioglitazoneActos
SulfonylureasGlipizide Beapizide
Diacon
Diactin
Dibizide
Glynase
Melizide
Minidiab
Sunglucon
GliclazideDiamicron
Diamicron MR
Dianorm
Diapro
Gliavis
Gliclada
Glimicron
Glizide
Glynade
Medoclazide
Melicron
Mexan
Sun-gliclazide
Sun-glizide
Glimepiride Amaryl
Dialosa
Diapride
Glibenclamide Benil
Clamide
Daonil
Glyboral
Tolbutamide Tobumide
Tolmide
MeglitinideRepaglinide Novonorm
Dipeptidyl peptidase-4 (DPP-4) inhibitorsLinagliptin Trajenta
SaxagliptinOnglyza
Sitagliptin Januvia
Vildagliptin Galvus
GLP-1 Agonists (Incretin mimetics)Dulaglutide Trulicity
Liraglutide Saxenda
Victoza
Semaglutide Ozempic
Rybelsus
α-Glucosidase inhibitorsAcarbose Garbose
Glucobay
Sodium-glucose co-transporter-2 (SGLT-2) inhibitorCanagliflozin Invokana
Ertugliflozin Steglatro
Short-acting insulins (Bolus insulins)Insulin aspart Fiasp
Novorapid
Insulin glulisineApidra Solostar
Insulin lispro Humalog
Regular (soluble/neutral) insulin Actrapid
Humulin R
Long-acting insulins (Basal insulins)Insulin degludecRyzodeg
Tresiba
Insulin detemirLevemir
Insulin glargine Basalog one
Lantus Solostar
Semglee
Toujeo Solostar
Neutral Protamine Hagedorn (NPH)/isophane insulin Humulin N
Insulatard
Mixed insulinsInsulin aspart and insulin aspart protamine crystals Novomix
Insulin lispro and lispro protamine Humalog mix
Regular insulin and insulin isophane Humulin 30/70
Regular insulin and isophane insulin Mixtard
Combination MedicationsVildagliptin Metformin Galvus Met
EmpagliflozinLinagliptin Glyxambi
GlibenclamideMetformin HCL Glucovance
Metformin/Metformin XR Sitagliptin Janumet/Janumet XR
Metformin XRSaxagliptin Kombiglyze
Linagliptin Metformin HCL Trajenta Duo
Insulin glargineLixisenatide Soliqua
SitagliptinErtugliflozin Steglujan
DapagliflozinMetformin/Metformin XRXigduo/Xigduo XR
Table A6. Inter-annotator agreement between 15 adjudicators for establishing DM status.
Table A6. Inter-annotator agreement between 15 adjudicators for establishing DM status.
Annotator ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 1 0.99 1 0.99 0.94 1 0.93 0.99 0.99 0.95 0.95 1 0.99
21 1 0.99 1 0.99 0.94 1 0.93 0.99 0.99 0.95 0.95 1 0.99
31 1 0.99 1 0.99 0.94 1 0.93 0.99 0.99 0.95 0.95 1 0.99
40.99 0.99 0.99 0.99 0.98 0.95 0.99 0.92 0.98 0.98 0.94 0.96 0.99 0.98
51 1 1 0.99 0.99 0.94 1 0.93 0.99 0.99 0.95 0.95 1 0.99
60.99 0.99 0.99 0.98 0.99 0.93 0.99 0.92 0.98 0.98 0.96 0.94 0.99 0.98
70.94 0.94 0.94 0.95 0.94 0.93 0.94 0.92 0.93 0.93 0.89 0.96 0.94 0.93
81 1 1 0.99 1 0.99 0.94 0.93 0.99 0.99 0.95 0.95 1 0.99
90.93 0.93 0.93 0.92 0.93 0.92 0.92 0.93 0.94 0.92 0.88 0.95 0.93 0.92
100.99 0.99 0.99 0.98 0.99 0.98 0.93 0.99 0.94 0.98 0.94 0.96 0.99 0.98
110.99 0.99 0.99 0.98 0.99 0.98 0.93 0.99 0.92 0.98 0.94 0.94 0.99 0.98
120.95 0.95 0.95 0.94 0.95 0.96 0.89 0.95 0.88 0.94 0.94 0.9 0.95 0.94
130.95 0.95 0.95 0.96 0.95 0.94 0.96 0.95 0.95 0.96 0.94 0.9 0.95 0.94
141 1 1 0.99 1 0.99 0.94 1 0.93 0.99 0.99 0.95 0.95 0.99
150.99 0.99 0.99 0.98 0.99 0.98 0.93 0.99 0.92 0.98 0.98 0.94 0.94 0.99
Table A7. Demographic profile of DM patients identified by each algorithm.
Table A7. Demographic profile of DM patients identified by each algorithm.
Atrial Fibrillation Cohort (n = 1194)
Actual DM Group
(n = 457)
Diagnosis Codes and/or Laboratory Tests and/or Medications (n = 597)Diagnosis Codes and/or Laboratory Tests
(n = 574)
Diagnosis Codes and/or
Medications (n = 456)
Sex, n (%)Male247 (54.0%)314 (52.6%)303 (52.8%)247 (54.2%)
Female210 (46.0%)283 (47.4%)271 (47.2%)209 (45.8%)
Race, n (%)Chinese 328 (71.8%)436 (73.0%)424 (73.9%)333 (73.0%)
Malay83 (18.2%)105 (17.6%)96 (16.7%)79 (17.3%)
Indian 26 (5.7%)34 (5.7%)33 (5.7%)25 (5.5%)
Others20 (4.4%)22 (3.7%)21 (3.7%)19 (4.2%)
AgeMean 72.373.673.772.3
Standard deviation11.211.111.211.2
Median73.074.075.073.0
Interquartile range16.016.015.016.0
Random hospitalized sample (n = 1619)
Actual DM group
(n = 367)
Diagnosis codes and/or laboratory tests and/or medications (n = 584)Diagnosis codes and/or laboratory tests
(n = 573)
Diagnosis codes and/or
medications (n = 382)
Sex, n (%)Male197 (53.7%)319 (54.6%)315 (55.0%)198 (51.8%)
Female170 (46.3%)265 (45.4%)258 (45.0%)184 (48.2%)
Race, n (%)Chinese 237 (64.6%)404 (69.2%)398 (69.5%)251 (65.7%)
Malay55 (15.0%)71 (12.2%)71 (12.4%)56 (14.7%)
Indian 50 (13.6%)73 (12.5%)68 (11.9%)54 (14.1%)
Others25 (6.8%)36 (6.2%)36 (6.3%)21 (5.5%)
AgeMean 67.766.266.667.1
Standard deviation13.817.116.915.0
Median69.068.068.068.5
Interquartile range17.020.020.017.0

References

  1. Upadhyaya, S.G.; Murphree, D.H.; Ngufor, C.G.; Knight, A.M.; Cronk, D.J.; Cima, R.R.; Curry, T.B.; Pathak, J.; Carter, R.E.; Kor, D.J. Automated Diabetes Case Identification Using Electronic Health Record Data at a Tertiary Care Facility. Mayo Clin. Proc. Innov. Qual. Outcomes 2017, 1, 100–110. [Google Scholar] [CrossRef] [PubMed]
  2. Kagawa, R.; Kawazoe, Y.; Ida, Y.; Shinohara, E.; Tanaka, K.; Imai, T.; Ohe, K. Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach. J. Diabetes Sci. Technol. 2017, 11, 791–799. [Google Scholar] [CrossRef] [PubMed]
  3. Weerahandi, H.M.; Horwitz, L.I.; Blecker, S.B. Diabetes Phenotyping Using the Electronic Health Record. J. Gen. Intern. Med. 2020, 35, 3716–3718. [Google Scholar] [CrossRef] [PubMed]
  4. Spratt, S.E.; Pereira, K.; Granger, B.B.; Batch, B.C.; Phelan, M.; Pencina, M.; Miranda, M.L.; Boulware, E.; Lucas, J.E.; Nelson, C.L.; et al. Assessing electronic health record phenotypes against gold-standard diagnostic criteria for diabetes mellitus. J. Am. Med. Inform. Assoc. 2017, 24, e121–e128. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Richesson, R.L.; Rusincovitch, S.A.; Wixted, D.; Batch, B.C.; Feinglos, M.N.; Miranda, M.L.; Hammond, W.E.; Califf, R.M.; Spratt, S.E. A comparison of phenotype definitions for diabetes mellitus. J. Am. Med. Inform. Assoc 2013, 20, e319–e326. [Google Scholar] [CrossRef] [PubMed]
  6. Psaty, B.M.; Breckenridge, A.M. Mini-Sentinel and regulatory science--big data rendered fit and functional. N. Engl. J. Med. 2014, 370, 2165–2167. [Google Scholar] [CrossRef] [PubMed]
  7. Voss, E.A.; Makadia, R.; Matcho, A.; Martijn, S.; Knoll, C.; Schuemie, M.; DeFalco, F.J.; Londhe, A.; Zhu, V.; Ryan, P.B. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J. Am. Med. Inform. Assoc. 2015, 22, 553–564. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Klann, J.G.; Abend, A.; Raghavan, V.A.; Mandl, K.D.; Murphy, S.N. Data interchange using i2b2. J. Am. Med. Inform. Assoc. 2016, 23, 909–915. [Google Scholar] [CrossRef] [PubMed]
  9. Fleurence, R.L.; Curtis, L.H.; Califf, R.M.; Platt, R.; Selby, J.V.; Brown, J.S. Launching PCORnet, a national patient-centered clinical research network. J. Am. Med. Inform. Assoc. 2014, 21, 578–582. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Schneeweiss, S. Learning from big health care data. N. Engl. J. Med. 2014, 370, 2161–2163. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Bourke, A.; Bate, A.; Sauer, B.C.; Brown, J.S.; Hall, G.C. Evidence generation from healthcare databases: Recommendations for managing change. Pharmacoepidemiol. Drug. Saf. 2016, 25, 749–754. [Google Scholar] [CrossRef] [PubMed]
  12. Tan, C.C.; Lam, C.S.P.; Matchar, D.B.; Zee, Y.K.; Wong, J.E.L. Singapore’s health-care system: Key features, challenges, and shifts. Lancet 2021, 398, 1091–1104. [Google Scholar] [CrossRef] [PubMed]
  13. Christiansen, C.B.; Gerds, T.A.; Olesen, J.B.; Kristensen, S.L.; Lamberts, M.; Lip, G.Y.; Gislason, G.H.; Køber, L.; Torp-Pedersen, C. Atrial fibrillation and risk of stroke: A nationwide cohort study. Europace 2016, 18, 1689–1697. [Google Scholar] [CrossRef] [PubMed]
  14. Chao, T.F.; Lip, G.Y.; Liu, C.J.; Tuan, T.C.; Chen, S.J.; Wang, K.L.; Lin, Y.J.; Chang, S.L.; Lo, L.W.; Hu, Y.F.; et al. Validation of a Modified CHA2DS2-VASc Score for Stroke Risk Stratification in Asian Patients with Atrial Fibrillation: A Nationwide Cohort Study. Stroke 2016, 47, 2462–2469. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. TRUST. Improving Health Outcomes through Trusted Data Exchange. Available online: https://trustplatform.sg/ (accessed on 2 February 2023).
  16. Lash, T.L.; Olshan, A.F. EPIDEMIOLOGY Announces the “Validation Study” Submission Category. Epidemiology 2016, 27, 613–614. [Google Scholar] [CrossRef] [PubMed]
  17. Marshall, R.J. Validation study methods for estimating exposure proportions and odds ratios with misclassified data. J. Clin. Epidemiol. 1990, 43, 941–947. [Google Scholar] [CrossRef] [PubMed]
  18. Lo-Ciganic, W.; Zgibor, J.C.; Ruppert, K.; Arena, V.C.; Stone, R.A. Identifying type 1 and type 2 diabetic cases using administrative data: A tree-structured model. J. Diabetes Sci. Technol. 2011, 5, 486–493. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Lipscombe, L.L.; Hwee, J.; Webster, L.; Shah, B.R.; Booth, G.L.; Tu, K. Identifying diabetes cases from administrative data: A population-based validation study. BMC Health Serv. Res. 2018, 18, 316. [Google Scholar] [CrossRef] [PubMed]
  20. Bao, Y.K.; Ma, J.; Ganesan, V.C.; McGill, J.B. Mistaken Identity: Missed Diagnosis of Type 1 Diabetes in an Older Adult. Med. Res. Arch. 2019, 7, 1962. [Google Scholar] [PubMed]
  21. Thomas, N.J.; Lynam, A.L.; Hill, A.V.; Weedon, M.N.; Shields, B.M.; Oram, R.A.; McDonald, T.J.; Hattersley, A.T.; Jones, A.G. Type 1 diabetes defined by severe insulin deficiency occurs after 30 years of age and is commonly treated as type 2 diabetes. Diabetologia 2019, 62, 1167–1172. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Flowchart with values indicated at each data element checkpoint, representing the total number of patients that were identified at that particular checkpoint, with respect to the 2019 and 2020 combined AF cohort of 1194 patients.
Figure 1. Flowchart with values indicated at each data element checkpoint, representing the total number of patients that were identified at that particular checkpoint, with respect to the 2019 and 2020 combined AF cohort of 1194 patients.
Pharmacoepidemiology 02 00019 g001
Figure 2. Flowchart used to phenotype patients with diabetes mellitus.
Figure 2. Flowchart used to phenotype patients with diabetes mellitus.
Pharmacoepidemiology 02 00019 g002
Table 2. Performance of the algorithm on the AF cohort and random sample of hospitalized patients.
Table 2. Performance of the algorithm on the AF cohort and random sample of hospitalized patients.
Atrial Fibrillation Cohort
(n = 1194)
Random Hospitalized Sample (n = 1619)
2019
(n = 608)
2020
(n = 586)
2019
(n = 808)
2020
(n = 811)
Diabetes, yes, n228 229 198 169
Sensitivity, %97.896.998.097.6
Specificity, %81.177.680.383.6
Positive predictive value, %75.673.5 61.861.1
Negative predictive value, %98.497.5 99.299.3
Table 3. Cumulative sensitivity with respect to the respective data element checkpoints for the AF cohort and random cohorts.
Table 3. Cumulative sensitivity with respect to the respective data element checkpoints for the AF cohort and random cohorts.
Data Element CheckpointPredicted to Have DMGold Standard
(of Those Predicted to Have DM)
Cumulative Sensitivity (%)
DMNo DM
(False-Positive)
Combined atrial fibrillation cohort (n = 1194)
With DM (457)Diagnosis codes3853523377.0
Diagnosis codes and/or laboratory tests57442215292.3
Diagnosis codes and/or laboratory tests and/or medications59744515297.4
Combined random hospitalized sample (n = 1619)
With DM (367)Diagnosis codes3293062383.4
Diagnosis codes and/or laboratory tests57335521896.7
Diagnosis codes and/or laboratory tests and/or medications58435922597.8
DM: Diabetes mellitus.
Table 4. Stratified performance of the algorithm in different age and sex subgroups.
Table 4. Stratified performance of the algorithm in different age and sex subgroups.
Combined Atrial Fibrillation Cohort
(n = 1194)
Combined Random Hospitalized Sample (n = 1619)
Sensitivity
(%)
Specificity
(%)
PPV
(%)
NPV
(%)
Sensitivity
(%)
Specificity
(%)
PPV
(%)
NPV
(%)
Sex
Female97.178.672.198.097.685.062.699.3
Male97.680.276.898.098.078.860.599.2
Age group
64 years and below96.590.887.397.598.088.857.699.6
65 years and above97.775.871.198.297.760.664.497.3
PPV: Positive predictive value; NPV: Negative predictive value.
Table 5. Reasons for algorithmic misclassification in both cohorts.
Table 5. Reasons for algorithmic misclassification in both cohorts.
Number in Combined Atrial Fibrillation Cohort
(FP = 152, FN = 12)
Number in Combined Random Hospitalized Sample (FP = 225, FN = 8)
Reasons for false-positive classification
DM not mentioned in unstructured clinical notes (e.g., discharge summary), but diagnosis, laboratory tests or medications fit the DM criteria3850
Impaired fasting glucose or HbA1c in prediabetic range180
Hyperglycemia (due to other reasons)03
Impaired glucose tolerance03
Gestational diabetes02
DM on diet control12
Total FP sampled for review5760
Reason for false-negative classification
DM mentioned in discharge summary, but no diagnosis, labs or medications fit the DM criteria 54
Total FN sampled for review54
DM: Diabetes mellitus; FP: False-positive; FN: False-negative.
Table 6. Algorithm performance in the absence of laboratory tests or medication data in both cohorts.
Table 6. Algorithm performance in the absence of laboratory tests or medication data in both cohorts.
TPFPTNFNSensitivity (%)Specificity
(%)
PPV
(%)
NPV
(%)
Combined atrial fibrillation cohort (n = 1194)
Diagnosis codes and/or laboratory tests and/or medications4451525851297.479.474.598.0
Diagnosis codes and/or laboratory tests4221525853592.379.473.594.4
Diagnosis codes and/or
medications
420367013791.995.192.195.0
Combined random hospitalized sample (n = 1619)
Diagnosis codes and/or laboratory tests and/or medications3592251027897.882.061.599.2
Diagnosis codes and/or laboratory tests35521810341296.782.662.098.9
Diagnosis codes and/or
medications
3453712152294.097.090.398.2
TP: True-positive, FP: False-positive, TN: True-negative, FN: False-negative.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tan, H.X.; Lim, R.L.T.; Ang, P.S.; Foo, B.P.Q.; Koon, Y.L.; Neo, J.W.; Ng, A.J.J.; Tan, S.H.; Teo, D.C.H.; Tham, M.Y.; et al. Phenotyping Diabetes Mellitus on Aggregated Electronic Health Records from Disparate Health Systems. Pharmacoepidemiology 2023, 2, 223-235. https://doi.org/10.3390/pharma2030019

AMA Style

Tan HX, Lim RLT, Ang PS, Foo BPQ, Koon YL, Neo JW, Ng AJJ, Tan SH, Teo DCH, Tham MY, et al. Phenotyping Diabetes Mellitus on Aggregated Electronic Health Records from Disparate Health Systems. Pharmacoepidemiology. 2023; 2(3):223-235. https://doi.org/10.3390/pharma2030019

Chicago/Turabian Style

Tan, Hui Xing, Rachel Li Ting Lim, Pei San Ang, Belinda Pei Qin Foo, Yen Ling Koon, Jing Wei Neo, Amelia Jing Jing Ng, Siew Har Tan, Desmond Chun Hwee Teo, Mun Yee Tham, and et al. 2023. "Phenotyping Diabetes Mellitus on Aggregated Electronic Health Records from Disparate Health Systems" Pharmacoepidemiology 2, no. 3: 223-235. https://doi.org/10.3390/pharma2030019

APA Style

Tan, H. X., Lim, R. L. T., Ang, P. S., Foo, B. P. Q., Koon, Y. L., Neo, J. W., Ng, A. J. J., Tan, S. H., Teo, D. C. H., Tham, M. Y., Yap, A. J. Y., Ng, N. K. M., Loke, C. W. P., Peck, L. F., Huang, H., & Dorajoo, S. R. (2023). Phenotyping Diabetes Mellitus on Aggregated Electronic Health Records from Disparate Health Systems. Pharmacoepidemiology, 2(3), 223-235. https://doi.org/10.3390/pharma2030019

Article Metrics

Back to TopTop