Next Article in Journal
Affordability of Paediatric Oral Anti-Infective Medicines in a Selected District, Sri Lanka
Previous Article in Journal
Beyond Statins: Novel Lipid-Lowering Agents for Reducing Risk of Atherosclerotic Cardiovascular Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pioneering Arterial Hypertension Phenotyping on Nationally Aggregated Electronic Health Records

Vigilance & Compliance Branch, Health Products Regulation Group, Health Sciences Authority, Singapore 138667, Singapore
*
Author to whom correspondence should be addressed.
Pharmacoepidemiology 2024, 3(1), 169-182; https://doi.org/10.3390/pharma3010010
Submission received: 19 December 2023 / Revised: 1 March 2024 / Accepted: 4 March 2024 / Published: 12 March 2024

Abstract

:
Background: Hypertension is frequently studied in epidemiological studies that have been conducted using retrospective observational data, either as an outcome or a variable. However, there are few validation studies investigating the accuracy of hypertension phenotyping algorithms in aggregated electronic health record (EHR) data. Methods: Utilizing a centralized repository of inpatient EHR data from Singapore for the period of 2019–2020, a new algorithm that incorporates both diagnostic codes and medication details (Diag+Med) was devised. This algorithm was intended to supplement and improve the diagnostic code-only model (Diag-Only) for the classification of hypertension. We computed various metrics (sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV)) to assess the algorithm’s effectiveness in identifying hypertension on 2813 chart-reviewed records. This pool was composed of two patient cohorts: a random sampling of all inpatient admissions (Random Cohort) and a targeted group with atrial fibrillation diagnoses (AF Cohort). Results: The Diag+Med algorithm was more sensitive at detecting hypertension patients in both cohorts compared to the Diag-Only algorithm (83.8 and 87.6% vs. 68.2 and 66.5% in the Random and AF Cohorts, respectively). These improvements in sensitivity came at minimal costs in terms of PPV reductions (88.2 and 90.3% vs. 91.4 and 94.2%, respectively). Conclusion: The combined use of diagnosis codes and specific antihypertension medication exposure patterns facilitates a more accurate capture of patients with hypertension in a database of aggregated EHRs from diverse healthcare institutions in Singapore. The results presented here allow for the bias correction of risk estimates derived from observational studies involving hypertension.

1. Introduction

Hypertension remains a leading risk factor for cardiovascular disease and premature death worldwide [1]. Hypertension is, thus, an important primary outcome and covariate in epidemiological studies, which are increasingly being conducted on electronic health records (EHRs). In such studies, accurately identifying patients with hypertension is a necessary first step.
However, repurposing EHR data for secondary analyses presents key challenges [2,3]. Evidence suggests tendencies towards under-coding diagnoses related to cardiovascular risk factors, such as hypertension in an individual’s EHRs [4]. Using diagnosis codes alone to phenotype hypertension has been shown to result in a significant underestimation of the true disease prevalence [5].
Previous work in phenotyping hypertension has ranged from developing simple rule-based algorithms that use only hypertension-related diagnosis codes and/or antihypertensive medication exposures [6] to more complex machine-learning algorithms that require both structured and unstructured EHR data [7]. These models have yielded acceptable sensitivity and positive predictive value (PPV) statistics on validation. However, validation has been mostly limited to data arising from the same setting as those used to develop these algorithms. The performance of any hypertension phenotyping model would expectedly vary based on prevailing setting-specific practices such as the completeness of chronic disease coding and documentation as well as the extent of capture of the prescription records for chronic medications and blood pressure measurements.
Attempting to phenotype hypertension on a nationally aggregated EHR database that draws data from different healthcare settings (from primary to tertiary care) also presents a unique challenge using different EHR systems. Serving as a consolidated repository, these aggregated databases capture individual health statuses more comprehensively. Solutions to overcome the lack of standardization upon aggregation exist, such as the conversion of EHRs to a common data model (CDM) [2], but this requires significant effort, which may not be practical. Therefore, there is still a need for the development of a broadly generalizable model that can be applied to raw aggregated EHR databases.
The primary objective of this study is to develop and validate an algorithm for predicting arterial hypertension in patients using aggregated EHR data, particularly when direct blood pressure (BP) measurements are unavailable. Such an algorithm should allow for the prevalence estimation and bias correction of risk estimates in observational studies involving hypertension. Recognizing the constraints posed by the aggregated nature of consolidated electronic health record (EHR) databases (which amalgamate data in various formats from multiple hospitals where data completeness may not be consistent), our algorithm strategically utilizes diagnostic codes and medication data to estimate hypertension status. This approach is tailored to function effectively within the limitations of the available data. Furthermore, we aim to demonstrate the feasibility of creating a robust and generalizable phenotyping algorithm that can adapt to the diverse and large datasets often encountered in EHR settings, where ideal data may not always be accessible.

2. Results

The Random Cohort was composed of 1619 inpatient admissions, with 808 patients admitted in 2019 and 811 in 2020. The mean age for this cohort was 47.5 years in 2019 and 45.8 years in 2020, reflecting a broad age distribution among the general inpatient population.
In contrast, the AF Cohort, which included patients with atrial fibrillation, consisted of 608 patients in 2019 and 586 patients in 2020. Compared to the Random Cohort, the AF Cohort had an older mean age of 72.2 years and 72.4 years, respectively, in 2019 and 2020. Additionally, there was a higher proportion of Chinese patients and a slightly greater ratio of males to females in the AF Cohort compared to the Random Cohort, as detailed in Table 1. Table A1 provides a more detailed breakdown of age by gender and race.
The two validation cohorts differed in the underlying prevalence of hypertension (Table 1). The AF Cohort had an expectedly higher prevalence (75.8 and 79.2% in 2019 and 2020, respectively) versus that of the Random Cohort (37.1 and 41.5%).
The Diag+Med hypertension algorithm was applied to both validation cohorts (Figure A1), and the results were validated via chart review. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for the two validation cohorts. The overall performance metrics of the Diag+Med hypertension algorithm were compared with the Diag-Only control algorithm in Table 2.
Table 2. Comparison of Diag+Med algorithm performance with Diag-Only algorithm performance.
Table 2. Comparison of Diag+Med algorithm performance with Diag-Only algorithm performance.
Diag-Only (%)Diag+Med (%)
CohortRandom Cohort
(n = 1619)
AF Cohort
(n = 1194)
Random Cohort
(n = 1619)
AF Cohort
(n = 1194)
Sensitivity68.266.583.887.6
Specificity95.885.992.867.7
PPV91.494.288.290.3
NPV82.342.789.961.3
Diag-Only: diagnosis code-only algorithm (from diagnosis and Patient Problem List). Diag+Med: diagnosis code and medication algorithm (Figure 1). AF: atrial fibrillation; PPV: positive predictive value; NPV: negative predictive value.
Figure 1. Diagnosis- and medication-based hypertension phenotyping algorithm flow chart (Diag+Med algorithm). PPL: Patient Problem List; ACEi: angiotensin-converting enzyme inhibitor; ARB: angiotensin receptor blocker; BB: beta blocker; DHP-CCB: dihydropyridine-calcium channel blocker.
Figure 1. Diagnosis- and medication-based hypertension phenotyping algorithm flow chart (Diag+Med algorithm). PPL: Patient Problem List; ACEi: angiotensin-converting enzyme inhibitor; ARB: angiotensin receptor blocker; BB: beta blocker; DHP-CCB: dihydropyridine-calcium channel blocker.
Pharmacoepidemiology 03 00010 g001
For both the Random and AF Cohorts, the Diag+Med algorithm outperformed the Diag-Only algorithm in sensitivity from 66.5–68.2% (Diag-Only) to 83.8–87.6% (Diag+Med). The Diag+Med algorithm also outperformed the Diag-Only algorithm in NPV, while maintaining relatively similar PPVs in both cohorts. The Diag+Med algorithm displayed lower specificity compared to the Diag-Only algorithm.
The year-wise performance metrics by cohort (2019 and 2020) of the Diag+Med algorithm are shown in Table 3. In the Random Cohort, the Diag+Med algorithm had a sensitivity of 82.45–85.4% (2019 and 2020, respectively), specificity of 92.0–93.5%, PPV of 87.95–88.6%, and NPV of 88.1–91.6%. In the AF Cohort, the algorithm had a sensitivity of 85.9–89.2%, specificity of 66.7–68.9%, PPV of 89.0–91.6%, and NPV of 60.1–62.7%. Performance metrics were also calculated across years of admission, gender, and race (Table A2).

3. Discussion

This retrospective validation study, conducted on Singaporean hospital inpatients, demonstrated an overall good performance in phenotyping patients with arterial hypertension. The performance of our Diag+Med algorithm was comparable to other hypertension phenotype algorithms developed in other countries. These other studies similarly highlighted that diagnosis codes were usually only inadequate for capturing hypertension [5,7]. An example is an American study by Teixeira et al. in which they developed multiple algorithms using a combination of diagnosis codes, medications, and BP measurements, achieving sensitivity and PPV of above 80–90% [7].
However, the crux of our Diag+Med algorithm lies in its useability on heterogeneous aggregated EHR databases without requiring BP measurements, which is the primary objective of our study. Teixeira’s phenotyping algorithms were developed using data from only one hospital cluster [7], in contrast to our Diag+Med algorithm, which was developed and tested on aggregated data from multiple hospital clusters in Singapore and did not require BP measurements.
Another key advantage of our Diag+Med algorithm is that it operates directly on raw EHR data without requiring the laborious process of conversion to a CDM, unlike other rule-based hypertension phenotyping algorithms [8].
This algorithm is a pioneering study in the use of Singapore’s nationwide EHRs to phenotype patients. The strength of this study lies in the novelty and scale of the EHR data accessed (covering approximately 85% of all hospital admissions in Singapore [9]), as well as the large and varied cohorts used to evaluate the generalizability of the proposed algorithm. The algorithm validation involved a relatively large sample size of different patient profiles from multiple contributing healthcare institutions, showing that the algorithm is fit for use on diverse patient populations from different healthcare clusters. This facilitates the identification of patients with hypertension from aggregated data sources in Singapore without the need for additional harmonization or processing (such as conversion to a CDM).
Hypertension is a chronic condition that is usually managed on an outpatient basis [10]; therefore, physicians may not input hypertension as a diagnosis for an inpatient admission. This illustrates the importance of including patient medication data in the phenotyping of hypertension, as it considerably improves sensitivity without excessively sacrificing specificity.
Compared to the AF Cohort, the Diag+Med algorithm was better at distinguishing negative cases in the Random Cohort. This was likely due to the difference in patient profiles between the two cohorts, with the AF Cohort having a much higher underlying prevalence of hypertension. Higher hypertension prevalence in the AF Cohort resulted in lower specificity [11]. Evidence also suggests that increased prevalence may result in a variance in sensitivity and specificity even though these measures are theoretically independent of prevalence, possibly due to other mechanisms [12].
The Diag+Med algorithm performed consistently over consecutive years (2019 and 2020, shown in Table 3), with all performance metrics (sensitivity, specificity, PPV, and NPV) varying less than 5% across the years for all validation cohorts. The data demonstrate the algorithm’s stability throughout the 2020 period, indicating resilience to any potential impact caused by the COVID-19 pandemic. This is critical as it suggests that the algorithm can reliably function under varying conditions, a feature that is essential for real-world applications.
A sub-group analysis of the Diag+Med algorithm across gender and race was also conducted (Table A2). The algorithm performed consistently across genders and most races in Singapore. However, caution should be taken in interpreting the results of the sub-group analysis. The findings may be attributable to chance, especially since the validation study was not specifically designed to focus on these sub-group analyses, and certain sub-groups are relatively small (such as the Others ethnicity as listed in Table 1).
There are some limitations to the EHR database available to researchers. Due to the nature of the EHR database, which lacked vital sign readings such as BP readings, it was crucial to develop an algorithm that was able to phenotype hypertension without such data. There remains a need for an algorithm that can estimate the prevalence of hypertension and adjust risk estimates in epidemiological studies using the available data. Our algorithm is designed to work within these constraints, leveraging diagnostic codes and medication data to provide the best possible estimation of hypertension status in the absence of direct BP measurements. It is notable that in this study, BP readings were not needed to produce a hypertension algorithm with a good performance.
Our study was developed and validated on hospital inpatients. Due to the nature of our database, which contained unstructured notes from inpatient settings but not outpatient settings, our ability to carry out comprehensive chart reviews on non-hospitalized patients was limited. Inpatient cases would have their past medical history extensively documented in the discharge summary, but not outpatients; hence, it was not possible to carry out a robust algorithm validation on an outpatient study cohort.
Furthermore, the algorithm may not accurately predict patients who are followed up in private settings, such as by general practitioners (GPs) or in private hospitals, as their medications and outpatient visits are not available in the database. This likely contributed to the false negative cases in the algorithm’s validation as there is no visibility of the patient data outside of their inpatient admissions.
In Singapore, some medications for hypertension are commonly prescribed for other conditions, such as heart failure and coronary heart syndrome (e.g., angiotensin II receptor blockers and ACE inhibitors) [13]. This potentially contributes to the false positive rate in the algorithm, and further improvements to the algorithm should consider the presence of such comorbidities in addition to patients’ medication lists.
The performance of the algorithm may vary over time as hypertension prescribing guidelines, coding practices, and EHR systems in public hospitals may change in the future. Caution must be taken to ensure that these underlying trends are stable before applying the hypertension algorithm to cohorts.

4. Materials and Methods

4.1. Data Sources

All available historical records were extracted from a database that contains aggregated, de-identified clinical data from all public healthcare institutions in Singapore. This database covers approximately 85% of all hospital admissions and over 40% of all chronic outpatient visits [8]. The database did not undergo prior harmonization or processing (e.g., conversion to a CDM). Structured clinical data include patient demographics, diagnosis codes in SNOMED (Systematized Nomenclature of Medicine), ICD-10 (International Classification of Diseases, 10th Revision) formats, and dispensed medication records from both outpatient and inpatient settings. Unstructured clinical data include hospital discharge summaries and emergency department visit notes.
Diagnosis codes were extracted from two data tables: (1) diagnosis and (2) Patient Problem List (PPL). A Patient Problem List includes active issues with current management, background chronic conditions, and resolved past medical issues. Medications dispensed at the outpatient or inpatient discharge pharmacies from all contributing data centers were extracted from the medications table.
Data elements from structured and unstructured clinical data (such as discharge summaries and laboratory tests) were available for a chart review; however, vital sign readings such as blood pressure (BP) were not accessible.

4.2. Algorithm Development and Validation

The primary outcome of interest was defined as chronic arterial hypertension, with the exclusion of pulmonary hypertension, pre-eclampsia/gestational hypertension, ocular hypertension, peripheral venous hypertension, and portal hypertension. Diagnosis codes from the diagnosis and PPL tables, and medications from the medications table, were used to develop a combined diagnosis and medication data-based hypertension phenotyping algorithm, as shown in Figure 1 (Diag+Med algorithm). A diagnosis code-only algorithm, which exclusively relied on the presence of diagnosis codes, was used as a control (Diag-Only algorithm), as shown in Figure 2.
A broad list of candidate SNOMED and ICD diagnosis codes for hypertension was first identified from ICD-9 AM, ICD-10, and SNOMED CT browsers. A frequency of use assessment was conducted to identify commonly used codes; diagnosis codes with fewer than 10 patients found in the database between 2018 and 2021 were removed. A hospital physician vetted the remaining diagnosis codes and descriptions to ensure their appropriateness for identifying chronic hypertension (Table A3). First- and second-line medications used to manage hypertension were shortlisted based on the American Heart Association’s 2017 [14] and the Ministry of Health (Singapore)’s 2017 guidelines [9] (Table A4). Patients treated with beta blockers alone were not included. The diagnosis codes and medications listed in Table A3 and Table A4 were used in the phenotyping algorithms.
For the Diag+Med algorithm (Figure 1), patients were categorized as hypertensive if they had any PPL or diagnosis table records with a diagnosis code found in Table A3. If the patient did not have any diagnosis codes in Table A3, the algorithm would look at the medications prescribed to the patient. Patients were classified as hypertensive if they were prescribed one medication listed in Table A4, specifically an angiotensin-converting enzyme inhibitor (ACEi), angiotensin receptor blocker (ARB), dihydropyridine-calcium channel blocker (DHP-CCB), or thiazide diuretic. Alternatively, patients were also classified as hypertensive if they were prescribed a combination of any two of the following medications from Table A4: (a) ACEi or ARB with a beta blocker (BB), (b) ACEi or ARB with a DHP-CCB, (c) ACEi or ARB with a thiazide diuretic, (d) BB with a DHP-CCB, or (e) DHP-CCB with a thiazide diuretic.
For the Diag-Only algorithm (Figure 2), patients were categorized as hypertensive if they had any PPL or diagnosis table records with a diagnosis code found in Table A3.
The hypertension algorithms (Diag+Med and Diag-Only) were applied on two validation cohorts (Random Cohort and AF Cohort). These validation cohorts were constructed by sampling patients admitted to any public health institution between 2019 and 2020, as shown in Figure 3. The Random Cohort consists of a random sample of inpatient admissions in 2019 or 2020 from any public health institution. Random sampling of the dataset was necessitated by the constraints of our available computational resources. The AF Cohort consists of inpatient admissions with a new diagnosis of atrial fibrillation (AF) in 2019 and 2020. The inclusion criteria for the AF Cohort were defined as a new onset of primary or secondary diagnoses of AF (an ICD-10 or SNOMED diagnosis code of atrial fibrillation) and the initiation of one of the drugs of interest (apixaban, rivaroxaban, or warfarin) within 2 days before the date of discharge.
The Random Cohort was used to assess the algorithm’s generalizability and performance in a diverse patient population. Additionally, the AF Cohort was chosen due to the higher prevalence of hypertension within this group compared to the general inpatient population, providing a robust test for the algorithm’s sensitivity in a group with higher prevalence.
Trained annotators from the Health Sciences Authority independently assessed all sampled admissions from both validation cohorts via a chart review of the aggregated database. Only data that were recorded before or on the discharge date of the patient’s inpatient admission episode were used.
A trial annotation run-in phase was conducted for annotators to practice annotating for hypertension on a common set of 200 patient charts (not included in this study) to assess potential variability in annotation accuracy. An excellent inter-annotator agreement of 0.89–0.99 was achieved on the 200 practice set records (pairwise Cohen’s Kappa, Table A5). Thereafter, independent (non-overlapping) annotations were carried out on all records in both AF and Random validation cohorts to develop a gold-standard label for each patient in the validation cohorts.

4.3. Statistical Analysis

Algorithm performance metrics (sensitivity, specificity, PPV, and NPV) were calculated by comparing the hypertension predictions of the algorithm with the gold-standard labels reviewed in the patient charts during annotation. Sensitivity was calculated by taking the proportion of confirmed hypertension cases that were predicted to be positive (true positives) out of all confirmed hypertension cases (true positives and false negatives). Specificity was calculated by taking the proportion of confirmed non-hypertension cases that were predicted to be negative (true negatives) out of all confirmed non-hypertension cases (true negatives and false positives). PPV was calculated by taking the proportion of confirmed hypertension cases that were predicted to be positive (true positives) out of all cases that were predicted positive (true positives and false positives). NPV was calculated by taking the proportion of confirmed non-hypertension cases that were predicted to be negative (true negatives) out of all cases that were predicted negative (true negatives and false negatives). The algorithm’s performance metrics and Cohen’s Kappa were calculated using Spyder (Python 3.8).

5. Conclusions

The development and validation of a hypertension phenotyping algorithm with high sensitivity and specificity were not only beneficial for identifying hypertensive patients in various clinical and pharmacoepidemiology studies, but they also demonstrated its effectiveness in the context of Singapore. This algorithm is particularly noteworthy as it can be successfully applied to national aggregated data sourced from diverse healthcare institutions across the country, without requiring harmonization or conversion to a CDM. This makes it a versatile and robust tool for the identification of patients with hypertension within the healthcare landscape in Singapore to facilitate risk estimate adjustments or quantitative bias analysis in epidemiological studies.

Author Contributions

J.W.N., Q.X. and S.R.D. designed the study. J.W.N. and Q.X. analyzed the data. J.W.N., P.S.A., H.X.T., B.F., Y.L.K., A.N., S.H.T., D.T., M.Y.T., A.Y., N.N., C.W.P.L., L.F.P., H.H. and S.R.D. provided the domain expertise for the manual annotation of discharge summaries. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The information used for this study is not available in the public domain. The code is not available as it has been written for specific fields in the dataset. Interested parties may refer to the algorithm’s logic, which is included in this paper.

Acknowledgments

Yan Tong Loo vetted diagnosis codes for use in the hypertension phenotyping algorithm.

Conflicts of Interest

The authors have no conflicts of interest that are directly relevant to the content of this article. The views expressed in this article may not be understood or quoted as being made on behalf of or reflecting the position of HSA.

Appendix A

Table A1. Demographic Profile of Atrial Fibrillation Cohort and Random Cohort, with breakdown by gender and race.
Table A1. Demographic Profile of Atrial Fibrillation Cohort and Random Cohort, with breakdown by gender and race.
Random Cohort (n = 1619)AF Cohort (n = 1194)
2019 (n = 808)2020 (n = 811)2019 (n = 608)2020 (n = 586)
Hypertension Yes335 (41.5%)301 (37.1%)461 (75.8%)464 (79.2%)
No473 (58.5%)510 (62.9%)147 (24.2%)122 (20.8%)
GenderMale380 (47.0%)401 (49.4%)305 (50.2%)310 (52.9%)
Female428 (53.0%)410 (50.6%)303 (49.8%)276 (47.1%)
RaceChinese 514 (63.6%)489 (60.3%)451 (74.2%)458 (78.2%)
Malay139 (17.2%)137 (16.8%)92 (15.1%)81 (13.8%)
Indian 84 (10.4%)99 (12.3%)29 (4.8%)25 (4.3%)
Others71 (8.8%)86 (10.6%)36 (5.9%)22 (3.8%)
Age
(Mean, SD)
Overall47.5 (28.8)45.8 (27.5)72.2 (11.8)72.4 (12.0)
Male47.5 (29.9)47.0 (28.4)69.1 (11.6)69.7 (12.0)
Female47.5 (27.9)44.6 (26.4)75.3 (11.2)75.3 (11.4)
Chinese 52.8 (28.1)52.9 (27.8)73.8 (11.0)73.7 (10.9)
Malay33.2 (26.7)31.6 (24.8)67.6 (15.6)69.6 (13.7)
Indian 45.8 (27.4)39.3 (22.9)69.0 (11.0)66.7 (13.6)
Others39.1 (28.7)35.2 (19.5)63.9 (14.6)67.8 (18.3)
Total 808 (100.0%)811 (100.0%)608 (100.0%)586 (100.0%)
AF: atrial fibrillation.
Figure A1. Illustrative flow chart of Diag+Med algorithm applied to 2019 Random Cohort. PPL: Patient Problem List; ACEi: angiotensin-converting enzyme inhibitor; ARB: angiotensin receptor blocker; BB: beta blocker; DHP-CCB: dihydropyridine-calcium channel blocker.
Figure A1. Illustrative flow chart of Diag+Med algorithm applied to 2019 Random Cohort. PPL: Patient Problem List; ACEi: angiotensin-converting enzyme inhibitor; ARB: angiotensin receptor blocker; BB: beta blocker; DHP-CCB: dihydropyridine-calcium channel blocker.
Pharmacoepidemiology 03 00010 g0a1
Table A2. Breakdown of Diag+Med algorithm performance compared with Diag-Only algorithm performance by year of admission, gender, and race.
Table A2. Breakdown of Diag+Med algorithm performance compared with Diag-Only algorithm performance by year of admission, gender, and race.
Diag-Only (%)Diag+Med (%)
Random Cohort
(n = 1619)
AF Cohort
(n = 1194)
Random Cohort
(n = 1619)
AF Cohort
(n = 1194)
SensitivityOverall68.266.583.887.6
201965.163.182.485.9
202071.869.885.489.2
Male67.168.382.487.7
Female69.764.685.587.4
Chinese68.768.184.388.6
Malay71.258.486.484.7
Indian76.568.688.288.6
Others45.263.466.778.0
SpecificityOverall95.885.992.867.7
201995.385.792.066.7
202096.386.193.568.9
Male95.084.191.965.6
Female96.588.193.570.3
Chinese93.985.390.267.0
Malay98.186.195.775.0
Indian98.384.295.757.9
Others98.394.196.570.6
PPV Overall91.494.288.290.3
201990.893.387.989.0
202091.995.088.691.6
Male91.393.088.888.7
Female91.495.587.692.0
Chinese90.594.488.090.7
Malay92.294.186.492.8
Indian96.388.992.379.5
Others90.596.387.586.5
NPV Overall82.342.789.961.3
201979.442.688.160.1
202085.242.991.662.7
Male78.746.487.063.5
Female85.539.092.358.9
Chinese78.042.587.262.0
Malay91.635.295.756.2
Indian87.659.393.273.3
Others83.151.688.857.1
AF: atrial fibrillation.
Table A3. Diagnosis codes used in hypertension phenotyping algorithm.
Table A3. Diagnosis codes used in hypertension phenotyping algorithm.
No.Diagnosis CodeDiagnosis DescriptionFormat
138341003Hypertensive disorderSNOMED
259621000Essential hypertensionSNOMED
310725009Benign hypertensionSNOMED
438481006Hypertensive renal diseaseSNOMED
51201005Benign essential hypertensionSNOMED
66962006Hypertensive retinopathySNOMED
764715009Hypertensive heart diseaseSNOMED
856218007Systolic hypertensionSNOMED
9170578008Poor hypertension controlSNOMED
10I10Essential (primary) hypertensionICD-10
1186041002Pre-existing hypertension in obstetric contextSNOMED
1286234004Hypertensive heart AND renal diseaseSNOMED
13473392002Hypertensive nephrosclerosisSNOMED
14266287006(Hypertensive disease) or (hypertension)SNOMED
158762007Chronic hypertension in obstetric contextSNOMED
16712832005Supine hypertensionSNOMED
175148006Hypertensive heart disease with congestive heart failureSNOMED
1865402008Pre-existing hypertension complicating AND/OR reason for care during pregnancySNOMED
1978975002Malignant essential hypertensionSNOMED
20194779001Hypertensive heart and renal disease with (congestive) heart failureSNOMED
2146113002Hypertensive heart failureSNOMED
2248146000Diastolic hypertensionSNOMED
23194767001Benign hypertensive heart disease with congestive cardiac failureSNOMED
24397748008Hypertension with albuminuriaSNOMED
2549220004Hypertensive renal failureSNOMED
26443482000Hypertensive urgencySNOMED
2762275004Hypertensive episodeSNOMED
2850490005Hypertensive encephalopathySNOMED
29706882009Hypertensive crisisSNOMED
3070272006Malignant hypertensionSNOMED
3131992008Secondary hypertensionSNOMED
32161501007H/O: hypertension * SNOMED
3352698002Transient hypertensionSNOMED
34123799005Renovascular hypertensionSNOMED
3528119000Renal hypertensionSNOMED
36193003Benign hypertensive renal disease (disorder)SNOMED
37194785008Benign secondary hypertensionSNOMED
38449759005Hypertensive complicationSNOMED
39428163005Hypertensive left ventricular hypertrophySNOMED
4089242004Malignant secondary hypertensionSNOMED
4137618003Chronic hypertension complicating AND/OR reason for care during pregnancySNOMED
* History of hypertension.
Table A4. Medications used in hypertension phenotyping algorithm.
Table A4. Medications used in hypertension phenotyping algorithm.
No.Class of Medicine Included ATC L4 CodeIncluded Drugs (Not Exclusive)Excluded Drugs
1Dihydropyridine derivativesC08CA
C08GA
Amlodipine
Nifedipine
Felodipine
Lacidipine
Cilnidipine
Nimodipine
2Angiotensin II antagonists, plainC09CALosartan
Valsartan
Telmisartan
Irbesartan
Candesartan
Olmesartan medoxomil
3ACE inhibitors, plainC09AAEnalapril
Lisinopril
Perindopril
Captopril
Ramipril
Imidapril
4Beta blocking agents, selectiveC07ABAtenolol
Bisoprolol
Metoprolol
Nebivolol
Sotalol
Timolol
Betaxolol
Esmolol
5Angiotensin II antagonists and calcium channel blockersC09DB
C09DX
Valsartan and amlodipine
Telmisartan and amlodipine
Olmesartan, medoxomil, and amlodipine
6Thiazides, plainC03AAHydrochlorothiazide
7Sulfonamides, plainC03BA
C03CA
Furosemide
Indapamide
Metolazone
Bumetanide
Verapamil
8Alpha and beta blocking agentsC07AGCarvedilol
Labetalol
9Organic nitratesC01DA
C01DB
C01DX
Isosorbide dinitrate
Isosorbide mononitrate
Glyceryl trinitrate
10Beta blocking agents, non-selectiveC07AAPropranolol
Nadolol
11Angiotensin II antagonists and diureticsC09DAValsartan and diuretics
Losartan and diuretics
Irbesartan and diuretics
12Benzothiazepine derivativesC08DBDiltiazem
13Aldosterone antagonistsC03DASpironolactone
Eplerenone
14Beta blocking agents, selective, and other antihypertensivesC07FB
C07FX
Atenolol and other antihypertensives
15ACE inhibitors and calcium channel blockersC09BBPerindopril and amlodipine
16Low-ceiling diuretics and potassium-sparing agentsC03EAHydrochlorothiazide and potassium-sparing agents
17ACE inhibitors, other combinationsC09BXPerindopril, amlodipine and indapamide
Cosyrel
18Angiotensin II antagonists, other combinationsC09DXSacubitril-valsartan
NAOther excluded medicinesC03XA
C01DX
Tolvaptan
Nicorandil
Table A5. Inter-annotator agreement among annotators (Cohen’s Kappa). Pairwise Cohen’s Kappa score between fifteen annotators on training dataset of 200 discharge summaries.
Table A5. Inter-annotator agreement among annotators (Cohen’s Kappa). Pairwise Cohen’s Kappa score between fifteen annotators on training dataset of 200 discharge summaries.
123456789101112131415
1 0.990.990.960.970.950.970.970.920.960.980.930.960.990.96
20.99 0.980.950.960.960.980.960.930.950.990.940.950.980.97
30.990.98 0.970.980.940.960.980.930.950.970.940.970.980.95
40.960.950.97 0.950.910.930.950.90.920.960.950.960.950.92
50.970.960.980.95 0.920.940.960.910.950.950.920.950.960.93
60.950.960.940.910.92 0.940.920.910.910.950.920.950.960.93
70.970.980.960.930.940.94 0.940.910.930.970.920.930.960.95
80.970.960.980.950.960.920.94 0.910.930.950.920.950.960.93
90.920.930.930.900.910.910.910.91 0.900.920.890.920.910.90
100.960.950.950.920.950.910.930.930.90 0.940.890.920.950.92
110.980.990.970.960.950.950.970.950.920.94 0.950.940.970.96
120.930.940.940.950.920.920.920.920.890.890.95 0.910.940.91
130.960.950.970.960.950.950.930.950.920.920.940.91 0.950.92
140.990.980.980.950.960.960.960.960.910.950.970.940.95 0.95
150.960.970.950.920.930.930.950.930.900.920.960.910.920.95

References

  1. Brouwers, S.; Sudano, I.; Kokubo, Y.; Sulaica, E.M. Arterial hypertension. Lancet 2021, 10296, 249–261. [Google Scholar] [CrossRef] [PubMed]
  2. Ta, C.N.; Weng, C. Detecting Systemic Data Quality Issues in Electronic Health Records. Stud. Health Technol. Inform. 2019, 264, 383–387. [Google Scholar] [CrossRef] [PubMed]
  3. D’Amore, J. Electronic Health Record Data Governance and Data Quality in the Real World. Healthcare Information and Management Systems Society. 2023. Available online: https://www.himss.org/resources/electronic-health-record-data-governance-and-data-quality-real-world (accessed on 16 February 2023).
  4. Angelow, A.; Reber, K.C.; Schmidt, C.O.; Baumeister, S.E.; Chenot, J.-F. Prevalence of Cardiovascular Risk Factors at The Population Level: A Comparison of Ambulatory Physician-Coded Claims Data with Clinical Data from A Population-Based Study. Gesundheitswesen 2019, 81, 791–800. [Google Scholar] [CrossRef] [PubMed]
  5. Peng, M.; Chen, G.; Kaplan, G.G.; Lix, L.M.; Drummond, N.; Lucyk, K.; Garies, S.; Lowerison, M.; Weibe, S.; Quan, H. Methods of defining hypertension in electronic medical records: Validation against national survey data. J. Public Health 2016, 38, e392–e399. [Google Scholar] [CrossRef] [PubMed]
  6. Nadkarni, G.N.; Gottesman, O.; Linneman, J.G.; Chase, H.; Berg, R.L.; Farouk, S.; Nadukuru, R.; Lotay, V.; Ellis, S.; Hripcsak, G.; et al. Development and validation of an electronic phenotyping algorithm for chronic kidney disease. AMIA Annu. Symp. Proc. 2014, 2014, 907–916. [Google Scholar] [PubMed]
  7. Teixeira, P.L.; Wei, W.-Q.; Cronin, R.M.; Mo, H.; VanHouten, J.P.; Carroll, R.J.; LaRose, E.; Bastarache, L.A.; Rosenbloom, S.T.; Edwards, T.L.; et al. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. J. Am. Med. Inform. Assoc. 2016, 24, 162–171. [Google Scholar] [CrossRef] [PubMed]
  8. McDonough, C.W.; Babcock, K.; Chucri, K.; Crawford, D.C.; Bian, J.; Modave, F.; Cooper-DeHoff, R.M.; Hogan, W.R. Optimizing identification of resistant hypertension: Computable phenotype development and validation. Pharmacoepidemiol. Drug Saf. 2020, 29, 1393–1401. [Google Scholar] [CrossRef] [PubMed]
  9. Tan, C.C.; Lam, C.S.P.; Matchar, D.B.; Zee, Y.K.; Wong, J.E.L. Singapore’s health-care system: Key features, challenges, and shifts. Lancet 2021, 398, 1091–1104. [Google Scholar] [CrossRef] [PubMed]
  10. MOH Clinical Practice Guidelines on Hypertension. Ministry of Health, Singapore. 2023. Available online: https://www.moh.gov.sg/hpp/doctors/guidelines/GuidelineDetails/cpgmed_hypertension (accessed on 16 February 2023).
  11. Parikh, R.; Mathai, A.; Parikh, S.; Chandra Sekhar, G.; Thomas, R. Understanding and using sensitivity, specificity and predictive values. Indian J. Ophthalmol. 2008, 56, 45–50. [Google Scholar] [CrossRef]
  12. Leeflang, M.M.; Rutjes, A.W.; Reitsma, J.B.; Hooft, L.; Bossuyt, P.M. Variation of a test’s sensitivity and specificity with disease prevalence. CMAJ 2013, 185, E537–E544. [Google Scholar] [CrossRef] [PubMed]
  13. Huang, W.; Lee, S.G.S.; How, C.H. Management of the heart failure patient in the primary care setting. Singapore Med. J. 2020, 61, 225–229. [Google Scholar] [CrossRef]
  14. Whelton, P.K.; Carey, R.M.; Aronow, W.S.; Casey, D.E., Jr.; Collins, K.J.; Himmelfarb, C.D.; DePalma, S.M.; Gidding, S.; Jamerson, K.A.; Jones, D.W.; et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Hypertension 2018, 71, e13–e115. [Google Scholar] [CrossRef]
Figure 2. Diagnosis code-based hypertension phenotyping algorithm flow chart (Diag-Only algorithm). PPL: Patient Problem List.
Figure 2. Diagnosis code-based hypertension phenotyping algorithm flow chart (Diag-Only algorithm). PPL: Patient Problem List.
Pharmacoepidemiology 03 00010 g002
Figure 3. Construction of Random Cohort and Atrial Fibrillation (AF) Cohort.
Figure 3. Construction of Random Cohort and Atrial Fibrillation (AF) Cohort.
Pharmacoepidemiology 03 00010 g003
Table 1. Demographic profile of Atrial Fibrillation Cohort and Random Cohort.
Table 1. Demographic profile of Atrial Fibrillation Cohort and Random Cohort.
Random Cohort (n = 1619)AF Cohort (n = 1194)
2019 (n = 808)2020 (n = 811)2019 (n = 608)2020 (n = 586)
Hypertension Yes335 (41.5%)301 (37.1%)461 (75.8%)464 (79.2%)
No473 (58.5%)510 (62.9%)147 (24.2%)122 (20.8%)
GenderMale380 (47.0%)401 (49.4%)305 (50.2%)310 (52.9%)
Female428 (53.0%)410 (50.6%)303 (49.8%)276 (47.1%)
RaceChinese 514 (63.6%)489 (60.3%)451 (74.2%)458 (78.2%)
Malay139 (17.2%)137 (16.8%)92 (15.1%)81 (13.8%)
Indian 84 (10.4%)99 (12.3%)29 (4.8%)25 (4.3%)
Others71 (8.8%)86 (10.6%)36 (5.9%)22 (3.8%)
AgeMean 47.545.872.272.4
Standard deviation28.827.511.812.0
Total 808 (100.0%)811 (100.0%)608 (100.0%)586 (100.0%)
AF: atrial fibrillation.
Table 3. Performance of the Diag+Med algorithm on the AF Cohort and the Random Cohort.
Table 3. Performance of the Diag+Med algorithm on the AF Cohort and the Random Cohort.
Random Cohort (n = 1619)AF Cohort (n = 1194)
Statistics 2019 (%)2020 (%)Overall (%)2019 (%)2020 (%)Overall (%)
Sensitivity82.485.483.885.989.287.6
Specificity92.093.592.866.768.967.7
PPV 87.988.688.289.091.690.3
NPV 88.191.689.960.162.761.3
AF: atrial fibrillation; PPV: positive predictive value; NPV: negative predictive value.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Neo, J.W.; Xie, Q.; Ang, P.S.; Tan, H.X.; Foo, B.; Koon, Y.L.; Ng, A.; Tan, S.H.; Teo, D.; Tham, M.Y.; et al. Pioneering Arterial Hypertension Phenotyping on Nationally Aggregated Electronic Health Records. Pharmacoepidemiology 2024, 3, 169-182. https://doi.org/10.3390/pharma3010010

AMA Style

Neo JW, Xie Q, Ang PS, Tan HX, Foo B, Koon YL, Ng A, Tan SH, Teo D, Tham MY, et al. Pioneering Arterial Hypertension Phenotyping on Nationally Aggregated Electronic Health Records. Pharmacoepidemiology. 2024; 3(1):169-182. https://doi.org/10.3390/pharma3010010

Chicago/Turabian Style

Neo, Jing Wei, Qihuang Xie, Pei San Ang, Hui Xing Tan, Belinda Foo, Yen Ling Koon, Amelia Ng, Siew Har Tan, Desmond Teo, Mun Yee Tham, and et al. 2024. "Pioneering Arterial Hypertension Phenotyping on Nationally Aggregated Electronic Health Records" Pharmacoepidemiology 3, no. 1: 169-182. https://doi.org/10.3390/pharma3010010

Article Metrics

Back to TopTop