Pioneering Arterial Hypertension Phenotyping on Nationally Aggregated Electronic Health Records

: Background: Hypertension is frequently studied in epidemiological studies that have been conducted using retrospective observational data, either as an outcome or a variable. However, there are few validation studies investigating the accuracy of hypertension phenotyping algorithms in aggregated electronic health record (EHR) data. Methods: Utilizing a centralized repository of inpatient EHR data from Singapore for the period of 2019–2020, a new algorithm that incorporates both diagnostic codes and medication details (Diag+Med) was devised. This algorithm was intended to supplement and improve the diagnostic code-only model (Diag-Only) for the classification of hypertension. We computed various metrics (sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV)) to assess the algorithm’s effectiveness in identifying hypertension on 2813 chart-reviewed records. This pool was composed of two patient cohorts: a random sampling of all inpatient admissions (Random Cohort) and a targeted group with atrial fibrillation diagnoses (AF Cohort). Results: The Diag+Med algorithm was more sensitive at detecting hypertension patients in both cohorts compared to the Diag-Only algorithm (83.8 and 87.6% vs. 68.2 and 66.5% in the Random and AF Cohorts, respectively). These improvements in sensitivity came at minimal costs in terms of PPV reductions (88.2 and 90.3% vs. 91.4 and 94.2%, respectively). Conclusion: The combined use of diagnosis codes and specific antihypertension medication exposure patterns facilitates a more accurate capture of patients with hypertension in a database of aggregated EHRs from diverse healthcare institutions in Singapore. The results presented here allow for the bias correction of risk estimates derived from observational studies involving hypertension.


Introduction
Hypertension remains a leading risk factor for cardiovascular disease and premature death worldwide [1].Hypertension is, thus, an important primary outcome and covariate in epidemiological studies, which are increasingly being conducted on electronic health records (EHRs).In such studies, accurately identifying patients with hypertension is a necessary first step.
However, repurposing EHR data for secondary analyses presents key challenges [2,3].Evidence suggests tendencies towards under-coding diagnoses related to cardiovascular risk factors, such as hypertension in an individual's EHRs [4].Using diagnosis codes alone to phenotype hypertension has been shown to result in a significant underestimation of the true disease prevalence [5].
Previous work in phenotyping hypertension has ranged from developing simple rule-based algorithms that use only hypertension-related diagnosis codes and/or antihypertensive medication exposures [6] to more complex machine-learning algorithms that require both structured and unstructured EHR data [7].These models have yielded acceptable sensitivity and positive predictive value (PPV) statistics on validation.However, validation has been mostly limited to data arising from the same setting as those used to develop these algorithms.The performance of any hypertension phenotyping model would expectedly vary based on prevailing setting-specific practices such as the completeness of chronic disease coding and documentation as well as the extent of capture of the prescription records for chronic medications and blood pressure measurements.
Attempting to phenotype hypertension on a nationally aggregated EHR database that draws data from different healthcare settings (from primary to tertiary care) also presents a unique challenge using different EHR systems.Serving as a consolidated repository, these aggregated databases capture individual health statuses more comprehensively.Solutions to overcome the lack of standardization upon aggregation exist, such as the conversion of EHRs to a common data model (CDM) [2], but this requires significant effort, which may not be practical.Therefore, there is still a need for the development of a broadly generalizable model that can be applied to raw aggregated EHR databases.
The primary objective of this study is to develop and validate an algorithm for predicting arterial hypertension in patients using aggregated EHR data, particularly when direct blood pressure (BP) measurements are unavailable.Such an algorithm should allow for the prevalence estimation and bias correction of risk estimates in observational studies involving hypertension.Recognizing the constraints posed by the aggregated nature of consolidated electronic health record (EHR) databases (which amalgamate data in various formats from multiple hospitals where data completeness may not be consistent), our algorithm strategically utilizes diagnostic codes and medication data to estimate hypertension status.This approach is tailored to function effectively within the limitations of the available data.Furthermore, we aim to demonstrate the feasibility of creating a robust and generalizable phenotyping algorithm that can adapt to the diverse and large datasets often encountered in EHR settings, where ideal data may not always be accessible.

Results
The Random Cohort was composed of 1619 inpatient admissions, with 808 patients admitted in 2019 and 811 in 2020.The mean age for this cohort was 47.5 years in 2019 and 45.8 years in 2020, reflecting a broad age distribution among the general inpatient population.
In contrast, the AF Cohort, which included patients with atrial fibrillation, consisted of 608 patients in 2019 and 586 patients in 2020.Compared to the Random Cohort, the AF Cohort had an older mean age of 72.2 years and 72.4 years, respectively, in 2019 and 2020.Additionally, there was a higher proportion of Chinese patients and a slightly greater ratio of males to females in the AF Cohort compared to the Random Cohort, as detailed in Table 1.Table A1 provides a more detailed breakdown of age by gender and race.
The two validation cohorts differed in the underlying prevalence of hypertension (Table 1).The AF Cohort had an expectedly higher prevalence (75.8 and 79.2% in 2019 and 2020, respectively) versus that of the Random Cohort (37.1 and 41.5%).
The Diag+Med hypertension algorithm was applied to both validation cohorts (Figure A1), and the results were validated via chart review.Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for the two validation cohorts.The overall performance metrics of the Diag+Med hypertension algorithm were compared with the Diag-Only control algorithm in Table 2.
For both the Random and AF Cohorts, the Diag+Med algorithm outperformed the Diag-Only algorithm in sensitivity from 66.5-68.2%(Diag-Only) to 83.8-87.6%(Diag+Med).The Diag+Med algorithm also outperformed the Diag-Only algorithm in NPV, while maintaining relatively similar PPVs in both cohorts.The Diag+Med algorithm displayed lower specificity compared to the Diag-Only algorithm.The two validation cohorts differed in the underlying prevalence of hypertension (Table 1).The AF Cohort had an expectedly higher prevalence (75.8 and 79.2% in 2019 and 2020, respectively) versus that of the Random Cohort (37.1 and 41.5%).
The Diag+Med hypertension algorithm was applied to both validation cohorts (Figure A1), and the results were validated via chart review.Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for the two validation cohorts.The overall performance metrics of the Diag+Med hypertension algorithm were compared with the Diag-Only control algorithm in Table 2.
For both the Random and AF Cohorts, the Diag+Med algorithm outperformed the Diag-Only algorithm in sensitivity from 66.5-68.2%(Diag-Only) to 83.8-87.6%(Diag+Med).The Diag+Med algorithm also outperformed the Diag-Only algorithm in NPV, while maintaining relatively similar PPVs in both cohorts.The Diag+Med algorithm displayed lower specificity compared to the Diag-Only algorithm.1).AF: atrial fibrillation; PPV: positive predictive value; NPV: negative predictive value.The year-wise performance metrics by cohort (2019 and 2020) of the Diag+Med algorithm are shown in Table 3.In the Random Cohort, the Diag+Med algorithm had a sensitivity of 82.45-85.4% (2019 and 2020, respectively), specificity of 92.0-93.5%,PPV of 87.95-88.6%,and NPV of 88.1-91.6%.In the AF Cohort, the algorithm had a sensitivity of 85.9-89.2%,specificity of 66.7-68.9%,PPV of 89.0-91.6%, and NPV of 60.1-62.7%.Performance metrics were also calculated across years of admission, gender, and race (Table A2).

Discussion
This retrospective validation study, conducted on Singaporean hospital inpatients, demonstrated an overall good performance in phenotyping patients with arterial hypertension.The performance of our Diag+Med algorithm was comparable to other hypertension phenotype algorithms developed in other countries.These other studies similarly highlighted that diagnosis codes were usually only inadequate for capturing hypertension [5,7].An example is an American study by Teixeira et al. in which they developed multiple algorithms using a combination of diagnosis codes, medications, and BP measurements, achieving sensitivity and PPV of above 80-90% [7].
However, the crux of our Diag+Med algorithm lies in its useability on heterogeneous aggregated EHR databases without requiring BP measurements, which is the primary objective of our study.Teixeira's phenotyping algorithms were developed using data from only one hospital cluster [7], in contrast to our Diag+Med algorithm, which was developed and tested on aggregated data from multiple hospital clusters in Singapore and did not require BP measurements.
Another key advantage of our Diag+Med algorithm is that it operates directly on raw EHR data without requiring the laborious process of conversion to a CDM, unlike other rule-based hypertension phenotyping algorithms [8].
This algorithm is a pioneering study in the use of Singapore's nationwide EHRs to phenotype patients.The strength of this study lies in the novelty and scale of the EHR data accessed (covering approximately 85% of all hospital admissions in Singapore [9]), as well as the large and varied cohorts used to evaluate the generalizability of the proposed algorithm.The algorithm validation involved a relatively large sample size of different patient profiles from multiple contributing healthcare institutions, showing that the algorithm is fit for use on diverse patient populations from different healthcare clusters.This facilitates the identification of patients with hypertension from aggregated data sources in Singapore without the need for additional harmonization or processing (such as conversion to a CDM).
Hypertension is a chronic condition that is usually managed on an outpatient basis [10]; therefore, physicians may not input hypertension as a diagnosis for an inpatient admission.This illustrates the importance of including patient medication data in the phenotyping of hypertension, as it considerably improves sensitivity without excessively sacrificing specificity.
Compared to the AF Cohort, the Diag+Med algorithm was better at distinguishing negative cases in the Random Cohort.This was likely due to the difference in patient profiles between the two cohorts, with the AF Cohort having a much higher underlying prevalence of hypertension.Higher hypertension prevalence in the AF Cohort resulted in lower specificity [11].Evidence also suggests that increased prevalence may result in a variance in sensitivity and specificity even though these measures are theoretically independent of prevalence, possibly due to other mechanisms [12].
The Diag+Med algorithm performed consistently over consecutive years (2019 and 2020, shown in Table 3), with all performance metrics (sensitivity, specificity, PPV, and NPV) varying less than 5% across the years for all validation cohorts.The data demonstrate the algorithm's stability throughout the 2020 period, indicating resilience to any potential impact caused by the COVID-19 pandemic.This is critical as it suggests that the algorithm can reliably function under varying conditions, a feature that is essential for real-world applications.
A sub-group analysis of the Diag+Med algorithm across gender and race was also conducted (Table A2).The algorithm performed consistently across genders and most races in Singapore.However, caution should be taken in interpreting the results of the sub-group analysis.The findings may be attributable to chance, especially since the validation study was not specifically designed to focus on these sub-group analyses, and certain sub-groups are relatively small (such as the Others ethnicity as listed in Table 1).
There are some limitations to the EHR database available to researchers.Due to the nature of the EHR database, which lacked vital sign readings such as BP readings, it was crucial to develop an algorithm that was able to phenotype hypertension without such data.There remains a need for an algorithm that can estimate the prevalence of hypertension and adjust risk estimates in epidemiological studies using the available data.Our algorithm is designed to work within these constraints, leveraging diagnostic codes and medication data to provide the best possible estimation of hypertension status in the absence of direct BP measurements.It is notable that in this study, BP readings were not needed to produce a hypertension algorithm with a good performance.
Our study was developed and validated on hospital inpatients.Due to the nature of our database, which contained unstructured notes from inpatient settings but not outpatient settings, our ability to carry out comprehensive chart reviews on non-hospitalized patients was limited.Inpatient cases would have their past medical history extensively documented in the discharge summary, but not outpatients; hence, it was not possible to carry out a robust algorithm validation on an outpatient study cohort.
Furthermore, the algorithm may not accurately predict patients who are followed up in private settings, such as by general practitioners (GPs) or in private hospitals, as their medications and outpatient visits are not available in the database.This likely contributed to the false negative cases in the algorithm's validation as there is no visibility of the patient data outside of their inpatient admissions.
In Singapore, some medications for hypertension are commonly prescribed for other conditions, such as heart failure and coronary heart syndrome (e.g., angiotensin II receptor blockers and ACE inhibitors) [13].This potentially contributes to the false positive rate in the algorithm, and further improvements to the algorithm should consider the presence of such comorbidities in addition to patients' medication lists.
The performance of the algorithm may vary over time as hypertension prescribing guidelines, coding practices, and EHR systems in public hospitals may change in the future.Caution must be taken to ensure that these underlying trends are stable before applying the hypertension algorithm to cohorts.

Data Sources
All available historical records were extracted from a database that contains aggregated, de-identified clinical data from all public healthcare institutions in Singapore.This database covers approximately 85% of all hospital admissions and over 40% of all chronic outpatient visits [8].The database did not undergo prior harmonization or processing (e.g., conversion to a CDM).Structured clinical data include patient demographics, diagnosis codes in SNOMED (Systematized Nomenclature of Medicine), ICD-10 (International Classification of Diseases, 10th Revision) formats, and dispensed medication records from both outpatient and inpatient settings.Unstructured clinical data include hospital discharge summaries and emergency department visit notes.
Diagnosis codes were extracted from two data tables: (1) diagnosis and (2) Patient Problem List (PPL).A Patient Problem List includes active issues with current management, background chronic conditions, and resolved past medical issues.Medications dispensed at the outpatient or inpatient discharge pharmacies from all contributing data centers were extracted from the medications table.
Data elements from structured and unstructured clinical data (such as discharge summaries and laboratory tests) were available for a chart review; however, vital sign readings such as blood pressure (BP) were not accessible.

Algorithm Development and Validation
The primary outcome of interest was defined as chronic arterial hypertension, with the exclusion of pulmonary hypertension, pre-eclampsia/gestational hypertension, ocular hypertension, peripheral venous hypertension, and portal hypertension.Diagnosis codes from the diagnosis and PPL tables, and medications from the medications table, were used to develop a combined diagnosis and medication data-based hypertension phenotyping algorithm, as shown in Figure 1 (Diag+Med algorithm).A diagnosis code-only algorithm, which exclusively relied on the presence of diagnosis codes, was used as a control (Diag-Only algorithm), as shown in Figure 2.

Data Sources
All available historical records were extracted from a database that contains aggregated, de-identified clinical data from all public healthcare institutions in Singapore.This database covers approximately 85% of all hospital admissions and over 40% of all chronic outpatient visits [8].The database did not undergo prior harmonization or processing (e.g., conversion to a CDM).Structured clinical data include patient demographics, diagnosis codes in SNOMED (Systematized Nomenclature of Medicine), ICD-10 (International Classification of Diseases, 10th Revision) formats, and dispensed medication records from both outpatient and inpatient settings.Unstructured clinical data include hospital discharge summaries and emergency department visit notes.
Diagnosis codes were extracted from two data tables: (1) diagnosis and (2) Patient Problem List (PPL).A Patient Problem List includes active issues with current management, background chronic conditions, and resolved past medical issues.Medications dispensed at the outpatient or inpatient discharge pharmacies from all contributing data centers were extracted from the medications table.
Data elements from structured and unstructured clinical data (such as discharge summaries and laboratory tests) were available for a chart review; however, vital sign readings such as blood pressure (BP) were not accessible.

Algorithm Development and Validation
The primary outcome of interest was defined as chronic arterial hypertension, with the exclusion of pulmonary hypertension, pre-eclampsia/gestational hypertension, ocular hypertension, peripheral venous hypertension, and portal hypertension.Diagnosis codes from the diagnosis and PPL tables, and medications from the medications table, were used to develop a combined diagnosis and medication data-based hypertension phenotyping algorithm, as shown in Figure 1 (Diag+Med algorithm).A diagnosis code-only algorithm, which exclusively relied on the presence of diagnosis codes, was used as a control (Diag-Only algorithm), as shown in Figure 2. A broad list of candidate SNOMED and ICD diagnosis codes for hypertension was first identified from ICD-9 AM, ICD-10, and SNOMED CT browsers.A frequency of use assessment was conducted to identify commonly used codes; diagnosis codes with fewer than 10 patients found in the database between 2018 and 2021 were removed.A hospital physician vetted the remaining diagnosis codes and descriptions to ensure their appropriateness for identifying chronic hypertension (Table A3).First-and second-line medications used to manage hypertension were shortlisted based on the American Heart Association's 2017 [14] and the Ministry of Health (Singapore)'s 2017 guidelines [9] (Table A4).Patients treated with beta blockers alone were not included.The diagnosis codes and medications listed in Tables A3 and A4 were used in the phenotyping algorithms.
For the Diag+Med algorithm (Figure 1), patients were categorized as hypertensive if they had any PPL or diagnosis table records with a diagnosis code found in Table A3.If the patient did not have any diagnosis codes in Table A3, the algorithm would look at the A broad list of candidate SNOMED and ICD diagnosis codes for hypertension was first identified from ICD-9 AM, ICD-10, and SNOMED CT browsers.A frequency of use assessment was conducted to identify commonly used codes; diagnosis codes with fewer than 10 patients found in the database between 2018 and 2021 were removed.A hospital physician vetted the remaining diagnosis codes and descriptions to ensure their appropriateness for identifying chronic hypertension (Table A3).First-and second-line medications used to manage hypertension were shortlisted based on the American Heart Association's 2017 [14] and the Ministry of Health (Singapore)'s 2017 guidelines [9] (Table A4).Patients treated with beta blockers alone were not included.The diagnosis codes and medications listed in Tables A3 and A4 were used in the phenotyping algorithms.
For the Diag+Med algorithm (Figure 1), patients were categorized as hypertensive if they had any PPL or diagnosis table records with a diagnosis code found in Table A3.If the patient did not have any diagnosis codes in Table A3, the algorithm would look at the medications prescribed to the patient.Patients were classified as hypertensive if they were prescribed one medication listed in Table A4, specifically an angiotensin-converting enzyme inhibitor (ACEi), angiotensin receptor blocker (ARB), dihydropyridine-calcium channel blocker (DHP-CCB), or thiazide diuretic.Alternatively, patients were also classified as hypertensive if they were prescribed a combination of any two of the following medications from Table A4: (a) ACEi or ARB with a beta blocker (BB), (b) ACEi or ARB with a DHP-CCB, (c) ACEi or ARB with a thiazide diuretic, (d) BB with a DHP-CCB, or (e) DHP-CCB with a thiazide diuretic.
For the Diag-Only algorithm (Figure 2), patients were categorized as hypertensive if they had any PPL or diagnosis table records with a diagnosis code found in Table A3.
The hypertension algorithms (Diag+Med and Diag-Only) were applied on two validation cohorts (Random Cohort and AF Cohort).These validation cohorts were constructed by sampling patients admitted to any public health institution between 2019 and 2020, as shown in Figure 3 fied as hypertensive if they were prescribed a combination of any two of the following medications from Table A4: (a) ACEi or ARB with a beta blocker (BB), (b) ACEi or ARB with a DHP-CCB, (c) ACEi or ARB with a thiazide diuretic, (d) BB with a DHP-CCB, or (e) DHP-CCB with a thiazide diuretic.
For the Diag-Only algorithm (Figure 2), patients were categorized as hypertensive if they had any PPL or diagnosis table records with a diagnosis code found in Table A3.
The hypertension algorithms (Diag+Med and Diag-Only) were applied on two validation cohorts (Random Cohort and AF Cohort).These validation cohorts were constructed by sampling patients admitted to any public health institution between 2019 and 2020, as shown in Figure 3.The Random Cohort consists of a random sample of inpatient admissions in 2019 or 2020 from any public health institution.Random sampling of the dataset was necessitated by the constraints of our available computational resources.The AF Cohort consists of inpatient admissions with a new diagnosis of atrial fibrillation (AF) in 2019 and 2020.The inclusion criteria for the AF Cohort were defined as a new onset of primary or secondary diagnoses of AF (an ICD-10 or SNOMED diagnosis code of atrial fibrillation) and the initiation of one of the drugs of interest (apixaban, rivaroxaban, or warfarin) within 2 days before the date of discharge.
The Random Cohort was used to assess the algorithm's generalizability and performance in a diverse patient population.Additionally, the AF Cohort was chosen due to the higher prevalence of hypertension within this group compared to the general inpatient population, providing a robust test for the algorithm's sensitivity in a group with higher prevalence.Trained annotators from the Health Sciences Authority independently assessed all sampled admissions from both validation cohorts via a chart review of the aggregated database.Only data that were recorded before or on the discharge date of the patient's inpatient admission episode were used.
A trial annotation run-in phase was conducted for annotators to practice annotating for hypertension on a common set of 200 patient charts (not included in this study) to assess potential variability in annotation accuracy.An excellent inter-annotator agreement of 0.89-0.99 was achieved on the 200 practice set records (pairwise Cohen's Kappa, Table The Random Cohort was used to assess the algorithm's generalizability and performance in a diverse patient population.Additionally, the AF Cohort was chosen due to the higher prevalence of hypertension within this group compared to the general inpatient population, providing a robust test for the algorithm's sensitivity in a group with higher prevalence.
Trained annotators from the Health Sciences Authority independently assessed all sampled admissions from both validation cohorts via a chart review of the aggregated database.Only data that were recorded before or on the discharge date of the patient's inpatient admission episode were used.
A trial annotation run-in phase was conducted for annotators to practice annotating for hypertension on a common set of 200 patient charts (not included in this study) to assess potential variability in annotation accuracy.An excellent inter-annotator agreement of 0.89-0.99 was achieved on the 200 practice set records (pairwise Cohen's Kappa, Table A5).Thereafter, independent (non-overlapping) annotations were carried out on all records in both AF and Random validation cohorts to develop a gold-standard label for each patient in the validation cohorts.

Statistical Analysis
Algorithm performance metrics (sensitivity, specificity, PPV, and NPV) were calculated by comparing the hypertension predictions of the algorithm with the gold-standard labels reviewed in the patient charts during annotation.Sensitivity was calculated by taking the proportion of confirmed hypertension cases that were predicted to be positive (true positives) out of all confirmed hypertension cases (true positives and false negatives).Specificity was calculated by taking the proportion of confirmed non-hypertension cases that were predicted to be negative (true negatives) out of all confirmed non-hypertension cases (true negatives and false positives).PPV was calculated by taking the proportion of confirmed hypertension cases that were predicted to be positive (true positives) out of all cases that were predicted positive (true positives and false positives).NPV was calculated by taking the proportion of confirmed non-hypertension cases that were predicted to be negative (true negatives) out of all cases that were predicted negative (true negatives and false negatives).The algorithm's performance metrics and Cohen's Kappa were calculated using Spyder (Python 3.8).

Conclusions
The development and validation of a hypertension phenotyping algorithm with high sensitivity and specificity were not only beneficial for identifying hypertensive patients in various clinical and pharmacoepidemiology studies, but they also demonstrated its effectiveness in the context of Singapore.This algorithm is particularly noteworthy as it can be successfully applied to national aggregated data sourced from diverse healthcare institutions across the country, without requiring harmonization or conversion to a CDM.This makes it a versatile and robust tool for the identification of patients with hypertension within the healthcare landscape in Singapore to facilitate risk estimate adjustments or quantitative bias analysis in epidemiological studies.
. The Random Cohort consists of a random sample of inpatient admissions in 2019 or 2020 from any public health institution.Random sampling of the dataset was necessitated by the constraints of our available computational resources.The AF Cohort consists of inpatient admissions with a new diagnosis of atrial fibrillation (AF) in 2019 and 2020.The inclusion criteria for the AF Cohort were defined as a new onset of primary or secondary diagnoses of AF (an ICD-10 or SNOMED diagnosis code of atrial fibrillation) and the initiation of one of the drugs of interest (apixaban, rivaroxaban, or warfarin) within 2 days before the date of discharge.

Table 1 .
Demographic profile of Atrial Fibrillation Cohort and Random Cohort.

Table 2 .
Comparison of Diag+Med algorithm performance with Diag-Only algorithm performance.

Table 2 .
Comparison of Diag+Med algorithm performance with Diag-Only algorithm performance.

Table 3 .
Performance of the Diag+Med algorithm on the AF Cohort and the Random Cohort.

Table A1 .
Demographic Profile of Atrial Fibrillation Cohort and Random Cohort, with breakdown by gender and race.

Table A2 .
Breakdown of Diag+Med algorithm performance compared with Diag-Only algorithm performance by year of admission, gender, and race.

Table A3 .
Diagnosis codes used in hypertension phenotyping algorithm.
* History of hypertension.

Table A4 .
Medications used in hypertension phenotyping algorithm.