Correlations between Resting Electrocardiogram Findings and Disease Profiles: Insights from the Qatar Biobank Cohort

Background: Resting electrocardiogram (ECG) is a valuable non-invasive diagnostic tool used in clinical medicine to assess the electrical activity of the heart while the patient is resting. Abnormalities in ECG may be associated with clinical biomarkers and can predict early stages of diseases. In this study, we evaluated the association between ECG traits, clinical biomarkers, and diseases and developed risk scores to predict the risk of developing coronary artery disease (CAD) in the Qatar Biobank. Methods: This study used 12-lead ECG data from 13,827 participants. The ECG traits used for association analysis were RR, PR, QRS, QTc, PW, and JT. Association analysis using regression models was conducted between ECG variables and serum electrolytes, sugars, lipids, blood pressure (BP), blood and inflammatory biomarkers, and diseases (e.g., type 2 diabetes, CAD, and stroke). ECG-based and clinical risk scores were developed, and their performance was assessed to predict CAD. Classical regression and machine-learning models were used for risk score development. Results: Significant associations were observed with ECG traits. RR showed the largest number of associations: e.g., positive associations with bicarbonate, chloride, HDL-C, and monocytes, and negative associations with glucose, insulin, neutrophil, calcium, and risk of T2D. QRS was positively associated with phosphorus, bicarbonate, and risk of CAD. Elevated QTc was observed in CAD patients, whereas decreased QTc was correlated with decreased levels of calcium and potassium. Risk scores developed using regression models were outperformed by machine-learning models. The area under the receiver operating curve reached 0.84 using a machine-learning model that contains ECG traits, sugars, lipids, serum electrolytes, and cardiovascular disease risk factors. The odds ratio for the top decile of CAD risk score compared to the remaining deciles was 13.99. Conclusions: ECG abnormalities were associated with serum electrolytes, sugars, lipids, and blood and inflammatory biomarkers. These abnormalities were also observed in T2D and CAD patients. Risk scores showed great predictive performance in predicting CAD.


Introduction
Resting electrocardiogram (ECG) is a valuable diagnostic tool used in clinical medicine to assess the electrical activity of the heart while the patient is at rest [1][2][3].ECG abnormalities directly indicate certain diseases, such as atrial fibrillation (AF) [4] and heart arrhythmias (ARs) [5].They tend to occur more often in patients with certain diseases (e.g., diabetes [6,7], coronary artery disease (CAD) [8], hypertension (HTN) [9]) compared to healthy people.
ECG abnormalities have been shown to be associated with metabolic syndrome and its components, with evidence changing by gender [10].They have also been associated with insulin-induced hypoglycemia [11,12] and abnormal serum electrolyte levels.Deviations in the concentrations of extracellular potassium, calcium, and magnesium have the potential to disrupt the myocyte membrane potential gradients and modify the cardiac action potential [13].Many of these results were obtained in disease cohorts and not in general populations.
Risk scores using ECG parameters have been proposed to predict various outcomes, including mortality [14], sudden cardiac death in the general population [15], and cardiovascular disease and its subclinical phenotypes [16][17][18].These risk scores can be combined with traditional clinical risk scores that are used to predict cardiovascular diseases (CVD) (e.g., QRISK3) [19] and type 2 diabetes (T2D) (e.g., FINDRISC) [20].Accurate disease prediction and early detection are important for better prevention and management of CAD and T2D.
Most findings about ECG association with various diseases and performance of risk scores were determined using cohorts of individuals of European descent.To the best of our knowledge, ECG was never studied in the Middle Eastern and North African (MENA) region.In this study, we evaluated the association between ECG traits and (1) diseases such as T2D and CAD, (2) serum electrolytes, (3) blood and inflammatory biomarkers (e.g., red blood cells, white blood cells, C-reactive protein, etc.), and (4) CVD/T2D risk factors (e.g., LDL-C, insulin, etc.) in the Qatar Biobank (QBB) dataset.We used the 12-lead resting ECG data of 13,827 subjects from QBB, a self-reported questionnaire, biochemical markers (serum electrolytes and CVD/T2D risk factors (sugars/lipids/BP)), and available electronic medical records (EMRs) of 8308 subjects to extract disease information (e.g., CAD, AF, etc.).

Study Cohort
The dataset consisted of 13,827 Qatari individuals.QBB collected all samples and generated the phenotypic data [21].Personal referrals from family, friends, social media, and the QBB's website were used to recruit participants.Deep phenotyping was performed at QBB facilities.Participants filled out a standardized questionnaire presenting information about lifestyle, nutrition, and medical history.Blood, saliva, and urine were collected and kept in liquid nitrogen at -80 • C. To ensure informed permission from all participants, the research study protocol's ethical approval was obtained from the Hamad Medical Corporation Ethics Committee (Protocol No. MRC-03-20-097) and QBB Institutional Review Board (IRB) (Protocol No. E-2019-QF-QBB_RES-ACC-0153-0103) in 2020 and renewed on an annual basis.

ECG Data
Resting 12-lead ECG was performed using Mortara Eli 350 or 380 automated system (Welch Allyn, Skaneateles Falls, New York, NY, USA) to collect data from participants who were required to rest for 2 min.The ECG data were collected following 10 s ECG recording with an interval of 1 min between each recording, three times [21].Measures were automatically recorded, including the RR interval, PR interval, QRS duration, and corrected QT interval (QTc).These measures were averaged over the 3 recordings and used for analysis.Two additional variables were calculated: P wave (PW) was calculated as max (Offset-Onset) over the three time points, and JT interval as "average QTc-average QRS".ECG data were available for 13,827 participants.

Statistical Analysis
Association analysis was performed between the 6 ECG traits (RR, PR, QRS, QTc, PW, and JT) and each of the demographics, serum electrolytes, sugars/lipids, blood and inflammatory, and clinical/disease traits.Linear regression was applied, adjusting for sex, age, and BMI, except when testing the association with demographic variables.Risk scores were developed to predict CAD using a variety of models that included the following as predictors: demographics only, ECG traits only, serum electrolytes only, sugars/lipids only, and clinical/disease traits that are known to be risk factors for CAD.A global model was developed, including all traits from the previous categories.The risk scores were created by splitting the full data into training and testing datasets (70% vs. 30%, respectively).Multivariate logistic regression and xgboost machine-learning models were performed.In the multivariate regression model (RS mult ), the regression effect sizes were recorded and used to build the risk score as a weighted sum (effect size × predictors).Only predictors that had a p < 0.05 were included in the risk score calculation.Xgboost risk score (RS xgboost ) included all features/predictors and used the following parameters: max_depth = 2, gamma = 0, max_delta_step = 0, lambda = 1, eta = 0.001, nthread = 8, and nrounds = 4000 in the xgboost R package (https://cran.r-project.org/web/packages/xgboost/index.html,accessed on 1 September 2023).The performance of each score was evaluated using logistic regression between the disease and the risk score.OR per 1 SD increase, the area under the receiver operating curve (AUC), and OR for the top decile (OR decile ) vs. the remaining deciles were reported as performance metrics in the testing dataset.

Results
The cohort characteristics and summary of the ECG traits before normalization are shown in Table 1.The total number of subjects was 13,827, of which 44.46% were male.The cohort was relatively young, with an average age of 40.12 ± 13.11.The average BMI was 29.6 ± 6.16.2).RR, PR, QRS, and PW were significantly elevated in females, with QRS being the most significant ECG trait.QTc and JT were significantly decreased in females (Table 2).All ECG traits except for RR were positively correlated with age and BMI (Table 2).Age showed the most significant association with QTc (p = 3.4 × 10 −218 ) and BMI with PW (p = 9.1 × 10 −222 ) (Table 2).Ancestry, as inferred using genetic data, was associated with ECG traits.Only JT and RR did not show significant associations using the Bonferroni significance level (α = 0.05/6 = 0.008) (Figure 1).PR was the ECG trait with the largest differences between ancestral groups (p = 7.99 × 10 −19 ; Figure 1).PR was the largest in individuals with African origins (average-167 ms) and lowest in individuals with South Asian origins (average-159 ms).QRS was the largest in South-Asian-origin individuals (average-94 ms) and the lowest in African-origin individuals (average-91 ms) (Figure 1).QTc was also the highest in South-Asian-origin individuals.Although differences were observed between ancestral groups, the magnitude of these differences may not be of clinical relevance.The sex variable was coded as 1 for females and 0 for males.Negative effect size means a decrease in ECG traits in females.

Serum Electrolytes
With Bonferroni significance (0.05/(7 × 6) = 1.1 × 10 −3 ), RR was significantly associated with all tested electrolytes except potassium (Figure 2).Electrolytes were associated with an increase in RR, except for calcium (Figure 2).PR was only associated with phosphorus and bicarbonate, showing a positive correlation (Figure 2).QRS was associated with potassium (negative correlation) and phosphorus and bicarbonate (positive correlations) (Figure 2).QTc was significantly decreased with potassium and calcium but increased with phosphorus and magnesium (Figure 2).PW did not show significant associations at Bonferroni levels, while JT was significantly elevated with phosphorus and decreased with potassium and calcium (Figure 2).

Serum Electrolytes
With Bonferroni significance (0.05/(7 × 6) = 1.1 × 10 −3 ), RR was significantly associated with all tested electrolytes except potassium (Figure 2).Electrolytes were associated with an increase in RR, except for calcium (Figure 2).PR was only associated with phosphorus and bicarbonate, showing a positive correlation (Figure 2).QRS was associated with potassium (negative correlation) and phosphorus and bicarbonate (positive correlations) (Figure 2).QTc was significantly decreased with potassium and calcium but increased with phosphorus and magnesium (Figure 2).PW did not show significant associations at Bonferroni levels, while JT was significantly elevated with phosphorus and decreased with potassium and calcium (Figure 2).), all sugars/lipids traits except LDL-C were associated with RR (Figure 3).Only HDL-C showed a positive correlation

Risk Score Performance to Predict CAD
Risk score results are presented in Table 3.The RS mult for the demographic variables was performed twice.In the first model, only sex and age were the variables that contributed to the risk scores, while BMI and ancestry were not significant.The OR was 3.84 (95% CI [3.21, 4.59], p = 3.55 × 10 −49 ) and AUC was 0.84.However, this performance is due to the data collection process and the unbalanced distribution of age and sex for CAD vs. control subjects (mean age in CAD = 54.46 vs. 39.57 in controls; 39% of CAD patients were females, whereas 56% were females in controls).In the second RS mult model, ).Most importantly, the OR decile for xgboost was 13.99, which was substantially higher than the global RS mult (i.e., 9.57).The effect sizes for the multivariate regression and variable importance for xgboost are shown in Supplementary Table S1.Finally, we performed a risk score analysis excluding AF patients from the dataset, but the obtained performance did not change (data not shown).

Discussion
In this study, we performed association analysis between ECG traits (RR, PR, QRS, QTc, PW, and JT) and several clinical biomarkers and diseases.We used the QBB dataset of 13,827 participants with available ECG data.We tested three types of biomarkers: serum electrolytes (chloride, magnesium, potassium, sodium, calcium, phosphorus, and bicarbonate), sugars/lipids (glucose, HbA1C, insulin, LDL-C, HDL-C, total cholesterol, and TG), and blood and inflammatory (eosinophil, basophil, lymphocyte, monocyte, neutrophil, red blood cells, white blood cells, and C-reactive protein).The clinical and disease traits that were tested with ECG traits were AF, AR, CM, T2D, CAD, smoking, hyperthyroidism, stroke, SBP, and DBP.This is the first and largest such a study in the Middle East and North Africa region.Participants in the QBB cohort had Arab, African, and South Asian origins.The summary of significant associations and their directions with ECG traits is shown in Figure 6.
Serum electrolyte imbalances have been reported to be associated with ECG abnormalities and cardiac arrhythmias [22,23].Mild hyperkalemia was associated with a narrow QTc interval [23].This is concordant with what was observed in our study.QTc was inversely and significantly associated with potassium levels.However, potassium levels were not associated with PR, as previously discussed [13,23].Consistent with the literature, lower calcium levels (hypocalcemia) were associated with prolongation of QTc [22,24].Lower levels of calcium were also associated with elevated JT and RR.Heart rate, which is inversely proportional to RR, was previously shown to correlate with lower levels of calcium [25].In our study, heart rate increase was associated with a rise in all serum electrolytes (except for calcium), with bicarbonate and chloride being the most statistically significant.Phosphorus rise was associated with prolonged QTc.In the Third National Health and Nutrition Survey (NHANES III) and the Atherosclerosis Risk in Communities (ARIC) study, phosphorus was positively associated with longer QTc, which is consistent with our data [24].However, in patients undergoing hemodialysis, longer QTc was associated with lower phosphorus serum levels [26].This suggests that the associations we identified may be valid for relatively healthy participants, and different relationships may be observed depending on the presence of certain diseases.
Sugars, lipids, and BP traits are known risk factors for CVD and T2D.CVD and T2D can lead to cardiac arrhythmias [27].AF, which is the most common form of arrhythmias, is associated with a range of CVD [28].In our study, all sugars/lipids/BP were associated with heart rate except LDL-C.An increase in these traits was associated with an increase in heart rate, except for HDL-C, which was negatively correlated with heart rate.In a recent study with a relatively small sample size, a positive correlation was observed between heart rate and HDL-C, LDL-C, BMI, and TG [29].Our results should be more accurate because of our larger sample size (40× higher) and thus higher statistical power.The good cholesterol HDL-C is considered a protective factor against CVD, and it is expected that it has an impact on decreasing heart rate.QTc was only associated with BP traits where the correlation was positive (higher blood pressure associated with longer QTc).Elevations in SBP and DBP can disrupt ventricular repolarization, leading to the prolongation of the QT interval [30].Prolonged PR intervals increase susceptibility to AF [31,32].It was previously shown that enhanced PI3K activation reduced PR intervals in cross-bred transgenic mice [33].Therefore, PI3K activation by insulin may avert AF and improve cardiac rhythm [31].Our results add evidence to this hypothesis, where we showed that an increase in insulin and HbA1C levels was associated with reduced PR intervals.Serum electrolyte imbalances have been reported to be associated with ECG abnormalities and cardiac arrhythmias [22,23].Mild hyperkalemia was associated with a narrow QTc interval [23].This is concordant with what was observed in our study.QTc was inversely and significantly associated with potassium levels.However, potassium levels were not associated with PR, as previously discussed [13,23].Consistent with the literature, lower calcium levels (hypocalcemia) were associated with prolongation of QTc [22,24].Lower levels of calcium were also associated with elevated JT and RR.Heart rate, which is inversely proportional to RR, was previously shown to correlate with lower levels of calcium [25].In our study, heart rate increase was associated with a rise in all serum electrolytes (except for calcium), with bicarbonate and chloride being the most statistically significant.Phosphorus rise was associated with prolonged QTc.In the Third National Health and Nutrition Survey (NHANES III) and the Atherosclerosis Risk in Communities (ARIC) study, phosphorus was positively associated with longer QTc, which is consistent with our data [24].However, in patients undergoing hemodialysis, longer QTc was associated with lower phosphorus serum levels [26].This suggests that the associations we identified may be valid for relatively healthy participants, and different relationships may be observed depending on the presence of certain diseases.
Sugars, lipids, and BP traits are known risk factors for CVD and T2D.CVD and T2D can lead to cardiac arrhythmias [27].AF, which is the most common form of arrhythmias, is associated with a range of CVD [28].In our study, all sugars/lipids/BP were associated with heart rate except LDL-C.An increase in these traits was associated with an increase in heart rate, except for HDL-C, which was negatively correlated with heart rate.In a recent study with a relatively small sample size, a positive correlation was observed between heart rate and HDL-C, LDL-C, BMI, and TG [29].Our results should be more accurate because of our larger sample size (40× higher) and thus higher statistical power.The good cholesterol HDL-C is considered a protective factor against CVD, and it is expected PR interval was negatively associated with red blood cells, neutrophils, and C-reactive protein.The decrease in any of these biomarkers may lead to a prolonged PR.The decrease in RR due to an increase in red blood cells, neutrophils, and C-reactive protein resulted in only a PR interval decrease, while QRS, QTc, PW, and JT remained relatively unchanged.The other types of white blood cells were not associated with any ECG trait.The increase in red blood cells is expected to be associated with an increase in heart rate (decrease in RR).When tissues receive insufficient oxygen, the body may attempt to compensate by increasing the heart rate to pump more oxygenated blood to the tissues.
ECG alterations have been previously observed in T2D, CVD, CM, and other diseases [34].For example, long QTc, QT dispersion, and left ventricular hypertrophy may be observed in T2D patients [35].In our study, PR, RR, and PW intervals were the only ECG variables that decreased in T2D patients.In CAD patients, QRS and QTc were the only ECG traits that showed significant associations.An increase in QRS and QTc was observed in CAD patients.Like CAD, CM patients showed higher QRS and QTc levels.As expected, the increase in QRS and QTc was greater in CM patients than in CAD patients.AF patients, despite their small numbers, showed a significant decrease in PW intervals.Long QRS and QTc were found to be among the strongest predictors of CAD events in postmenopausal women [36] concordant with what was observed in our study, which generalizes this finding to the general population (not only postmenopausal women).These two variables were also identified as the dominant mortality predictors [36].They can be used clinically to improve the prognosis of CAD patients.
The risk prediction of CAD using various well-established risk factors is important for early detection and prevention.One commonly used risk score is QRISK3 [19].Additional risk factors can improve the performance of risk scores for CAD.ECG traits were previously used to predict the level of coronary artery calcium, and they provided good performance [16].ECG abnormality risk scores were also shown to predict mortality risk in the elderly [37].Recently, a deep learning model was developed using 12-lead ECGs and predicted 5-year atherosclerotic disease with an AUC of 0.67 [38].Our results showed a good predictive power for CAD using ECG traits only (i.e., AUC = 0.66).OR for the top decile compared to the remaining deciles was 3.76 for the model that includes ECG traits only, which means a 3.76-fold risk increase in people with the highest risk score values.The performance of the risk scores developed using serum electrolytes, sugars/lipids, or clinical/disease traits all outperformed the ECG-based risk score.The global risk score, which showed the greatest predictive performance (AUC = 0.81 and OR decile = 9.57), contained T2D, stroke, SBP, DBP, RR, PR, QRS, QTc, magnesium, potassium, smoking, and HbA1C.The machine-learning model, xgboost, outperformed the multivariate logistic regression (AUC = 0.84 and OR decile = 13.99).The model included serum electrolytes, sugars/lipids, demographics, and clinical/disease risk factors.Both multivariate and xgboost models can be easily used in clinical settings.However, it is important to validate our developed risk scores in independent cohorts from the same region.Finally, the validation and utility of integrating ECG risk scores, genetic risk scores, and clinical scores remains to be seen in future longitudinal studies for CAD and other diseases.
This study has a few limitations.The sample size for diseases (AF, CM, AR, CAD, and stroke) is small.In the risk score analysis, ideally, the cases and controls should be ageand sex-matched, which was not the case in our study.Selecting a subset of controls to match our cases would have reduced the sample size drastically, especially for the disease categories.Since this is a retrospective study, the validation of the risk scores needs further investigation using longitudinal datasets.

Conclusions
Our study is the largest and the first study to investigate ECG trait associations with sugars, lipids, BP, blood and inflammatory biomarkers, CAD, T2D, and arrhythmias in a Middle Eastern cohort.Significant associations were identified with different ECG traits.RR was the ECG trait that showed significant associations with the highest number of variables.T2D, HbA1C, and triglycerides showed the largest negative effect size with RR.Importantly, QTc was shown to be longer in CAD patients but showed a negative correlation with calcium levels.Interestingly, the JT interval, part of the QTc interval, was negatively associated with calcium levels and smoking.Risk scores for CAD showed great predictive power.They included ECG traits, demographics, serum electrolytes, sugars, lipids, and clinical and disease traits (T2D, HbA1C, TC, QRS, SBP, QTc, smoking, glucose, RR, potassium, LDL-C, PW, TG, BMI, DBP, insulin, stroke, HDL-C, and PR).Implementation of these scores in clinical practice should help in setting tailored prevention and treatment plans for everyone.manuscript and approved it.M.S. supervised the study.All authors have read and agreed to the published version of the manuscript.

Figure 1 .
Figure 1.Variability in ECG traits by ancestry.Q-AFR: Qatari citizens with African origins; Q-SAS: Qatari citizens with South Asian origins (Iran and India); Arab: Qatari citizens with Arab origins spanning the Gulf and Middle East region.

Figure 1 .
Figure 1.Variability in ECG traits by ancestry.Q-AFR: Qatari citizens with African origins; Q-SAS: Qatari citizens with South Asian origins (Iran and India); Arab: Qatari citizens with Arab origins spanning the Gulf and Middle East region.J. Clin.Med.2024, 12, x FOR PEER REVIEW 6 of 15

Figure 2 .
Figure 2. Associations between ECG traits and serum electrolytes.Colors are proportional to the effect size of each regression model.Red colors represent a negative correlation, while green colors represent positive ones.The numbers in each cell are the p-values.The underlined p-values are significant with Bonferroni threshold.3.1.3.Sugars/Lipids With Bonferroni significance (0.05/(7 × 6) = 1.1 × 10 −3 ), all sugars/lipids traits except LDL-C were associated with RR (Figure 3).Only HDL-C showed a positive correlation

Figure 2 .
Figure 2. Associations between ECG traits and serum electrolytes.Colors are proportional to the effect size of each regression model.Red colors represent a negative correlation, while green colors

Figure 3 .
Figure 3. Associations between ECG traits and sugars/lipids.Colors are proportional to the effect size of each regression model.Red colors represent a negative correlation, while green colors represent positive ones.The numbers in each cell are the p-values.HDL-C: high-density lipoprotein cholesterol; LDL-C: low-density lipoprotein cholesterol; TC: total cholesterol; TG: triglyceride.The underlined p-values are significant with Bonferroni threshold.

Figure 3 .
Figure 3. Associations between ECG traits and sugars/lipids.Colors are proportional to the effect size of each regression model.Red colors represent a negative correlation, while green colors represent positive ones.The numbers in each cell are the p-values.HDL-C: high-density lipoprotein cholesterol; LDL-C: low-density lipoprotein cholesterol; TC: total cholesterol; TG: triglyceride.The underlined p-values are significant with Bonferroni threshold.

Figure 5 .
Figure 5. Associations between ECG traits and clinical/disease traits.Colors are proportional to the effect size of each regression model.Red colors represent a negative association, while green colors represent positive ones.The numbers in each cell are the p-values.Cases were coded as 1 and controls as 0. AF: atrial fibrillation; AR: arrhythmia; CAD: coronary artery disease; T2D: type 2 diabetes; HyperThyroid: hyperthyroidism.The underlined p-values are significant with Bonferroni threshold.

J 15 Figure 6 .
Figure 6.Summary of all Bonferroni-significant associations.Circle size is proportional to the effect size.The value of effect sizes is shown above the circles.

Figure 6 .
Figure 6.Summary of all Bonferroni-significant associations.Circle size is proportional to the effect size.The value of effect sizes is shown above the circles.

Table 1 .
Cohort characteristics and variable distributions.

Table 2 .
Associations between ECG traits and demographic data.

Table 3 .
Multivariate regression and machine-learning risk score models to predict CAD using several combinations of risk factors.