Electrocardiographic Predictors of Mortality: Data from a Primary Care Tele-Electrocardiography Cohort of Brazilian Patients

Computerized electrocardiography (ECG) has been widely used and allows linkage to electronic medical records. The present study describes the development and clinical applications of an electronic cohort derived from a digital ECG database obtained by the Telehealth Network of Minas Gerais, Brazil, for the period 2010–2017, linked to the mortality data from the national information system, the Clinical Outcomes in Digital Electrocardiography (CODE) dataset. From 2,470,424 ECGs, 1,773,689 patients were identified. A total of 1,666,778 (94%) underwent a valid ECG recording for the period 2010 to 2017, with 1,558,421 patients over 16 years old; 40.2% were men, with a mean age of 51.7 [SD 17.6] years. During a mean follow-up of 3.7 years, the mortality rate was 3.3%. ECG abnormalities assessed were: atrial fibrillation (AF), right bundle branch block (RBBB), left bundle branch block (LBBB), atrioventricular block (AVB), and ventricular pre-excitation. Most ECG abnormalities (AF: Hazard ratio [HR] 2.10; 95% CI 2.03–2.17; RBBB: HR 1.32; 95%CI 1.27–1.36; LBBB: HR 1.69; 95% CI 1.62–1.76; first degree AVB: Relative survival [RS]: 0.76; 95% CI0.71–0.81; 2:1 AVB: RS 0.21 95% CI0.09–0.52; and RS 0.36; third degree AVB: 95% CI 0.26–0.49) were predictors of overall mortality, except for ventricular pre-excitation (HR 1.41; 95% CI 0.56–3.57) and Mobitz I AVB (RS 0.65; 95% CI 0.34–1.24). In conclusion, a large ECG database established by a telehealth network can be a useful tool for facilitating new advances in the fields of digital electrocardiography, clinical cardiology and cardiovascular epidemiology.


Introduction
Cardiovascular diseases are the main cause of mortality both worldwide and in Brazil, and are responsible for 31.2% of total deaths and a mortality rate standardized by age of 256.0 per 100,000 inhabitants [1]. The electrocardiogram (ECG) is a low-cost, easy-access and non-invasive exam used for cardiovascular assessment, and possesses both diagnostic and prognostic value. Epidemiological studies using the ECG began in the 1940s with the first cardiovascular cohorts [2]. However, ECG reports were very heterogeneous due to the lack of an established coding system appropriate to epidemiological and population-based studies [3]. The Minnesota Code [4] was created in 1960 to standardize ECG classification and enable comparison between different populations. In the following decades, many papers were published on the use of the ECG in population-based studies, showing the prognostic value of different electrocardiographic abnormalities [5][6][7][8][9][10][11].
Simultaneously, the evolution of computerized ECG and automated interpretation had a great impact on cardiovascular epidemiological studies [12,13]. Systems that are capable of transmitting electrocardiographic tracings over the Internet and software packages that enable automatic analysis and coding of tracings have revolutionized the electrocardiography of population-based studies, enhancing its applications and facilitating the study of large populations [14][15][16][17].
The identification of new electrocardiographic variables as predictors for cardiovascular events is an important objective of research among electronic cohorts, especially when performing ECG for population screening remains controversial [18,19] and the benefit of the traditional ECG markers with cardiovascular risk scores for discrimination and reclassification is questionable [18,20]. The use of new technologies such as artificial intelligence (AI) is a promising tool in this field for the recognition of potential non-traditional electrocardiographic risk factors.
Despite many studies on ECG abnormalities and their prognostic value having been published, their data are usually from cohorts that typically include hundreds or thousands of patients, or even from secondary care or an inpatient setting, resulting in very specific populations. Big data sets with over one million patients are relatively new, especially in the outpatient setting, and can provide more precise estimates of the risk related to each ECG abnormality in the community setting. This information should be useful for physicians in the primary care setting, and may help to support clinical decisions. Thus, the present study aims to describe the development and clinical applications of an electronic cohort, entitled the Clinical Outcomes in Digital Electrocardiography (CODE) study [21]. This cohort is derived from a digital ECG database obtained by the Telehealth Network of Minas Gerais (TNMG), Brazil [22], from 2010 to 2017, and linked to the mortality data from the national information system with more than 1.5 million patients.

Study Design
This study is based on a retrospective cohort of primary care patients from Minas Gerais, Brazil, whose ECGs were analyzed by the Telehealth Network of Minas Gerais (TNMG) cardiologists between 2010 and 2017. TNMG currently covers 817 of the 853 counties in Minas Gerais and nearly 400 in other Brazilian states. It has already acquired more than five million ECGs since its implementation [23].

Inclusion Criteria
Patients older than 16 years with 12-lead ECGs performed at TNMG between 2010 and 2017 were included in the study. For the specific analysis of ventricular pre-excitation, all age groups were included.

Exclusion Criteria
Isoelectric recordings and those with interference, reversal or poor positioning of electrodes, which compromised the analysis, were excluded (6.03%). For the analysis of electrocardiographic changes, patients who underwent more than one ECG had only the first exam analyzed; subsequent recordings were excluded (28.20%).

Data Collection
ECGs were performed by the local primary care professional using digital electrocardiographs manufactured by Tecnologia Eletrônica Brasileira model ECGPC (São Paulo, Brazil) or Micromed Biotecnologia model ErgoPC 13 (Brasilia,Brazil).
Clinical data (age, sex and comorbidities) were collected using a standardized questionnaire. Clinical conditions included self-reported smoking, hypertension, diabetes, dyslipidemia, Chagas disease, previous myocardial infarction and chronic obstructive pulmonary disease.
Specific software, developed in-house, was able to capture an ECG tracing, upload the ECG and the patient's clinical history, and then transmit the data to the TNMG analysis center via the internet. The clinical information, ECG tracings and reports were stored in a specific database. All data managed and transferred followed the national law for security and protection of the database. For the purpose of the present study, the Glasgow 12-lead ECG analysis program (license 28.4.1, approved for use on 16 June 2009) was used to automatically interpret all ECGs available in the database, exporting the diagnosis as interpreted by both Glasgow and Minnesota codes.

Data Analysis Major Electrocardiographic Abnormalities
The major electrocardiographic abnormalities included were atrial fibrillation (AF), right bundle branch block (RBBB), left bundle branch block (LBBB), first, second and third degree atrioventricular blocks (AVB) and ventricular pre-excitation [24].
ECGs were analyzed by a team of fourteen trained cardiologists using standardized criteria [24]. Each ECG was interpreted by only one cardiologist.
The ECG report was recorded as an unstructured free text. To recognize ECG abnormalities among these million reports, a computational linguistics program was used. First, the cardiologist's text was preprocessed by removing "stop-words" (such as: the, is, at, which and on) and generating n-grams, defined as a contiguous sequence of n items from a given sample of text or speech. Then, we used a self-supervised learning classification model based on artificial intelligence, using a recurrent neural network as a classifier [25,26], which was built with a 2800-sample dictionary manually created by specialists based on text from real diagnoses. The final report with the ECG abnormalities was obtained by imputing the classifier results for recognition of each ECG abnormality. The classification model was tested on 4557 medical reports manually labeled by two cardiologists with 80.7% positive predictive value, 94.3% sensitivity and 87.0% F1 score for AF; 86.1% positive predictive value, 95.4% sensitivity and 90.9% F1 score for RBBB; 91.4% positive predictive value, 86.0% sensitivity and 88.6% F1 score for LBBB; 75.6% positive predictive value, 93.5% sensitivity and 83.6% F1 score for AVB, and 96.7% positive predictive value, 96.7% sensitivity and 96.7% F1 score for ventricular pre-excitation [27]. F1 score is a measure of the model's accuracy and it is calculated from the positive predictive value and the sensitivity of the test.
The diagnosis of electrocardiographic abnormalities was accepted, without manual review, when there was agreement in the cardiologist's report with one of the automatic systems (Minnesota or Glasgow). The ECGs in which the abnormality was reported by the cardiologist only or by the two automatic systems were manually reviewed by trained staff (Figure 1). For LBBB and RBBB, 17,903 ECGs were revised, while for AVB 9038, AF 4343 and ventricular pre-excitation 1090 tracings were amended. This represents 1.3% of the total number processed, or 2.4 million ECGs.
Hearts 2021, 2, FOR PEER REVIEW 4 Figure 1. Diagram for ECG abnormality diagnosis. Concordance between the cardiologist's report and one of the automatic systems (Glasgow or Minnesota) was required for a diagnosis to be accepted without manual revision.

Probabilistic Linkage
The electronic cohort was obtained linking data from the ECG exams (name, sex, date of birth, city of residence) and those from the national mortality information system, using standard probabilistic linkage methods (FRIL: Fine-grained record linkage software, v.2.1.5, Atlanta, GA, USA) [21,28].

Statistical Analysis
Qualitative variables were described by frequency distribution. Data obtained from continuous quantitative variables were expressed as mean and standard deviation or median with interquartile range.
For the analysis of the electrocardiographic abnormalities, the time elapsed between the date of the electrocardiogram (index event) and the event of interest (date of death) was considered a dependent variable. The presence of the electrocardiographic abnor- Figure 1. Diagram for ECG abnormality diagnosis. Concordance between the cardiologist's report and one of the automatic systems (Glasgow or Minnesota) was required for a diagnosis to be accepted without manual revision.

Probabilistic Linkage
The electronic cohort was obtained linking data from the ECG exams (name, sex, date of birth, city of residence) and those from the national mortality information system, using standard probabilistic linkage methods (FRIL: Fine-grained record linkage software, v.2.1.5, Atlanta, GA, USA) [21,28].

Statistical Analysis
Qualitative variables were described by frequency distribution. Data obtained from continuous quantitative variables were expressed as mean and standard deviation or median with interquartile range.
For the analysis of the electrocardiographic abnormalities, the time elapsed between the date of the electrocardiogram (index event) and the event of interest (date of death) was considered a dependent variable. The presence of the electrocardiographic abnormality was an independent variable, along with the clinical characteristics of the population. The comparison group was patients without major electrocardiographic changes, which included both those with a normal ECG and those with all other abnormalities. Patients who did not present with an event of interest by the end of follow-up were censored, but were included in our analysis with follow-up time until the study's end date (September 2017).
The non-parametric Kaplan-Meier method was used to calculate survival. The level of statistical significance was defined for p values less than 0.05, calculated by the Log rank test. The Cox proportional regression multivariate model was used for all analyses, except for AVB, in which we used the Log-normal model, since the assumptions of the Cox model could not be achieved. Hazard ratio (HR) with 95% confidence interval was used for the ECG abnormalities analysis, except for the AVB survival analysis, in which relative survival risk (RS) was used. RS under 1 means lower survival rate, while RS over 1 means higher survival rate. Analyses were adjusted for age, sex and comorbidities. The R statistical program (version 3.4.3, Vienna, Austria) was used for all analyses.

CODE Cohort
From 2,470,424 ECGs, 1,773,689 patients were identified. A total of 1,666,778 (94%) underwent a valid ECG recording from 2010 to 2017, with 1,558,421 patients over 16 years old. Most patients were women (60.8%), and mean age was 51.6 (SD ±17.6) years. The overall mortality rate was 3.31% in a mean follow-up of 3.7 years. The clinical conditions of all adult patients and the prevalence of the studied abnormalities are described in Table 1.

Survival Analysis: ECG Abnormalities
All ECG abnormalities, with the exception of ventricular pre-excitation and second degree AVB Mobitz I, were associated with higher mortality for all causes. Patients with AF and LBBB were also at higher risk of cardiovascular mortality (Table 2, Figure 2).

Discussion
The resulting dataset has several potential applications, both for technical and ical-epidemiological studies. Previous studies from our group showed that ECG ab malities that are considered important, such as pre-excitation syndrome, have no p

Discussion
The resulting dataset has several potential applications, both for technical and clinicalepidemiological studies. Previous studies from our group showed that ECG abnormalities that are considered important, such as pre-excitation syndrome, have no prognostic impact in a community setting. On the other hand, the risk of dying for a person with RBBB is almost as high as with LBBB, the latter being considered a much stronger marker of risk in general cardiology practice [28][29][30]. Patients with AF were at a higher risk of mortality compared to the other abnormalities. First degree AVB was a more severe ECG abnormality than Mobitz I, which had a benign prognosis in this population. A 2:1 AVB in the 12-lead ECG was associated with 79% reduction of relative survival, probably indicating an infranodal block.
According to the World Health Organization, primary health care is an integral part of a country's health system, with a main focus on the social and economic development of the community [31]. Its essence is to treat people, not specific diseases and conditions. Actions related to health promotion and both primary and secondary prevention of cardiovascular diseases are necessary to improve collective health. In this context, the search for new features that are capable of predicting individual cardiovascular risk and, therefore, stimulating development of cost-effective preventive actions, is a matter of great importance.
Several tests, such as coronary calcium score, carotid and vertebral echodoppler, and serum measurement of ultrasensitive C-reactive protein have already been recommended for re-stratification of cardiovascular risk [32], although their cost-effectiveness is questionable [32], especially in the context of public health. On the other hand, an inexpensive and widely available exam, such as the ECG, could diagnose abnormalities such as AF, RBBB, LBBB and AVB that imply a higher risk for mortality regardless of age, sex or previous comorbidities.
Stratification of cardiovascular risk by ECG could be a potentially useful tool for clinical practice, especially in primary health care. Identifying the patient who will benefit most from tighter control of blood pressure, diabetes, and cholesterol levels may prevent cardiac events in the future. Electrocardiographic abnormalities draw attention to the potential severity of the patient's condition and the importance of more intensive treatment. In addition, they may help to rationalize and prioritize referrals to secondary or tertiary referral centers.
Electronic cohorts with a large amount of data are powerful sources for the development of population based studies, and, therefore, provide more strong evidence to be used in healthcare. Information on ECG parameters or abnormalities from big data sets [33] may have a major impact by distinguishing between benign and potentially life-threatening cardiac conditions. Each population has specific features, such as social, racial and lifestyle characteristics, that have an impact in their health [34]. Chagas disease, for example, is prevalent in Brazil and is associated with major ECG abnormalities [35], while it is very rare in United States and Europe.
AI in healthcare is the future pathway to managing big data from electronic cohorts. The development of machine learning (ML) models for disease prediction and diagnosis is in a state of exponential growth. In electrocardiography, AI algorithms have been extensively studied for both the automatic diagnosis of electrocardiographic alterations [36] as well as for the prediction of cardiovascular events and identification of new cardiovascular risk factors [37]. Estimation of age and sex by electrocardiographic tracing alone has also been demonstrated [38]. Furthermore, the isolated analysis of the 12-lead ECG can predict mortality within one year with good accuracy, even in tracings reported as normal [39]. AI can extract information from the electrocardiogram that is undervalued and/or unrecognized by conventional methods of analysis, adding diagnostic and prognostic value.
The CODE study is now also working with ML techniques. We found good performance of a deep neural network in the recognition of six ECG abnormalities [36]. In the field of prognosis and health promotion, the concept of an electrocardiographic age via AI, compared with the patient's biological age, is promising [40]. This new promising cardiac biomarker can summarize the individual electrocardiographic characteristics simply and intuitively. It has the potential to provide patients with accessible and understandable information about their cardiovascular risk. More of our results will soon be available and should highlight the importance of ECG epidemiological studies with both traditional and AI methods.
Our study has limitations. Data on comorbidities were self-reported, and thus might have been under-reported. The clinical data came from a predetermined questionnaire not tailored for this study. Therefore, some important variables with impacts on the cardiovascular prognosis, such as heart failure, were unavailable and not considered as comorbidities in the multivariate analysis. The AI classifier used for ECG report classification had good accuracy, sensitivity and positive predictive value, but can make errors. In order to minimize this problem, we included the automatic classification of Glasgow and Minnesota in the diagnostic algorithm. Furthermore, manual revision was done in more than 30,000 ECGs to confirm the presence of the ECG abnormality. The quality of the data from the national mortality information system varies according to region within the state of Minas Gerais; therefore, the information from the national mortality system is heterogeneous among the regions of Minas Gerais such that misclassification of the basic cause of death can occur. The probabilistic linkage also has some issues, such as less than perfect sensitivity and the possibility of false pairs. We defined a high cut off point (94 of 100) for true pairs and made manual revisions in doubtful cases.

Conclusions
Electrocardiographic markers are predictors of mortality in the TNMG population. AF, LBBB, RBBB and AVB are associated with a higher risk of death from all causes, regardless of age, sex and associated comorbidities. AF and LBBB are independent predictors of higher cardiovascular mortality. Ventricular pre-excitation and Mobitz I second-degree AVB are not associated with higher overall mortality. An electronic cohort with a large amount of ECG data can be a useful prognostic tool and provide a stimulus for future developments in the fields of digital electrocardiography, clinical cardiology and cardiovascular epidemiology.

Institutional Review Board Statement:
This study complied with all relevant ethical regulations. The CODE Study was approved by the Research Ethics Committee of the Universidade Federal de Minas Gerais, protocol 49368496317.7.0000.5149. Since this is a secondary analysis of anonymized data stored in the TNMG, informed consent was not required by the Research Ethics Committee for the present study. All researchers who deal with datasets signed terms of confidentiality and data utilization.

Informed Consent Statement: Not applicable.
Data Availability Statement: Researchers affiliated with educational or research institutions can make requests to access the datasets. Requests should be made to the corresponding author of this paper. They will be forwarded and considered on an individual basis by the Telehealth Network of Minas Gerais. The estimated time needed for data access requests to be evaluated is three months. If approved, any data use will be restricted to non-commercial research purposes. The data will only be made available on the execution of appropriate data use agreements.