Deep Learning Algorithms for Estimation of Demographic and Anthropometric Features from Electrocardiograms

The electrocardiogram (ECG) has been known to be affected by demographic and anthropometric factors. This study aimed to develop deep learning models to predict the subject’s age, sex, ABO blood type, and body mass index (BMI) based on ECGs. This retrospective study included individuals aged 18 years or older who visited a tertiary referral center with ECGs acquired from October 2010 to February 2020. Using convolutional neural networks (CNNs) with three convolutional layers, five kernel sizes, and two pooling sizes, we developed both classification and regression models. We verified a classification model to be applicable for age (<40 years vs. ≥40 years), sex (male vs. female), BMI (<25 kg/m2 vs. ≥25 kg/m2), and ABO blood type. A regression model was also developed and validated for age and BMI estimation. A total of 124,415 ECGs (1 ECG per subject) were included. The dataset was constructed by dividing the entire set of ECGs at a ratio of 4:3:3. In the classification task, the area under the receiver operating characteristic (AUROC), which represents a quantitative indicator of the judgment threshold, was used as the primary outcome. The mean absolute error (MAE), which represents the difference between the observed and estimated values, was used in the regression task. For age estimation, the CNN achieved an AUROC of 0.923 with an accuracy of 82.97%, and a MAE of 8.410. For sex estimation, the AUROC was 0.947 with an accuracy of 86.82%. For BMI estimation, the AUROC was 0.765 with an accuracy of 69.89%, and a MAE of 2.332. For ABO blood type estimation, the CNN showed an inferior performance, with a top-1 accuracy of 31.98%. For the ABO blood type estimation, the CNN showed an inferior performance, with a top-1 accuracy of 31.98% (95% CI, 31.98–31.98%). Our model could be adapted to estimate individuals’ demographic and anthropometric features from their ECGs; this would enable the development of physiologic biomarkers that can better reflect their health status than chronological age.


Introduction
The electrocardiogram (ECG) is a noninvasive piece of diagnostic equipment that records physiological activities over time. It has become the standard tool for diagnosing arrhythmia and ischemic heart disease. Furthermore, an ECG analysis can also aid in the detection of electrolyte abnormalities such as hyperkalemia [1]. ECGs can be collected 2 of 18 from homes [2,3], portable devices [4,5], and hospitals, expanding their applicability in various situations.
With recent developments in technology and computing power, deep learning algorithms are being used in medicine for disease diagnosis and prognostic stratification [6]. Deep learning algorithms are a type of artificial intelligence that can recognize complex patterns and features in large datasets during the learning process. Due to this capability, these algorithms can be applied to various fields to facilitate data analysis and decisionmaking processes. They are primarily applied in imaging modalities, such as magnetic resonance imaging, computed tomography, plain radiography, ultrasonography [7], pathology imaging [8], and clinical imaging of dermatologic disorders. In contrast, deep learning can be applied not only to high-dimensional images but also to low-dimensional data. It is also utilized in the analysis of other biosignals that measure microcurrents within the body, such as ECGs [9,10], electroencephalograms (EEGs), and electromyograms (EMGs). These signals provide important information about the activity of the brain and muscles, respectively. The application of ECGs to deep learning algorithms has been extensively attempted in the past few years [11,12]. Most studies have focused on processing ECG signals, such as feature extraction or noise reduction, or mainly on classifying abnormalities or types of single heartbeats [13][14][15]. Recently, beyond the existing analysis, various approaches and attempts have been made to determine whether deep learning algorithms can predict diseases using ECGs [16][17][18].
Such an approach enables the utilization of various ECG features that have not previously been identified through the human eye or the traditional rule-based approach for interpreting ECGs, leading to more accurate predictions in a broader array of diseases [19]. We postulated that being able to predict subjects' demographic and anthropometric features from their ECGs would enable the development of physiologic biomarkers that can better reflect their health statuses than chronological age [20], as well as reduce various types of human errors that can arise while entering subject information during ECG acquisition [21,22]. Some studies have reported significant differences in QRS duration depending on age [23]. QT interval characteristics, ST segment, and others can contribute to gender identification [24]. A pathological basis can be seen to characterize an individual through ECG changes. Previous studies have utilized artificial intelligence to predict demographic characteristics such as age and sex based on ECG data [25]. While there have been studies that have identified an association between BMI and ECG components, none have employed regression and classification techniques [26,27]. To our knowledge, there are currently no studies on predicting the ABO blood type using ECG data. In light of these gaps in the literature, we designed an artificial intelligence prediction model to address these issues and provide a novel approach to ECG-based prediction. Our study aims to provide new insights and references for predicting demographic and anthropometric features using ECG data.
This study aimed to develop and validate a deep learning model that uses ECGs to predict various demographic and anthropometric features, such as age, sex, ABO blood type, and body mass index (BMI). The objective of this study was to develop and validate a deep learning model that utilizes electrocardiograms (ECGs) for predicting several demographic and anthropometric features, including age, sex, type, body mass index (BMI), and ABO type. To achieve this goal, the study employed ECG big data from Yonsei University's Wonju Severance and deep learning algorithms. The findings of this study confirm our assumptions and demonstrate the potential for evaluating individuals in greater detail. This approach offers a comprehensive assessment of an individual's overall health in the medical field. Additionally, precise predictions of demographic and anthropometric features can prevent long-term or potential heart disease by assessing ECG function and reducing errors. The findings of previous studies demonstrate that it is possible to evaluate individuals in greater detail, which can be a valuable approach in the medical field. By taking into account an individual's overall health status, an accurate prediction of demographic and anthropometric features can be achieved through an ECG analysis. This can

Dataset
For the age, sex, and ABO blood type prediction models, subjects with at least one previous instance of ABO blood typing were included in the dataset (Figure 1). When multiple ECGs were available for a single subject, only the initial ECG was used to train

Dataset
For the age, sex, and ABO blood type prediction models, subjects with at least one previous instance of ABO blood typing were included in the dataset (Figure 1). When multiple ECGs were available for a single subject, only the initial ECG was used to train and validate the models. Exceptional cases, such as changes in sex or blood type due to hematopoietic stem cell transplantation during the study period, were not considered for data curation. Unlike the other variables, the BMI was only available for some subjects, as the information was only collected when performing cardiac ultrasonography. Therefore, in most cases, the BMI and ECG measurements were within one month (31 days). When multiple ECGs were available for a single subject, the ECG closest to the BMI measurement was used to train and validate the BMI prediction model.
For all tasks, the dataset was partitioned into a training set (40%), validation set (30%), and test set (30%). Partitioning was performed so that the class balance was kept consistent for the respective task of each deep learning model. Partitioning was performed on an individual subject basis, and there was no overlap between the sets. The training and validation sets were used to determine and tune the parameters while training the model. Each model with the highest performance in the validation set was evaluated for its final performance using a test set.

Classification and Regression Task
The following four variables were to be predicted by the models (Figures 2 and 3): age, sex, ABO blood type, and BMI. The prediction model used two independent models: classification and regression. Binary classification was conducted to verify that the classification model could classify subjects according to variations, and the regression model was used to perform the prediction. For the binary classification of age, 40-a numerical value that generally divides young and older subjects-was used as a cut-off, and for the binary classification of BMI, 25-a criterion for dividing obesity-was used as a cut-off. The age prediction model was comprised of two models: a classification model that performed a binary classification of the subjects into those aged below 40 years and those aged 40 years or above, and a regression model that directly predicted the subjects' ages. The sex prediction model was a classification model that classified the subjects into male and female. The ABO blood type prediction model was a multiclass classification model that classified the subjects into A, B, AB, and O blood types. The BMI prediction model was also comprised of two models: a classification model that classified the subjects into those with BMIs below 25 kg/m 2 and those with BMIs 25 kg/m 2 or above according to the Korean diagnostic criteria for obesity [30], and a regression model that directly predicted the subjects' BMI.

Classification and Regression Task
The following four variables were to be predicted by the models (Figures 2 and 3): age, sex, ABO blood type, and BMI. The prediction model used two independent models: classification and regression. Binary classification was conducted to verify that the classification model could classify subjects according to variations, and the regression model was used to perform the prediction. For the binary classification of age, 40-a numerical value that generally divides young and older subjects-was used as a cut-off, and for the binary classification of BMI, 25-a criterion for dividing obesity-was used as a cut-off. The age prediction model was comprised of two models: a classification model that performed a binary classification of the subjects into those aged below 40 years and those aged 40 years or above, and a regression model that directly predicted the subjects' ages. The sex prediction model was a classification model that classified the subjects into male and female. The ABO blood type prediction model was a multiclass classification model that classified the subjects into A, B, AB, and O blood types. The BMI prediction model was also comprised of two models: a classification model that classified the subjects into those with BMIs below 25 kg/m 2 and those with BMIs 25 kg/m 2 or above according to the Korean diagnostic criteria for obesity [30], and a regression model that directly predicted the subjects' BMI.

Deep Learning Method
The deep learning model was based on the convolutional neural network (CNN) model widely used in imaging, and the layers have been modified to enable the ECG to be learned. Because the ECG used as an input was a one-dimensional (1D) signal, the amount of information on the pattern was relatively small compared to the image. Therefore, the model was designed to be simpler. The convolutional and pooling layers of the model were used to extract features from the ECG data. In other words, these layers enabled the extraction of ECG features. The fully connected layer was responsible for generating the final feature by calculating the representative values for each dimension at the end of the process. The model was constructed using three primary blocks and a fully connected layer. Each of the primary blocks contained four layers, which included a 1D convolution layer (with a kernel size of 5), a batch normalization layer, the ReLU activation function, and a 1D max pooling layer. The 12-lead models took 12 (leads) × 10 (seconds) × 500 (Hz) ECG data as input and extracted their features through the main block. In contrast, the 1-lead models took lead II (excerpt from standard 12-lead supine ECGs) as a single input lead × 10 (seconds) × 500 (Hz). The extracted features were then processed through the pooling and flatten layers, and they became a feature map with a form of 192 × 1. The features then passed through a fully connected layer consisting of three layers, where each layer had 192, 64, and 32 nodes. There were two output nodes in the classification models and one output node in the regression models. The final output layer of the

Deep Learning Method
The deep learning model was based on the convolutional neural network (CNN) model widely used in imaging, and the layers have been modified to enable the ECG to be learned. Because the ECG used as an input was a one-dimensional (1D) signal, the amount of information on the pattern was relatively small compared to the image. Therefore, the model was designed to be simpler. The convolutional and pooling layers of the model were used to extract features from the ECG data. In other words, these layers enabled the extraction of ECG features. The fully connected layer was responsible for generating the final feature by calculating the representative values for each dimension at the end of the process. The model was constructed using three primary blocks and a fully connected layer. Each of the primary blocks contained four layers, which included a 1D convolution layer (with a kernel size of 5), a batch normalization layer, the ReLU activation function, and a 1D max pooling layer. The 12-lead models took 12 (leads) × 10 (seconds) × 500 (Hz) ECG data as input and extracted their features through the main block. In contrast, the 1-lead models took lead II (excerpt from standard 12-lead supine ECGs) as a single input lead × 10 (seconds) × 500 (Hz). The extracted features were then processed through the pooling and flatten layers, and they became a feature map with a form of 192 × 1. The features then passed through a fully connected layer consisting of three layers, where each layer had 192, 64, and 32 nodes. There were two output nodes in the classification models and one output node in the regression models. The final output layer of the classification model was selected as the softmax layer, which outputs the actual probability corresponding to the class. Accordingly, the second node corresponding to the positive class between the two nodes was used as the final output. As the final output, the classification models provided the probabilities of the input belonging to each class, and the regression models provided the directly predicted values. The cross-entropy and mean absolute error (MAE) loss were respectively used as the loss function for each type of model.

Scaling a Model
The hyperparameters were determined through repeated experiments and a grid search. In particular, the batch size, learning rate, and kernel size of the max pooling layer were considered potential candidate parameters. The batch sizes were considered to be in the range from 64 to 512, and the learning rate was considered to be between 0.0001 and 0.1. Within the deep learning model, the kernel size of the convolution layer was fixed with the parameters determined through the grid search. Subsequently, the kernel size of the max pooling layer was considered to be between 2 and 5, and the case with the highest ARUOC was optimally defined for the validation set.

Statistical Metrics
The performance of the classification models was evaluated with the area under the receiver operating characteristic (AUROC). Based on the prediction results of the deep learning model and the comparison between the actual labels, each of the ECGs was classified into true positive (TP), false negative (FN), true negative (TN), false positive (FP), and was used in the metrics calculation. Accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were measured using the optimal cut-off with the receiver operating characteristic (ROC) curve calculated based on the maximized Youden index (J). The metrics were calculated as follows: where C is the cut-off point in the ROC curve. The MAE, Pearson correlation coefficient (Pearson R), and intraclass correlation coefficient (ICC) were measured for the regression models to determine the difference between the observed and estimated values. Years in the age task and kg/m 2 in the BMI task were set as the basic unit of MAE. Potential bias was evaluated using a Bland-Altman plot, in which the x-axis was the mean of the observed and predicted values, and the y-axis was the difference between them. The metrics were calculated as follows: where y i is an observed value for the ith ECGs, and ÿ i is an estimated value. Because our dataset consisted of standard 12-lead supine ECGs, we set the performance of the 12-lead ECG-based models as the main outcomes of our study. However, the performance of the 1-lead models was also reported to investigate the impact of the multilead information. All outcomes were measured for the validation dataset and test dataset. All statistics data were reported as point estimates and 95% confidence intervals (CIs). Data were analyzed and visualized using Python 3.8.5 (Python Software Foundation).

Visualization for Model Explanation
Gradient-weighted class activation mapping (Grad-CAM) [31] is a visual representative method for interpreting the decision of the trained CNN model. Each channel of the input 12-lead ECGs was converted into an image by plotting it on a two-dimensional plane, and all channels were then added to make it an image for the Grad-CAM. Because our model was a 1D CNN model, the size of the extracted heat map was a one-line array. Therefore, after resizing the image size through interpolation, the Grad-CAM was completed by combining the previously created image. The red-to-yellow area of the heat map created using the Grad-CAM method significantly impacted the CNN's decision; hence, the closer the area was to blue, the less impact it had. In this study, we analyzed the explanatory model using the Grad-CAM method for three classification tasks.

Data and Code Availability
The data used in this study cannot be disclosed without the permission of the Ethics Committee (irb@yonsei.ac.kr, 82-033-741-1715). The data contain potentially sensitive information such as the subject's date of birth and gender. Thus, the data are not publicly available because of privacy or ethical restrictions. All codes used for model development, analysis related to the current submission, and future updates will be available in the following repository: (https://github.com/RyuJiSSSS/ECG-Classification-for-YMJ-task. git) (accessed on 6 July 2022).

Study Population
A total of 124,415 subjects were included throughout the study period ( Figure 1). The age range of the subjects included in our study was 18 to 108 years, with a median age of 55 years. Among all subjects, 60,835 (48.90%) were female, and the mean age was 55.2 (SD, 17.3) years. The ECGs of 124,415 subjects were utilized in the training and validation of the age, sex, and ABO blood type prediction models ( Figure 3 and Table 1). Among the ECGs, 49,762 (40%) were used as the training set, 37,324 (30%) as the validation set, and the remaining 37,329 (30%) as the test set. Meanwhile, the ECGs of 48,488 subjects were used in the training and validation of the BMI prediction model. These subjects had a mean BMI of 24.51 kg/m 2 (SD, 14.17 kg/m 2 ), with 20,402 (42.07%) having a high BMI (BMI over 25 kg/m 2 ). Among the ECGs, 19,393 (40%), 14,546 (30%), and 14,549 (30%) were used as the training, validation, and test sets, respectively.

Evaluation Protocol
When experiments were conducted with ECGs with signal processing such as normalization or noise reduction, there was no significant difference between the results of each model. Therefore, this study conducted an experiment using raw ECGs for all tasks. The optimal batch size was 512, and the initial learning rate was 0.001. The max pooling kernel size of the deep learning model was set to 2. However, the difference in performance was minimal across the hyperparameter settings. The model was trained for 50 epochs using the Adam optimizer. During the learning process, the classification model was defined and stored as the best-performing model when the AUROC for the validation set was the highest. The regression model had the lowest MAE values. Subsequently, the stored best model was called, and final verification was performed using a test set. PyTorch version 1.10 was used as a deep learning framework.

Age Estimation
A performance summary of the classification model for age (<40 years vs. ≥40 years) is provided in Table 2 (Table 3). A performance summary of the regression model for the direct prediction of age is provided in Table 2 and Figure 4b,

Sex Estimation
A performance summary of the classification model for sex is provided in Table 2 and

ABO Blood Type Estimation
A performance summary of the classification model for ABO blood type is provided in Table 2  showing a poor predictive power overall. The experiments were not repeated for the 1-lead models since even the 12-lead main model did not reveal any predictive power.

Sex Estimation
A performance summary of the classification model for sex is provided in Table 2 and

BMI Estimation
A performance summary of the classification model for BMI (<25 kg/m 2 vs. ≥ is provided in Table 2

BMI Estimation
A performance summary of the classification model for BMI (<25 kg/m 2 vs. ≥25 kg/m 2 ) is provided in Table 2 Figure 8 shows the Grad-CAM for the age, sex, and BMI classification models. Grad-CAMs toward the old age class for ECGs acquired from young ( Figure 8a) and old individuals (Figure 8b) are presented. Activation of the P wave, PR interval, and T wave was  Figure 8 shows the Grad-CAM for the age, sex, and BMI classification models. Grad-CAMs toward the old age class for ECGs acquired from young ( Figure 8a) and old individuals (Figure 8b) are presented. Activation of the P wave, PR interval, and T wave was prominent only in the ECGs from old age. In Grad-CAMs toward the female class for ECGs acquired from male ( Figure 8c) and female individuals (Figure 8d), activation of the T wave was only prominent in ECGs from female individuals. In Grad-CAMs toward a high BMI class for ECGs acquired from low BMI (Figure 8e) and high BMI individuals (Figure 8f), diffuse activation for diverse segments was only present in ECGs from high BMI individuals, where there was only activation on the QRS complex in ECGs from low BMI individuals.

Discussion
Our study confirmed that the analysis of ECGs using deep learning algorithms could predict anthropometric information, such as BMI, as well as demographic information, such as age and sex, with high accuracy. However, it showed that ECGs were not as useful in predicting ABO blood types. In addition, our study showed that the deep learning algorithm could reliably predict 1-lead ECGs, even if the accuracy was significantly lower than that of 12-lead ECGs.
Many previous studies have reported that age and sex affect ECGs [32][33][34]. For an individual, ECGs taken at a later age had more prolonged PR and QT intervals, shorter QRS durations, and diverse T wave amplitudes than ECGs taken at a younger age [35]. In addition, many studies have reported that ECGs differ by sex among healthy individuals [36]. Cases of a shorter PR interval and QRS duration, lower ECG voltage, longer QT interval, and more prevalent ST-segment changes following ischemic heart disease were more commonly observed in females than in males [37]. In our study, a Grad-CAM suggested that the model focused on P wave, PR interval, T wave, and diverse ECG segments in the prediction of demographic features from ECGs. Although we could not confirm that the model utilized the aforementioned features for prediction solely with the Grad-CAMs, the activation pattern suggests that the model localized the amplitude and position of the major ECG components.
A study on predicting age and sex using 12-lead ECGs was conducted prior to our study (Table 4) [25,38,39]. The deep learning model used in the study achieved an AUROC of 0.923 in age classification and an AUROC of 0.947 in sex classification, showing a much higher performance than classifications based on parameters traditionally extracted from ECGs [25]. As the study used a dataset and setting different from ours, the model's performance could not be directly compared to ours. Notably, racial bias is also one of the primary considerations in analyzing ECGs with deep learning [40]; thus, differences in the racial compositions of the study population could also affect the results. The fact that most subjects in our study originated from an Asian population indicates that the predictions of age and sex using deep learning algorithms could be applied to subjects from various ethnic backgrounds. BMI was another variable that we could effectively predict from the ECGs. It has previously been reported that obesity affects ECGs as well [26]. An increased P wave duration, an increased P wave, a leftward shift of the heart axis, and a low QRS voltage are commonly observed in a subject with a high BMI [27]. Furthermore, obesity is the most significant risk factor for left atrial enlargement [41], which is significantly related to an increased prevalence of atrial fibrillation and the occurrence of cardiovascular events and death. Although no clear mechanism has been found, metabolic and inflammatory functions arising from increased epicardial and pericardial fats may contribute to this. Although measuring BMI is simpler and easier in most circumstances than obtaining ECGs, in our study, ECGs were used to predict BMI, which may be helpful in patients for whom height and weight measurements are difficult. For instance, patients in the intensive care unit or those with unclear consciousness would benefit from this approach. The accuracy of our algorithm in predicting BMI was demonstrated by an AUROC of 0.765 and a MAE of 2.332 kg/m 2 , which were lower than those in predicting age and sex. However, as there have been no previous studies on BMI prediction using a machine learning analysis of ECGs, it was impossible to compare the performance of our algorithm with that of existing algorithms. In our study, we found that using ECGs to predict BMI can be particularly beneficial for patients for whom traditional height and weight measurements are difficult to obtain. This includes individuals in the intensive care unit or those with unclear consciousness, where alternative methods for assessing BMI may be necessary. Our findings highlight the potential of the ECG-based BMI prediction as a practical and effective approach in such scenarios.
ABO blood type was another variable that we aimed to predict in our study. However, we observed that ABO blood type was nearly unpredictable based on a deep learning analysis of ECGs. Although some studies have reported that cardiovascular risk [42], susceptibility to certain diseases [43], and personality traits [44] differ according to ABO blood type, the mechanism and causality behind these differences were unclear. The results of our study suggest that the associations between the specific ABO blood type and the CVD risk reported in previous studies are not explainable or contributable by changes in ECGs. This negative result, derived from a large population comprising 124,406 individuals, could provide robust evidence that the two domains do not seem to be directly associated with each other, even though some previous studies have reported their possible association. Finally, there could also be indirect evidence that our models were less likely to be inappropriately overfitted for the dataset.
Recently, wearable devices based on 1-lead or 2-lead ECGs have been increasingly used. Therefore, we attempted to determine the predictive power of the models developed with 1-lead ECGs. Although the data used in the study were simple excerpts of the standard 12-lead supine ECGs, the potential value could be investigated in this manner. Even if the overall performance was lower than when 12-lead ECGs were used, the models with 1-lead ECGs still showed a fair performance in the prediction of age, sex, and BMI.
Several studies have reported ECG analysis models using deep learning [45]. While most studies have focused on arrhythmia, some have proposed models that predict heart failure [46], valvular heart disease [47], and cardiomyopathy [48], which are not possible to diagnose solely based on an ECG. Furthermore, attempts are being made for the extracardiac domain, such as anemia [17] or hyperkalemia [19]. Our model predicted the subjects' demographic and anthropometric features, providing two potential advantages. First, the age predicted by our model can be used as a functional biomarker that better reflects the overall cardiovascular health [20]. For instance, by comparing the biological age predicted from the ECG and the 'chronological age', individuals' overall health status could be quantified. Second, the model may help in correctly identifying subjects during ECG acquisition. Since subject mismatches can potentially lead to errors in diagnosis or treatment [21,22], our algorithm could provide an opportunity to rectify such errors when the ECG entered deviates considerably from the corresponding subject's information.
Our study had a few limitations. First, no external validation was performed because we used only a single source of data. Many artificial intelligence models fail to exhibit the same performance on external datasets from different environments. Although ECG data are not likely to carry such a risk as they are gathered using various types of equipment by many operators, further validation using additional datasets is required to generalize the results. Second, all of the data used were acquired in tertiary hospitals. Taking ECG scans in tertiary hospitals may mean that there have been prior indications of particular diseases or conditions, which may result in a deviation of the dataset from the ECG abnormalities and the prevalence rate of accompanying diseases found in the general population. In particular, because BMI was only measured in subjects who underwent cardiac ultrasonography, this problem may be more marked in the BMI prediction model than in the other models. Due to restrictions on sensitive information and data accessibility of the study subjects, we could not confirm the presence or absence of any disease, including heart disease. In future studies, we will include a patient history, including previous heart disease, to enhance the significance of the research. Thirdly, while the ECG is increasingly being utilized outside hospitals through wearable devices with 1-lead ECGs [2][3][4][5], our model was developed using the standard 12-lead supine ECG. It is difficult to apply the algorithm to wearable devices because there are considerable differences in the nature and characteristics of ECGs acquired from wearable devices and conventional monitoring systems [49,50]. Further research on ambulatory ECG monitoring is required to expand our model for general use. Finally, the explainability of the model is limited. The decision making of the model and the reliability and explanation of the rationale for that decision are critical. Therefore, we tried to secure the explainability of the model through a Grad-CAM analysis. As a result, it was found that our model focused on P wave, QRS complex, T wave, and other ECG segments during the prediction process. However, this provided indirect evidence that our model utilized the major components of the ECG. It could not reveal exactly which ECG features the model used for predicting each class. Further studies are required to develop better explainable AI. Funding: This research was supported by a National Research Foundation of Korea grant provided by the Korean government (Ministry of Science and ICT) (number NRF-2022R1A2C2091160). The funder had no role in the study design, including data collection, analysis and interpretation, or manuscript writing. The corresponding author had full access to all of the data in the study and had the final responsibility for the decision to submit for publication.

Institutional Review Board Statement:
The study was approved by the Institutional Review Board of Yonsei University Wonju Severance Christian Hospital and conformed to the ethical guidelines of the Declaration of Helsinki (approved on 18 January 2020, approval number CR319173).

Informed Consent Statement:
The need for informed consent was waived, given the impracticality and minimal harm. Data Availability Statement: Not applicable. The data used in this study cannot be disclosed without the permission of the Ethics Committee (irb@yonsei.ac.kr, 82-033-741-1715). The data contain potentially sensitive information such as the subject's date of birth and gender. Thus, the data are not publicly available because of privacy or ethical restrictions.

Conflicts of Interest:
The authors declare no conflict of interest.