Lead Analysis for the Classification of Multi-Label Cardiovascular Diseases and Neural Network Architecture Design

Yang, Tao; Xie, Chao-Xin; Huang, Hui-Ming; Wang, Yu; Fan, Ming-Hui; Kuo, I-Chun; Chen, Tsung-Yi; Chen, Shih-Lun; Chen, Chiung-An; Abu, Patricia Angela R.; Wang, Liang-Hung

doi:10.3390/electronics14163211

Open AccessArticle

Lead Analysis for the Classification of Multi-Label Cardiovascular Diseases and Neural Network Architecture Design

by

Tao Yang

^1,†

,

Chao-Xin Xie

^1,†

,

Hui-Ming Huang

^1,†

,

Yu Wang

¹,

Ming-Hui Fan

^1,*,

I-Chun Kuo

^2,*,

Tsung-Yi Chen

³

,

Shih-Lun Chen

⁴

,

Chiung-An Chen

⁵

,

Patricia Angela R. Abu

⁶

and

Liang-Hung Wang

^1,*

¹

The Department of Microelectronics, College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China

²

College of Biological Science and Engineering, Fuzhou University, Fuzhou 350108, China

³

Department of Electronic Engineering, Feng Chia University, Taichung 40724, Taiwan

⁴

The Department of Electronic Engineering, Chung Yuan Christian University, Taoyuan City 320317, Taiwan

⁵

Department of Electrical Engineering, Ming Chi University of Technology, New Taipei City 243303, Taiwan

⁶

The Department of Information Systems and Computer Science, Ateneo de Manila University, Quezon City 1108, Philippines

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(16), 3211; https://doi.org/10.3390/electronics14163211

Submission received: 31 May 2025 / Revised: 23 July 2025 / Accepted: 8 August 2025 / Published: 13 August 2025

Download

Browse Figures

Versions Notes

Abstract

The electrocardiogram (ECG), which records variations in surface electrical potential over time, has been widely used in the diagnosis of cardiovascular diseases. In recent years, the artificial intelligence (AI) + ECG paradigm has attracted considerable interest, but the two intrinsic characteristics of the ECG, namely, inter-lead correlations and multi-label classification, are often overlooked. Given that this oversight may constrain the full potential of AI models to enhance diagnostic performance, this study focuses on investigating methods for fusing information from a 12-lead ECG. A series of comprehensive experiments was conducted to evaluate the performance of various lead configurations, that is, 1-, 3-, 6-, 9-, and 12-lead combinations, with different fusion strategies. Innovatively integrating medical theory, we propose a novel five-lead-grouping strategy and develop a neural network architecture named Lead-5-Group Net (L5G-Net). After ranking the 12 leads with the AUC, we found that the aVR, V5, and V6 leads are particularly informative for single-lead ECG diagnosis. Furthermore, in multi-lead ECG classification, adopting an orthogonal lead-selection strategy which is based on the hypothesis of spatial interdependence among ECG leads was shown to enhance performance by ensuring that the information provided by each lead is complementary. Finally, the proposed L5G-Net demonstrates outstanding performance, achieving a macro-AUC of 0.9357 on the PTB-XL multi-label dataset without the use of data augmentation, attention mechanisms, or other strategies. Furthermore, considerable performance gains were observed after the five-lead-grouping strategy was applied to DenseNet and ResNet. These results imply that the proposed strategy can be seamlessly integrated into various network architectures and considerably enhance performance.

Keywords:

12-lead ECG; multi-label classification; lead grouping; arrhythmia detection

1. Introduction

Cardiovascular diseases remain the leading cause of mortality worldwide [1]. Consequently, effective diagnosis and early detection of cardiovascular diseases are of paramount importance. The electrocardiogram (ECG) has become a routine test to diagnose cardiovascular diseases because of its convenience and effectiveness [2]. An ECG records the electrical activity of the heart over time through sensors placed at specific points on the body’s surface. A standard ECG monitoring system consists of 12 leads: six limb leads (I, II, III, aVL, aVR, and aVF) and six precordial leads (V1, V2, V3, V4, V5, V6). Each lead is designed to reflect the conduction of electrophysiological signals in a specific area of the heart, but the diagnostic value of the leads varies [3]. Typically, diagnoses are made by combining information about patients’ conditions and key ECG leads. For example, the V3 and V4 leads are essential for identifying abnormalities in the anterior part of the heart when assessing myocardial infarction [4]. In the diagnosis of the left bundle branch block, the deep and broad S-waves in lead V1 and the broad clumsy R-waves in V6 are usually used [5]. These examples underscore that different leads play distinct roles in the diagnosis of diseases.

In general, ECG morphology is considerably diverse, and different leads have complex associations with diseases. Given that diagnosis based on ECG is difficult and time-consuming [6], the classification of cardiac abnormalities with the ECG has emerged as a prominent area of research in the field of artificial intelligence (AI) [7]. These algorithms can be classified into traditional machine learning algorithms [8,9,10,11,12] and deep learning methods [13,14,15,16].

A deep learning method consists of a neural network and gradient descent mechanism, relies on big data, and is efficient for feature learning and classification. Acharya et al. [13] developed a nine-layer 1D-CNN to identify five types of beat level abnormalities in the MIT-BIH Arrhythmia database [17]. Yao et al. [14] constructed Time-Incremental ResNet18 with the 1D-ResNet18 and LSTM network and regarded the China Physiological Signal Challenge 2018 database as a multi-class arrhythmia classification task. Prabhakararao et al. [15] proposed an ensemble method using multiple deep CNN classifiers. Specifically, they designed a multiple-scale-dependent DCNN with different receptive fields and retrieved ECG records with single-label annotations from the multi-label PTB-XL database [18] for analysis. Hu et al. [16] proposed a Transformer-based deep neural network named ECG DETR, which performed arrhythmia detection on single-lead continuous ECG segments.

In the application of deep learning to ECG analysis, most research has focused on enhancing performance by modifying network architectures and optimization methods. However, studies across various domains demonstrate that neural networks specifically designed for the inherent characteristics of the data better address domain-specific problems [19,20]. In the context of ECG, comparatively little attention has been given to the intrinsic spatial characteristics and multi-label nature of ECG data.

Therefore, this study mainly focused on the intrinsic spatial characteristics of 12-lead ECG to investigate multi-label ECG classification. Through comprehensive experiments, the optimal approach for lead fusion was investigated, with the aim to maximize the utilization of inter-lead information to improve performance. Furthermore, a CNN structure suitable for lead information fusion and multi-label classification was proposed. The main contributions of this paper are as follows:

A series of thorough experiments was conducted to evaluate the fusion performance of 1-, 3-, 6-, 9- and 12-lead configurations with different fusion strategies. The diagnostic importance of each lead was evaluated, and an efficient lead selection approach was proposed on the basis of experimental results.
Based on anatomical knowledge, we innovatively proposed a novel five-lead-grouping strategy and a neural network architecture named Lead-5-Group Net (L5G-Net) for multi-label ECG classification.
No enhancement strategies, such as oversampling, model ensemble, and attention mechanisms, were used in this study. The approach demonstrates that a simple feed forward architecture can effectively learn ECG features and results close to SOTA methods can be obtained by promoting lead information fusion with appropriate methods.

The paper is organized as follows. Section 2 provides a survey of the literature related to our work. In Section 3, we present the methodology, including the model architecture, dataset, preprocessing steps, and the details of the proposed method. The experimental results are reported and analyzed in Section 4 and the conclusion is provided in Section 5.

2. Background and Related Works

As shown in Figure 1, a standard 12-lead ECG provides multiple perspectives as diagnostic evidence and is an essential tool used by physicians. The 12 leads can be divided into two planes according to spatial orientation: frontal plane (limb lead) and precardiac region (precordial lead). ECG diagnostic reports available in hospitals are mostly in the form of one or more labels [21], that is, each record often has multiple signs of diseases, rather than one label per heartbeat. Thus, a model with robust feature extraction and abstraction capabilities is necessary. The publication of open-source multi-label ECG data, such as CPSC2018 [22] and PTB-XL, has established a foundation for the implementation of multi-label models. In addition, researchers have proposed various schemes [14,23,24].

Notably, ECG signals exhibit distinct morphological and rhythmic features in different leads. Hence, two critical questions are raised: which lead is most informative, and which combination of leads inform the results. Krasteva et al. [25] conducted a comparative evaluation of single- and multi-lead scenarios for human verification, using a standard 12-lead ECG. They found that the frontal plane sector, which is defined by leads I, aVR, and II, represents the most effective projection of the cardiac vector and the multi-lead approaches improved the identification verification rate. Matyschik et al. [26] investigated the use of variational autoencoders to assess the representation of leads in a standard 12-lead system. They concluded that precordial leads, namely, V2, V3 and V4, contain the largest information. Zhang et al. [24] developed a deep neural network based on 1D CNNs for automatic multi-label classification on the CPSC2018 dataset and achieved a good performance. They also conducted experiments on single-lead ECGs. The F1 scores ranged from 4.4% to 11.8%, which were lower than those obtained using all 12 leads, and the top-performing single leads were leads I, aVR, and V5. Mousa et al. [27] recommended using group policy to real-time ECG signal classification and assessed performance on 1-3-6-12 leads. A lead-wise grouping method using up to six leads was selected, with the aim to reduce time and resources without compromising accuracy.

The diagnostic importance of leads varies according to the specific application and diagnostic target. In multi-label cardiovascular disease classification with numerous classification numbers and actual diagnostic requirements, the effective extraction and fusion of lead information has become a focus in neural network design. Reddy et al. [28] proposed the IMLE-Net model, which leverages multiple-channel information and learns patterns at the beat, rhythm, and channel levels. Their model achieved a macro averaged area under the receiver operating characteristic curve (ROC-AUC) score of 0.9216 on the PTB-XL dataset. Xie et al. [21] proposed a leadwise grouping multibranch network on CINC2020 [29] and SPH [30] datasets. The 12 ECG leads were randomly divided into several groups according to relationships between leads. Hadj Azzem et al. [31] proposed an explainable multi-branch CNN model (X-ECGNet) for the multi-label classification of the PTB-XL dataset. The proposed model accounts for the diversity and integrity of a multi-lead ECG by incorporating separate branches for each lead and combining them to form comprehensive 12-lead features. They achieved a macro-AUC score of 0.936. Tao et al. [32] proposed the flexible block DKR-block to extract inter- and intra-lead features. A new DNN based on the DKR-block (2D-ECGNet) was designed, which achieved a macro-AUC of 0.929 on the PTB-XL dataset. Zhou et al. [33] introduced a leadwise clustering multibranch network, in which 12 leads were grouped and fed into corresponding branches of the network. Additionally, the model incorporated multi-scale convolution blocks and coordinate attention modules. Their proposed network outperformed common deep learning–based models on two commonly used ECG datasets.

However, the current literature provides limited discussion on the role of individual leads. In this study, we further investigated the performance of different leads with the CNN architecture, using data from the PTB-XL database. We then examined the effects of varying number of leads and lead-fusing methods. Inspired by the concept of lead reconstruction [34,35], which is finding an appropriate lead set to represent ECG information, we proposed a five-lead-grouping strategy.

3. Materials and Methods

The dataset and preprocessing method were first described, followed by the baseline model. Subsequently, a single-lead comparison experiment based on the baseline model was presented. In the subsequent multi-lead test, the number of leads was gradually increased and different lead combination methods were explored to verify the effect of a CNN on lead fusion. Guided by the biomedical theory, a lead grouping scheme was proposed, and an optimized model L5G-Net was designed. Finally, the evaluation metrics were described.

3.1. Dataset and Preprocessing Method

The PTB-XL dataset, which contains 21,837 ECG records from 18,885 patients, was used. It is the largest openly available dataset that provides clinical 12-channel ECG waveforms. Each ECG recording utilized a standard 12-lead placement with reference electrodes positioned on the right arm. And each record lasts 10 s and is available at either 100 or 500 Hz sampling rates. The data were annotated by two cardiologists, who assigned potentially multiple ECG statements to each record for multi-label diagnosis. The cardiac abnormalities of the PTB-XL dataset can be categorized into 71 (All) different statements: 44 (Diagnostic), 19 (Form), and 12 (Rhythm) statements. The 44 Diagnostic statements can be further divided into 5 Superclasses and 24 Subclasses. The distribution of the results in this paper was based on 5 Superclasses, 44 Diagnostic, and 71 Categories. Taking the Superclasses as an example, it encompasses five major categories: Normal ECG (NORM), Myocardial Infarction (MI), ST/T Change (STTC), Conduction Disturbance (CD), and Hypertrophy (HYP), with sample sizes of 9514, 5469, 5235, 4898, and 2649 respectively. The number of multiple labels for Superclasses and All categories is summarized in Table 1. The maximum number of labels assigned to a single ECG record was four for the Superclasses and nine for the All categories. We used the predefined 10-fold train–test splits provided by the PTB-XL dataset. The first eight folds were used for training, the 9th fold was used as the validation set, and the 10th fold was included in the test set.

We employed a 100 Hz sampling frequency to reduce computational overheads. To ensure the consistency of the results comparison, we followed the benchmark given in [36]. Raw data were normalized before being fed into the neural network, and no additional filtering, baseline elimination, and removal steps of abnormal data were performed.

3.2. Baseline Model

Previous studies [21,24,25,26,27,28,29,30,31,32,33] proposed different neural network architectures which focus on different aspects of ECG data, showing a promising classification performance. A CNN is efficient in capturing local information (i.e., ECG morphology) through its stacking structure. This study focuses on the fusion of information across leads and emphasizes the need for model interpretability. Specifically, the meaning of each convolution block within a neural network should be clear. Methods, such as a residual connection, multi-scale, and dense block that will cause feature mixing were not employed, and a feed-forward CNN structure was selected. The network is shown in Figure 2, and relevant layers and parameters are shown in Table 2.

The baseline model is based on CNN architecture. Each record was 10 s in length, corresponding to 1000 sampling points. The model had five blocks, each block was composed of two layers of Conv1D + Batch normalization (BN) + Activation and one layer of Maxpooling, and the field of view of each block was doubled, while the number of channels was increased accordingly. Then, two layers of Conv1D + BN + Activation were added to promote information fusion. High-level information was compressed by Global Average pooling, and classification was completed by 1 × 1 convolution. The activation function was RELU, and the number of nodes in the output layer depended on the number of diseases. Sigmoid was integrated into Binary Crossentropy to obtain the loss.

3.3. Single-Lead Experiment

Different leads provide different information perspectives for the diagnosis of various diseases. In addition, portable and out-of-hospital monitoring scenarios have some single-lead diagnostic requirements. In the single-lead experiment, each lead was fed into the baseline model separately to obtain a single classification result. The model was designed to focus on the specific characteristics of each lead, allowing for the ranking of leads according to predefined evaluation metrics. Leads with high rankings were considered indicative of a disease and prioritized in subsequent experiments.

3.4. Three-Lead to Twelve-Lead Experiment

Increasing the number of leads might enhance the recognition performance of a network. In clinical diagnosis, a cardiologist diagnosis on the basis of a standard 12-channel ECG recording [37]. Finding the appropriate number of leads and lead selection method is a common problem in the diagnosis of multi-lead ECG.

In this study, ECGs with 3, 6, 9, and 12 leads were selected to investigate the effect of lead count on classification performance. Given that lead information is insufficient in the experiment using three and six leads and results can be affected by the lead selection method employed, we adopted three methods: selection by ranking, complementary selection, and selection by region, which rely on the ranking of single-lead experiments, spatial mapping relationship, and regions represented, respectively. In the experiment using nine leads, ECG information was sufficient and the strategy of random selection was adopted.

3.5. Lead Grouping and Model Structure Optimization

The concept of lead grouping was introduced to improve network interpretability and to explore the diversity of ECG lead features [21,28,31,32,33]. Specifically, each lead is first assigned to an independent branch for the extraction of lead-specific features, and then a BLSTM, Concatenate, or Attention module is used to fuse diverse lead features.

Hence, the features of a baseline model can be described in terms of length, lead, and hierarchy (layer). In the proposed optimized model, on the one hand, we retained the practice of maintaining lead features on each branch, and on the other hand, similar to the baseline model, we retained the feed-forward architectures to maintain the consistency of the receptive field of each point in the same layer. The key considerations are how and in which layer respective lead features are combined. Accordingly, we conducted a comparative experiment using three leads.

Different from the baseline model (one lead is regarded as a channel, and different lead signals are mixed at the input layer), three leads (I, II, V2) first passed through their respective branches, and the features of each branch were merged in the subsequent layers. Then, mixed features were extracted and classified through the remaining network layer. The network structure is illustrated in Figure 2.

With the increase to 12 leads, the ECG data provide comprehensive information about the heart’s condition. Referring to the basis of the lead reconstruction [34,35], the 12 leads are not independent. For example, the limb leads exhibit a linear relationship (Table 3) and the mapping relationship between precordial and limb leads is influenced by disease type. Therefore, dividing the 12 leads into different groups should facilitate learning of ECG features with a particular pattern.

Given the complementarity of the lead information, a five-group strategy was proposed. The grouping strategy and the regions represented are summarized in Table 3. Leads with similar projection directions (and thus similar information content) naturally cluster into the same group, while maximizing uncorrelatedness between groups. By doing so, the network spends less capacity on fusing redundant features within a group and can focus its representational power on integrating the most complementary signals across groups. The five-group strategy facilitated the extraction and aggregation of multi-lead ECG features.

Furthermore, the network structure L5G-Net using the grouping strategy was proposed. The specific structure is shown in Figure 2, and the overall structure is similar to the baseline model. The difference is that the corresponding leads are merged by 1 × 1 convolution for intra-group information at the input layer and concatenated for inter-group information after several blocks, followed by subsequent blocks, global pooling, and classification output layers.

3.6. Evaluation Metrics

In multi-label classification applications, the prediction of each record may belong to multiple labels, and various performance evaluation methods have been introduced for classifiers. In this study, we employed accuracy and a macro-averaged F1-score, and a macro-AUC as metrics to assess the performance of our model. The AUC was selected because it can be evaluated based on soft classifier outputs, where no thresholding is applied yet. This approach is believed to provide a comprehensive understanding of the discriminative power of a given classification algorithm and to enable the model to optimize global metrics and reduce the impact of class imbalance issues [36]. In addition, the accuracy and F1-score are provided for comparison with previous studies. The equations of accuracy, F1-score, and AUC are presented as follows:

a c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N},

(1)

p r e c i s i o n = \frac{T P}{T P + F P},

(2)

r e c a l l = \frac{T P}{T P + F N},

(3)

F 1 = \frac{2 \times p r e c i s i o n * r e c a l l}{p r e c i s i o n + r e c a l l},

(4)

where TP, FP, TN, and FN are the true positives, false positives, true negatives, and false negatives, respectively.

An AUC can be constructed by plotting the true positive rate (TPR) against the false positive rate (FPR) at various classification thresholds. The macro-average, which combines the contributions of each label, was used.

F P R = \frac{F P}{T N + F P},

(5)

T P R (r e c a l l) = \frac{T P}{T P + F N},

(6)

4. Results

In this section, we comprehensively describe our design and corresponding experimental results. First, the experiment setups and comparison network are introduced, and then the simulation results of lead selection, lead grouping method, and the corresponding optimized network are reported.

4.1. Experiment Setups

Average binary-cross entropy was minimized using an adaptive Adam estimation optimizer. The Cosine Decay Strategy was used, and the initial learning rate was set to 1 × 10⁻³. The L2 regularization mechanism was used to prevent overfitting, and the regularization ratio was set to 0.001. The maximum number of iterations was set to 140, and the batch size was 16. To ensure the reliability of the results, the three optimal results of each experiment were recorded, and the mean and optimal values were retained. All experiments were conducted on the cloud-based platform Google Colab Pro, and the proposed model was developed in Keras/Tensorflow-GPU V2.15.0.

4.2. Result of the Single-Lead Experiment

Table 4 shows the performance of the model using different leads on the Superclasses category. The ranking of leads by the average AUC was provided, and the results of different leads vary. The average AUC ranges from 0.8011 to 0.8645, the F1-score from 0.5712 to 0.6556, and the ACC ranges from 0.7912 to 0.8399. In addition, aVR, V6, V5, and II rank among the highest, whereas III, V2, and aVL rank among the lowest. This result suggests that aVR, V6, and V5 contribute more critical information than the other leads when only a single ECG lead is available for diagnosing cardiovascular diseases.

4.3. Result of 3–12 Lead Experiment

The influence of lead number and lead selection strategy on the performance of the baseline model is summarized in Table 5.

The amount of ECG lead increases and the model performance also improves. However, the baseline model performance for the 12-lead ECG is comparable to that for the 9-lead ECG, demonstrating means that the 9-lead ECG provides sufficient information for the baseline model. The model needs to be optimized to effectively incorporate 12-lead ECG information.

When the number of leads is limited, the lead selection method considerably affects model performance. Using the three-lead configuration as an example, three-lead combination methods were employed: the combination of [aVR, V5, V6] and [III, aVL, V2] was selected according to average AUC performance, [I, II, V2] and [I, II, V1] were the complementary combinations of spatial orientation, and [II, III, aVF] and [V1, V4, V6] were obtained from the limb and precordial leads, respectively.

In the six three-lead combinations shown in Table 5, the average AUC difference was 0.0281. Lead combinations derived from a single plane, such as [II, III, aVF] or [V1, V4, V6], suboptimal performance. As for the [aVR, V5, V6] combination, the lead ranked high in the single-lead experiment but exhibited the poorest performance in the three-lead model. By contrast, the [III, aVL, V2] combination exhibited the best performance in the three-lead experiment. This result can be explained by the complementarity of leads. As shown in Figure 1, aVR, V5, and V6 demonstrate directional similarity in spatial projection. The directions represented by V5 and V6 were close, and aVR was decomposed into the opposite direction of V6 and the upward direction. Lead III and aVL exhibit mutually perpendicular projection vectors, with V2 contributing a third direction approximately orthogonal to this pair. This near-orthogonal triad maximizes spatial information diversity, thereby achieving optimal AUC performance. The [I, II, V2] and [I, II, V1] combination were also selected by the complementarity of spatial information, and the performance is close to the result of [III, aVL, V2].

The results of the six three-lead combinations show that the complementarity of lead information is an important factor affecting the performance of the neural network. The complementary lead information enables the model to obtain optimal experimental results. Similar conclusions can be verified again in the six-lead experiments. The performance of the complementary lead combination is better than that of a single-plane combination.

In the nine-lead experiment, the area covered by the leads is sufficiently wide, and the spatial information is abundant. The differences between the combinations are small, the model achieves optimal results, and the average AUC difference is only 0.0045.

Figure 3 summarizes the experimental results of 1-, 3-, 6-, 9-, and 12-lead configurations. The average AUC increases as the number of leads increases, and the fluctuation is reduced. However, the baseline model’s performance on the 12-lead data exceeds that achieved with the 9-lead configuration. The classification results saturate with the increasing number of leads. The baseline model needs to be improved to accommodate the 12-lead signal. Specifically, the model focuses on the fusion of lead information to further improve the AUC results.

4.4. Result of Lead Grouping and Structure Optimization

In general, an increase in the number of layers in a CNN indicates an expansion of the receptive field and a higher level of abstraction. In ECG signal processing, low-level outputs can be interpreted as local morphological features, whereas high-level outputs represent the beat or global features of an ECG. The effects of the three-lead fusion at different levels in the feed-forward structures are summarized in Table 6. The comparison of two lead fusion methods (i.e., Concat and Add) is also summarized in the table. The numbers (0, 1, 2, 3, 4) denote the specific block after which the fusion is performed. To ensure comparability of the results, the number of channels on each branch was set to 12, 24, 48, 72, or 96 to render the model size close to that of the baseline model. The results suggest the following:

Concat exhibits better performance than Add;
Higher performance was obtained when fusing at the lower levels of the model.

The observed results may be attributed to the fact that low-level features exhibit greater variability than high-level features. Therefore, separating leads at the lower level forces the network to focus on maximizing the extraction of differences among leads. After the combination of lead features, the subsequent modules realize the integration of features among leads.

When the number of leads was increased to 12, the grouping strategy was applied to construct L5G-Net. The results (Table 7 and Figure 4) suggest the following:

Fusing methods at different levels improves the classification performance compared with the 12-lead baseline model.
Compared with the results of the three-branch experiment (Table 6), the L5G-Net has strong consistency, and the effect of fusion at different levels makes little difference, and the difference in average AUC is only 0.0019.

The results are directly related to the grouping strategy. The three leads (I, II, and V2) of the three-branch model are nearly orthogonal, and the original features in the three-lead configuration vary considerably. The low-level fusion method can effectively utilize differences between the features of the leads. As for the L5G-Net, the features of adjacent regions are integrated within groups, and intra-group features are richer, inter-group information in the same level is more related, and thus discrepancies caused by level difference are considerably reduced.

To illustrate the consistency of the effect of grouping strategy, the results of Resnet and Densenet using grouping strategy were compared, as presented in Table 8. With the grouping strategy, the average AUC results of Densenet and Resnet increase by 0.0025 and 0.0064, respectively.

In addition, we evaluated the Cnn_12_branch model, which is similar to the three-lead baseline model. The key difference lies in the use of 12 branches, which independently processed the 12 leads of the input. The branch features were fused after block4. The result in Table 8 shows that the average AUC is reduced by 0.0032–0.0047 compared with the L5G-Net model.

5. Discussion

5.1. Comparison of Results

To validate the performance of the proposed model, we compared it with several typical multi-label ECG classification methods, using the PTB-XL dataset. The results are presented in Table 9, which supplements the test results of the L5G-Net model under the All and Diagnostic categories.

ResNet101 is a widely used CNN backbone for ECG diagnosis models because of its robust recognition and adaptation capability. Xrsenet1d101 is a Resnet adaptation for multi-label 12-lead ECG analysis. Inception1d is another inception-based CNN network. Lstm_bidir is a typical recurrent neural network. These four models are recorded in [36] and were selected in this study because of their representativity.

MLBF-Net [38] is a multi-lead-branch fusion network architecture for arrhythmia classification and integrates multi-loss optimization to learn diversity and the integrity of a multi-lead ECG. IMLE-Net [28] leverages the multichannel information available in standard 12-channel ECG recordings and learns patterns at the beat, rhythm, and channel levels. IM-ECG [32] uses a flexible block DKR-block to extract inter- and intra-lead features. X-ECGNET [31] accounts for the diversity and integrity of a multi-lead ECG by incorporating separate branches for each lead and combining them to form comprehensive 12-lead features.

For the Superclasses category, the proposed L5G-Net achieves a maximum AUC of 0.9357, which is extremely close to the AUC result (0.936) of X-ECGNET. The maximum F1-score and accuracy values can rank among the highest in these comparison models.

The accuracy and F1-score for the All and Diagnostic categories have not been recorded in the literature, and thus they were not included in Table 9. The optimal AUC values of the L5G-Net in these categories are 0.9341 and 0.9395, respectively, which are better than those of the other models.

In terms of metrics, the L5G-Net with grouping strategy exhibits good performance and can achieve results comparable to or better than that of the optimal model. The relatively simple feed-forward neural network structure remains comparable to other complex models in terms of results. The strategy of lead grouping is easy to implement and has been validated using different architectures. In addition, we did not use other enhancement strategies, such as oversampling, model ensemble, and attention mechanisms. Nevertheless, the test results of the L5G-Net are comparable to those in the literature.

5.2. Limitations and Future Directions

Following the presentation of the method and experimental results, we provided a detailed analysis of the diagnostic role of ECG leads and proposed a lead grouping model named L5G-Net. However, the current analysis primarily relies on model test results and does not fully address issues in neural networks. Future work should incorporate explainable techniques and visualization methods to verify the rationality of the hypotheses (e.g., grouping strategy reduces differences among the layers of features) in order to comprehensively assess the contribution of this paper.

According to the models in Table 8 and Table 9, an upper performance limit seemed to exist for neural networks applied to PTB-XL classification tasks. This limitation is partly due to the PTB-XL dataset. As shown in Table 1, the number of validation and test sets is limited. Some ECG records in the PTB-XL database contain noise, the uneven distribution of various diseases causes model deviation, and correlations among diseases affect model performance. Therefore, an in-depth study of the PTB-XL dataset is an important direction for future research.

The limited number of certain diseases within the ALL category of the PTB-XL dataset impacts model results. Future work will consider incorporating data from other databases to enhance the persuasiveness of the findings. In addition, as the primary focus of this study is to investigate effective lead grouping strategies, the design of the network architecture was not extensively explored. As a result, the temporal properties of ECG signals were not fully utilized through dedicated time series modeling, which could potentially lead to the loss of some important features. In future research, we plan to incorporate models that excel at capturing temporal dependencies, such as RNNs and Transformers, to enhance the model’s capability in extracting relevant features.

6. Conclusions

This study investigated the effects of individual leads and fusion methods for 12-lead multi-label classification. Building upon the baseline model, we executed experimental validation using 1-, 3-, 6-, 9- and 12-lead ECG data. The experimental results vary considerably with different lead inputs. Hence, using more leads and complementary lead selection strategies can substantially improve the baseline model. For 12-lead ECG data, lead information fusion is crucial to the enhancement of model performance. On the biomedical mechanism of ECG, the L5G-Net was proposed, which divides the 12 leads into five groups according to their spatial relationships and fuses the characteristics of each lead within each group, which is completed by the low-level network. Intra-group information is aggregated through Concat operation, and the inter-group information is extracted. Notably, L5G-Net has optimal performance, and even a simple neural network can achieve results comparable to or better than those of many models. The lead-grouping strategy improves the accuracy of multi-label classification in different network architectures, and the performance is stable.

Author Contributions

Conceptualization, T.Y. and Y.W.; methodology, T.Y., C.-X.X., H.-M.H., Y.W. and L.-H.W.; software, T.Y., C.-X.X. and H.-M.H.; validation, T.Y., C.-X.X. and L.-H.W.; formal analysis, T.Y., C.-X.X., H.-M.H., Y.W., M.-H.F., I.-C.K., T.-Y.C., S.-L.C., C.-A.C., P.A.R.A. and L.-H.W.; investigation, T.Y., C.-X.X. and L.-H.W.; resources, T.Y. and L.-H.W.; data curation, T.Y., Y.W. and C.-X.X.; writing—original draft preparation, T.Y.; writing—review and editing, T.Y., C.-X.X., T.-Y.C., S.-L.C., C.-A.C., P.A.R.A. and L.-H.W.; visualization, T.Y. and C.-X.X.; supervision, T.Y., M.-H.F., I.-C.K. and L.-H.W.; project administration, T.Y. and L.-H.W.; funding acquisition, T.Y. and L.-H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China and the Major Project and Innovation Platform of the Science and Technology Agency of Fujian Province under Grant Nos. 61971140, 2020IM010200, and 2021H6003, 2021D036, 2022J01549, and 2023J01258.

Data Availability Statement

The PTB-XL database is available at https://www.physionet.org/content/ptb-xl/1.0.3/ (accessed on 9 November 2022).

Acknowledgments

The authors are grateful to Fuzhou University, Intelligence Health System and Biologic Integrated Circuits Development International (Hong Kong, Macao and Taiwan) Joint Laboratory in Fuzhou University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Roopa, C.K.; Harish, B.S. A Survey on various Machine Learning Approaches for ECG Analysis. Int. J. Comput. Appl. 2017, 163, 25–33. [Google Scholar] [CrossRef]
Ayano, Y.M.; Schwenker, F.; Dufera, B.D.; Debelee, T.G. Interpretable Machine Learning Techniques in ECG-Based Heart Disease Classification: A Systematic Review. Diagnostics 2023, 13, 111. [Google Scholar] [CrossRef] [PubMed]
Krishnan, R.; Rajpurkar, P.; Topol, E.J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 2022, 6, 1346–1352. [Google Scholar] [CrossRef] [PubMed]
Thygesen, K.; Alpert, J.S.; Jaffe, A.S.; Chaitman, B.R.; Bax, J.J.; Morrow, D.A.; White, H.D.; Executive Group on behalf of the Joint European Society of Cardiology (ESC)/American College of Cardiology (ACC)/American Heart Association (AHA)/World Heart Federation (WHF). Fourth universal definition of myocardial infarction (2018). Circulation 2018, 138, 618–651. [Google Scholar] [CrossRef]
Chen, T.-M.; Huang, C.-H.; Shih, E.S.C.; Hu, Y.-F.; Hwang, M.-J. Detection and Classification of Cardiac Arrhythmias by a Challenge-Best Deep Learning Neural Network Model. iScience 2020, 23, 100886. [Google Scholar] [CrossRef]
Hannun, A.-Y.; Rajpurkar, P.; Haghpanahi, M.; Tison, G.H.; Bourn, C.; Turakhia, M.P.; Ng, A.Y. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Med. 2019, 25, 65–69. [Google Scholar] [CrossRef]
Feeny, A.K.; Chung, M.K.; Madabhushi, A.; Attia, Z.I.; Cikes, M.; Firouznia, M.; Friedman, P.A.; Kalscheur, M.M.; Kapa, S.; Narayan, S.M.; et al. Artificial Intelligence and Machine Learning in Arrhythmias and Cardiac Electrophysiology. Circ. Arrhythmia Electrophysiol. 2020, 13, e007952. [Google Scholar] [CrossRef]
Georgieva-Tsaneva, G.; Gospodinova, E. Heart rate variability analysis of healthy individuals and patients with ischemia and arrhythmia. Diagnostics 2023, 13, 2549. [Google Scholar] [CrossRef]
Wang, L.-H.; Yan, Z.-H.; Yang, Y.-T.; Chen, J.-Y.; Yang, T.; Kuo, I.-C.; Abu, P.A.R.; Huang, P.-C.; Chen, C.-A.; Chen, S.-L. A Classification and Prediction Hybrid Model Construction with the IQPSO-SVM Algorithm for Atrial Fibrillation Arrhythmia. Sensors 2021, 21, 5222. [Google Scholar] [CrossRef]
Raj, S.; Ray, K.C. ECG Signal Analysis Using DCT-Based DOST and PSO Optimized SVM. IEEE Trans. Instrum. Meas. 2017, 66, 470–478. [Google Scholar] [CrossRef]
Kung, B.-H.; Hu, P.-Y.; Huang, C.-C.; Lee, C.-C.; Yao, C.-Y.; Kuan, C.-H. An Efficient ECG Classification System using Resource-Saving Architecture and Random Forest. IEEE J. Biomed. Health Inform. 2021, 25, 1904–1914. [Google Scholar] [CrossRef]
Xie, C.-X.; Wang, L.-H.; Yu, Y.-T.; Ding, L.-J.; Yang, T.; Kuo, I.-C.; Wang, X.-K.; Gao, J.; Abu, P.A.R. Clinical Sudden Cardiac Death Risk Prediction: A Grid Search Support Vector Machine Multimodel Base on Ventricular Fibrillation Visualization Features. Comput. Electr. Eng. 2025, 123, 110022. [Google Scholar] [CrossRef]
Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adam, M.; Gertych, A.; San Tan, R. A deep convolutional neural network model to classify heartbeats. Comput. Biol. Med. 2017, 89, 389–396. [Google Scholar] [CrossRef]
Yao, Q.H.; Wang, R.X.; Fan, X.M.; Liu, J.; Li, Y. Multi-class Arrhythmia detection from 12-lead varied-length ECG using Attention-based Time-Incremental Convolutional Neural Network. Inf. Fusion 2020, 53, 174–182. [Google Scholar] [CrossRef]
Prabhakararao, E.; Dandapat, S. Multi-Scale Convolutional Neural Network Ensemble for Multi-Class Arrhythmia Classification. IEEE J. Biomed. Health Inform. 2022, 26, 3802–3812. [Google Scholar] [CrossRef] [PubMed]
Hu, R.; Chen, J.; Zhou, L. A transformer-based deep neural network for arrhythmia detection using continuous ECG signals. Comput. Biol. Med. 2022, 144, 105325. [Google Scholar] [CrossRef] [PubMed]
Moody, G.B.; Mark, R.G. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng. Med. Biol. Mag. 2001, 20, 45–50. [Google Scholar] [CrossRef]
Wagner, P.; Strodthoff, N.; Bousseljot, R.-D.; Kreiseler, D.; Lunze, F.I.; Samek, W.; Schaeffter, T. PTB-XL, a large publicly available electrocardiography dataset. Sci. Data 2020, 7, 154. [Google Scholar] [CrossRef]
Xing, Z.Z.; Ma, G.L.; Wang, L.Y.; Yang, L.; Guo, X.; Chen, S. Toward Visual Interaction: Hand Segmentation by Combining 3-d Graph Deep Learning and Laser Point Cloud for Intelligent Rehabilitation. IEEE Internet Things J. 2025, 12, 21328–21338. [Google Scholar] [CrossRef]
Zhou, P.C.; Fang, Z.Y.; Yang, Z.L.; Zhou, Z.; Zhou, L. Efficient Streaming Voice Steganalysis in Challenging Detection Scenarios. IEEE Trans. Inf. Forensics Secur. 2025, 20, 5966–5977. [Google Scholar] [CrossRef]
Xie, X.Y.; Liu, H.; Chen, D.; Shu, M.; Wang, Y. Multilabel 12-Lead ECG Classification Based on Leadwise Grouping Multibranch Network. IEEE Trans. Instrum. Meas. 2022, 71, 4004111. [Google Scholar] [CrossRef]
Liu, F.F.; Liu, C.Y.; Zhao, L.; Zhang, X.; Wu, X.; Xu, X.; Liu, Y.; Ma, C.; Wei, S.; He, Z.; et al. An Open Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection. J. Med. Imaging Health Inform. 2018, 8, 1368–1373. [Google Scholar] [CrossRef]
He, R.N.; Liu, Y.; Wang, K.Q.; Zhao, N.; Yuan, Y.; Li, Q.; Zhang, H. Automatic Cardiac Arrhythmia Classification Using Combination of Deep Residual Network and Bidirectional LSTM. IEEE Access 2019, 7, 102119–102135. [Google Scholar] [CrossRef]
Zhang, D.D.; Yang, S.; Yuan, X.H.; Zhang, P. Interpretable deep learning for automatic diagnosis of 12-lead electrocardiogram. iScience 2021, 24, 102373. [Google Scholar] [CrossRef] [PubMed]
Krasteva, V.; Jekova, I.; Abächerli, R. Biometric verification by cross-correlation analysis of 12-lead ECG patterns: Ranking of the most reliable peripheral and chest leads. J. Electrocardiol. 2017, 50, 847–854. [Google Scholar] [CrossRef] [PubMed]
Matyschik, M.; Mauranen, H.; Karel, J.; Bonizzi, P. Feasibility of ECG Reconstruction From Minimal Lead Sets Using Convolutional Neural Networks. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020; pp. 1–4. [Google Scholar] [CrossRef]
Mousa, A.; Elgazzar, K. Six Leads Are All You Need for Efficient Cardiac Analysis. In Proceedings of the 2023 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT), Dubai, United Arab Emirates, 10–11 December 2023; pp. 153–160. [Google Scholar] [CrossRef]
Reddy, L.; Talwar, V.; Alle, S.; Bapi, R.S.; Priyakumar, U.D. IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 1068–1074. [Google Scholar] [CrossRef]
Perez Alday, E.A.; Gu, A.; Shah, A.J.; Robichaux, C.; Wong, A.-K.I.; Liu, C.; Liu, F.; Rad, A.B.; Elola, A.; Seyedi, S.; et al. Classification of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. Physiol. Meas. 2020, 41, 124003. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Chen, D.; Zhang, X.; Li, H.; Bian, L.; Shu, M.; Wang, Y. A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements. Sci. Data 2022, 9, 272. [Google Scholar] [CrossRef] [PubMed]
Azzem, Y.C.H.; Harrag, F. Explainable Deep Learning Based-System for Multilabel Classification of 12-Lead ECG. In Proceedings of the 2023 International Conference on Networking and Advanced Systems (ICNAS), Algiers, Algeria, 21–23 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
Tao, R.; Wang, L.; Xiong, Y.N.; Zeng, Y.-R. IM-ECG: An interpretable framework for arrhythmia detection using multi-lead ECG. Expert Syst. Appl. 2024, 237, 121497. [Google Scholar] [CrossRef]
Zhou, F.Y.; Chen, L.Z. Leadwise clustering multi-branch network for multi-label ECG classification. Med. Eng. Phys. 2024, 130, 104196. [Google Scholar] [CrossRef]
Kapfo, A.; Datta, S.; Dandapat, S.; Bora, P.K. LSTM based Synthesis of 12-lead ECG Signal from a Reduced Lead Set. In Proceedings of the 2022 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), Trivandrum, India, 10–12 March 2022; pp. 296–301. [Google Scholar] [CrossRef]
Wang, L.-H.; Zou, Y.-Y.; Xie, C.-X.; Yang, T.; Abu, P.A.R. Feasibility and validity of using deep learning to reconstruct 12-lead ECG from three-lead signals. J. Electrocardiol. 2024, 84, 27–31. [Google Scholar] [CrossRef]
Strodthoff, N.; Wagner, P.; Schaeffter, T.; Samek, W. Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XL. IEEE J. Biomed. Health Inform. 2021, 25, 1519–1528. [Google Scholar] [CrossRef] [PubMed]
Drew, B.J.; Finlay, D.D. Standardization of reduced and optimal lead sets for continuous electrocardiogram monitoring: Where do we stand? J. Electrocardiol. 2008, 41, 458–465. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Liang, D.; Liu, A.P.; Gao, M.; Chen, X.; Zhang, X.; Chen, X. MLBF-Net: A Multi-Lead-Branch Fusion Network for Multi-Class Arrhythmia Classification Using 12-Lead ECG. IEEE J. Transl. Eng. Health Med. 2021, 9, 1900211. [Google Scholar] [CrossRef] [PubMed]

Figure 1. 12-lead spatial mapping diagram, and limb leads I–III and chest leads V1–V6.

Figure 2. Network structure used in this paper. (a) Baseline model; (b) 3–12-lead branch model; (c) proposed lead grouping model. Supplement: fusion can be performed after any block, and the diagram conveniently shows the network structure of fusion after block 2.

Figure 3. Average value and fluctuation of AUC from 1- to 12-lead experiment.

Figure 4. Comparison of Concat and Add methods in different layers for three-branch and L5G-Net models.

Table 1. Number of multi-labels for Superclasses and All categories.

		1	2	3	4	5	6	7	8	9
Superclasses	Train	12,978	3279	730	124	-	-	-	-	-
	Val	1642	400	95	19	-	-	-	-	-
	Test	1652	400	95	16	-	-	-	-	-
	Total	16,272	4079	920	159	-	-	-	-	-
All	Train	550	9005	4014	2138	985	485	208	50	6
	Val	63	1114	571	230	136	52	20	6	1
	Test	92	1128	529	229	133	60	25	7	-
	Total	705	11,247	5114	2597	1254	597	253	63	7

Table 2. Configuration of the baseline model for 12 leads.

Layer	Input Shape	Output Shape	Kernel Size
Conv1D + BN + Act Conv1D + BN + Act	1000,2	1000,36	5 3
Max_Pool	1000,36	500,36
Conv1D + BN + Act Conv1D + BN + Act	500,36	500,72	3 3
Max_Pool	500,72	250,72
Conv1D + BN + Act Conv1D + BN + Act	250,72	250,144	3 3
Max_Pool	250,144	125,144
Conv1D + BN + Act Conv1D + BN + Act	125,144	125,216	3 3
Max_Pool	125,216	62,216
Conv1D + BN + Act Conv1D + BN + Act	62,216	62,288	3 3
Max_Pool	62,288	31,288
Conv1D + BN + Act Conv1D + BN + Act	31,288	31,288	3 3
Global Average Pool + Reshape	31,288	1,288
Conv1D + BN + Act	1,288	1,72	1
Conv1D + BN + Sigmoid	1,72	1,5	1

Table 3. Linear vector relationship of the limb leads and recommended grouping strategy.

Input Leads	Reconstructed Limb Leads	Linear Regression Reconstruction Model
I + II	III	$I I I = I I - I$
	aVR	$a V R = - \frac{1}{2} (I + I I)$
	aVL	$a V L = I - \frac{1}{2} I I$
	aVF	$a V F = I I - \frac{1}{2} I$
Recommended Grouping Strategy
[V1, V2]—Septal wall [V3, V4]—Anterior wall [V5, V6, I, aVL]—Lateral and High-lateral wall [II, III, aVF]—Inferior wall [aVR]—Posterior wall

Table 4. Performance of different leads on Superclasses category.

Lead	AUC Average (Max)	F1-Score Average (Max)	ACC Average (Max)	Rank by AUC
I	0.8357 (0.8396)	0.6151 (0.6179)	0.8184 (0.8202)	6
II	0.8525 (0.8562)	0.6301 (0.6357)	0.8343 (0.8393)	4
III	0.8011 (0.8044)	0.5712 (0.5784)	0.7912 (0.7953)	12
aVR	0.8645 (0.8650)	0.6479 (0.6498)	0.8367 (0.8380)	1
aVL	0.8129 (0.8167)	0.5806 (0.5853)	0.8024 (0.8040)	10
aVF	0.8287 (0.8306)	0.6004 (0.6040)	0.8121 (0.8154)	7
V1	0.8223 (0.8228)	0.6032 (0.6058)	0.8015 (0.8047)	9
V2	0.8061 (0.8071)	0.5875 (0.5890)	0.8048 (0.8079)	11
V3	0.8224 (0.8233)	0.6003 (0.6019)	0.8139 (0.8171)	8
V4	0.8429 (0.8461)	0.6205 (0.6225)	0.8219 (0.8246)	5
V5	0.8624 (0.8630)	0.6556 (0.6574)	0.8399 (0.8415)	3
V6	0.8629 (0.8637)	0.6543 (0.6571)	0.8398 (0.8401)	2