Learning Periodic Patterns in ECG Signals Using TimesNet for Automated Cardiac Classification
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsI have reviewed the manuscript titled “Learning Periodic Patterns in ECG Signals Using TimesNet for Automated Cardiac Classification” and, in its current form, I recommend major revision because although the topic is relevant and the reported results appear promising, the manuscript still has important methodological, reporting, and presentation weaknesses that must be addressed before it can be considered for publication in an international journal. The study has potential value in applying a periodicity-aware deep learning architecture to ECG classification, but the scientific contribution is not yet communicated with sufficient rigor or clarity. First, the problem formulation remains ambiguous, particularly regarding whether the task is truly handled as multi-label or effectively treated as multi-class in the interpretation of results; this inconsistency affects the credibility of the evaluation and should be clarified in detail. Second, the data split strategy requires stronger justification, especially with respect to patient-level separation and comparability with prior PTB-XL studies, since possible leakage or non-standard splitting may affect the reported performance. Third, the manuscript lacks an ablation study, which is essential to demonstrate the individual contribution of the FFT-based period extraction, TimesBlocks, architectural simplifications, and preprocessing choices. Fourth, the comparison with prior work is still not sufficiently rigorous, and the practical advantage of the proposed model over state-of-the-art ECG classification methods remains uncertain; a clearer state-of-the-art comparison table under comparable settings would strengthen the paper. Fifth, the discussion of weaker class-wise performance, especially for more difficult categories such as STTC and PVC, should be more critical and transparent rather than overly optimistic. Sixth, validation on an additional dataset or a cross-dataset evaluation would substantially improve confidence in the generalizability of the findings. Seventh, the manuscript should explicitly include a Novelties and Contributions section to make clear what is methodologically new beyond an adaptation of TimesNet to ECG data. Eighth, the inclusion and exclusion criteria, label construction process, SCP-code mapping, and handling of overlapping labels should be described more transparently. Ninth, reproducibility would be significantly improved by sharing the code and preprocessing pipeline through a public repository such as GitHub and an archival platform such as Zenodo. Tenth, the presentation quality needs major improvement: Table 4 should not be presented as a standalone result, but should be interpreted and commented on directly below the table so that readers can clearly understand the significance of the reported values; Figures 1–6 are of poor visual quality and should be improved substantially, and in fact the quality, readability, and resolution of all figures in the manuscript should be carefully rechecked and revised, including axis labels, legends, font sizes, and contrast. In addition, all equations should be numbered sequentially and each equation should be explained clearly, one by one, in the main text, because the current mathematical presentation is not sufficiently accessible or precise for readers. The manuscript also needs careful English language editing, as there are several awkward or repetitive sentences that reduce scientific clarity. Overall, the study addresses an important topic, but it requires stronger methodological transparency, better experimental justification, clearer statement of contribution, improved visual presentation, explicit discussion of limitations and future work, and more careful reporting before it can be reconsidered.
Author Response
Reviewer
I have reviewed the manuscript titled “Learning Periodic Patterns in ECG Signals Using TimesNet for Automated Cardiac Classification” and, in its current form, I recommend major revision because although the topic is relevant and the reported results appear promising, the manuscript still has important methodological, reporting, and presentation weaknesses that must be addressed before it can be considered for publication in an international journal.
Authors
We acknowledge your recommendation for a major revision and agree that several aspects of the manuscript require further clarification and improvement. In response to your comments, we have thoroughly revised the manuscript to address methodological, reporting, and presentation concerns.
Reviewer
First, the problem formulation remains ambiguous, particularly regarding whether the task is truly handled as multi-label or effectively treated as multi-class in the interpretation of results; this inconsistency affects the credibility of the evaluation and should be clarified in detail.
Authors
This is an important point, and we acknowledge that the original manuscript did not sufficiently distinguish between the multi-label problem formulation and the dominant-label–based evaluation used in several visualization analyses throughout the manuscript.
- Abstract
updated
In this work, we investigated the performance of our proposed model on the PTB-XL dataset with different numbers of classes (three and five) assigned to the different cardiac arrhythmias and abnormal ECG signals: Three-Class classification (NORM, AFIB, PVC) and Five-Class classification (NORM, AFIB, MI, PVC, STTC). Experiments were conducted using a one-vs-rest multi-label learning approach with independent class probability estimation. The proposed framework achieved mean one-vs-rest test AUC values of 0.913 and 0.956 for the Five-Class and Three-Class classification settings, respectively. - Proposed Methodology
Although PTB-XL is a multi-label dataset, this work focuses on Three-Class (NORM, AFIB, PVC) and Five-Class (NORM, AFIB, MI, PVC, STTC) classification tasks instead of learning all the 15 possible diagnoses. The experiments are implemented in a one-vs-rest multi-label learning setting with sigmoid outputs where class-wise proba-bility estimation is performed. In our experiments, we train and evaluate on a reduced set of diagnostic labels corresponding to the target classes that we are interested in. In principle, in the prediction of a recording, more than one target class can be activated, but for the purpose of confusion matrices and visualization, we consider the high-est-class probability as the dominant predicted class for each recording and analyze its class-specific performance in detail. Model architecture
Binary cross-entropy loss was optimized independently for each class to support one-vs-rest probability estimation within the multi-label framework.
- Results and Discussion
In our framework training was performed using a one-vs-rest multi-label setup; however, for class-wise visualization, confusion matrices and projections in latent space, a dominant-label assignment was done based on the highest predicted probability for ease of interpretation and comparison. ROC-AUC, precision, recall and F1-score are reported for each diagnostic class in the tasks using a one-vs-rest evaluation approach. Micro-averaged metrics that capture performance across all classes are also provided.
The confusion matrix for this class, shown in Figure 6 , indicates that the majority of normal ECG recordings are correctly assigned dominant-label predictions with very few false positives or false negatives.
Confusion matrices were produced using the highest confidence score to classify each ECG recording to a particular class, even though the network has been trained on ECG recordings with multiple labels.
The model achieved a mean AUC of 0.9556, demonstrating strong discriminative performance across the three target diagnostic categories.
- Conclusion
n this paper, we have developed a framework for multi-label ECG classification using deep neural networks. In our implementation, we used one-vs-rest multi-label learning approach where sigmoid function-based probabilities for each class were computed separately. For visualization and convenience of confusion-matrix analysis, domi-nant-label was assigned by selecting the class with the maximum probability. It capi-talizes on the periodic nature of ECG signals to facilitate the extraction of temporal features from the signals. With the aid of the Fast Fourier Transform (FFT) for the pe-riod detection and multi-scale convolutional features in TimesBlocks, it comprehen-sively extracts intra-period morphological features and inter-period rhythm features from multi-lead ECG signals to achieve the desired performance. Experimental results are provided over the PTB-XL dataset and the obtained classification performance was competitive, achieving a mean AUC of 0.913 and 0.956 for the 5 and 3 class settings respectively. Furthermore, it is demonstrated that class complexity reduction leads to higher feature separability, stability of classification decision regions and finally to higher model robustness. Finally, visualizing the clusters of different classes over UMAP and PCA representations shows that 3 class setting leads to more separable and thus more pure clusters, while gradient-based explanations demonstrate that the mod-el focuses on the physiologically meaningful representations of heart beats especially for the rhythm-related pathologies like AFIB and PVC.
While the model achieved a competitive performance, the authors also identified some limitations. These were mainly to do with classifying more morphologically sim-ilar conditions, such as myocardial infarction and ST-T abnormalities, within the Five-Class setting. The authors argue that there are data-driven restrictions when it comes to identifying the subtle changes in repolarization associated with these cardiac conditions.
Future work will focus on several key directions. First, integrating morpholo-gy-aware attention mechanisms or hybrid architectures combining TimesNet with clinical feature extraction may improve discrimination between overlapping classes such as MI and STTC. Second, incorporating multi-modal data (e.g., clinical metadata, patient history, or imaging) could enhance diagnostic robustness.
Reviewer
Second, the data split strategy requires stronger justification, especially with respect to patient-level separation and comparability with prior PTB-XL studies, since possible leakage or non-standard splitting may affect the reported performance.
Authors
The dataset was divided into three different subsets which included training and validation and test subsets. In order to avoid data leakage, the standard practice of time-series dataset partitioning at the patient level was followed. All available recordings of a single patient were thus assigned to either the training, validation or test subset. No data from the same patient was therefore available for evaluation on unseen data. This separation strategy is standard in time-series classification and follows the evaluation principles outlined in the original PTB-XL benchmark; it is also typically followed in ECG classification studies employing PTB-XL. For this reason, it enhances the ability to meaningfully assess generalization performance and to compare results with other approaches.
To tackle this challenging learning task, we employ a stratified partitioning strategy. We use the training set to tune the model’s parameters, the validation set to tune model hyper-parameters and to enforce early stopping to prevent overfitting. The test set is then used to evaluate the performance of the best performing model.
The PTB-XL dataset contains multi-label diagnostic annotations in SCP diagnostic code format. For this study we selected a subset of clinically relevant diagnostic categories to build the target classification tasks. Two different experimental settings have been investigated: a Three-Class setup (NORM, AFIB, PVC) and a Five-Class setup (NORM, AFIB, MI, PVC, STTC).
The original SCP diagnostic statements from the SCP file were mapped into the target diagnostic categories using clinically related grouping rules. Infract related SCP codes, such as IMI, AMI, ASMI, ALMI, INJAS, and INJAL were grouped into the myocardial infarction (MI) category. Repolarization related SCP codes such as STTC, STD_, and STE_ were grouped into the STTC category. Recording classifications of NORM, AFIB, and PVC annotations were used to map the SCP statements to the target classes.
As PTB-XL is a multi-label dataset, several target labels may be present in one recording at the same time. Following this, we encode target labels as multi-hot vectors and employ a one-vs-rest approach in multi-label learning. We exclude recordings that contain no target labels of the selected categories. In addition, recordings with missing waveform data or with leads other than mono, bipolar or tri-polarab were excluded in preprocessing/standardization step.
For visualization purposes, dominant-labeling was performed by selecting the class with the highest probability for each sample, and using this for the generation of confusion matrices and latent space projections. However, all other aspects of training and evaluation were multi-label. For more information, refer to Table 1.
Table 1 Summary of the PTB-XL dataset characteristics and the customized Five-Class ECG classification setting used in this study.
Reviewer
Third, the manuscript lacks an ablation study, which is essential to demonstrate the individual contribution of the FFT-based period extraction, TimesBlocks, architectural simplifications, and preprocessing choices.
Authors
We have added the following text in the updated MS/
Ablation Study of Architecture Components
We performed an ablation study to highlight the functionalities of several components of our proposed framework. The full proposed model consisting of all the components has a mean AUC of 0.913 over all the 15 categories. We remove the FFT-based period extraction in TimesNet and the results in terms of mean AUC drops to 0.900, which highlights that frequency-aware temporal decomposition for learning of rhythm-related temporal dependencies are important for better classification performance. Also, removing multi-scale TimesBlocks also results in a comparable drop in classification performance and this further highlights the importance of multi-scale temporal modeling for morphology-aware feature learning. Further, removing positional encoding and data augmentation results in moderate decrease in classification performance, which are primarily due to the lack of temporal ordering information and insufficient representation learning and generalization of model. The ablation study thus highlights that frequency-aware temporal modeling integrated with multi-scale feature learning is crucial for achieving robust classification performance for automated multi-label ECG classification. (Table 6).
Table 4. Ablation analysis of the major architectural components in the proposed TimesNet-based ECG classification framework on the PTB-XL test set.
|
Configuration |
FFT-Based Period Detection |
Multi-Scale TimesBlocks |
Positional Embedding |
Data Augmentation |
Mean AUC |
|
Full Proposed Model |
✓ |
✓ |
✓ |
✓ |
0.900 |
|
Without FFT Period Extraction |
✗ |
✓ |
✓ |
✓ |
0.901 |
|
Without Multi-Scale Kernels |
✓ |
✗ |
✓ |
✓ |
0.892 |
|
Without Positional Embedding |
✓ |
✓ |
✗ |
✓ |
0.904 |
|
Without Data Augmentation |
✓ |
✓ |
✓ |
✗ |
0.898 |
Reviewer
Fourth, the comparison with prior work is still not sufficiently rigorous, and the practical advantage of the proposed model over state-of-the-art ECG classification methods remains uncertain; a clearer state-of-the-art comparison table under comparable settings would strengthen the paper.
Author replies
Table 5 Summary of recent deep learning and machine learning approaches for ECG.
|
Ref |
Study / Model |
Dataset / ECG Type |
Classification Setting |
Methodology |
Key Contribution |
Reported Metric |
Performance |
|
[22] |
MTDL-Net (Han et al., 2023) |
Multi-lead ECG |
Multi-class |
Transformer + temporal enhancement |
Joint morphological and temporal feature learning |
Accuracy, Recall |
Acc: 95.6%, Recall: 96.4% |
|
[24] |
ML Arrhythmia Classification (Zakaria et al., 2024) |
2-lead ECG |
Arrhythmia classification |
Ensemble ML + feature engineering |
Inter-patient morphological classification |
Accuracy, F1-score |
Acc: 87%, F1: 87% |
|
[25] |
Morphology–Rhythm Contrast (Liu et al., 2024) |
Multi-lead ECG |
Multi-label |
Triple-branch contrastive learning |
Joint morphology–rhythm representation learning |
AUROC, AUPRC |
AUROC: 0.9889, AUPRC: 0.9694 |
|
[26] |
MSGformer (Ji et al., 2024) |
Multi-lead ECG |
Multi-class |
Multi-scale transformer + grid attention |
Multi-scale spatial-temporal feature extraction |
Accuracy, F1-score |
Acc: 99.28%, F1 ≈ 0.86 |
|
[27] |
1D-CNN + Attention (Guhdar et al., 2025) |
Multi-lead ECG |
Multi-class |
CNN + attention + focal loss |
Attention-based ECG representation learning |
Accuracy, F1-score |
Acc: 99.48–99.83%, F1 ≈ 1.00 |
|
[28] |
Deep Learning ECG (Mathews et al., 2018) |
Single-lead ECG |
Beat classification |
RBM + DBN |
Early deep learning ECG classification |
Accuracy |
Acc: 93.63–95.57% |
|
[29] |
Y-Net-ECG (Liu et al., 2025) |
Multi-lead ECG |
AF detection |
Dual-branch segmentation framework |
Interpretable ECG segmentation and AF detection |
F1-score, AUC |
F1: 99.60%, AUC: 0.983 |
|
[30] |
DeepMI (Tadesse et al., 2021) |
12-lead ECG |
MI classification |
Multi-level fusion + RNN |
Temporal myocardial infarction analysis |
AUROC |
AUROC: 96.7% |
|
[31] |
LVH Detection (Jothiramalingam et al., 2021) |
Multi-lead ECG |
Clinical diagnosis |
ML + wavelet + classifiers |
Clinical feature-based cardiac diagnosis |
Accuracy |
Acc: 97.8% |
|
[32] |
Multi-scale CNN (Zhou & Fang, 2025) |
Multi-lead ECG |
Multi-class |
Hierarchical CNN + LEA attention |
Multi-scale morphology-aware learning |
Accuracy |
Acc: 99.5% |
|
[33] |
Two-stage CNN (Jain et al., 2020) |
Multi-lead ECG |
Hypertension risk classification |
CNN-based risk stratification |
ECG-based hypertension risk assessment |
Accuracy |
Acc: 99.68% |
|
[34] |
ECG-MAE (Hu et al., 2023) |
PTB-XL (12-lead ECG) |
Multi-label |
Self-supervised masked autoencoder |
Label-efficient ECG representation learning |
Macro-AUC |
Macro-AUC: 0.9474 |
|
[35] |
DDNN AF Detection (Cai et al., 2020) |
12-lead ECG |
AF classification |
Dense convolutional neural network |
Large-scale AF classification |
Accuracy, Sensitivity |
Acc: 99.35%, Sens: 99.19% |
|
Proposed |
Proposed TimesNet-Based Framework |
PTB-XL (12-lead ECG) |
Multi-label |
FFT-based periodicity modeling + multi-scale TimesBlocks |
Joint rhythm-aware and morphology-aware temporal representation learning |
Mean AUC, |
Mean AUC: 0.913, |
Reviewer
Fifth, the discussion of weaker class-wise performance, especially for more difficult categories such as STTC and PVC, should be more critical and transparent rather than overly optimistic.
Author replies
We have updated the following text in the discussion section
Although our framework achieves strong overall performance, certain classes receive poorer classification accuracy. Specifically, performance on STTC is worse than that of NORM and AFIB. Although less so, performance on PVC is also suboptimal. Several factors may account for the decrease in performance on these ECG subclasses. STTC patterns possess unique challenges such as the variability in their morphology and the large overlap with other cardiac rhythm classes. Additionally, PVC patterns have great inter-patient variability and even inpatient recordings may not follow a consistent temporal pattern. The class imbalance inherent to most rhythm classification datasets and the varying morphology of both normal and abnormal ECG signals further create challenges for representing these difficult ECG subclasses. Future work includes improving morphology-aware feature extraction techniques, leveraging class-balanced datasets, and enhancing temporal-context modeling for automatic ECG rhythm classification.
Reviewer
Sixth, validation on an additional dataset or a cross-dataset evaluation would substantially improve confidence in the generalizability of the findings.
Author replies
Although further external validation on new datasets would strengthen confidence in the overall applicability, the cross-dataset evaluation on existing public datasets is difficult due to varying recording protocols, annotations, label space, and electrode arrangements. For some of the classes used in the multi-label scenario, there is either a strong class imbalance or only very few examples are available. Due to these reasons, we ultimately chose PTB-XL as our main dataset for evaluation. It is by far the largest, best-annotated, multi-label 12-lead ECG dataset available, covering all relevant diagnostic challenges for ECG analysis. Future work includes a more in-depth evaluation using techniques for domain adaptation and broader cross-dataset comparisons on multiple datasets with varying acquisition conditions.
Reviewer
Seventh, the manuscript should explicitly include a Novelties and Contributions section to make clear what is methodologically new beyond an adaptation of TimesNet to ECG data.
Author replies
The novelties and contributions of this work are emphasized in the statement below.
- In this paper, a periodicity-aware TimesNet-based framework is proposed for multi-label 12-lead ECG classification. With this architecture, rhythm-related temporal dependencies and morphology-related ECG characteristics are learned simultaneously.
- To better learn temporal relations for ECG analysis, we adapt the FFT-based approach of temporal period extraction of time series to learn a representation that also captures periodic cardiac dynamics.
- To capture both the short-term morphological variations and long-range temporal relationships in ECG time series, we incorporate multiple scale TimesBlocks with different convolutional kernel sizes.
- The proposed framework is evaluated using a challenging multi-label PTB-XL benchmark with patient-level separation and state-of-the-art standardized evaluation metrics to avoid information leakage and increase evaluation credibility.
- In order to better understand the functionality of the proposed ECG classification method, a component-wise ablation analysis is provided. This analysis investigates the impact of several techniques, namely FFT-based period extraction, multi-scale temporal modelling, positional embeddings, and data augmentation.
- To provide better insight into learning of ECG classification models, our paper includes extensive visualization and interpretability results including ROC analysis, confusion matrices, UMAP and PCA embedding, probability landscapes and gradients to give an understanding of how features in the ECG signal space are separated and how different physiological classes are represented.
Reviewer
Eighth, the inclusion and exclusion criteria, label construction process, SCP-code mapping, and handling of overlapping labels should be described more transparently.
Author replies
The PTB-XL dataset contains multi-label diagnostic annotations in SCP diagnostic code format. For this study we selected a subset of clinically relevant diagnostic categories to build the target classification tasks. Two different experimental settings have been investigated: a three-class setup (NORM, AFIB, PVC) and a five-class setup (NORM, AFIB, MI, PVC, STTC).
The original SCP diagnostic statements from the SCP file were mapped into the target diagnostic categories using clinically related grouping rules. Infract related SCP codes, such as IMI, AMI, ASMI, ALMI, INJAS, and INJAL were grouped into the myocardial infarction (MI) category. Repolarization related SCP codes such as STTC, STD_, and STE_ were grouped into the STTC category. Recording classifications of NORM, AFIB, and PVC annotations were used to map the SCP statements to the target classes.
As PTB-XL is a multi-label dataset, several target labels may be present in one recording at the same time. Following this, we encode target labels as multi-hot vectors and employ a one-vs-rest approach in multi-label learning. We exclude recordings that contain no target labels of the selected categories. In addition, recordings with missing waveform data or with leads other than mono, bipolar or tri-polar were excluded in preprocessing/standardization step.
For visualization purposes, dominant-labeling was performed by selecting the class with the highest probability for each sample, and using this for the generation of confusion matrices and latent space projections. However, all other aspects of training and evaluation were multi-label.
Reviewer
Ninth, reproducibility would be significantly improved by sharing the code and preprocessing pipeline through a public repository such as GitHub and an archival platform such as Zenodo.
Author replies
We have uploaded all codes at GitHub, under the repository https://github.com/manjurkolhar/Timesnet
Reviewer
Tenth, the presentation quality needs major improvement: Table 4 should not be presented as a standalone result, but should be interpreted and commented on directly below the table so that readers can clearly understand the significance of the reported values; Figures 1–6 are of poor visual quality and should be improved substantially, and in fact the quality, readability, and resolution of all figures in the manuscript should be carefully rechecked and revised, including axis labels, legends, font sizes, and contrast.
Author replies
We have update MS with the following text in the discussion section and all figures especially Fig1 to Fig6 have been updated.
Table 5 presents the summary of quantitative performance metrics which were assessed on the test set. The model achieved a mean AUC of 0.9556, demonstrating strong discriminative performance across the three target diagnostic categories. The highest test AUC at the class level reached NORM with a score of 0.9763 whereas AFIB and PVC followed with scores of 0.9596 and 0.9308 respectively. The test-set precision and recall results together with the F1-score measurements validate the effectiveness of the proposed framework. NORM achieved a precision rate of 0.9509 and a recall rate of 0.9801 which led to an F1-score of 0.9653 that demonstrates reliable identification of normal ECG recordings. The AFIB test results showed a precision of 0.7273 and a recall of 0.7635 which produced an F1-score of 0.7449 that showed strong classification abilities but faced difficulties when identifying atrial fibrillation patterns. The PVC model achieved a precision rate of 0.4348 and a recall rate of 0.8602 which produced an F1-score of 0.5776 because the model detected ventricular ectopic events at high rates but made excessive false positive errors for this category. The test set achieved a strong micro-averaged performance which resulted in precision rates of 0.8317 and recall rates of 0.9394 together with an F1-score of 0.8823. The saved evaluation outputs provide direct support for these test metrics.
Reviewer
In addition, all equations should be numbered sequentially and each equation should be explained clearly, one by one, in the main text, because the current mathematical presentation is not sufficiently accessible or precise for readers. The manuscript also needs careful English language editing, as there are several awkward or repetitive sentences that reduce scientific clarity.
Author replies
We have updated the Model architecture, following changes can be had from this section of the updated MS.
Model architecture
The proposed model shown in Figure 2, adopts ECG-specific TimesNet-based framework architecture designed to capture both periodic and morphological patterns in multi-lead ECG signals. The model processes a batch of ECG recordings and transforms the raw signals into a latent representation that enables effective temporal feature learning through stacked TimesBlocks. The input ECG tensor is represented as: where is batch size, represents the number of ECG leads, and denotes the temporal sequence length.
The linear projection layer projects the raw ECG signal at each time step t into a low dimensional feature space.where: : Input vector at time step , : Learnable projection matrix, : Bias term and : Latent embedding dimension (set to 128). The embedded sequence after projection transforms into the following representation as summarized in Table 2.
Table 2. Mathematical symbols, tensor dimensions, and indexing notation used in the proposed ECG-specific TimesNet architecture
|
Symbol |
Meaning |
|
(B) |
Batch size |
|
(C) |
Number of ECG leads/channels |
|
(T) |
Number of temporal samples (time steps) |
|
(D) |
Latent embedding dimension |
|
(L) |
Number of stacked TimesBlocks |
|
(M) |
Number of target ECG classes |
|
(r) |
Index of detected dominant temporal period/frequency |
|
(q) |
Convolution kernel-size index |
|
(m) |
Target class index |
|
(p_r) |
Temporal period corresponding to frequency (f_r) |
|
PTBXL(_{train}) |
PTB-XL training dataset split |
|
PTBXL(_{val}) |
PTB-XL validation dataset split |
|
PTBXL(_{test}) |
PTB-XL independent test dataset split |
|
pos_support |
Number of positive samples belonging to a target class |
|
AUC(_{CI_low}) |
Lower bound of the confidence interval for AUC |
|
AUC(_{CI_high}) |
Upper bound of the confidence interval for AUC |
A learnable positional embedding, , is added to the projected features to ensure that the temporal ordering information is preserved.
Where,
Positional encoding makes the model able to effectively learn from its surrounding information. ECG signals show quasi-periodic patterns which match the timing of heartbeats.
As quasi-periodic cardiac rhythms are characteristic for ECG signals, the model determines dominant temporal periodicities by calculating a Fast Fourier Transform (FFT).
This is followed by a computation of the mean feature representation across latent channels:
The frequency spectrum is then obtained using
The principal frequenciesthat come from the amplitude spectr, and their corresponding temporal periods are computed.
These detected periods guide the temporal reshaping of the feature sequence within the TimesBlock.
The core architecture consists of four stacked TimesBlocks designed to learn representations across different periodic structures. The sequence is divided into period-aligned segments for each detected period .
This transition helps the model to capture various patterns within the individual heartbeats across the multiple cycles. The Inception-style parallel convolution module extracts features from TimesBlock content through its system of multiple Conv1D layers which operate at various kernel sizes . The kernels base their design on temporal dependencies which they extract from different time scales.
The resulting feature maps are concatenated:
Similarly, the output is reshaped back to be in the original temporal format and is then integrated back to the original tensor through a residual connection:
The training process becomes more stable through this residual formulation which enables the system to maintain its existing learned patterns while it adds new temporal features. The final feature representation is obtained through global average pooling which operates on the entire temporal dimension after passing through the four TimesBlocks which are after passing through four stacked TimesBlocks after passing through four stacked TimesBlocks.
The resulting pooled feature vector is then passed to a fully connected classification layer:
The classification parameters are represented by and , while the symbol stands for the sigmoid activation function which is used in multi-label prediction. The model outputs the probabilities for the target ECG classes, which include NORM, AFIB, MI, PVC, and STTC. Binary cross-entropy loss was optimized independently for each class to support one-vs-rest probability estimation within the multi-label framework.
Reviewer
Overall, the study addresses an important topic, but it requires stronger methodological transparency, better experimental justification, clearer statement of contribution, improved visual presentation, explicit discussion of limitations and future work, and more careful reporting before it can be reconsidered.
Author replies
We have updated the MS, as directed by the respected reviewer. I hope now the updated MS is readable.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsA simplified TimesNet-based framework is proposed for the classification of multi-lead ECG signals on the PTB-XL dataset under both three-class and five-class settings corresponding to different cardiac arrhythmias and ECG abnormalities.
Although the paper addresses an interesting problem, the level of novelty appears limited, considering that both FFT-based approaches for ECG analysis and TimesNet architectures have been extensively explored in the literature. Moreover, the manuscript lacks clarity, which makes it difficult to properly assess the contribution. In particular, the analysis of how increasing the number of classes affects feature separability, decision boundaries, and classifier confidence is rather superficial and does not provide substantial new insights.
The manuscript would benefit from significant revision. The motivation of the study and the specific research gap it aims to address are not clearly articulated. Likewise, it is not convincingly demonstrated why the proposed “simplified” approach is preferable to existing methods, including the original TimesNet architecture. A quantitative comparison with relevant works using the PTB-XL dataset is essential to support the claims.
Several statements remain insufficiently justified, such as: “Our results demonstrate the importance of designing models that are aware of the underlying periodicity of the signal and the classes”, “TimesNet could be a valuable tool for the design of scalable, interpretable, and clinically relevant cardiac diagnostic systems”, and “Deep learning algorithms often fail to capture the highly nonlinear and periodic temporal patterns inherent in ECG signals”. These claims should be supported by clearer arguments and empirical evidence.
The Literature Survey section lists several related works, but provides only a limited discussion of the proposed approaches. Moreover, essential details about the datasets used in the cited studies—such as size, class distribution, and experimental setup—are often missing, and the limitations of the reviewed methods are not addressed at all. As a result, it is difficult to determine under which conditions the proposed method outperforms existing approaches. It would also be useful to clarify whether other studies have been conducted using the PTB-XL dataset, and to report the results achieved in those works, including the number of classification tasks addressed.
A summary table describing the PTB-XL dataset is required, including the total number of ECG recordings and the class distribution in the five-class setting. Additionally, the notion of “simplified TimesNet” is repeatedly mentioned but never formally defined, nor is its difference from the original architecture clearly explained.
From a methodological standpoint, the definition should be presented as a data representation rather than an equation, indicating that each preprocessed ECG sample i belongs to a dataset with leads and time samples. The term “SCP codes” (row 138) requires clarification. Terminology such as “system,” “procedure,” and “framework” is used interchangeably and should be standardized throughout the manuscript.
All mathematical symbols and subscripts used in the Model Architecture section must be explicitly defined. In particular, the variable is undefined, and the index is ambiguously used to denote both class labels and kernel size; distinct symbols should be adopted. Figure 2 does not define the variable , and several terms (e.g., “independent test,” “pos_support,” “PTBXL_test”, “auc_ci_PTBXL_test.csv”) are unclear or inconsistently used, including in Table 1. Class labels should be reported explicitly in the first column of the table.
References to preprints (e.g., Reference [21]) should be avoided. All acronyms should be written in extended form at first occurrence. Terminology must be homogenized throughout the manuscript (e.g., “Three-Class” vs. “Three Class,” “GradInput” vs. “GradxInput”, “TimesBlocks” vs. “TimesNet Blocks”, etc.).
Finally, the quality of several figures is insufficient. Figure 7 is scarcely readable and should be improved, for instance by using colored markers to highlight class similarities; similar issues apply to Figures 8 and 9. Several figures (e.g., Figures 10 and 11) lack axis labels and legends.
In summary, the manuscript addresses a topic of interest to the research community; however, it presents substantial shortcomings in terms of originality, clarity of presentation, and methodological rigor. The contribution with respect to the state of the art is not clearly articulated nor adequately supported by quantitative comparisons with existing methods, particularly those based on the PTB-XL dataset. The concept of a “simplified TimesNet” remains insufficiently defined and justified. Additionally, the manuscript suffers from numerous formal inaccuracies, inconsistent terminology, incomplete mathematical definitions, limited dataset description, and poor figure quality.
Overall, the work requires major and thorough revision before it can be considered suitable for publication.
Comments on the Quality of English LanguageThe English could be improved to more clearly express the research.
Author Response
Reviewers
A simplified TimesNet-based framework is proposed for the classification of multi-lead ECG signals on the PTB-XL dataset under both three-class and five-class settings corresponding to different cardiac arrhythmias and ECG abnormalities.
Although the paper addresses an interesting problem, the level of novelty appears limited, considering that both FFT-based approaches for ECG analysis and TimesNet architectures have been extensively explored in the literature. Moreover, the manuscript lacks clarity, which makes it difficult to properly assess the contribution. In particular, the analysis of how increasing the number of classes affects feature separability, decision boundaries, and classifier confidence is rather superficial and does not provide substantial new insights.
Author reply
We thank the reviewer for this important observation and agree with the reviewer’s point of view, therefore we have updated the MS.
Both FFT-based temporal analysis and architectures based on TimesNet have been explored in the literature for time-series learning and ECG analysis. Our work, however, was not with the intention to design an entirely new deep learning architecture from scratch and start exploring its potential. Instead, our goal was to explore if periodicity-aware temporal modeling can be adapted, generalized and interpreted in a clinically relevant multi-label ECG classification setting using PTB-XL.
This contribution clarifies the content of the revised document, specifically by including a new subsection "Novelties and Main Contributions" in the manuscript' introduction. In the manuscript' introduction, the methodological contribution of this work is described in detail: (i) it refers to the methodical adaptation of TimesNet for multi-label ECG classification from 12 leads using deep neural networks, (ii) it concerns the use of FFT-guided periodic temporal decompositions for rhythm-aware representation learning, (iii) it describes the multi-scale TimesBlock architecture that is designed to capture morphology-related and rhythm-related information from ECG signals jointly, and (iv) it addresses the feature separability for increasing classification difficulty from 3-class to 5-class settings.
Reviewers
The manuscript would benefit from significant revision. The motivation of the study and the specific research gap it aims to address are not clearly articulated. Likewise, it is not convincingly demonstrated why the proposed “simplified” approach is preferable to existing methods, including the original TimesNet architecture. A quantitative comparison with relevant works using the PTB-XL dataset is essential to support the claims.
Author reply
We would like to thank the reviewer for the advice. The manuscript was heavily revised in terms of clarifying the motivation for the study, the remaining study gap that the proposed approach can solve and the methodological contribution of the new approach compared to TimesNet. We replaced the term “simplified TimesNet” with “ECG-specific TimesNet-based framework” in all places where it appears in the paper. In addition to that, a “Novelties and Contributions” section was added, and a quantitative comparison table was extended by related PTB-XL-based studies for ECG classification tasks.
In the Introduction section we have added in the
While deep learning based ECG analysis has achieved state of the art results, many existing methods are limited to binary or a small number of classes, and lack an in-depth study on rhythm aware temporal periodicity in multi-label ECG classification. While existing works based on TimesNet demonstrate competitive performance in general time-series learning tasks, their application to periodicity-aware multi-label ECG representation learning have not been thoroughly explored. In particular, as classification tasks increase in complexity, how do networks maintain feature separability, prevent class overlap, and maintain confidence has not been sufficiently studied within the ECG specific TimesNet framework.
The novelties and contributions of this work are emphasized in the statement below.
- In this paper, a periodicity-aware TimesNet-based framework is proposed for multi-label 12-lead ECG classification. With this architecture, rhythm-related temporal dependencies and morphology-related ECG characteristics are learned simultaneously.
- For ECG analysis, we adapted the FFT-based approach of temporal period extraction of time series to learn a representation that also captures periodic cardiac dynamics.
- To capture both the short-term morphological variations and long-range temporal relationships in ECG time series, we incorporated multiple scale TimesBlocks with different convolutional kernel sizes.
- The proposed framework is evaluated using a challenging multi-label PTB-XL benchmark with patient-level separation and state-of-the-art standardized evaluation metrics to avoid information leakage and increase evaluation credibility.
- In order to better understand the functionality of the proposed ECG classification method, a component-wise ablation analysis is provided. This analysis investigated the impact of several techniques, namely FFT-based period extraction, multi-scale temporal modelling, positional embeddings, and data augmentation.
- To provide better insight into learning of ECG classification models, our paper included extensive visualization and interpretability results including ROC analysis, confusion matrices, UMAP and PCA embedding, probability landscapes and gradients to give an understanding of how features in the ECG signal space are separated and how different physiological classes are represented.
Reviewers
Several statements remain insufficiently justified, such as: “Our results demonstrate the importance of designing models that are aware of the underlying periodicity of the signal and the classes”, “TimesNet could be a valuable tool for the design of scalable, interpretable, and clinically relevant cardiac diagnostic systems”, and “Deep learning algorithms often fail to capture the highly nonlinear and periodic temporal patterns inherent in ECG signals”. These claims should be supported by clearer arguments and empirical evidence.
Author reply
We would like to thank the reviewer for this comment. Some statements in the original paper were too general and required further evidence. These statements have been modified and tempered and found throughout the paper.
This is supported particularly by our observations regarding periodicity-aware temporal modelling, which was directly addressed in our ablation analysis showing that removal of the FFT-based period extraction worsens the model's classification performance compared to the full model. Further, our ability to provide insights into interpretability and physiological representation learning is highlighted via UMAP/PCA representations, probability landscapes, and gradient-based attribution maps. Finally, what initially seemed to be a safe claim regarding the limitations of deep learning has been refined to acknowledge that conventional models lack periodicity awareness and ECG representation learning capabilities unless specifically designed with periodicity-aware mechanisms.
We have updated the following section
Results/ ablation / and conclusions. Why the Three-Class Configuration Produced Better Results than the Five-Class Setup: Evidence from Visualization and Model Interpretability, Gradient-Based Model Interpretability and Visualization Analysis and Insights
The Literature Survey section lists several related works, but provides only a limited discussion of the proposed approaches. Moreover, essential details about the datasets used in the cited studies—such as size, class distribution, and experimental setup—are often missing, and the limitations of the reviewed methods are not addressed at all. As a result, it is difficult to determine under which conditions the proposed method outperforms existing approaches. It would also be useful to clarify whether other studies have been conducted using the PTB-XL dataset, and to report the results achieved in those works, including the number of classification tasks addressed.
Authors reply
Literature Survey
Table 5 shows a state-of-the-art methodology from the referenced studies, emphasizing their key approaches and performance metrics. In [22], authors proposed a transformer-based deep learning model called MTDL-Net for the classification of ECG heartbeats. By utilizing the advantages of masked attention embedding in deep learning for extracting the morphological characteristics of ECG signals and a new temporal feature enhancement mechanism for analyzing the heartbeat dynamics, the proposed MTDL-Net effectively learns the ECG spatiotemporal features and therefore achieves a high classification performance. The experimental results demonstrate the efficiency of the proposed approach with the deep learning models yielding an average accuracy of 95.6%, a maximum specificity of 98.7% and a maximum recall of 96.4%. The authors [23] proposed Cascaded Thinning Upscale–Downscale Representation (CTUDR) approach for EEG signal smoothing. The proposed approach initially converts EEG signals to binary images and applies morphological thinning to achieve smoothing. They demonstrate that two-stage upscale–downscale filtering can be realized within a single image representation. Experimental results demonstrate that EEG signals smoothed using the proposed CTUDR approach yields the best smoothing performance as well as improves the performance of the widely adopted EEGNet classifier, where the accuracy and F-measure values are 0.7640 and 0.7607, respectively. In [23] an inter-patient arrhythmia classification approach based on an ensemble machine learning approach that uses two ECG leads to classify five distinct morphological arrhythmias, namely LBBB, RBBB, PVC, PAC and Normal. The proposed approach includes ECG signal processing, feature extraction, feature selection and hyperparameter tuning. The proposed approach achieved an average accuracy of 87%, a sensitivity of 87.4%, a precision of 88.4% and an F1-score of 87% which is better than the existing literature. Authors propose Morphology–Rhythm Contrast (MRC) [24] a contrastive learning framework for multilead ECG representation learning based on morphology and rhythm-based augmentations utilizing a triple-branch network. Linear probing achieves strong performance across the PTB-XL, CPSC and Chapman datasets with average AUROC of 0.9889 and AUPRC of 0.9694, outperforming than random initialization and a range of supervised baselines. The authors [25] propose a novel transformer-based model named Multi-Scale Grid Transformer, (MSGformer) for ECG arrhythmia classification. The proposed model not only combines the strengths of multi-lead spatial feature fusion in ECG signal processing but also the multi-scale grid attention mechanism, to extract and learn the temporal and morphological patterns of ECG signals. The proposed model obtains satisfactory performance with an F1-score of about 0.86 on CPSC-2018 dataset, and 99.28% accuracy, 97.13% sensitivity, 97.87% positive predictive value (PPV) on MIT-BIH dataset, which are all better than those obtained by other state-of-the-art approaches. In [26], they present a hybrid deep learning approach for ECG classification that leverages the merits of 1D-CNN with squeeze-and-excitation attention for adaptive multi-scale feature learning. The proposed model incorporates the loss function of focal loss, L2 regularization and ensemble mixed-precision training. Experimental results achieved 99.48% accuracy on MIT-BIH, 99.83% on PTB and 99.64% on the combined dataset with F1-score reaching up to 1.00. In [27], authors investigated a novel approach of ECG classification using deep learning, specifically by employing Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN). Authors used the MIT-BIH database of recordings for the investigation, which allowed us to train our network to classify ventricular and supraventricular ectopic beats while reaching classification accuracy of 93.63% and 95.57% respectively at a very low sampling frequency of 114 Hz. The deep learning approach allowed for a quick, easy and with good accuracy classification of the ECG signals by using simple features and reducing the data complexity which makes this a very efficient and scalable approach for ECG classification. In [28], the authors proposed Y-Net, a deep learning-based approach for the robust ECG segmentation with the capability of learning single-lead and multi-lead ECG signals. Y-Net is developed with a dual-branch structure and a two-stage training scheme to deal with different rhythm types. In the experiments, the authors evaluated the performance of Y-Net on the LUDB and RDB datasets and the results showed the highest F1 scores as 99.60% and 99.03% within intra-dataset and inter-dataset evaluation respectively. Meanwhile, the AF detection based on morphological features extracted from the segmentation results of Y-Net showed the highest AUC as 0.983, indicating that the proposed method is not only effective but also interpretable and clinically practical. In [29] authors proposed DeepMI, a deep learning based end-to-end framework to detect myocardial infarction (MI) and occurrence-time (Acute, Recent, Old) from 12-lead ECG signals. The DeepMI network adopted multi-level fusion (data, feature, decision), coupled with the advantages of pre-trained networks and recurrent layers, to extract the temporal information from ECG signals. DeepMI was validated in 17,381 patients and achieved the AUROCs of 96.7% (Normal), 82.9% (Acute), 68.6% (Recent), and 73.8% (Old) for multi-lead ECG-based analysis and the temporal classification of MI. in [30], the authors proposed a new machine learning approach designed to diagnose Left Ventricular Hypertrophy (LVH) from multi-lead ECG signals acquired from patients. The clinical features associated with ECG signals were extracted using the continuous wavelet transform, and peak detection of R and S waves in the ECG signal. Different classifiers such as kNN, decision trees, Naive Bayes, support vector machines, and neural networks have been used in this study. Experimental results revealed that neural networks achieved the highest accuracy, which was 97.8% on average. The authors thus propose an efficient cost-effective method to diagnose cardiac problems [30]. In [31], a novel deep learning method that employs a CNN model to extract mixed-scale hierarchical features from ECG signals and a Lead Encoder Attention (LEA) mechanism to fully utilize the morphological and temporal information of ECG is developed to classify multi-lead arrhythmia. Experimental results on MIT-BIH and CCDD datasets confirmed the effectiveness and robustness of the proposed method with the highest accuracy of 99.5% achieved by the proposed method on MIT-BIH and an average accuracy of 88.5% with good cross-dataset generalization. Authors [32] introduced a two-stage deep CNN model for efficient detection of hypertension and corresponding risk assessment of hypertensive patients from the multi-lead ECG signals. In the proposed work, the first stage is dedicated to detecting hypertension from the given ECG signal, while the second stage is responsible for risk assessment of detected hypertensive patients and classification of low and high risk patients. Extensive experiments with benchmark public datasets yielded remarkable performances, of 99.68% and 90.98% accuracy in detection and risk assessment stages, respectively. The excellent performances make the proposed approach eligible for clinical practice that can be efficiently moved to a cloud infrastructure. In [33] show that deep learning models can achieve high accuracy in ECG analysis. However, those models usually need a large amount of labeled data to train, which is expensive and hard to obtain. In this work, they proposed a self-supervised generative pretraining approach called ECG-MAE. They use the model to learn the spatiotemporal representation of ECG by training it to restore masked regions of 12-lead ECG over time and leads. When fine-tuning the model for multi-label classification tasks, it significantly outperforms previous models and achieves a higher macro-AUC (0.9474) with higher label efficiency, better performance on the test sets and more capability to diagnose rare cardiac conditions. In [35], they designed and evaluated a one-dimensional deep densely connected neural network (DDNN) to classify atrial fibrillation (AF) using 12-lead ECG signals. Experiments on 16,557 recordings yield an accuracy of 99.35%, a sensitivity of 99.19% and a specificity of 99.44%, showing robust and high-performance classification. Their model holds great promise for clinical diagnostics and for use in wearable screening devices for AF.
Table 5 Comparison of the Proposed Framework with Representative PTB-XL and ECG Classification Approaches.
|
Ref |
Study / Model |
Dataset / ECG Type |
Classification Setting |
Methodology |
Key Contribution |
Reported Metric |
Performance |
|
[22] |
MTDL-Net (Han et al., 2023) |
Multi-lead ECG |
Multi-class |
Transformer + temporal enhancement |
Joint morphological and temporal feature learning |
Accuracy, Recall |
Acc: 95.6%, Recall: 96.4% |
|
[24] |
ML Arrhythmia Classification (Zakaria et al., 2024) |
2-lead ECG |
Arrhythmia classification |
Ensemble ML + feature engineering |
Inter-patient morphological classification |
Accuracy, F1-score |
Acc: 87%, F1: 87% |
|
[25] |
Morphology–Rhythm Contrast (Liu et al., 2024) |
Multi-lead ECG |
Multi-label |
Triple-branch contrastive learning |
Joint morphology–rhythm representation learning |
AUROC, AUPRC |
AUROC: 0.9889, AUPRC: 0.9694 |
|
[26] |
MSGformer (Ji et al., 2024) |
Multi-lead ECG |
Multi-class |
Multi-scale transformer + grid attention |
Multi-scale spatial-temporal feature extraction |
Accuracy, F1-score |
Acc: 99.28%, F1 ≈ 0.86 |
|
[27] |
1D-CNN + Attention (Guhdar et al., 2025) |
Multi-lead ECG |
Multi-class |
CNN + attention + focal loss |
Attention-based ECG representation learning |
Accuracy, F1-score |
Acc: 99.48–99.83%, F1 ≈ 1.00 |
|
[28] |
Deep Learning ECG (Mathews et al., 2018) |
Single-lead ECG |
Beat classification |
RBM + DBN |
Early deep learning ECG classification |
Accuracy |
Acc: 93.63–95.57% |
|
[29] |
Y-Net-ECG (Liu et al., 2025) |
Multi-lead ECG |
AF detection |
Dual-branch segmentation framework |
Interpretable ECG segmentation and AF detection |
F1-score, AUC |
F1: 99.60%, AUC: 0.983 |
|
[30] |
DeepMI (Tadesse et al., 2021) |
12-lead ECG |
MI classification |
Multi-level fusion + RNN |
Temporal myocardial infarction analysis |
AUROC |
AUROC: 96.7% |
|
[31] |
LVH Detection (Jothiramalingam et al., 2021) |
Multi-lead ECG |
Clinical diagnosis |
ML + wavelet + classifiers |
Clinical feature-based cardiac diagnosis |
Accuracy |
Acc: 97.8% |
|
[32] |
Multi-scale CNN (Zhou & Fang, 2025) |
Multi-lead ECG |
Multi-class |
Hierarchical CNN + LEA attention |
Multi-scale morphology-aware learning |
Accuracy |
Acc: 99.5% |
|
[33] |
Two-stage CNN (Jain et al., 2020) |
Multi-lead ECG |
Hypertension risk classification |
CNN-based risk stratification |
ECG-based hypertension risk assessment |
Accuracy |
Acc: 99.68% |
|
[34] |
ECG-MAE (Hu et al., 2023) |
PTB-XL (12-lead ECG) |
Multi-label |
Self-supervised masked autoencoder |
Label-efficient ECG representation learning |
Macro-AUC |
Macro-AUC: 0.9474 |
|
[35] |
DDNN AF Detection (Cai et al., 2020) |
12-lead ECG |
AF classification |
Dense convolutional neural network |
Large-scale AF classification |
Accuracy, Sensitivity |
Acc: 99.35%, Sens: 99.19% |
|
Proposed |
Proposed TimesNet-Based Framework |
PTB-XL (12-lead ECG) |
Multi-label |
FFT-based periodicity modeling + multi-scale TimesBlocks |
Joint rhythm-aware and morphology-aware temporal representation learning |
Mean AUC, |
Mean AUC: 0.96, |
Table 5 compares state-of-the-art approaches to ECG classification and situates the proposed TimeNet-based framework in the spectrum of deep learning and machine learning techniques. The methods developed in prior ECG classification works all employ deep learning and machine learning techniques, including transformer networks (e.g., MSGformer, Table 5), CNNs (e.g., hierarchical architecture in Table 5), RNNs (e.g., Table 5), and self-supervised learning (e.g., ECG-MAE, Table 5). Many of them have reached state-of-the-art classification performance on various datasets, in many cases even for challenging multi-class classification tasks. For example, very high classification accuracies have recently been reported for binary classification tasks as well as for classification tasks with limited numbers of labels (few-shot learning) using multi-scale temporal and morphological feature extraction. For learning effective ECG representations, self-supervised pretraining on large-scale datasets has also been shown to be very effective for efficient classification with limited labeled data on the challenging PTB-XL dataset.
Although significant progress has been achieved for various ECG classification tasks, there is still a major barrier for direct comparison between different studies, resulting from huge differences in several aspects, such as data construction, lead settings, class definitions, data preprocessing and augmentation strategies, evaluation metrics, and classification settings. Some studies focused on binary arrhythmia detection or atrial fibrillation screening, while others tackled specific clinical scenarios such as distinguishing healthy subjects from those with serious cardiac diseases. In this work, we aimed to tackle a more challenging multi-label classification task, and employed the large-scale PTB-XL database with patient-level separation for comparison quality. Unlike existing methods, our method incorporates periodicity-aware temporal modeling to leverage time-sensitive features from ECG signals within the TimesNet framework. Our approach, built upon FFT-based period extraction and multi-scale TimesBlocks, effectively captures both rhythm-related temporal relationships and morphology-related waveform features within multi-lead ECG recordings. Through an ablation study, we demonstrate the effectiveness of frequency-aware temporal decomposition and multi-scale feature learning for ECG classification. While high classification accuracies have been reported in recent works under constrained binary or limited multi-class settings, our approach achieves competitive performance on a more challenging multi-label classification task on ECG recordings that better reflect clinical scenarios. Beyond classification performance, our framework incorporate periodicity-aware temporal structure learning into deep models and enables better interpretation for ECG-based health monitoring via both latent-space and gradient-based analysis methods. While deep ECG classification methods have achieved state-of-the-art results on several tasks in electrocardiogram classification, current methods have several limitations. First, existing studies typically evaluate on binary or a small number of multi-class tasks, on relatively small datasets, or on very specific clinical questions. Additionally, the lead configurations, preprocessing steps, class definitions, and evaluation methodologies often vary between studies, hindering comparison. More importantly, most current methods do not provide insights into feature importance, provide insufficient evidence of separability in the learned latent space, and lack an assessment of generalization ability in the more challenging multi-label ECG classification setting exemplified by the PTB-XL dataset.
Reviewers
A summary table describing the PTB-XL dataset is required, including the total number of ECG recordings and the class distribution in the five-class setting. Additionally, the notion of “simplified TimesNet” is repeatedly mentioned but never formally defined, nor is its difference from the original architecture clearly explained.
Authors reply
We have updated the dataset section with the below table in the updated MS
Table 1 Summary of the PTB-XL dataset characteristics and the customized five-class ECG classification setting used in this study.
|
Category |
Description |
|
Dataset Name |
PTB-XL Electrocardiography Dataset |
|
Source |
PhysioNet / PTB-XL |
|
Total ECG Recordings |
21,837 clinical ECG recordings |
|
Number of Patients |
18,885 patients |
|
ECG Type |
12-lead diagnostic ECG |
|
Recording Duration |
10-second ECG recordings |
|
Sampling Frequency |
500 Hz (100 Hz down sampled version also available) |
|
Annotation Type |
Expert cardiologist annotations based on SCP-ECG statements |
|
Classification Setting Used in This Study |
Customized five-class setting |
|
Selected Classes |
NORM, AFIB, MI, PVC, STTC |
|
Label Nature |
Multi-label diagnostic and arrhythmia annotations |
|
Preprocessing |
Signal normalization and label-specific grouping applied before training |
|
Dataset Characteristics |
Large-scale heterogeneous ECG dataset |
Reviewers
From a methodological standpoint, the definition should be presented as a data representation rather than an equation, indicating that each preprocessed ECG sample i belongs to a dataset with leads and time samples. The term “SCP codes” (row 138) requires clarification. Terminology such as “system,” “procedure,” and “framework” is used interchangeably and should be standardized throughout the manuscript.
Author’s reply
We have updated the entire section of modeling and architecture which describes these information.
Model architecture
The proposed model shown in Figure 2, adopts ECG-specific TimesNet-based framework architecture designed to capture both periodic and morphological patterns in multi-lead ECG signals. The model processes a batch of ECG recordings and transforms the raw signals into a latent representation that enables effective temporal feature learning through stacked TimesBlocks. The input ECG tensor is represented as: where is batch size, represents the number of ECG leads, and denotes the temporal sequence length.
The linear projection layer projects the raw ECG signal at each time step t into a low dimensional feature space.where: : Input vector at time step , : Learnable projection matrix, : Bias term and : Latent embedding dimension (set to 128). The embedded sequence after projection transforms into the following representation as summarized in Table 2.
|
Symbol |
Meaning |
|
(B) |
Batch size |
|
(C) |
Number of ECG leads/channels |
|
(T) |
Number of temporal samples (time steps) |
|
(D) |
Latent embedding dimension |
|
(L) |
Number of stacked TimesBlocks |
|
(M) |
Number of target ECG classes |
|
(r) |
Index of detected dominant temporal period/frequency |
|
(q) |
Convolution kernel-size index |
|
(m) |
Target class index |
|
(p_r) |
Temporal period corresponding to frequency (f_r) |
|
PTBXL(_{train}) |
PTB-XL training dataset split |
|
PTBXL(_{val}) |
PTB-XL validation dataset split |
|
PTBXL(_{test}) |
PTB-XL independent test dataset split |
|
pos_support |
Number of positive samples belonging to a target class |
|
AUC(_{CI_low}) |
Lower bound of the confidence interval for AUC |
|
AUC(_{CI_high}) |
Upper bound of the confidence interval for AUC |
Table 2. Mathematical symbols, tensor dimensions, and indexing notation used in the proposed ECG-specific TimesNet architecture
A learnable positional embedding, , is added to the projected features to ensure that the temporal ordering information is preserved.
Where,
Positional encoding makes the model able to effectively learn from its surrounding information. ECG signals show quasi-periodic patterns which match the timing of heartbeats.
As quasi-periodic cardiac rhythms are characteristic for ECG signals, the model determines dominant temporal periodicities by calculating a Fast Fourier Transform (FFT).
This is followed by a computation of the mean feature representation across latent channels:
The frequency spectrum is then obtained using
The principal frequenciesthat come from the amplitude spectr, and their corresponding temporal periods are computed.
These detected periods guide the temporal reshaping of the feature sequence within the TimesBlock.
The core architecture consists of four stacked TimesBlocks designed to learn representations across different periodic structures. The sequence is divided into period-aligned segments for each detected period .
This transition helps the model to capture various patterns within the individual heartbeats across the multiple cycles. The Inception-style parallel convolution module extracts features from TimesBlock content through its system of multiple Conv1D layers which operate at various kernel sizes . The kernels base their design on temporal dependencies which they extract from different time scales.
The resulting feature maps are concatenated:
Similarly, the output is reshaped back to be in the original temporal format and is then integrated back to the original tensor through a residual connection:
The training process becomes more stable through this residual formulation which enables the system to maintain its existing learned patterns while it adds new temporal features. The final feature representation is obtained through global average pooling which operates on the entire temporal dimension after passing through the four TimesBlocks which are after passing through four stacked TimesBlocks after passing through four stacked TimesBlocks.
The resulting pooled feature vector is then passed to a fully connected classification layer:
The classification parameters are represented by and , while the symbol stands for the sigmoid activation function which is used in multi-label prediction. The model outputs the probabilities for the target ECG classes, which include NORM, AFIB, MI, PVC, and STTC. Binary cross-entropy loss was optimized independently for each class to support one-vs-rest probability estimation within the multi-label framework.
Reviewers
All mathematical symbols and subscripts used in the Model Architecture section must be explicitly defined. In particular, the variable is undefined, and the index is ambiguously used to denote both class labels and kernel size; distinct symbols should be adopted. Figure 2 does not define the variable , and several terms (e.g., “independent test,” “pos_support,” “PTBXL_test”, “auc_ci_PTBXL_test.csv”) are unclear or inconsistently used, including in Table 1. Class labels should be reported explicitly in the first column of the table.
Author’s reply
Now the updated, we have added definition table where user can get instant definition before reader starts reading the equations.
able 2. Mathematical symbols, tensor dimensions, and indexing notation used in the proposed ECG-specific TimesNet architecture
|
Symbol |
Meaning |
|
(B) |
Batch size |
|
(C) |
Number of ECG leads/channels |
|
(T) |
Number of temporal samples (time steps) |
|
(D) |
Latent embedding dimension |
|
(L) |
Number of stacked TimesBlocks |
|
(M) |
Number of target ECG classes |
|
(r) |
Index of detected dominant temporal period/frequency |
|
(q) |
Convolution kernel-size index |
|
(m) |
Target class index |
|
(p_r) |
Temporal period corresponding to frequency (f_r) |
|
PTBXL(_{train}) |
PTB-XL training dataset split |
|
PTBXL(_{val}) |
PTB-XL validation dataset split |
|
PTBXL(_{test}) |
PTB-XL independent test dataset split |
|
pos_support |
Number of positive samples belonging to a target class |
|
AUC(_{CI_low}) |
Lower bound of the confidence interval for AUC |
|
AUC(_{CI_high}) |
Upper bound of the confidence interval for AUC |
T
Reviewers
References to preprints (e.g., Reference [21]) should be avoided. All acronyms should be written in extended form at first occurrence. Terminology must be homogenized throughout the manuscript (e.g., “Three-Class” vs. “Three Class,” “GradInput” vs. “GradxInput”, “TimesBlocks” vs. “TimesNet Blocks”, etc.).
Author’s reply
I respect the reviewer’s comment on the use of preprint references. Although the work in question is a preprint reference, it is the most influential work so far in the time-series deep learning field. In addition to this, the work has been referenced by many peer-reviewed publications from major publishers, such as IEEE, Springer, Elsevier, and MDPI. The work has also received scholarly recognition, with over 3,800 citations as of the manuscript’s revision. The reference is therefore included to give a proper methodological background for developing and adapting TimesNet framework for time-series data.
In addition, the terminology was systematically standardized across the revised manuscript to improve consistency and readability. Specifically, naming inconsistencies such as “Three-Class” versus “Three Class,” “GradInput” versus “Grad×Input,” and “TimesBlocks” versus “TimesNet Blocks” were homogenized throughout the text, figures, captions, and tables.
Reviewers
Finally, the quality of several figures is insufficient. Figure 7 is scarcely readable and should be improved, for instance by using colored markers to highlight class similarities; similar issues apply to Figures 8 and 9. Several figures (e.g., Figures 10 and 11) lack axis labels and legends.
Reviewers
A simplified TimesNet-based framework is proposed for the classification of multi-lead ECG signals on the PTB-XL dataset under both three-class and five-class settings corresponding to different cardiac arrhythmias and ECG abnormalities.
Although the paper addresses an interesting problem, the level of novelty appears limited, considering that both FFT-based approaches for ECG analysis and TimesNet architectures have been extensively explored in the literature. Moreover, the manuscript lacks clarity, which makes it difficult to properly assess the contribution. In particular, the analysis of how increasing the number of classes affects feature separability, decision boundaries, and classifier confidence is rather superficial and does not provide substantial new insights.
Author reply
We thank the reviewer for this important observation and agree with the reviewer’s point of view, therefore we have updated the MS.
Both FFT-based temporal analysis and architectures based on TimesNet have been explored in the literature for time-series learning and ECG analysis. Our work, however, was not with the intention to design an entirely new deep learning architecture from scratch and start exploring its potential. Instead, our goal was to explore if periodicity-aware temporal modeling can be adapted, generalized and interpreted in a clinically relevant multi-label ECG classification setting using PTB-XL.
This contribution clarifies the content of the revised document, specifically by including a new subsection "Novelties and Main Contributions" in the manuscript' introduction. In the manuscript' introduction, the methodological contribution of this work is described in detail: (i) it refers to the methodical adaptation of TimesNet for multi-label ECG classification from 12 leads using deep neural networks, (ii) it concerns the use of FFT-guided periodic temporal decompositions for rhythm-aware representation learning, (iii) it describes the multi-scale TimesBlock architecture that is designed to capture morphology-related and rhythm-related information from ECG signals jointly, and (iv) it addresses the feature separability for increasing classification difficulty from 3-class to 5-class settings.
Reviewers
The manuscript would benefit from significant revision. The motivation of the study and the specific research gap it aims to address are not clearly articulated. Likewise, it is not convincingly demonstrated why the proposed “simplified” approach is preferable to existing methods, including the original TimesNet architecture. A quantitative comparison with relevant works using the PTB-XL dataset is essential to support the claims.
Author reply
We would like to thank the reviewer for the advice. The manuscript was heavily revised in terms of clarifying the motivation for the study, the remaining study gap that the proposed approach can solve and the methodological contribution of the new approach compared to TimesNet. We replaced the term “simplified TimesNet” with “ECG-specific TimesNet-based framework” in all places where it appears in the paper. In addition to that, a “Novelties and Contributions” section was added, and a quantitative comparison table was extended by related PTB-XL-based studies for ECG classification tasks.
In the Introduction section we have added in the
While deep learning based ECG analysis has achieved state of the art results, many existing methods are limited to binary or a small number of classes, and lack an in-depth study on rhythm aware temporal periodicity in multi-label ECG classification. While existing works based on TimesNet demonstrate competitive performance in general time-series learning tasks, their application to periodicity-aware multi-label ECG representation learning have not been thoroughly explored. In particular, as classification tasks increase in complexity, how do networks maintain feature separability, prevent class overlap, and maintain confidence has not been sufficiently studied within the ECG specific TimesNet framework.
The novelties and contributions of this work are emphasized in the statement below.
- In this paper, a periodicity-aware TimesNet-based framework is proposed for multi-label 12-lead ECG classification. With this architecture, rhythm-related temporal dependencies and morphology-related ECG characteristics are learned simultaneously.
- For ECG analysis, we adapted the FFT-based approach of temporal period extraction of time series to learn a representation that also captures periodic cardiac dynamics.
- To capture both the short-term morphological variations and long-range temporal relationships in ECG time series, we incorporated multiple scale TimesBlocks with different convolutional kernel sizes.
- The proposed framework is evaluated using a challenging multi-label PTB-XL benchmark with patient-level separation and state-of-the-art standardized evaluation metrics to avoid information leakage and increase evaluation credibility.
- In order to better understand the functionality of the proposed ECG classification method, a component-wise ablation analysis is provided. This analysis investigated the impact of several techniques, namely FFT-based period extraction, multi-scale temporal modelling, positional embeddings, and data augmentation.
- To provide better insight into learning of ECG classification models, our paper included extensive visualization and interpretability results including ROC analysis, confusion matrices, UMAP and PCA embedding, probability landscapes and gradients to give an understanding of how features in the ECG signal space are separated and how different physiological classes are represented.
Reviewers
Several statements remain insufficiently justified, such as: “Our results demonstrate the importance of designing models that are aware of the underlying periodicity of the signal and the classes”, “TimesNet could be a valuable tool for the design of scalable, interpretable, and clinically relevant cardiac diagnostic systems”, and “Deep learning algorithms often fail to capture the highly nonlinear and periodic temporal patterns inherent in ECG signals”. These claims should be supported by clearer arguments and empirical evidence.
Author reply
We would like to thank the reviewer for this comment. Some statements in the original paper were too general and required further evidence. These statements have been modified and tempered and found throughout the paper.
This is supported particularly by our observations regarding periodicity-aware temporal modelling, which was directly addressed in our ablation analysis showing that removal of the FFT-based period extraction worsens the model's classification performance compared to the full model. Further, our ability to provide insights into interpretability and physiological representation learning is highlighted via UMAP/PCA representations, probability landscapes, and gradient-based attribution maps. Finally, what initially seemed to be a safe claim regarding the limitations of deep learning has been refined to acknowledge that conventional models lack periodicity awareness and ECG representation learning capabilities unless specifically designed with periodicity-aware mechanisms.
We have updated the following section
Results/ ablation / and conclusions. Why the Three-Class Configuration Produced Better Results than the Five-Class Setup: Evidence from Visualization and Model Interpretability, Gradient-Based Model Interpretability and Visualization Analysis and Insights
The Literature Survey section lists several related works, but provides only a limited discussion of the proposed approaches. Moreover, essential details about the datasets used in the cited studies—such as size, class distribution, and experimental setup—are often missing, and the limitations of the reviewed methods are not addressed at all. As a result, it is difficult to determine under which conditions the proposed method outperforms existing approaches. It would also be useful to clarify whether other studies have been conducted using the PTB-XL dataset, and to report the results achieved in those works, including the number of classification tasks addressed.
Authors reply
Literature Survey
Table 5 shows a state-of-the-art methodology from the referenced studies, emphasizing their key approaches and performance metrics. In [22], authors proposed a transformer-based deep learning model called MTDL-Net for the classification of ECG heartbeats. By utilizing the advantages of masked attention embedding in deep learning for extracting the morphological characteristics of ECG signals and a new temporal feature enhancement mechanism for analyzing the heartbeat dynamics, the proposed MTDL-Net effectively learns the ECG spatiotemporal features and therefore achieves a high classification performance. The experimental results demonstrate the efficiency of the proposed approach with the deep learning models yielding an average accuracy of 95.6%, a maximum specificity of 98.7% and a maximum recall of 96.4%. The authors [23] proposed Cascaded Thinning Upscale–Downscale Representation (CTUDR) approach for EEG signal smoothing. The proposed approach initially converts EEG signals to binary images and applies morphological thinning to achieve smoothing. They demonstrate that two-stage upscale–downscale filtering can be realized within a single image representation. Experimental results demonstrate that EEG signals smoothed using the proposed CTUDR approach yields the best smoothing performance as well as improves the performance of the widely adopted EEGNet classifier, where the accuracy and F-measure values are 0.7640 and 0.7607, respectively. In [23] an inter-patient arrhythmia classification approach based on an ensemble machine learning approach that uses two ECG leads to classify five distinct morphological arrhythmias, namely LBBB, RBBB, PVC, PAC and Normal. The proposed approach includes ECG signal processing, feature extraction, feature selection and hyperparameter tuning. The proposed approach achieved an average accuracy of 87%, a sensitivity of 87.4%, a precision of 88.4% and an F1-score of 87% which is better than the existing literature. Authors propose Morphology–Rhythm Contrast (MRC) [24] a contrastive learning framework for multilead ECG representation learning based on morphology and rhythm-based augmentations utilizing a triple-branch network. Linear probing achieves strong performance across the PTB-XL, CPSC and Chapman datasets with average AUROC of 0.9889 and AUPRC of 0.9694, outperforming than random initialization and a range of supervised baselines. The authors [25] propose a novel transformer-based model named Multi-Scale Grid Transformer, (MSGformer) for ECG arrhythmia classification. The proposed model not only combines the strengths of multi-lead spatial feature fusion in ECG signal processing but also the multi-scale grid attention mechanism, to extract and learn the temporal and morphological patterns of ECG signals. The proposed model obtains satisfactory performance with an F1-score of about 0.86 on CPSC-2018 dataset, and 99.28% accuracy, 97.13% sensitivity, 97.87% positive predictive value (PPV) on MIT-BIH dataset, which are all better than those obtained by other state-of-the-art approaches. In [26], they present a hybrid deep learning approach for ECG classification that leverages the merits of 1D-CNN with squeeze-and-excitation attention for adaptive multi-scale feature learning. The proposed model incorporates the loss function of focal loss, L2 regularization and ensemble mixed-precision training. Experimental results achieved 99.48% accuracy on MIT-BIH, 99.83% on PTB and 99.64% on the combined dataset with F1-score reaching up to 1.00. In [27], authors investigated a novel approach of ECG classification using deep learning, specifically by employing Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN). Authors used the MIT-BIH database of recordings for the investigation, which allowed us to train our network to classify ventricular and supraventricular ectopic beats while reaching classification accuracy of 93.63% and 95.57% respectively at a very low sampling frequency of 114 Hz. The deep learning approach allowed for a quick, easy and with good accuracy classification of the ECG signals by using simple features and reducing the data complexity which makes this a very efficient and scalable approach for ECG classification. In [28], the authors proposed Y-Net, a deep learning-based approach for the robust ECG segmentation with the capability of learning single-lead and multi-lead ECG signals. Y-Net is developed with a dual-branch structure and a two-stage training scheme to deal with different rhythm types. In the experiments, the authors evaluated the performance of Y-Net on the LUDB and RDB datasets and the results showed the highest F1 scores as 99.60% and 99.03% within intra-dataset and inter-dataset evaluation respectively. Meanwhile, the AF detection based on morphological features extracted from the segmentation results of Y-Net showed the highest AUC as 0.983, indicating that the proposed method is not only effective but also interpretable and clinically practical. In [29] authors proposed DeepMI, a deep learning based end-to-end framework to detect myocardial infarction (MI) and occurrence-time (Acute, Recent, Old) from 12-lead ECG signals. The DeepMI network adopted multi-level fusion (data, feature, decision), coupled with the advantages of pre-trained networks and recurrent layers, to extract the temporal information from ECG signals. DeepMI was validated in 17,381 patients and achieved the AUROCs of 96.7% (Normal), 82.9% (Acute), 68.6% (Recent), and 73.8% (Old) for multi-lead ECG-based analysis and the temporal classification of MI. in [30], the authors proposed a new machine learning approach designed to diagnose Left Ventricular Hypertrophy (LVH) from multi-lead ECG signals acquired from patients. The clinical features associated with ECG signals were extracted using the continuous wavelet transform, and peak detection of R and S waves in the ECG signal. Different classifiers such as kNN, decision trees, Naive Bayes, support vector machines, and neural networks have been used in this study. Experimental results revealed that neural networks achieved the highest accuracy, which was 97.8% on average. The authors thus propose an efficient cost-effective method to diagnose cardiac problems [30]. In [31], a novel deep learning method that employs a CNN model to extract mixed-scale hierarchical features from ECG signals and a Lead Encoder Attention (LEA) mechanism to fully utilize the morphological and temporal information of ECG is developed to classify multi-lead arrhythmia. Experimental results on MIT-BIH and CCDD datasets confirmed the effectiveness and robustness of the proposed method with the highest accuracy of 99.5% achieved by the proposed method on MIT-BIH and an average accuracy of 88.5% with good cross-dataset generalization. Authors [32] introduced a two-stage deep CNN model for efficient detection of hypertension and corresponding risk assessment of hypertensive patients from the multi-lead ECG signals. In the proposed work, the first stage is dedicated to detecting hypertension from the given ECG signal, while the second stage is responsible for risk assessment of detected hypertensive patients and classification of low and high risk patients. Extensive experiments with benchmark public datasets yielded remarkable performances, of 99.68% and 90.98% accuracy in detection and risk assessment stages, respectively. The excellent performances make the proposed approach eligible for clinical practice that can be efficiently moved to a cloud infrastructure. In [33] show that deep learning models can achieve high accuracy in ECG analysis. However, those models usually need a large amount of labeled data to train, which is expensive and hard to obtain. In this work, they proposed a self-supervised generative pretraining approach called ECG-MAE. They use the model to learn the spatiotemporal representation of ECG by training it to restore masked regions of 12-lead ECG over time and leads. When fine-tuning the model for multi-label classification tasks, it significantly outperforms previous models and achieves a higher macro-AUC (0.9474) with higher label efficiency, better performance on the test sets and more capability to diagnose rare cardiac conditions. In [35], they designed and evaluated a one-dimensional deep densely connected neural network (DDNN) to classify atrial fibrillation (AF) using 12-lead ECG signals. Experiments on 16,557 recordings yield an accuracy of 99.35%, a sensitivity of 99.19% and a specificity of 99.44%, showing robust and high-performance classification. Their model holds great promise for clinical diagnostics and for use in wearable screening devices for AF.
Table 5 Comparison of the Proposed Framework with Representative PTB-XL and ECG Classification Approaches.
|
Ref |
Study / Model |
Dataset / ECG Type |
Classification Setting |
Methodology |
Key Contribution |
Reported Metric |
Performance |
|
[22] |
MTDL-Net (Han et al., 2023) |
Multi-lead ECG |
Multi-class |
Transformer + temporal enhancement |
Joint morphological and temporal feature learning |
Accuracy, Recall |
Acc: 95.6%, Recall: 96.4% |
|
[24] |
ML Arrhythmia Classification (Zakaria et al., 2024) |
2-lead ECG |
Arrhythmia classification |
Ensemble ML + feature engineering |
Inter-patient morphological classification |
Accuracy, F1-score |
Acc: 87%, F1: 87% |
|
[25] |
Morphology–Rhythm Contrast (Liu et al., 2024) |
Multi-lead ECG |
Multi-label |
Triple-branch contrastive learning |
Joint morphology–rhythm representation learning |
AUROC, AUPRC |
AUROC: 0.9889, AUPRC: 0.9694 |
|
[26] |
MSGformer (Ji et al., 2024) |
Multi-lead ECG |
Multi-class |
Multi-scale transformer + grid attention |
Multi-scale spatial-temporal feature extraction |
Accuracy, F1-score |
Acc: 99.28%, F1 ≈ 0.86 |
|
[27] |
1D-CNN + Attention (Guhdar et al., 2025) |
Multi-lead ECG |
Multi-class |
CNN + attention + focal loss |
Attention-based ECG representation learning |
Accuracy, F1-score |
Acc: 99.48–99.83%, F1 ≈ 1.00 |
|
[28] |
Deep Learning ECG (Mathews et al., 2018) |
Single-lead ECG |
Beat classification |
RBM + DBN |
Early deep learning ECG classification |
Accuracy |
Acc: 93.63–95.57% |
|
[29] |
Y-Net-ECG (Liu et al., 2025) |
Multi-lead ECG |
AF detection |
Dual-branch segmentation framework |
Interpretable ECG segmentation and AF detection |
F1-score, AUC |
F1: 99.60%, AUC: 0.983 |
|
[30] |
DeepMI (Tadesse et al., 2021) |
12-lead ECG |
MI classification |
Multi-level fusion + RNN |
Temporal myocardial infarction analysis |
AUROC |
AUROC: 96.7% |
|
[31] |
LVH Detection (Jothiramalingam et al., 2021) |
Multi-lead ECG |
Clinical diagnosis |
ML + wavelet + classifiers |
Clinical feature-based cardiac diagnosis |
Accuracy |
Acc: 97.8% |
|
[32] |
Multi-scale CNN (Zhou & Fang, 2025) |
Multi-lead ECG |
Multi-class |
Hierarchical CNN + LEA attention |
Multi-scale morphology-aware learning |
Accuracy |
Acc: 99.5% |
|
[33] |
Two-stage CNN (Jain et al., 2020) |
Multi-lead ECG |
Hypertension risk classification |
CNN-based risk stratification |
ECG-based hypertension risk assessment |
Accuracy |
Acc: 99.68% |
|
[34] |
ECG-MAE (Hu et al., 2023) |
PTB-XL (12-lead ECG) |
Multi-label |
Self-supervised masked autoencoder |
Label-efficient ECG representation learning |
Macro-AUC |
Macro-AUC: 0.9474 |
|
[35] |
DDNN AF Detection (Cai et al., 2020) |
12-lead ECG |
AF classification |
Dense convolutional neural network |
Large-scale AF classification |
Accuracy, Sensitivity |
Acc: 99.35%, Sens: 99.19% |
|
Proposed |
Proposed TimesNet-Based Framework |
PTB-XL (12-lead ECG) |
Multi-label |
FFT-based periodicity modeling + multi-scale TimesBlocks |
Joint rhythm-aware and morphology-aware temporal representation learning |
Mean AUC, |
Mean AUC: 0.96, |
Table 5 compares state-of-the-art approaches to ECG classification and situates the proposed TimeNet-based framework in the spectrum of deep learning and machine learning techniques. The methods developed in prior ECG classification works all employ deep learning and machine learning techniques, including transformer networks (e.g., MSGformer, Table 5), CNNs (e.g., hierarchical architecture in Table 5), RNNs (e.g., Table 5), and self-supervised learning (e.g., ECG-MAE, Table 5). Many of them have reached state-of-the-art classification performance on various datasets, in many cases even for challenging multi-class classification tasks. For example, very high classification accuracies have recently been reported for binary classification tasks as well as for classification tasks with limited numbers of labels (few-shot learning) using multi-scale temporal and morphological feature extraction. For learning effective ECG representations, self-supervised pretraining on large-scale datasets has also been shown to be very effective for efficient classification with limited labeled data on the challenging PTB-XL dataset.
Although significant progress has been achieved for various ECG classification tasks, there is still a major barrier for direct comparison between different studies, resulting from huge differences in several aspects, such as data construction, lead settings, class definitions, data preprocessing and augmentation strategies, evaluation metrics, and classification settings. Some studies focused on binary arrhythmia detection or atrial fibrillation screening, while others tackled specific clinical scenarios such as distinguishing healthy subjects from those with serious cardiac diseases. In this work, we aimed to tackle a more challenging multi-label classification task, and employed the large-scale PTB-XL database with patient-level separation for comparison quality. Unlike existing methods, our method incorporates periodicity-aware temporal modeling to leverage time-sensitive features from ECG signals within the TimesNet framework. Our approach, built upon FFT-based period extraction and multi-scale TimesBlocks, effectively captures both rhythm-related temporal relationships and morphology-related waveform features within multi-lead ECG recordings. Through an ablation study, we demonstrate the effectiveness of frequency-aware temporal decomposition and multi-scale feature learning for ECG classification. While high classification accuracies have been reported in recent works under constrained binary or limited multi-class settings, our approach achieves competitive performance on a more challenging multi-label classification task on ECG recordings that better reflect clinical scenarios. Beyond classification performance, our framework incorporate periodicity-aware temporal structure learning into deep models and enables better interpretation for ECG-based health monitoring via both latent-space and gradient-based analysis methods. While deep ECG classification methods have achieved state-of-the-art results on several tasks in electrocardiogram classification, current methods have several limitations. First, existing studies typically evaluate on binary or a small number of multi-class tasks, on relatively small datasets, or on very specific clinical questions. Additionally, the lead configurations, preprocessing steps, class definitions, and evaluation methodologies often vary between studies, hindering comparison. More importantly, most current methods do not provide insights into feature importance, provide insufficient evidence of separability in the learned latent space, and lack an assessment of generalization ability in the more challenging multi-label ECG classification setting exemplified by the PTB-XL dataset.
Reviewers
A summary table describing the PTB-XL dataset is required, including the total number of ECG recordings and the class distribution in the five-class setting. Additionally, the notion of “simplified TimesNet” is repeatedly mentioned but never formally defined, nor is its difference from the original architecture clearly explained.
Authors reply
We have updated the dataset section with the below table in the updated MS
Table 1 Summary of the PTB-XL dataset characteristics and the customized five-class ECG classification setting used in this study.
|
Category |
Description |
|
Dataset Name |
PTB-XL Electrocardiography Dataset |
|
Source |
PhysioNet / PTB-XL |
|
Total ECG Recordings |
21,837 clinical ECG recordings |
|
Number of Patients |
18,885 patients |
|
ECG Type |
12-lead diagnostic ECG |
|
Recording Duration |
10-second ECG recordings |
|
Sampling Frequency |
500 Hz (100 Hz down sampled version also available) |
|
Annotation Type |
Expert cardiologist annotations based on SCP-ECG statements |
|
Classification Setting Used in This Study |
Customized five-class setting |
|
Selected Classes |
NORM, AFIB, MI, PVC, STTC |
|
Label Nature |
Multi-label diagnostic and arrhythmia annotations |
|
Preprocessing |
Signal normalization and label-specific grouping applied before training |
|
Dataset Characteristics |
Large-scale heterogeneous ECG dataset |
Reviewers
From a methodological standpoint, the definition should be presented as a data representation rather than an equation, indicating that each preprocessed ECG sample i belongs to a dataset with leads and time samples. The term “SCP codes” (row 138) requires clarification. Terminology such as “system,” “procedure,” and “framework” is used interchangeably and should be standardized throughout the manuscript.
Author’s reply
We have updated the entire section of modeling and architecture which describes these information.
Model architecture
The proposed model shown in Figure 2, adopts ECG-specific TimesNet-based framework architecture designed to capture both periodic and morphological patterns in multi-lead ECG signals. The model processes a batch of ECG recordings and transforms the raw signals into a latent representation that enables effective temporal feature learning through stacked TimesBlocks. The input ECG tensor is represented as: where is batch size, represents the number of ECG leads, and denotes the temporal sequence length.
The linear projection layer projects the raw ECG signal at each time step t into a low dimensional feature space.where: : Input vector at time step , : Learnable projection matrix, : Bias term and : Latent embedding dimension (set to 128). The embedded sequence after projection transforms into the following representation as summarized in Table 2.
|
Symbol |
Meaning |
|
(B) |
Batch size |
|
(C) |
Number of ECG leads/channels |
|
(T) |
Number of temporal samples (time steps) |
|
(D) |
Latent embedding dimension |
|
(L) |
Number of stacked TimesBlocks |
|
(M) |
Number of target ECG classes |
|
(r) |
Index of detected dominant temporal period/frequency |
|
(q) |
Convolution kernel-size index |
|
(m) |
Target class index |
|
(p_r) |
Temporal period corresponding to frequency (f_r) |
|
PTBXL(_{train}) |
PTB-XL training dataset split |
|
PTBXL(_{val}) |
PTB-XL validation dataset split |
|
PTBXL(_{test}) |
PTB-XL independent test dataset split |
|
pos_support |
Number of positive samples belonging to a target class |
|
AUC(_{CI_low}) |
Lower bound of the confidence interval for AUC |
|
AUC(_{CI_high}) |
Upper bound of the confidence interval for AUC |
Table 2. Mathematical symbols, tensor dimensions, and indexing notation used in the proposed ECG-specific TimesNet architecture
A learnable positional embedding, , is added to the projected features to ensure that the temporal ordering information is preserved.
Where,
Positional encoding makes the model able to effectively learn from its surrounding information. ECG signals show quasi-periodic patterns which match the timing of heartbeats.
As quasi-periodic cardiac rhythms are characteristic for ECG signals, the model determines dominant temporal periodicities by calculating a Fast Fourier Transform (FFT).
This is followed by a computation of the mean feature representation across latent channels:
The frequency spectrum is then obtained using
The principal frequenciesthat come from the amplitude spectr, and their corresponding temporal periods are computed.
These detected periods guide the temporal reshaping of the feature sequence within the TimesBlock.
The core architecture consists of four stacked TimesBlocks designed to learn representations across different periodic structures. The sequence is divided into period-aligned segments for each detected period .
This transition helps the model to capture various patterns within the individual heartbeats across the multiple cycles. The Inception-style parallel convolution module extracts features from TimesBlock content through its system of multiple Conv1D layers which operate at various kernel sizes . The kernels base their design on temporal dependencies which they extract from different time scales.
The resulting feature maps are concatenated:
Similarly, the output is reshaped back to be in the original temporal format and is then integrated back to the original tensor through a residual connection:
The training process becomes more stable through this residual formulation which enables the system to maintain its existing learned patterns while it adds new temporal features. The final feature representation is obtained through global average pooling which operates on the entire temporal dimension after passing through the four TimesBlocks which are after passing through four stacked TimesBlocks after passing through four stacked TimesBlocks.
The resulting pooled feature vector is then passed to a fully connected classification layer:
The classification parameters are represented by and , while the symbol stands for the sigmoid activation function which is used in multi-label prediction. The model outputs the probabilities for the target ECG classes, which include NORM, AFIB, MI, PVC, and STTC. Binary cross-entropy loss was optimized independently for each class to support one-vs-rest probability estimation within the multi-label framework.
Reviewers
All mathematical symbols and subscripts used in the Model Architecture section must be explicitly defined. In particular, the variable is undefined, and the index is ambiguously used to denote both class labels and kernel size; distinct symbols should be adopted. Figure 2 does not define the variable , and several terms (e.g., “independent test,” “pos_support,” “PTBXL_test”, “auc_ci_PTBXL_test.csv”) are unclear or inconsistently used, including in Table 1. Class labels should be reported explicitly in the first column of the table.
Author’s reply
Now the updated, we have added definition table where user can get instant definition before reader starts reading the equations.
able 2. Mathematical symbols, tensor dimensions, and indexing notation used in the proposed ECG-specific TimesNet architecture
|
Symbol |
Meaning |
|
(B) |
Batch size |
|
(C) |
Number of ECG leads/channels |
|
(T) |
Number of temporal samples (time steps) |
|
(D) |
Latent embedding dimension |
|
(L) |
Number of stacked TimesBlocks |
|
(M) |
Number of target ECG classes |
|
(r) |
Index of detected dominant temporal period/frequency |
|
(q) |
Convolution kernel-size index |
|
(m) |
Target class index |
|
(p_r) |
Temporal period corresponding to frequency (f_r) |
|
PTBXL(_{train}) |
PTB-XL training dataset split |
|
PTBXL(_{val}) |
PTB-XL validation dataset split |
|
PTBXL(_{test}) |
PTB-XL independent test dataset split |
|
pos_support |
Number of positive samples belonging to a target class |
|
AUC(_{CI_low}) |
Lower bound of the confidence interval for AUC |
|
AUC(_{CI_high}) |
Upper bound of the confidence interval for AUC |
T
Reviewers
References to preprints (e.g., Reference [21]) should be avoided. All acronyms should be written in extended form at first occurrence. Terminology must be homogenized throughout the manuscript (e.g., “Three-Class” vs. “Three Class,” “GradInput” vs. “GradxInput”, “TimesBlocks” vs. “TimesNet Blocks”, etc.).
Author’s reply
I respect the reviewer’s comment on the use of preprint references. Although the work in question is a preprint reference, it is the most influential work so far in the time-series deep learning field. In addition to this, the work has been referenced by many peer-reviewed publications from major publishers, such as IEEE, Springer, Elsevier, and MDPI. The work has also received scholarly recognition, with over 3,800 citations as of the manuscript’s revision. The reference is therefore included to give a proper methodological background for developing and adapting TimesNet framework for time-series data.
In addition, the terminology was systematically standardized across the revised manuscript to improve consistency and readability. Specifically, naming inconsistencies such as “Three-Class” versus “Three Class,” “GradInput” versus “Grad×Input,” and “TimesBlocks” versus “TimesNet Blocks” were homogenized throughout the text, figures, captions, and tables.
Reviewers
Finally, the quality of several figures is insufficient. Figure 7 is scarcely readable and should be improved, for instance by using colored markers to highlight class similarities; similar issues apply to Figures 8 and 9. Several figures (e.g., Figures 10 and 11) lack axis labels and legends.
Authors Reply
All figures have been redrawn and improved for better clarity and readability for the readers.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have completely addressed all my comments, and I have no further concerns. Therefore, I recommend accepting the paper.
Author Response
The authors have completely addressed all my comments, and I have no further concerns. Therefore, I recommend accepting the paper.
Author’s reply
Thank you for your positive evaluation and valuable feedback throughout the review process. We sincerely appreciate your time and constructive comments, which significantly helped improve the quality and clarity of the manuscript.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper argues that incorporating periodicity-aware temporal modeling within the TimesNet framework can improve ECG representation learning, particularly in terms of feature separability, reduction of class overlap, and confidence estimation—areas that remain underexplored in ECG-specific applications of TimesNet.
This result is consistent with some of the objectives outlined in the paper; however, in the current state of the art, this is not sufficient, as the practical outcomes of the proposed approach and its application are also essential considerations. The paper does not fully substantiate the claims regarding scalability and interpretability. of the proposed ECG analysis framework. In particular, critical feasibility metrics—such as inference latency, computational complexity, and resource utilization—are not provided. As a consequence, it remains unclear whether the proposed method offers a tangible improvement over existing approaches or adequately supports its claim of enabling large-scale automated ECG interpretation and facilitating clinical decision-making in realistic settings, as stated in the manuscript.
While classification performance serves as a primary indicator of model effectiveness in an initial evaluation, it is not sufficient on its own to establish practical viability. A comprehensive assessment should also incorporate additional factors, including computational efficiency, scalability, and deployment constraints, which are essential for real-world clinical implementation.
Moreover, although the paper suggests a gap in the state of the art for multiclass classification on the PTB-XL dataset, this task has already been extensively addressed in the literature, including recent contributions (e.g., https://physionet.org/content/ptb-xl/1.0.3/).
As a minor observation, there are inconsistencies in the labeling, such as “3CLASS” in Fig. 9 versus “3-Class” in Fig. 10, as well as discrepancies like “FET” in Fig. 2 and “GradXInput” versus “GradInput,” which were already noted in the first revision.
Author Response
Reviewers
The paper argues that incorporating periodicity-aware temporal modeling within the TimesNet framework can improve ECG representation learning, particularly in terms of feature separability, reduction of class overlap, and confidence estimation areas that remain underexplored in ECG-specific applications of TimesNet.
Author’s reply
Thank you for the valuable comment. We agree that ECG classification using the PTB-XL dataset has been extensively investigated in prior studies. The novelty of this work does not lie in PTB-XL classification itself, but rather in investigating how periodicity-aware temporal modeling within an ECG-specific TimesNet framework influences representation learning, feature separability, class overlap reduction, classifier confidence, interpretability, and computational feasibility in multi-label ECG classification settings.
We have updated MS in the following sections
- Abstract
- Introduction
- Figures 7,8,9, 10 , 11 and 12
- We have added Computational Complexity and Scalability Analysis
- Updated the conclusions
Reviewers
This result is consistent with some of the objectives outlined in the paper; however, in the current state of the art, this is not sufficient, as the practical outcomes of the proposed approach and its application are also essential considerations. The paper does not fully substantiate the claims regarding scalability and interpretability. of the proposed ECG analysis framework. In particular, critical feasibility metrics—such as inference latency, computational complexity, and resource utilization—are not provided. As a consequence, it remains unclear whether the proposed method offers a tangible improvement over existing approaches or adequately supports its claim of enabling large-scale automated ECG interpretation and facilitating clinical decision-making in realistic settings, as stated in the manuscript. While classification performance serves as a primary indicator of model effectiveness in an initial evaluation, it is not sufficient on its own to establish practical viability. A comprehensive assessment should also incorporate additional factors, including computational efficiency, scalability, and deployment constraints, which are essential for real-world clinical implementation.
Authors Reply
We have updated the MS with the following text
Computational Complexity and Scalability Analysis
The computational complexity and scalability characteristics of the proposed periodicity-aware TimesNet framework are summarized in Table 7. The proposed periodicity-aware TimesNet framework contains approximately 1.957 million trainable parameters. Computational profiling demonstrated that the framework required approximately 13.14 GFLOPs per ECG recording during inference. For a 12-lead ECG signal consisting of 5000 samples, the framework achieved an average inference latency of 5.230 ms per recording using CUDA GPU hardware, corresponding to a throughput of approximately 191.21 ECG recordings per second. The median inference latency was 5.229 ms with a low standard deviation of 0.064 ms, indicating stable inference performance. Peak allocated GPU memory and reserved GPU memory during inference were approximately 55.17 MB and 76.00 MB, respectively.
These findings idicate practical potential for scalable offline and cloud-assisted ECG analysis applications. The relatively low inference latency together with moderate memory utilization suggests that the framework can efficiently process large-scale ECG datasets and may support future deployment in real-time clinical decision-support systems and remote cardiac monitoring platforms operating in realistic healthcare environments.
Compared with many recently reported transformer-based and recurrent ECG classification frameworks, which often require substantially higher computational complexity, memory utilization, and inference latency, the proposed periodicity-aware framework maintains relatively efficient computational performance while preserving competitive classification accuracy and interpretable temporal feature representations. Specifically, the framework achieves approximately 13.14 GFLOPs computational complexity, moderate GPU memory utilization of 55.17 MB, and fast inference latency of 5.230 ms per ECG recording, making it suitable for scalable automated ECG analysis applications.
In addition, interpretability of the proposed framework was investigated using UMAP and PCA latent-space visualization, probability landscape analysis, confusion-matrix analysis, and gradient-based attribution mapping. These analyses provide insight into feature separability, class overlap reduction, decision-boundary stability, and physiologically relevant waveform regions influencing model predictions, thereby improving transparency and explainability of the ECG classification process.
Although the proposed framework demonstrated favorable computational efficiency on GPU hardware, additional optimization and validation on resource-constrained edge devices and wearable platforms remain necessary before large-scale real-time deployment in low-power clinical environments. Furthermore, the proposed framework is intended to support automated ECG interpretation and assist clinicians during diagnostic assessment rather than replace expert cardiologist evaluation.
Table 7. Computational complexity and scalability profile of the proposed periodicity-aware TimesNet framework.
|
Metric |
Value |
|
Model |
Periodicity-aware TimesNet |
|
Input ECG size |
12 leads × 5000 samples |
|
Trainable parameters |
1.957 million |
|
Computational complexity |
13.14 GFLOPs / ECG recording |
|
Average inference latency |
5.230 ms / ECG recording |
|
Median inference latency |
5.229 ms / ECG recording |
|
Latency standard deviation |
0.064 ms |
|
Throughput |
191.21 ECG recordings / second |
|
Peak allocated GPU memory |
55.17 MB |
|
Peak reserved GPU memory |
76.00 MB |
|
Hardware |
CUDA GPU |
|
Deployment implication |
Indicates practical potential for scalable offline and cloud-assisted ECG analysis applications |
Reviewers
Moreover, although the paper suggests a gap in the state of the art for multiclass classification on the PTB-XL dataset, this task has already been extensively addressed in the literature, including recent contributions (e.g., https://physionet.org/content/ptb-xl/1.0.3/).
Authors Reply
The novelty of this work does not lie in PTB-XL classification itself, but rather in investigating how periodicity-aware temporal modeling within an ECG-specific TimesNet framework influences representation learning, feature separability, class overlap reduction, classifier confidence, interpretability, and computational feasibility in multi-label ECG classification settings.
Reviewers
As a minor observation, there are inconsistencies in the labeling, such as “3CLASS” in Fig. 9 versus “3-Class” in Fig. 10, as well as discrepancies like “FET” in Fig. 2 and “GradXInput” versus “GradInput,” which were already noted in the first revision.
We have updated the following figures
Figure 7. Representative 3D ECG surface visualizations for the Five-Class configuration, showing signal amplitude across time lead dimensions for NORM, AFIB, MI, STTC, and PVC classes, highlighting morphological similarities between MI and STTC patterns.
Figure 8. Representative 3D ECG surface visualizations for the Three-Class configuration (NORM, AFIB, PVC), illustrating distinct temporal–morphological patterns across the time–lead plane and demonstrating clearer structural separability between rhythm-based cardiac conditions.
Figure 9. GradInput attribution maps for the Five-Class configuration, illustrating the regions of the ECG signal that most strongly influence model predictions for NORM, AFIB, MI, STTC, and PVC classes. Similar attention patterns between MI and STTC indicate overlapping morphological features.
Figure 10. 3D UMAP visualization of the learned latent feature space. The Three-Class model shows clearer cluster separation, whereas the Five-Class model exhibits greater overlap among pathological classes.
Figure 11. 3D PCA projection of learned feature representations. The Three-Class configuration exhibits more compact clusters, whereas the Five-Class configuration shows increased overlap among classes.
Author Response File:
Author Response.pdf
Round 3
Reviewer 2 Report
Comments and Suggestions for Authors Please clarify what “FET” refers to in Figure 2 (FFT period detection), as it is not yet clearly defined.Fix 5-Class (row 632), PCC (row 651), idicate, Table 8 (row 769), 5 and 3 class (row 829).
Author Response
Please clarify what “FET” refers to in Figure 2 (FFT period detection), as it is not yet clearly defined.
Fix 5-Class (row 632), PCC (row 651), idicate, Table 8 (row 769), 5 and 3 class (row 829).
We would like to thank the reviewers for their continued support in helping improve the readability of our manuscript. We have updated all the required and suggested changes accordingly.
Author Response File:
Author Response.docx

