Next Article in Journal
Sinus Tarsi Morphometry Is Correlated with Flatfoot Severity on Weight-Bearing CT
Previous Article in Journal
The Role of Aurora Kinase A in HBV-Associated Hepatocellular Carcinomas: A Molecular and Immunohistochemical Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Challenges in the Classification of Cardiac Arrhythmias and Ischemia Using End-to-End Deep Learning and the Electrocardiogram: A Systematic Review

1
Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional Mayor de San Marcos, Lima 15081, Peru
2
Ingeniería de Sistemas-Ciencias de la Computación y Matemática Aplicada, Centro de Tecnologia & Centro de Ciências Matemáticas e da Natureza, Ilha do Fundão Campus, Universidad Federal de Río de Janeiro, Río de Janeiro 21941-617, Brazil
3
Instituto Nacional de Salud del Niño, Lima 15083, Peru
*
Author to whom correspondence should be addressed.
Diagnostics 2026, 16(1), 161; https://doi.org/10.3390/diagnostics16010161
Submission received: 19 November 2025 / Revised: 22 December 2025 / Accepted: 29 December 2025 / Published: 4 January 2026
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

Abstract

Background: Cardiac arrhythmias and ischemia are increasingly problematic worldwide because of their frequency, as well as the economic burden they confer. Methods: This research presents a systematic literature review (SLR), based on the PRISMA 2020 statement, that looks into the difficulties in their classification using end-to-end deep learning (DL) techniques and the electrocardiogram (ECG) from 2019 to 2025. A total of 121 relevant studies were identified from Scopus, Web of Science, and IEEE Xplore, and an inventory was created, categorized into six facets that researchers apply in DL studies: preprocessing, DL architectures, databases, evaluation metrics, pathologies, and explainability techniques. Results: Fifty-three challenges were reported, divided between end-to-end DL techniques (15), databases (18), pathologies (9), preprocessing (2), explainability (8), and evaluation metrics (1). Some of the complications identified were the complexity of pathological manifestations in the ECG signal, the large number of classes, the use of multiple leads, comorbidity, and the presence of different factors that change the expected patterns. Crucially, this SLR identified 18 new issues: four related to preprocessing, three related to end-to-end DL, one to databases, one to pathologies, four to metrics, and five to explainability. Particularly notable are the limitations of current metrics for assessing explainability and model decision confidence. Conclusions: This study clarifies all these limitations and provides a structured inventory and discussion of them, which can be useful to researchers, clinicians, and developers in enhancing existing techniques and designing new ECG-based end-to-end DL strategies, leading to more robust, generalizable, and reliable solutions.

1. Introduction

Cardiovascular diseases (CVDs) cause the most deaths and disabilities worldwide [1]; CVD-related fatalities have increased, going up from 12.3 to 19.4 million between 1990 and 2021, and in the U.S., someone dies of CVD every 34 s—a situation that is extremely alarming, totaling about 2580 per day [2]. Thus, they are an overwhelming burden on the health system, with estimates showing that deaths caused by CVDs will reach 20.5 million in 2025 and 35.6 million in 2050 [3]. There are several causes behind this high incidence of the condition, with the most important being hypertension, high low-density lipoprotein cholesterol, and hyperglycemia [4]. Unhealthy diet and smoking as behavioral risk factors, and overweight and obesity, which afflict 59% of adults across the globe, worsen the scenario [5,6]. Air pollution with fine particulate matter is among the environmental risk factors contributing most to the disease burden [7]. In addition, a critical shortage of specialists hampers timely diagnosis and appropriate treatment. Some of the most common diseases linked to CVDs are cardiomyopathies, heart failure, coronary heart disease, and arrhythmias [8]. They are all serious conditions and require accurate diagnosis and effective clinical management [9]. This research is going to study the last two pathologies due to their healthcare relevance and challenge of classification using the DL models.
Cardiac arrhythmias represent disturbances of the regular heart rhythm, and cover an extensive range of pathologies. Atrial fibrillation (AF) is the most predominant arrhythmia among adults, constituting a serious threat that causes significant morbidity and mortality [10], with AF now considered a global epidemic [11]. AF affects approximately one in three to five people after the age of 45, with a lifetime risk from that age onward. The current burden of AF increased from 34 to 59 million cases between 2010 and 2019, with 0.34 million deaths in 2021 [12]. Age is the leading factor for AF, where cases rise significantly after age 65 [11]. Ischemic heart disease, on the other hand, is a group of conditions resulting from partial or complete blockage of coronary blood flow, caused by the accumulation of plaques of fatty materials and cholesterol in the arterial walls. Coronary ischemia continues to be the predominant cause of mortality globally [4], with cases increasing from 5.37 million in 1990 to 8.99 million in 2021; furthermore, current projections suggest it will dominate mortality statistics until 2050 [2,3]. Myocardial damage is irreversible if ischemic heart disease is permitted to develop or continue; thus, effective management of underlying risk factors is essential not only to reduce the incidence of myocardial infarction but also to improve long-term prognosis.
Cardiac arrhythmia and ischemia classification (CAIC) based on analysis of the electrocardiogram (ECG) is not straightforward given the complicated waveform and highly dynamic behavior of the ECG. Classic machine learning techniques require considerable preprocessing and feature engineering by hand. As a result, they are not suitable for scaling, capturing complex patterns, and reproducibility, to name a few. Meanwhile, end-to-end DL methods can allow the raw signal to be directly analyzed, or with minimum preprocessing, and automatically extract features, thereby facilitating automation; in addition, they also aid in handling huge amounts of data. As a result, diagnosis and early detection are improved as well as treatment [13]. Research has revealed the potential of DL models in portable devices and telehealth, helping to improve timely diagnostic access in rural regions and making the spectrum of healthcare more equitable [14,15,16]. Accordingly, end-to-end DL techniques are examined as they can automate the full classification with great potential.
Among the studies on CAIC using ECG and end-to-end DL, ref. [17] employed an end-to-end CNN–Improved Bidirectional LSTM network for arrhythmia classification with the MIT-BIH Arrhythmia Database and the MIT-BIH Atrial Fibrillation Database, achieving 97.85% accuracy and a 97.95% sensitivity. Ref. [18] applied ResNet with an attention mechanism to detect six arrhythmias, reporting an F1-score of 88% and a specificity of 97%. Ref. [19] used a CNN combined with GRU for myocardial infarction detection with the PTB-XL Database, achieving both an accuracy and sensitivity of 99.1%. Ref. [20] utilized CNN, Bi-LSTM, and Bi-GRU in an end-to-end approach to categorize multiple arrhythmias, obtaining an accuracy of 98.55% and a recall of 0.9831. Finally, ref. [21] developed a CNN–Transformer model with dual-view and an external attention mechanism, end-to-end, using the CPSC-2018 database to detect six arrhythmias and ST-segment changes, reaching an 0.85 F1-score and an 0.863 accuracy rate.
CAIC through DL techniques and ECG still faces significant challenges, including low data quality, high variability in ECG signals, lack of model explainability, the contamination of the signals with noise and several artifacts, and the underrepresentation of certain pathologies in training datasets [13]. The widespread use of end-to-end DL architectures in the healthcare domain faces limitations because of the above problems. But this raises an important question. What difficulties does CAIC encounter when using end-to-end DL? In-depth knowledge behind every challenge is key to building novel algorithms or enhancing existing algorithms to improve performance and implementations, which will ultimately help with trust and user uptake [18].
Since 2019, multiple systematic reviews have examined the use of artificial intelligence for cardiac pathology classification with the ECG; however, these reviews typically address a broad spectrum of techniques, including traditional machine learning. To date, few systematic reviews have provided a comprehensive analysis of end-to-end DL pipelines, nor highlighted the specific challenges they face in classifying arrhythmias and ischemia using a rigorous methodology approach.
Because studies on the classification of arrhythmias and ischemia with end-to-end DL pipelines reflect difficulties across various aspects and methodologies, a systematic review becomes necessary to integrate these findings, identify patterns, and assess their impact on the models.
Unlike previous systematic literature reviews, this study focuses only on end-to-end DL architectures and the critical systematization of the difficulties they face in classifying arrhythmias and ischemia. By applying this particular perspective to the literature published between 2019 and 2025, our work provides a complementary and more focused contribution than existing reviews.
The primary goal of this review is to ascertain and scrutinize the techno-methodological barriers to the use of end-to-end DL models with ECG. Accordingly, several articles published from 2019 to 2025 formed the basis of this systematic literature review (SLR). Clinicians, biomedical companies, and researchers can use the findings to refine current algorithms, implement them into clinical practice, and develop more optimized and reliable medical applications. These findings also hint at new avenues of research.
This research aims (a) to provide an overview of heart functioning as well as cardiac arrhythmias and ischemia with respect to their causes, classification, diagnosis, and aspects of study; (b) to present an inventory of CAIC research with end-to-end DL and ECG with respect to their preprocessing approaches, end-to-end DL models, databases, cardiac pathologies, evaluation metrics, and explainability approaches; and (c) to provide an inventory of challenges in CAIC with end-to-end DL and ECG with respect to those already reported in the literature and those not reported yet.
This study is structured into five sections, described below. Section 2 contains a tutorial on cardiac function, as well as characteristics and patterns defining cardiac arrhythmias and ischemia on the ECG waveforms. Section 3 contains a systematic review of CAIC using end-to-end DL architectures and the ECG. The challenges of utilizing end-to-end DL models for CAIC are discussed in Section 4. Section 5 interprets the results. Lastly, Section 6 is dedicated to the conclusions.

2. Background

2.1. Electrical Control of Heart Pumping

The heart sends oxygenated blood to the body and deoxygenated blood to the lungs. In terms of structure, it has four chambers (i.e., two atria and two ventricles), valves, arteries, veins, and myocardium [22]. The heart goes through diastole, the period where the heart muscle relaxes and receives blood, and systole, which is when the heart contracts and pushes blood to the lungs and body. The cardiac conduction system controls these phases. The electrical impulses start in the sinoatrial node, travel through the atria where the electrical activity is delayed for a short time at the atrioventricular node, and then are transmitted through the bundle of His (AV bundle) and Purkinje fibers, causing the contraction of the ventricles [23].

2.2. The ECG and Its Leads

ECG signals are recorded using cutaneous electrodes and include waves, segments, and intervals [24]. The P wave refers to atrial depolarization, the QRS complex is for ventricular activation, while the T wave is for ventricular repolarization. The typical value of signal span is 2 mV, and the duration of a cardiac cycle is about 1 s. This varies in different individuals and conditions [22,25]. A typical ECG waveform is presented in Figure 1. Moreover, the standard 12-lead ECG records the heart’s bioelectrical activity through bipolar limb leads (I–III), precordial leads (V1–V6), and augmented unipolar limb leads (aVR, aVL, and aVF). Each lead provides a distinct view on cardiac regions, and signals from aVR, aVL, and aVF are derived algebraically from leads I, II, and III [26].

2.3. Arrhythmias: Causes and Classification

Arrhythmias are disturbances in the heart’s electrical activity that manifest as irregularities in rhythm, rate, or waveform; their causes range from underlying cardiac disease to stress, drug exposure, or genetic predisposition [28]. They are commonly classified by the site of origin—ventricular, supraventricular, atrioventricular junction, or sinoatrial node—and by the mechanism, which involves either abnormal impulse formation or impaired conduction. Disorders of impulse formation may result from triggered activity or irregular automaticity, producing conditions such as sinus tachycardia, bradycardia, ectopic rhythms, pauses, torsades de pointes, or digitalis-induced arrhythmias [24]. Conduction abnormalities, in turn, involve blocks or delays in propagation, and are divided into non-reentrant conduction blocks—including sinoatrial, atrioventricular, and bundle branch blocks—as well as aberrant supraventricular complexes and reentrant mechanisms, which underlie sinus reentrant tachycardia, atrial and nodal reentrant tachycardias, atrioventricular reentrant tachycardias with accessory pathways, atrial flutter, atrial fibrillation, and ventricular tachycardia or fibrillation [26].

2.4. Ischemia: Causes and Consequences

Cardiac ischemia arises from reduced myocardial perfusion caused by partial or complete obstruction of the coronary arteries. Prolonged ischemia causes tissue necrosis, whereas transient episodes can produce reversible lesions with variable outcomes [29]. The characteristics of myocardial injury depend on the affected artery, and ECG leads provide spatial information to localize the compromised region [23].

2.5. Related Research

Several systematic reviews have addressed denoising techniques for cardiovascular signal analysis. Ref. [30] examined 198 studies published between 2017 and 2023, emphasizing database availability and the classification of eight cardiovascular disease types. Ref. [31] focused on 112 studies from 2020 to 2024, highlighting advanced denoising methods for personalized ECG diagnosis and the challenges of inter-patient variability. Ref. [32], in a review of 368 articles up to 2022, identified major trends and research opportunities in arrhythmia classification, with particular attention to databases and commonly used denoising models. Ref. [13] provided an overview of 78 studies from 2017 to 2023, categorizing deep learning architectures that achieved over 96% accuracy in arrhythmia detection, while also offering medical background and methodological guidance. The systematic reviews together represent the course of evolution of denoising techniques, the role of databases, and the diversity of methodologies in the classification of arrhythmias.
Beyond those systematic reviews, several more general studies have looked at deep learning and artificial intelligence in cardiology. According to [33], randomized controlled trials were reviewed to evaluate clinical effectiveness and the applications across arrhythmias, and ischemia and structural heart disease were examined; the authors argued that trained DL strategies show promise in controlled settings, yet noted that real-world implementation will face challenges owing to commonly seen variations in datasets, a lack of standardization, and the need for multicenter validation. In a comprehensive overview of the literature, that is the study [34] which analyzed 200 studies published from 2020 to 2024, we find the use of AI in cardiology. This covers general clinical practice, including the prevention and intervention for arrhythmias, ischemia, and valvular disease. Ref. [35] concentrated on ECG analysis with AI, especially deep learning applied to arrhythmia detection and prediction, myocardial infarction, and other cardiac conditions; the research also raised ethical issues and problems around lack of interpretability. These reviews highlight the clinical potential, as well as the methodological and ethical challenges, of the use of AI in cardiology.
Extending this perspective, ref. [36] surveyed journal and conference articles published between 2019 and 2024, focusing on transformer-based and large language model methodologies for ECG diagnosis. The study provides a hierarchical classification of the reviewed methods, compares categories of approaches, and highlights research gaps along with future directions.

2.6. Aspects of Study

This study focuses on end-to-end DL techniques for automated CAIC using the ECG; however, various other aspects of cardiac arrhythmias and ischemia exist, which are shown in Figure 2 and described below.
  • Pathophysiology: The study of biological processes that alter heart rhythm or blood flow. For example, an imbalance between the sympathetic and parasympathetic systems can lead to arrhythmias [37].
  • Classification: Techniques for identifying cardiac arrhythmias and ischemia, may or may not be ECG-based. Visual inspection [29], echocardiography [38], end-to-end DL [17], and conventional machine learning [39] are some examples.
  • Ambulatory Monitoring: Continuous tracking with portable devices to detect cardiac events in real time, such as Holter monitors integrated with IoT technology [40].
  • Risk Factors: Identification of conditions that predispose individuals to cardiac diseases. For instance, obesity increases the risk of AF [41].
  • Prevention: Measures aimed at minimizing cardiac arrhythmias or ischemia through modification of lifestyle or early intervention. For example, regular physical activity minimizes the risk of cardiac infarction [42].
  • Treatments: Therapies designed to avoid or control arrhythmias and ischemia; for example, catheter ablation, which eliminates tachycardia [43].
  • Impact on Quality of Life: Assessment of how heart disease affects emotional, physical, and social well-being. For example, patients recently diagnosed with ischemia may suffer from chronic anxiety [44].
  • Prediction: Use of sophisticated algorithms to anticipate the occurrence of critical conditions. For example, ref. [45] proposed a fuzzy DL model to predict cardiac arrhythmias at their outset.

3. Materials and Methods

This section presents an SLR on CAIC through end-to-end DL as per the methodology and for planning, execution, results, and analysis. This study uses stringent inclusion criteria to focus exclusively on end-to-end deep learning architectures, in contrast to previous reviews. Furthermore, we conduct a systematic analysis of the methodological challenges in 6 essential components, namely, preprocessing, databases, pathologies, end-to-end DL models, evaluation metrics, and explainability. This methodological perspective is applied to the literature published in the period from 2019 to 2025. It highlights the scope of our SLR. Furthermore, it differentiates our SLR from existing SLRs.

3.1. Methodology

The 2020 PRISMA Statement [46] defines the article selection procedure for this SLR to ensure transparency and rigor (see Supplementary Materials, Tables S10 and S11). This strategy is consistent with recent systematic reviews; for instance, refs. [47,48] supply detailed surveys of the uses of artificial intelligence in cardiovascular disease diagnosis using the ECG. The specifications proposed by [49] for software engineering studies have been adopted as well. According to guidelines, the four phases used for SLRs on DL for CAICs, like [32,50,51], are explained below.
  • Planning: At this stage of the research protocol, investigators write draft a research protocol that contains the research questions and article search and selection procedure. This includes journal source selection, date selection, search strings, and inclusion and exclusion criteria.
  • Execution: The protocol is utilized to select relevant articles addressing the formulated research questions and answering them.
  • Results: Determination and presentation of statistics on the selected articles, including trends, quality, and distribution.
  • Analysis: The researchers will be required to analyze the research questions formulated at the planning stage.

3.2. Planning

To answer the research question on the difficulties of CAIC with end-to-end DL and ECG, the guiding question was as follows: how is CAIC performed with end-to-end DL and ECG? To answer this, a search for journal publications was conducted in the Scopus, Web of Science, IEEE Xplore, and PubMed databases, covering the period from 2019 to 2025. The search string used was as follows: [(diagnosis OR algorithm OR detection OR classification) AND (“cardiovascular diseases” OR “coronary events” OR arrhythmia OR cardiac OR “heart attack” OR “myocardial infarction” OR ischemia OR “atrial fibrillation”) AND (ecg OR electrocardiogram) AND (“deep learning” OR cnn OR rnn OR lstm OR gru OR transformer OR autoencoder)]. This string was applied to “Title–Abs–Key” in Scopus, “Topic” in Web of Science, and “Title–Abstract” in PubMed. After identifying the scientific articles, the selection criteria summarized in Table 1 were applied to determine the eligible studies.

3.3. Execution

After applying the search strategies, 3089 studies were selected. Next, we systematically reviewed these studies; the screening and selection process is summarized in Figure 3 and was conducted using the inclusion–exclusion criteria shown in Table 1. An Excel file was employed for the exercise to record the selected studies and capture important data like title, author, journal, DOI, and so on.
Initially, a total of 1073 studies were eliminated, including duplicates and other removals, resulting in 2016 articles. During stage two, titles and abstracts were screened, and 1261 studies that failed to satisfy the eligibility criteria were discarded, leaving 755 studies. In stage three, 718 full-text articles were retrieved. In the final stage, a full-text reading of articles was performed to find out those whose contribution is relevant to this review, resulting in 121 articles. Finally, these studies were rigorously analyzed, avoiding subjective interpretations, to find answers to the research question.

3.4. Results

3.4.1. Potential Articles

In total, 2016 potential articles were identified, and 121 were ultimately selected, accounting for 6% of the total (see Table 2). Although no articles were retrieved, PubMed was included in the search strategy given its relevance as a leading clinical database. Additionally, its inclusion ensured comprehensiveness and avoided potential bias due to a limited selection of sources.

3.4.2. Publication Trends

Figure 4 presents the distribution of the selected articles for the period 2019–2025. A significant increase in research output is observed starting in 2020, which corresponds to the emergence of the first relevant end-to-end deep learning (DL) studies in ECG around 2018, with pioneering contributions of [155]. The volume of publications remained relatively consistent through 2025. This trend reflects sustained activity in this field.

3.4.3. Selected Articles by Journal Quality Factor

Regarding journal quality, 70.25% (n = 85) of the selected articles were published in Q1 journals and 21.49% (n = 26) in Q2 journals. In total, 91.74% (n = 111) of the 121 selected articles appeared in the top two quartiles, underscoring their quality (see Figure 5).

3.4.4. Selected Articles by Journal

Figure 6 presents the distribution of the chosen articles by journal; those with only one article are grouped under “Other journals with a single occurrence”.

3.5. Analysis

This section responds to the research question outlined in Section 3.2 through the following sub-questions:
  • RQ1: What preprocessing techniques are applied to ECG signals?
  • RQ2: What end-to-end DL techniques are employed for feature extraction and CAIC from ECG?
  • RQ3: Which databases are used to train and validate end-to-end DL algorithms?
  • RQ4: What types of cardiac arrhythmias and ischemia are classified by the algorithms?
  • RQ5: What metrics are used to evaluate the effectiveness of end-to-end DL algorithms?
  • RQ6: Which techniques are used to explain the results of ECG-based CAIC using end-to-end DL?

3.5.1. RQ1

Twelve types of techniques were identified for preprocessing ECG signals prior to their use in end-to-end DL models (see Table 3). Among these, the most recurrent during the training, validation, and inference phases were segmentation, amplitude normalization, and noise and artifact removal, owing to their direct impact on data quality and model stability. By contrast, techniques such as resampling, structural data adjustment, class balancing, and advanced cleaning were less frequently employed (see Figure 7). These were primarily applied during the model construction phase to obtain suitable data because the final model generally operated on signals that already conformed to the required input format and did not require further modification or class-distribution adjustment.
Figure 7 illustrates the distribution of preprocessing techniques across the studies. The left axis shows the techniques’ identifiers, while the top axis shows how many techniques are employed in each study.
Having outlined the overall distribution and relevance of preprocessing techniques (Table 3, Figure 7), we now describe each category and specific techniques in detail (see Table 4). Additional information can be found in Supplementary Tables S7–S9.
  • Techniques T01: Noise and Artifact Removal
    The methods used in this category are used for preprocessing the ECG signals in order to improve their quality. Wavelet-based methods rely on multi-resolution decomposition to separate waves and suppress noise components. Digital filters (Butterworth, band-pass, and notch) are used to suppress other frequencies, such as baseline wander and power-line noise. LOESS, moving average, and Non-Local Means (NLM) smoothing are statistical methods that use local signal similarity to suppress noise. To minimize amplitude changes, normalization methods are applied (sliding window). Furthermore, to discard residual noise, a thresholding strategy is employed (such as a hard threshold or wavelet threshold), discarding coefficients that went below a defined level. The purposes of artifact removal, baseline wander correction, high-frequency noise suppression, and residual noise removal represent complementary approaches to the common objective of enhancing ECG signal quality. In their studies, authors have labeled the techniques differently, but they are all aimed at solving the same noise and distortion problems in ECG preprocessing. In our corpus, 42 studies used some processing for noise or artifact removal.
  • Techniques T02–05
    To ensure uniformity of the ECG signal amplitude, segmentation of the temporal structure on the recordings, and the harmonizing of sampling rates of the various datasets, preprocessing techniques T02–T04 were implemented. Normalization methods (T02) include Z-score scaling, Min–Max scaling, and unit variance adjustment to avoid varied amplitude ranges in the model. Windowing approaches (T03) segment signals into fixed-length segments of size 1.5–60 s using either a single window or multiple windows, with or without overlap, for local analysis and feature extraction. Resampling techniques (T04) modify the temporal resolution of a signal through downsampling or upsampling, aiming to create uniform sampling frequency data aligned in time and to process heterogeneous sources. These techniques enhance signal comparability and model compatibility, and were reported across a wide range of studies.
    The techniques under T05 deal with forcing identical signal duration and identical structure prior to the model input. The techniques used include zero-padding, cropping, trimming, replication, segmentation, and resampling. These methods were applied to obtain fixed-length signals of length 2.5 s to 2 min and sample length 4096 and 9000, respectively. Short recordings are padded or duplicated, while long recordings are cropped or split into overlapping recordings. These adaptations ensure that model architectures can leverage batch processing, allowing consistent feature extraction from various datasets. While the techniques vary across studies, they all attempt to bring the length and format of definitions to a more acceptable level to facilitate feature extraction and model training. According to Table S8, these approaches were analyzed in 25 papers.
  • Techniques T06–T12
    In total, 41 techniques were identified in categories T06–T12, reported across 34 studies: 14 techniques in T06, 11 in T07, 11 in T08, 2 in T12, and 1 each in T09–T11.
    Techniques to balance classes (T06) are shown in Table 4; oversampling methods such as SMOTE and GAN, as well as downsampling and replication, are countermeasures to improve class balance. Techniques of data cleaning (T07) are used to remove redundant and missing values, noise, indistinct segments, duplicate values, and anomalous signal parts to add accuracy to the input. The techniques of augmentation (T08) apply operations such as cropping, jittering, warping, and noise insertion to diversify data and limit overfitting. Several less-often-reported categories serve specialized preprocessing roles. Overall, the objective of these techniques is to improve data quality, balance classes, and increase variability.

3.5.2. RQ2

The classification of 121 DL techniques into seven families is shown in Table 5. The seven families put forward complementary techniques that should achieve optimal results on ECG data. Also, we can see the trend in use of each technique family. CNN models prevail in the literature, owing to their ability to extract morphological features from complex ECG signals across one or more leads. RNN-based modeling may be rare when compared to the above-mentioned models, but have their uses too. They can model the rhythm of a sequential dependence quite well. Hybrid CNN-RNN frameworks combine CNNs and RNNs that enable the use of spatial and temporal representations. Despite being less so, the transformer-based model introduces scalability and parallelization indicating a promising way forward in long-range dependencies. Increasingly adopted models with enhanced attentional abilities emphasize the importance of interpretability and the dynamic weighting of features in clinical applications. Generative and contrastive methods are useful for representation learning and improving data use efficiency, especially when labels are scarce. Last but not least, the custom/ensemble/NAS models show the architectural optimization and deployment efficiency pursuit.
As a whole, these families exemplify methodological diversification: CNNs remain the backbone, but attention mechanisms, generative paradigms, and automated design search are becoming increasingly important. This comparative lens not only highlights contributions of different families but also their interplay to shape end-to-end ECG analysis. Supplementary Table S1 includes a full list of all 121 techniques plus references for complete transparency and traceability.

3.5.3. RQ3

To assess the 52 databases referenced in the studies reviewed, we must first assign standard abbreviations to the cardiac conditions that cause arrhythmic and ischemic effects since each database states which condition it will cover. Supplementary Table S2 contains the full list of abbreviations. The databases serve multiple functions in model development and evaluation, such as training, validation, testing, and inference. Most of the reviewed studies relied on more than one data source because a single database rarely provides sufficient diversity in terms of pathological classes, patient age ranges, recording devices, or annotation quality. The 17 databases most frequently used across studies—together with the cardiac conditions they cover and the studies in which they were applied—are detailed in Supplementary Table S3, while the 35 databases used in a single study are listed in Supplementary Table S4.
Figure 8 presents the statistics on the use of the 52 databases across the selected studies. In total, these databases were used 163 times. The six most-often-employed databases—CPSC-2018, MIT-BIH, PTB-XL, CinC2017, AFDB, and Chapman–Shaoxing ECG Dataset—account for 57.64% of all instances of use. By contrast, the 35 databases used only once represent 21.47% of the total usage.
Table 6 complements the information in Supplementary Tables S3 and S4 by detailing the key characteristics of the 52 identified databases. Supplementary Table S5 provides download links for the 28 public databases.

3.5.4. RQ4

In total, 163 types of cardiac arrhythmias and ischemia were identified; these are listed and abbreviated in Table 6. Supplementary Tables S3 and S4 indicate the databases and studies in which they appear. Figure 9 illustrates the percentage of studies addressing the 14 most-often-investigated cardiac pathologies out of the total selected studies. Among them, AF was the most studied, appearing in 84 of the 121 studies (69%). It should be noted that some studies included more than one pathology.
The articles corresponding to the pathologies shown in Figure 9 were identified from Tables S3 and S4 and are summarized in Supplementary Table S6 according to usage count and references.

3.5.5. RQ5

Table 7 presents the 11 metrics employed in the studies analyzed in this review. Each metric is accompanied by a precise definition and the recommended scenario for its application.
Figure 10 presents the distribution of results by performance metric for the 121 end-to-end DL models for CAIC analyzed in this review. These results of each metric are not necessarily comparable because the studies relied on different databases that vary in the number of classes, the degree of class imbalance, and the allocation of data for training, validation, or testing. The F1-score, precision, accuracy, and sensitivity metrics show high values (above 95%) but with dispersion. AUROC and specificity also achieve high values, though with low dispersion. By contrast, AUPRC and Macro-F1 show more scattered values, generally below 90%. Finally, G-Mean, NPV, and mAP were each reported in only one study, with values around 95%.

3.5.6. RQ6

Interpretability is the inherent ability of a model to be understood, both in terms of its internal logic and the way it generates results. This is characteristic of so-called white-box models, like Support Vector Machines or Linear Regression, whose structure and operation are transparent. Unlike ML models, DL models can be seen as black boxes. This is due to their complex architectures that have thousands or millions of trainable parameters between any two layers. As such, it is difficult to ascertain the logic behind the inference made by DL models. Incorporating explainability mechanisms helps uncover or clarify the decision-making of the models. Explainability can be applied post hoc, that is, externally after training, using techniques such as weighted activation maps (Grad-CAM). Alternatively, it can be embedded directly into the model’s design, as in architectures based on attention mechanisms. In either case, the explanations do not fully eliminate the opacity of DL models but instead provide a partial—yet valuable—approximation of the reasoning behind their outputs.
Table 8 presents the 23 explainability techniques identified across 43 selected studies, along with brief descriptions and their type of explainability.

4. Challenges of CAIC End-to-End DL and the ECG

The challenges discussed in this section refer to the barriers and limitations that hinder the development of CAIC through end-to-end DL techniques and the use of ECG signals, as well as their integration into hospital systems. These challenges were identified using the method described in Section 4.1, with its execution detailed in Section 4.2, and the results—which outline the specific challenges—presented in Section 4.3.

4.1. Method

The method used to identify challenges in CAIC through end-to-end DL and ECG comprised five phases:
  • Phase 1. Study Inventory: Relevant information on CAIC using end-to-end DL and ECG was collected from the specialized literature.
  • Phase 2. Determination of the Purpose of each Analysis Aspect: The purpose of each analysis aspect was derived from its definition.
  • Phase 3. Inventory of Challenges in the Analysis Aspects: A comprehensive review of the challenges reported in the collected studies was conducted for each analysis aspect.
  • Phase 4. Identification of Unaddressed Challenges: Gaps not addressed in the literature were determined by comparing the inventory of challenges with the stated purposes of the analysis aspects.
  • Phase 5. Discussion of Findings: The challenges identified in the previous phases were discussed, highlighting their implications for future research and the development of CAIC solutions. This phase is presented in Section 5.

4.2. Development

In Phase 1, described in Section 3, 121 relevant studies on CAIC using end-to-end DL and ECG were identified. These studies formed the basis for compiling inventories across the following aspects: preprocessing techniques, end-to-end DL methods, databases used, cardiac pathologies studied, evaluation metrics, and explainability approaches. These aspects constitute the analytical dimensions of this review. Because challenges in these areas directly affect the development and implementation of end-to-end DL models for CAIC with ECG, Phase 2 established the purposes of each aspect, which are presented in Table 9.
In Phase 3, fifteen difficulties were identified for end-to-end DL techniques, as reported in 53 of the selected studies (Table 10). Additionally, Table 11, Table 12, Table 13, Table 14 and Table 15 detail the difficulties associated with each of the remaining five analysis aspects.
Eighteen database-related challenges were identified, explicitly reported in 72 of the selected studies (Table 11). The effects of these challenges on DL model performance are also detailed.
Table 11. Challenges related to databases.
Table 11. Challenges related to databases.
IDDifficultyEffectsReferences
D16Lack of large, well-annotated databases for portable devicesLimits generalization of models trained on standard clinical ECGs. Makes it difficult to capture artifacts specific to ambulatory use.[128]
D17Imbalance between positive classes or between positive and negative classesBiases the model toward the majority class and reduces performance for clinically important conditions.[17,18,20,54,55,56,57,58,59,62,64,67,68,69,71,73,74,75,76,80,81,82,83,85,86,88,89,91,92,93,95,98,101,102,107,113,116,118,119,120,121,123,125,126,129,130,131,132,133,136]
D18Scarcity of sufficiently large, diverse, and annotated databasesWeakens robustness and generalization to new clinical contexts. Leads to overfitting and hinders training of large or complex models.[17,21,40,53,58,59,62,66,79,80,88,90,93,96,99,102,107,114,121,122,128]
D19Lack of data standardization or qualityRequires more diverse and labor-intensive preprocessing due to incompatibilities. Complicates cross-validation and benchmarking.[18,21,57,58,78,79,83,91,126]
D20Underrepresentation of diverse populationsIntroduces bias and limits applicability to generalized clinical use.[17,18,57,72,92,96,116,126]
D21Restricted access and privacy issuesComplicates data collection, sharing, and use. Prevents external validation and reproducibility.[18,20,53,58,59,61,74,78,97,98,100,101,104,107,134]
D22Different sampling rates across databasesCauses loss of information or signal distortion from resampling.[57,114,116]
D23Data from a single source or deviceProduces bias toward the source device, excessive dependence on calibration, and poor generalization to other datasets. Overestimates model capability and reduces external validity.[21,57,91,104,106,119,126,127,129,130,131]
D24Variability among acquisition devicesCreates dependence on specific recording systems, degrades multicenter performance, and hinders cross-validation and benchmarking.[72,75,78,98]
D25Limited metadata: age, sex, weight, ethnic origin and population diversity, comorbidities, etc.Compromises interpretability, fairness, and adaptability of the model to subgroups or vulnerable populations.[58,59,73,79,80,103]
D26Limited availability of databases with concurrent pathologiesPrevents training of robust multi-label models and restricts the design of clinically useful models.[126]
D27Inconsistent or automated labelingLeads the model to learn incorrect associations and reduces performance.[18,55,57,84,89,98,101,102,103,135]
D28Absence of standardized protocols for acquisition, annotation, and structuring of records in ECG databasesReduces interoperability between datasets and limits model generalization, transferability, and comparability.[55,78]
D29Variability in the number of ECG leadsReduces model comparability, introduces differences in spatial information, and prevents transfer to devices using different leads.[70,126]
D30Dataset coverage restricted to a single pathologyLimits clinical evaluation and prevents training or testing of multi-class and multi-label models. [110,113]
D31Inter-database variability in ECG recording duration and qualityComplicates model architecture and joint training, leading to uneven or biased learning.[92,93]
D32Fine-tuningRequires large, high-quality clinical datasets.[110]
D33Different recording durations across databasesIncreases computational complexity and training difficulty. Performs poorly on long signals where rare or transient events may occur.[75,80,85,91,95,116,117,127,129]
Table 12 presents the 10 difficulties related to pathologies identified across 48 selected studies, together with their effects on model performance.
Table 12. Difficulties in cardiac pathologies.
Table 12. Difficulties in cardiac pathologies.
IDDifficultyEffectsReferences
D34Pathology similarityMakes it difficult to extract discriminative features, reducing accuracy in multi-class classification and increasing diagnostic errors. Requires clinically diverse data, precise labeling, and greater model capacity.[54,58,62,63,71,82,83,88,93,95,97,106,108,113,117,118,123,134]
D35Comorbidities or multiple concurrent cardiac pathologiesIntroduce diagnostic difficulty because one pathology may mask or distort another. Requires well-annotated multi-level databases and more sophisticated architectures capable of learning multiple patterns.[40,67,82,108,113]
D36Intra-patient and inter-patient variabilityReduces generalization by blurring physiological and pathological variability. Lowers performance in external cross-validation and limits transferability to new patients.[18,19,21,53,67,75,78,83,90,93,108,114,120,125,133,136]
D37Ambiguity in the patterns of certain pathologiesReduces diagnostic specificity due to inter-class overlap.[121]
D38Pathologies with episodic or paroxysmal occurrenceRequire long recordings or sequential models; sensitivity is reduced when using short windows.[17,53,64,68,75,83,87,88,92,93,122,129,133,134]
D39Subtypes of pathologiesDemand specialized models and finer expert-labeled annotations, increasing complexity and the risk of diagnostic errors.[72,73,86]
D40Complex patternsRequire more sophisticated models and larger volumes of annotated data.[20,21,73,82,83,85,87,92,101,123,128]
D41Subtle morphological changes in various pathologiesMake detection difficult and require complex models with high resolution or higher sampling rates.[21,93]
D42Redundancy of information in the 12-lead ECGLimits usefulness in deep models, where combinations can be learned automatically, and reduces suitability for portable devices.[72,131]
Two difficulties were identified in preprocessing techniques (D43 and D44), reported in 33 of the selected studies, and one difficulty (D45) in the metrics used. Table 13 presents these difficulties along with their effects on model performance evaluation.
Table 13. Difficulties in preprocessing and metrics.
Table 13. Difficulties in preprocessing and metrics.
IDDifficultyEffectsReferences
D43Presence of excessive or unaccounted noise and artifactsIncreases the risk of losing critical information and reduces model performance in real-world settings.[17,18,20,58,72,74,75,76,78,79,80,82,85,86,87,90,91,92,93,95,99,114,118,119,120,121,122,125,127,128,129,134,136]
D44Unrealistic generation of synthetic dataMay cause the model to capture non-real features, leading to poor generalization and reduced explainability.[128]
D45Absence of standardized metrics for evaluationHinders comparison across models; the use of inadequate metrics may obscure poor performance in critical classes.[18]
Finally, Table 14 presents the eight difficulties related to explainability techniques, identified in 16 of the selected studies. Each difficulty is associated with a specific explainability method.
Table 14. Difficulties in the explainability techniques employed.
Table 14. Difficulties in the explainability techniques employed.
IDTechniqueDifficultyEffectsReferences
D46T02Regions highlighted by attention maps do not always match clinically relevant or expected features.The use of clinical tests has limited acceptance in medical circles as they are neither very useful nor unambiguous.[59,101]
D47T03Does not allow complete reconstruction of the decision-making process; limited in scenarios with high signal variability.Restricts transparency; the lack of full traceability of the model’s reasoning hinders acceptance and validation in clinical settings.[129,131,134]
D48T07Significant overlap of feature maps; generated maps may not display clinically understandable, relevant, or complete patternsA reduction in visual clarity and difficulty in identifying the ECG areas influencing the results can lead to ambiguity and low clinical trustworthiness.[78,84,102]
D49T09Explanations can show which areas are important to the model but do not always show areas that the clinician would find important for diagnosis.Creates misalignment between model logic and clinical reasoning; hinders expert validation and reduces trust in automated decisions.[60]
D50T11Incorrect assignment of relevance to noisy regions.Produces false conclusions about ECG regions driving predictions; omits significant features, which may mislead analysts and reduce model reliability.[78]
D51T13It is not possible to trace the complete reasoning of the model using these means.Prevents full causal understanding of decisions; reduces transparency and limits reliability in clinical validation.[97]
D52T17Highlights important regions for the decision without explaining why those regions are relevant.Obscures the decision-making mechanism, reducing usefulness for clinical analysis or expert validation.[97,98]
D53T18Identifies important ECG regions without establishing correlation with clinical criteria or validating medical relevance.Limits interpretability; highlighted regions may be technically relevant but not clinically meaningful, reducing their reliability for practitioners.[61]

4.3. Unaddressed Difficulties

In Phase 4, the difficulties reported in the selected studies (Table 10, Table 11, Table 12, Table 13 and Table 14) were cross-referenced with the objectives of the analysis aspects defined in Table 15. This process allowed the identification of 17 difficulties not yet addressed in the literature, which are presented in Table 15.
Table 15. Difficulties not yet addressed in studies on CAIC.
Table 15. Difficulties not yet addressed in studies on CAIC.
IDAspectUnaddressed DifficultiesJustification of the Affected Activity or Feature
D54PreprocessingLack of dynamic normalization adapted to changing clinical contextsLimits real-time processing of signals that vary due to physiological, technical, clinical, or temporal factors.
D55PreprocessingAbsence of standards for preprocessing multichannel signals from different devicesCreates compatibility and robustness issues due to technical differences between sources.
D56PreprocessingAbsence of automatic quality control of signals in real-world environmentsModels trained on diagnostic-quality signals fail to generalize to uncontrolled environments.
D57PreprocessingFixed windows misaligned with clinical eventsWindows that do not follow physiological or diagnostic boundaries lead to missed detection of brief events.
D58DL end-to-end techniquesLack of automatic hyperparameter tuning mechanisms for deep architecturesReduces efficiency and slows model experimentation and optimization.
D59DL end-to-end techniquesIntegration of self-supervised techniques to pretrain models with limited dataSelf-supervised pretraining reduces dependence on large annotated databases.
D60DL end-to-end techniquesLack of real-time adaptation to patient changes during prolonged monitoringPrevents models from adjusting parameters to individual physiological changes, reducing performance.
D61DatabaseCreation of synthetic databases to balance minority classes without compromising qualityRare patterns should be included without degrading model performance.
D62Cardiac pathologiesLimited consideration of dynamic changes in pathologiesHampers classification when pathologies evolve dynamically during prolonged monitoring.
D63MetricsLimitations of metrics for evaluating explainability and confidence in model decisionsUndermines adoption in medical contexts where explainability is critical.
D64MetricsLack of correlation between computational metrics and clinical outcomesDisconnect between metrics and clinical decision-making fails to account for clinical risk, diagnostic urgency, or therapeutic utility, hindering objective comparisons.
D65MetricsMetrics with limitations for evaluating temporal sequences and real-time performanceFail to capture event timing or latency, persistence, or continuity. Short events go undetected, and real-time inference cannot be evaluated.
D66MetricsMetrics for multi-class classificationConceal poor performance in minority classes and fail to reflect differences in clinical risk between classes.
D67Explainability techniquesLack of visual tools to interpret decisions on long signals (e.g., Holter recordings)Prevents reliable interpretation of extended ECG records.
D68Explainability techniquesLack of explainability adapted to each pathological classCurrent techniques do not distinguish between classes with different clinical criteria; an explanation valid for one class may be inadequate for another.
D69Explainability techniquesLimitations of explanations in multi-label and multi-lead contextsVisual techniques merge explanatory information, preventing separation of influences by class or ECG lead.
D70Explainability techniquesLack of standardized evaluations to assess agreement with expected clinical findingsReduces the reliability of techniques and prevents comparability across studies.
D71Explainability techniquesMisalignment between the explanation’s scale and the clinical event’s scaleExplanations highlight very small regions without clinical correlation in duration.

5. Discussion

5.1. About Preprocessing

Though end-to-end DL models seek to minimize human intervention when conducting ECG analysis, evidence from the 121 reviewed studies shows that preprocessing remains both unavoidable and highly heterogeneous (Table 3 and Table 4). Specifically, 86.7% of the papers used between 1 and 4 of the 12 reported techniques, while 6.7% used no preprocessing at all. This pattern indicates a continuing lack of standardization, which hampers comparability. Segmentation (T03) and length normalization (T05) dominate the landscape, as seen in Figure 7, owing to the technical necessity of fixed-length inputs [155]. The fixed-window segmentation [61] is simpler to deal with than beat-based segmentation [84,91]. However, the first approach might not align with the clinical events of interest. The second approach will align with the clinical events and is more precise but it requires the use of manual feature engineering from the original signals, which is error-prone. This dichotomy indicates the trade-off between automation and clinician fidelity. The absence of a standard protocol is further reflected in amplitude normalization (T02), using Z-score mostly and Min–Max infrequently; this situation makes reproducibility difficult (D19). In the same way, noise and artifact removal (T01) is performed by digital bandpass filters having various cut-off frequencies, indicating different filtering criteria. Resampling (T04) introduces another source of variation because the sampling frequency is not agreed upon; it is commonly downsampled, resulting in a loss of resolution. Additional techniques, including the initial cleaning of data (T07) [75,110], are generally performed manually and considered optional. This highlights concerns about the robustness of the model under real-world inference. In the same way, the data balancing class (T06, T08) is hardly used, which indicates that there were no efforts taken to prevent the establishment of bias or to improve generalizability. Thus, while end-to-end DL models are oriented towards minimal and automatic preprocessing, it is nevertheless an essential component. It is even more important that the varied parameters and configurations used across studies diminish comparability and prevent meaningful conclusions. The evidence suggests that the field is still in flux: aiming for end-to-end automation but being foiled by the absence of standardized preprocessing pipelines.
The surveyed literature identified the aforementioned two vital difficulties in preprocessing that could influence the performance and generalizability of the models (Table 13). The first one refers to the high level of noise and artifacts, which were not taken into consideration during training (D43). Although this approach helps with this issue, misclassification still occurs in the presence of huge noise [78]. This indicates that clinically oriented models should include a separate noise class along with a normal and a pathological class to reject highly corrupted signals. The second problem concerns the unrealistic generation of the synthetic data (D44), leading to fake patterns [128]. When balancing techniques are used, synthetic data can reproduce biases from the original clinical datasets (such as population composition, acquisition protocols, or labeling practices) [170]. As a result, this decreases generalization or explainability. Beyond these challenges discussed, there are more limitations left unaddressed (Table 15). One of the most paramount challenges is the lack of automatic signal quality control in real life (D56), not only for excessive noise but also for loss of signal, saturation, and baseline drift. Another major problem is the lack of a common standard for mapping out the multichannel preprocessing across heterogeneous acquisition devices (D55). Interoperability problems and data approach issues limit models’ applicability in different clinical settings. The results of these findings, taken together, require the automated preprocessing strategies used to be adaptive and robust to a wide range of scenarios [74] and fully integrated in the pipeline for real-world use [18]. Preprocessing is essential; however, it continues to be diverse, subjective, and human-driven. These factors hamper reproducibility and limit the generalization of DL models to uncontrolled clinical scenarios.

5.2. About End-to-End DL Techniques

The wide variety of end-to-end DL models for ECG-based CAIC (Table 5) suggests a rapidly evolving field. Hybrid CNN-based techniques like CNN–BiLSTM and CNN–BiGRU, along with DenseNet, ShuffleNet, and SqueezeNet [20,21,83,85], continue to dominate with the dual purpose of capturing temporal information and spatial representation [84]. Despite this, there is a growing interest in newer architectures, namely, transformers and attention networks, which learn to model global dependencies and can scale better. Apart from the architecture, the increasing maturity of methodological innovations include contrastive learning [73], multitask and continuous learning [55], transfer learning, autoencoders, and knowledge distillation [68]. In summary, these approaches indicate a growing interest in autonomous and generalizable models as a way to tackle the issues of multi-derivation, multi-class, and multi-label classification. Emerging paradigms are suggesting that the convergence of hybrid CNNs is signaling a turning point in the field, with solutions based on systems rather than incremental improvements aimed at efficiency, adaptability, and clinical relevance.
Despite enhancements, end-to-end deep learning techniques continue to encounter significant obstacles for deployment in hospitals (Table 10, D14). The hurdle for the architectures (D02), which entails their complexity, poses the largest trouble and includes problems related to large annotated datasets, hardware requirements, and fine-tuning. This constraint limits portability (D08) and implementation in low-resource contexts. Utilizing several leads (D03) heightens the complexity of the model, which may result in increased overfitting risk when data are scarce or imbalanced [73]. Similarly, long-sequence (D01) analysis has yielded hybrid CNN designs that capture temporal dependencies of higher complexity. Biases in methodology still exist, and the lack of external cross-validation (D15) has a debilitating effect on robustness, as many models fail when their use is extended over different equipment, environments, or patients [18]. The issue is broader: end-to-end DL models are sensitive to the bias of the training data and perform worse on out-of-distribution and out-of-typical-distribution scenarios with comorbidities. Undefined challenges (Table 15) aggravate these constraints. Despite the existence of sophisticated optimizers [59], D58 hyperparameter optimization is still mostly manual. The use of semi-supervised learning (D59), like contrastive learning [40,73], to make better use of unlabeled data could be a great solution. Similarly, the limited diagnostic capacity of the algorithms is due to their lack of real-time adaptability (D54); continuous learning [55] can overcome this limitation and enable models to incorporate new expressions of cardiac diseases while retaining acquired knowledge. Overall, both the explicit and the overlooked challenges call for the need for efficient strategies and personalization mechanisms. End-to-end deep learning models require strengthening of validity, generalizability, and robustness in order to go from proof-of-concept to reliable tools for clinical use.

5.3. About Databases

The inventory revealed the use of 52 different ECG databases (see Table 6 and Table 7) but with significant variability with respect to class definitions, sampling frequency, number of records, number of leads, duration of recording, and availability of datasets. The presence of this heterogeneity indicates a severe lack of standardization (D19) that undermines model transferability and comparability critically [171]. Only 17 databases were reused across studies (Table S3), while 35 appeared only once (Table S4). The reuse of such few resources signifies the cleft benchmarking practices witnessed across the NLG domain. Figure 8 also shows that seven databases—CPSC-2018, PTB-XL, MIT-BIH, AFDB, CinC2017, Chapman–Shaoxing, and MIT-BIH—together account for 82.6% of the usage. An excessive reliance on any single dataset, mainly for fine-tuning a model, can incur domain bias and restrict inter-dataset generalization (D23) [16]. While a model may perform convincingly on a single source, it may not necessarily extend its range of efficiency to other sources [29]. These worries are echoed in Figure 11 and Figure 12. As illustrated in Figure 11, nearly half of the databases, out of the 52, are private. Furthermore, the other 28 are solely publicly accessible. Finally, two come under restricted Data Use Agreements. According to Table 6, 46.15% of databases have no public access (D21), obstructing transparency, reproducibility, and collaborative advancement. Of the public datasets, only 13 contain 12-lead recordings, the distribution of lead diversity is illustrated in Figure 12. Others offer far fewer leads, such as one, two, or fifteen. This makes it difficult to generalize our model to other acquisition setups. The majority of the 31 databases, supporting 12 derivations and 16 multi-labels (most of which also use 12 leads), have similar technical characteristics. However, the technical richness is not evenly distributed and is often restricted to the most reused datasets. Consequently, a relative wealth of resources is available but researchers tend to use an unrepresentative subset due to access limits and benchmarking bias. In order to address the above limitations, it has been argued that future research should use a multiplicity of data and report results over databases not used for the training [59,71]. This would make a broader validation of the results possible and reduce the risk of becoming too specific to the dataset.
The studies in Table 11 that discuss the reported issues reveal major shortcomings that clearly affect robustness, generalizability, and clinical relevance. Class imbalance (D17) is one of the issues that biases the model in favor of the majority class, with performance losses for all other clinically important conditions. Another common issue is the lack of large, diverse, and well-annotated datasets (D18), which poses challenges for adequate training. Besides these stated impediments, there is an unraised challenge: the development of databases with synthetic signals (D61) to introduce difficult-to-capture rare or paroxysmal patterns resembling real signals in order to incorporate events of rare occurrence [88]. The task overlaps with D44 (unrealistic synthetic data generation), where the model may learn the non-existent and, thus, result in a lack of generalizability and explainability [85]. As such, synthetic data should be sufficiently faithful to the complex form and change patterns in varied pathology (D40, D41). In this paper’s subsection, we will expand on the challenge posed by D61, as well as the significance of overcoming it in allowing end-to-end DL techniques to lessen the dependence on costly expert annotation (D18) and improve class equity by better representing minority classes. All these challenges, taken together, indicate that databases are not just a collection of data [84], but rather they are the cause of several serious limitations; databases should not just be accessible but also diverse, standardized, and clinically supported in order to enable building robust high-performing models in common clinical settings [172].

5.4. About Cardiac Pathologies: Cardiac Arrhythmias and Ischemia

The cardiac pathologies inventory (Tables S3 and S4) suggests that the studies tackled a very high number of pathologies, i.e., 153 pathologies (not merely conditions). However, the said data are inequitably represented in the studies, exhibiting a bias in focus. According to Figure 9, only a handful of cardiac conditions have been mainly researched since 69% of studies covered atrial fibrillation. Further, the other major studies also involve PVC and PAC. Diseases that are rarer or more complex than others receive less attention. This trend illustrates the long-tail issue [107,173], which constrains their clinical applicability and diagnostic value for the variety of diseases less often covered in research; this is also true for other models.
Cardiac conditions pose intrinsic challenges (Table 12) due to their dynamic physiology and definition in ECG expression (D40) [55], which often leads to misclassification. The subtle differences between arrhythmias such as AF and AFL (D34) create complications for multi-class tasks, while high intra- and inter-patient variability (D36) impedes generalization. It is observed clinically that comorbidities (D35) are quite common; however, this area is poorly studied [71,113,174]. Episodic conditions such as paroxysmal AF [153] need long recordings (such as Holter and patches) [92,110]. On the other hand, persistent rhythm scenarios were focused on models. One critical gap is neglecting the former expression for dynamic pathology changes (D62). Existing models use fixed windows [101], failing to account for the evolution of other disease like infarction progression or AF transition. The utility in a clinical context is, therefore, reliant on classifications made at a given moment in time as well as on tracking the evolution through time [110]. In conclusion, these limitations thus advocate for the need to build more adaptive and robust DL models to cope with the variability and complexity of cardiac diseases.

5.5. About Evaluation Metrics

The metrics used in the studies reviewed, both per class and aggregated, included 11 different metrics (Table 7). The three most commonly used metrics were recall, accuracy, and F1-score. The F1-score is the harmonic mean of precision (M01) and sensitivity (M02), and is especially useful for unbalanced datasets [94]. However, accuracy, by itself, is a misleading metric since it may be inflated by true negatives or dominating classes like normal rhythm. Figure 13 establishes the trend. The usage of F1-score, recall, and accuracy by far dominates in usage. As for the specialized metrics AUPRC, NPV, G-Mean, and mAP, they are rarely applied. AUROC, specificity, and precision exhibited an average value of (about) 95 percent with low dispersion, but they are still biased toward describing the majority class performance. It is worth noting that only four studies [76,175] made use of AUPRC, which is the more useful metric for rare events, whereas Macro-F1 (M08) and mAP (M11) occurred in just two and one studies, respectively. The performance of minority classes is often underreported to mask the weaknesses of the clinical applicability of the method. Using global metrics because they are widely accepted instead of the more class-conscious ones indicates a benchmarking bias that undermines evaluations of long-tail pathologies. To improve robustness and fairness, future studies should aim to use metrics that look at performance across all classes, particularly in uneven and multi-label situations.
While traditional metrics are still commonly used, there is no standardization in their use (D45). This causes a problem as comparing studies becomes impossible. One of the unaddressed issues (see Table 13) concerns the low usage of metrics that evaluate either explanation quality or confidence (D63), which is important for the clinical uptake [172]. In addition, the metric selected is often driven by statistical convenience rather than medical relevance (D64), and the potential impact of a false negative is often overlooked; in the case of arrhythmia, this could be a delayed diagnosis. Accuracy, for example, does not reflect these clinical consequences [107]. The described failings relate to problems of explainability, such as the inability to retrace reasoning (D51), the lack of validation of highlighted regions (D53), or uncertainty that visualizations (e.g., Grad-CAM) are consistent with cardiological knowledge (D49). Without solid evaluation, we do not trust the “black box” (D02). In summary, the data suggest that a standardized evaluation framework aligned with clinical goals is necessary to validate models technically and medically.

5.6. About Explainability Techniques

The clinical adoption of DL models is hampered due to limited model explainability [76,78]. Of the 121 studies reviewed, only 59 of the studies used at least 1 of the 23 techniques (Table 8), indicating low priority. There are two kinds of interpretability techniques: post hoc (like Grad-CAM, salience maps, SHAP, and t-SNE) and integrated (like attention and NBET). As shown in Figure 14, post hoc methods are the most used methods among interpreters. Both techniques Grad-CAM and t-SNE are the most used post hoc techniques, whereby Grad-CAM was the technique most commonly encountered in 18 studies as a visualization tool for indicating important ECG regions. Out of the 18 studies analyzed, attention mechanisms are the most common integrated approach [108] for improved training transparency. Overall, the real impact of integrated means (especially spatial and temporal attention (TE02 and TE08)) is not effective explainability. The limited use of hybrid or post hoc/integrated approaches indicates a failed attempt to align interpretability with complexity. The evidence indicates that explainability is viewed as important, but implementation is sporadic and often superficial, limiting trust and clinical uptake.
In spite of certain accomplishments, the clinical utility of current capabilities is limited by eight reported shortcomings. Attention mechanisms in ECG signal processing may fixate on unrelated areas (D46), while the Grad-CAM maps may share too much overlap (D48), thus leaving attributing the model prediction open to interpretation. The clinical applicability is further constrained by five more challenges (Table 15) that have not been addressed. Customized visual tools are lacking with respect to long-term recordings such as Holter data (D67). This makes it difficult to find episodic or late-onset events (D71). The visual techniques also face challenges in a multi-label or multi-lead context (D69). This is because merged explanations do not indicate which class or lead contributes to which output (D68). For instance, myocardial infarction (MI) relies on careful analysis of the ST segment and the T wave, while atrial fibrillation (AF) relies on the rhythm and the absence of any P waves [107]; without validation specific to pathology, explainability will lose its clinical relevance. Another significant disparity is the unavailability of reference metrics to gauge concordance with clinical expectations (D70). Without tools for response evaluation (D63), we cannot quantify how much we may trust a model. Consequently, these limitations show that current techniques may improve transparency, but do not yet suffice for deployment in the clinic [176]. There is an urgent need for temporally sensitive explainability methods, pathology-adapted and compatible with long-term multi-channel recordings for reliable automated diagnosis [177].

5.7. Limitations

This study has some limitations. First, the analysis focused exclusively on six key aspects of CAIC with ECG and end-to-end DL, neglecting other relevant dimensions such as ethical and regulatory issues, implementation in portable or embedded hardware, and integration into clinical settings. Clinical validation of this study was not conducted; thus, this study may be limited from a medical point of view. In the end, although the new challenges identified in this study did not appear in the articles we reviewed, the need to validate them remains an important task for the near future to decide if they may hamper the development of robust and clinically useful DL models.

6. Conclusions

The aim of this review was to investigate CAIC with ECG and end-to-end DL techniques. This article examined the key challenges associated with them. The challenges are in the aspects of preprocessing, DL techniques, databases, cardiac pathologies, evaluation metrics, and explainability techniques. The results include extensive inventories for these six areas based on relevant, impactful studies, as well as technical barriers that limit CAIC performance and clinical implementation. Collectively, this provides a systematic overview of the current state of the field. Unlike other CAIC with DL and ECG reviews, this study only focused on end-to-end DL, where 71 challenges were identified, which are as follows: 53 found in the literature, and 18 that are still not addressed. We should consider these latter challenges to close the gap toward high-performing models. This paper indicates that preprocessing conducted at the end-to-end level of DL models is minimal, transparent, and automated to improve performance while adding no unnecessary complexity. Despite the encouraging results obtained using these architectures, the issues of generalizability and training complexity persist. In addition, there are increasing calls for databases that are more diverse and better balanced between classes, particularly for pathologies with similar morphologies, such as AF and AFL, that make the problem more complicated. There is a need for performance evaluation metrics aligned with clinical practice, as well as for more robust and explainable techniques applicable to a wider range of clinical situations. This study offers a firm basis for designing more generalizable, robust, and clinically useful solutions.
As future work, we propose conducting studies to address the identified difficulties and to accelerate the advancement of CAIC through ECG and end-to-end DL. Additionally, we recommend creating a comprehensive framework for restoring and maintaining normal heart rhythm through the classification, prediction, explanation, treatment, and simulation of arrhythmias and cardiac ischemia—an approach similar to that employed by [178]—to maximize survival in pediatric congenital heart surgery.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/diagnostics16010161/s1, Table S1: End-to-end DL techniques; Table S2: Abbreviations of cardiac pathologies detected by ECG; Table S3: Cardiac pathology databases used in more than one study; Table S4: Cardiac pathology databases used in a single study; Table S5: Links to public databases; Table S6: The 14 most studied cardiac pathologies used; Table S7: Noise and artifact removal—specific techniques; Table S8: Preprocessing techniques T02–T05; Table S9: Preprocessing techniques T06–T12. Table S10. PRISMA 2020 for Abstract Checklist. Table S11: PRISMA 2020 Checklist.

Author Contributions

Conceptualization, E.O. and D.M.; methodology, D.M.; formal analysis, E.O. and D.M.; investigation, E.O., N.M., and G.U.; writing—original draft preparation, E.O.; writing—review and editing, E.O.; supervision, D.M., N.M., and G.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Joseph, P.; Lanas, F.; Roth, G.; Lopez-Jaramillo, P.; Lonn, E.; Miller, V.; Mente, A.; Leong, D.; Schwalm, J.-D.; Yusuf, S. Cardiovascular Disease in the Americas: The Epidemiology of Cardiovascular Disease and Its Risk Factors. Lancet Reg. Health-Am. 2025, 42, 100960. [Google Scholar] [CrossRef] [PubMed]
  2. Martin, S.S.; Aday, A.W.; Allen, N.B.; Almarzooq, Z.I.; Anderson, C.A.M.; Arora, P.; Avery, C.L.; Baker-Smith, C.M.; Bansal, N.; Beaton, A.Z.; et al. 2025 Heart Disease and Stroke Statistics: A Report of US and Global Data from the American Heart Association. Circulation 2025, 151, e41–e660. [Google Scholar] [CrossRef] [PubMed]
  3. Chong, B.; Jayabaskaran, J.; Jauhari, S.M.; Chan, S.P.; Goh, R.; Kueh, M.T.W.; Li, H.; Chin, Y.H.; Kong, G.; Anand, V.V.; et al. Global Burden of Cardiovascular Diseases: Projections from 2025 to 2050. Eur. J. Prev. Cardiol. 2025, 32, 1001–1015. [Google Scholar] [CrossRef] [PubMed]
  4. Yang, L.; Zheng, B.; Gong, Y. Global, Regional and National Burden of Ischemic Heart Disease and Its Attributable Risk Factors from 1990 to 2021: A Systematic Analysis of the Global Burden of Disease Study 2021. BMC Cardiovasc. Disord. 2025, 25, 625. [Google Scholar] [CrossRef]
  5. Reed, J.L.; Zaman, D.; Betancourt, M.T.; Robitaille, C.; Majoni, M.; Blanchard, C.; O’Neill, C.D.; Prince, S.A. Physical Activity, Sedentary Behaviour, and Cardiovascular Disease Risk Factors in Canadians Living with and Without Cardiovascular Disease. Can. J. Cardiol. 2025, 41, 507–518. [Google Scholar] [CrossRef]
  6. Tan, S.C.W.; Zheng, B.-B.; Tang, M.-L.; Chu, H.; Zhao, Y.-T.; Weng, C. Global Burden of Cardiovascular Diseases and Its Risk Factors, 1990–2021: A Systematic Analysis for the Global Burden of Disease Study 2021. QJM Int. J. Med. 2025, 118, 411–422. [Google Scholar] [CrossRef]
  7. Zhou, J.-X.; Zheng, Z.-Y.; Peng, Z.-X.; Ni, H.-G. Global Impact of PM2.5 on Cardiovascular Disease: Causal Evidence and Health Inequities across Region from 1990 to 2021. J. Environ. Manag. 2025, 374, 124168. [Google Scholar] [CrossRef]
  8. Roth, G.A.; Mensah, G.A.; Johnson, C.O.; Addolorato, G.; Ammirati, E.; Baddour, L.M.; Barengo, N.C.; Beaton, A.Z.; Benjamin, E.J.; Benziger, C.P.; et al. Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019. J. Am. Coll. Cardiol. 2020, 76, 2982–3021. [Google Scholar] [CrossRef]
  9. Schwalm, J.D.; Joseph, P.; Leong, D.; Lopez-Lopez, J.P.; Onuma, O.; Bhatt, P.; Avezum, A.; Walli-Attaei, M.; McKee, M.; Salim, Y. Cardiovascular Disease in the Americas: Optimizing Primary and Secondary Prevention of Cardiovascular Disease. Lancet Reg. Health-Am. 2025, 42, 100964. [Google Scholar] [CrossRef]
  10. Lorenzo-Almorós, A.; Casado Cerrada, J.; Álvarez-Sala Walther, L.-A.; Méndez Bailón, M.; Lorenzo González, Ó. Atrial Fibrillation and Diabetes Mellitus: Dangerous Liaisons or Innocent Bystanders? J. Clin. Med. 2023, 12, 2868. [Google Scholar] [CrossRef]
  11. Kornej, J.; Börschel, C.S.; Benjamin, E.J.; Schnabel, R.B. Epidemiology of Atrial Fibrillation in the 21st Century: Novel Methods and New Insights. Circ. Res. 2020, 127, 4–20. [Google Scholar] [CrossRef]
  12. Linz, D.; Gawalko, M.; Betz, K.; Hendriks, J.M.; Lip, G.Y.H.; Vinter, N.; Guo, Y.; Johnsen, S. Atrial Fibrillation: Epidemiology, Screening and Digital Health. Lancet Reg. Health-Eur. 2024, 37, 100786. [Google Scholar] [CrossRef] [PubMed]
  13. Ansari, Y.; Mourad, O.; Qaraqe, K.; Serpedin, E. Deep Learning for ECG Arrhythmia Detection and Classification: An Overview of Progress for Period 2017–2023. Front. Physiol. 2023, 14, 1246746. [Google Scholar] [CrossRef] [PubMed]
  14. Unnithan, D.R.; Jeba, J.R. A Novel Framework for Multiple Disease Prediction in Telemedicine Systems Using Deep Learning. Automatika 2024, 65, 763–777. [Google Scholar] [CrossRef]
  15. C., D.; J, N. Cardio Vascular Disease Prediction by Deep Learning Based on IOMT: Review. Smart Sci. 2025, 13, 22–32. [Google Scholar] [CrossRef]
  16. Islam, M.R.; Kabir, M.M.; Mridha, M.F.; Alfarhood, S.; Safran, M.; Che, D. Deep Learning-Based IoT System for Remote Monitoring and Early Detection of Health Issues in Real-Time. Sensors 2023, 23, 5204. [Google Scholar] [CrossRef]
  17. Feng, K.; Fan, Z. A Novel Bidirectional LSTM Network Based on Scale Factor for Atrial Fibrillation Signals Classification. Biomed. Signal Process. Control 2022, 76, 103663. [Google Scholar] [CrossRef]
  18. Huang, Z.; MacLachlan, S.; Yu, L.; Herbozo Contreras, L.F.; Truong, N.D.; Ribeiro, A.H.; Kavehei, O. Generalization Challenges in ECG Deep Learning: Insights from Dataset Characteristics and Attention Mechanism. medRxiv 2023. [Google Scholar] [CrossRef]
  19. Wang, J.; Guo, X. Automated Detection of Myocardial Infarction Based on an Improved State Refinement Module for LSTM/GRU. Artif. Intell. Med. 2024, 152, 102865. [Google Scholar] [CrossRef]
  20. Chopannejad, S.; Roshanpoor, A.; Sadoughi, F. Attention-Assisted Hybrid CNN-BILSTM-BiGRU Model with SMOTE–Tomek Method to Detect Cardiac Arrhythmia Based on 12-Lead Electrocardiogram Signals. Digit. Health 2024, 10, 20552076241234624. [Google Scholar] [CrossRef]
  21. Li, H.; Han, J.; Zhang, H.; Zhang, X.; Si, Y.; Zhang, Y.; Liu, Y.; Yang, H. Clinical Knowledge-Based ECG Abnormalities Detection Using Dual-View CNN-Transformer and External Attention Mechanism. Comput. Biol. Med. 2024, 178, 108751. [Google Scholar] [CrossRef] [PubMed]
  22. Wesley, K. Huszar’s ECG and 12-Lead Interpretation, 6th ed.; Elsevier: Amsterdam, The Netherlands, 2022; ISBN 978-0-323-71195-1. [Google Scholar]
  23. Garcia, T.B. 12-Lead ECG: The Art of Interpretation, 2nd ed.; Jones & Bartlett Learning: Burlington, MA, USA, 2015; ISBN 978-0-7637-7351-9. [Google Scholar]
  24. Hampton, J.; Hampton, J.; Adlam, A. The ECG Made Easy, 9th ed.; Elsevier: Amsterdam, The Netherlands, 2019; ISBN 978-0-7020-7457-8. [Google Scholar]
  25. Cleveland Clinic. Heart Conduction System; Cleveland Clinic: Cleveland, OH, USA, 2025. [Google Scholar]
  26. Zimmerman, F.H. ECG Core Curriculum, 1st ed.; McGraw Hill/Medical: New York, NY, USA, 2023; ISBN 978-0-07-178522-8. [Google Scholar]
  27. Mayapur, P. A Review on Detection and Performance Analysis on R-R Interval Methods for ECG. Int. J. Innov. Res. Sci. Eng. Technol. 2018, 7, 11019–11026. [Google Scholar] [CrossRef]
  28. Meloni, S.; Mastenbjörk, M. EKG/ECG Interpretation: Everything You Need to Know about the 12-Lead EKG/ECG Interpretation and How to Diagnose and Treat Arrhythmias, 2nd ed.; Medical Creations: Las Vegas, NV, USA, 2021; ISBN 978-1-7347413-5-3. [Google Scholar]
  29. Vélez Rodríguez, D. ECG: Electrocardiografía, 4th ed.; Marban: Madrid, Spain, 2020; ISBN 978-84-17184-98-8. [Google Scholar]
  30. Wu, Z.; Guo, C. Deep Learning and Electrocardiography: Systematic Review of Current Techniques in Cardiovascular Disease Diagnosis and Management. BioMed Eng. OnLine 2025, 24, 23. [Google Scholar] [CrossRef] [PubMed]
  31. Ding, C.; Yao, T.; Wu, C.; Ni, J. Deep Learning for Personalized Electrocardiogram Diagnosis: A Review. arXiv 2024, arXiv:2409.07975. [Google Scholar] [CrossRef]
  32. Xiao, Q.; Lee, K.; Mokhtar, S.A.; Ismail, I.; Pauzi, A.L.B.M.; Zhang, Q.; Lim, P.Y. Deep Learning-Based ECG Arrhythmia Classification: A Systematic Review. Appl. Sci. 2023, 13, 4964. [Google Scholar] [CrossRef]
  33. Schots, B.B.S.; Pizarro, C.S.; Arends, B.K.O.; Oerlemans, M.I.F.J.; Ahmetagić, D.; van Der Harst, P.; van Es, R. Deep Learning for Electrocardiogram Interpretation: Bench to Bedside. Eur. J. Clin. Investig. 2025, 55, e70002. [Google Scholar] [CrossRef]
  34. Stamate, E.; Piraianu, A.-I.; Ciobotaru, O.R.; Crassas, R.; Duca, O.; Fulga, A.; Grigore, I.; Vintila, V.; Fulga, I.; Ciobotaru, O.C. Revolutionizing Cardiology through Artificial Intelligence—Big Data from Proactive Prevention to Precise Diagnostics and Cutting-Edge Treatment—A Comprehensive Review of the Past 5 Years. Diagnostics 2024, 14, 1103. [Google Scholar] [CrossRef]
  35. Di Costanzo, A.; Spaccarotella, C.A.M.; Esposito, G.; Indolfi, C. An Artificial Intelligence Analysis of Electrocardiograms for the Clinical Diagnosis of Cardiovascular Diseases: A Narrative Review. J. Clin. Med. 2024, 13, 1033. [Google Scholar] [CrossRef]
  36. Ansari, M.Y.; Yaqoob, M.; Ishaq, M.; Flushing, E.F.; Mangalote, I.A.C.; Dakua, S.P.; Aboumarzouk, O.; Righetti, R.; Qaraqe, M. A Survey of Transformers and Large Language Models for ECG Diagnosis: Advances, Challenges, and Future Directions. Artif. Intell. Rev. 2025, 58, 261. [Google Scholar] [CrossRef]
  37. Kingma, J.G. Acute Myocardial Infarction: Perspectives on Physiopathology of Myocardial Injury and Protective Interventions. In Cardiac Diseases-Novel Aspects of Cardiac Risk, Cardiorenal Pathology and Cardiac Interventions; Gaze, D.C., Kibel, A., Eds.; IntechOpen: London, UK, 2021; ISBN 978-1-83968-161-5. [Google Scholar]
  38. Wahlang, I.; Maji, A.K.; Saha, G.; Chakrabarti, P.; Jasinski, M.; Leonowicz, Z.; Jasinska, E. Deep Learning Methods for Classification of Certain Abnormalities in Echocardiography. Electronics 2021, 10, 495. [Google Scholar] [CrossRef]
  39. Rjoob, K.; Bond, R.; Finlay, D.; McGilligan, V.; Leslie, S.J.; Rababah, A.; Iftikhar, A.; Guldenring, D.; Knoery, C.; McShane, A.; et al. Machine Learning and the Electrocardiogram over Two Decades: Time Series and Meta-Analysis of the Algorithms, Evaluation Metrics and Applications. Artif. Intell. Med. 2022, 132, 102381. [Google Scholar] [CrossRef] [PubMed]
  40. Zhang, H.; Liu, W.; Shi, J.; Chang, S.; Wang, H.; He, J.; Huang, Q. MaeFE: Masked Autoencoders Family of Electrocardiogram for Self-Supervised Pretraining and Transfer Learning. IEEE Trans. Instrum. Meas. 2023, 72, 2502015. [Google Scholar] [CrossRef]
  41. Sha, R.; Baines, O.; Hayes, A.; Tompkins, K.; Kalla, M.; Holmes, A.P.; O’Shea, C.; Pavlovic, D. Impact of Obesity on Atrial Fibrillation Pathogenesis and Treatment Options. J. Am. Heart Assoc. 2024, 13, e032277. [Google Scholar] [CrossRef]
  42. Diab, A.; Dastmalchi, L.N.; Gulati, M.; Michos, E.D. A Heart-Healthy Diet for Cardiovascular Disease Prevention: Where Are We Now? Vasc. Heal. Risk Manag. 2023, 19, 237–253. [Google Scholar] [CrossRef]
  43. Kingma, J.; Simard, C.; Drolet, B. Overview of Cardiac Arrhythmias and Treatment Strategies. Pharmaceuticals 2023, 16, 844. [Google Scholar] [CrossRef]
  44. Wan, L.; Yang, G.; Dong, H.; Liang, X.; He, Y. Impact of Cardiovascular Disease on Health-Related Quality of Life among Older Adults in Eastern China: Evidence from a National Cross-Sectional Survey. Front. Public Health 2024, 11, 1300404. [Google Scholar] [CrossRef]
  45. Amiri, H.; Mohammadzadeh, J.; Mirhosseini, S.M.; Nikravanshelmani, A. Early Prediction of Cardiac Arrhythmia Based on Active Fuzzy Deep Learning. Fuzzy Optim. Model. J. 2025, 6, 062506. [Google Scholar] [CrossRef]
  46. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. Syst. Rev. 2021, 10, 89. [Google Scholar] [CrossRef]
  47. Oke, O.A.; Cavus, N. A Systematic Review on the Impact of Artificial Intelligence on Electrocardiograms in Cardiology. Int. J. Med. Inform. 2025, 195, 105753. [Google Scholar] [CrossRef]
  48. Velandia, H.; Pardo, A.; Vera, M.I.; Vera, M. Systematic Review of Artificial Intelligence and Electrocardiography for Cardiovascular Disease Diagnosis. Bioengineering 2025, 12, 1248. [Google Scholar] [CrossRef] [PubMed]
  49. Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Technical Report EBSE 2007-001; Keele University: Keele, UK; University of Durham: Durham, UK, 2007. [Google Scholar]
  50. Musa, N.; Gital, A.Y.; Aljojo, N.; Chiroma, H.; Adewole, K.S.; Mojeed, H.A.; Faruk, N.; Abdulkarim, A.; Emmanuel, I.; Folawiyo, Y.Y.; et al. A Systematic Review and Meta-Data Analysis on the Applications of Deep Learning in Electrocardiogram. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 9677–9750. [Google Scholar] [CrossRef] [PubMed]
  51. Bhushan, M.; Pandit, A.; Garg, A. Machine Learning and Deep Learning Techniques for the Analysis of Heart Disease: A Systematic Literature Review, Open Challenges and Future Directions. Artif. Intell. Rev. 2023, 56, 14035–14086. [Google Scholar] [CrossRef]
  52. Chen, S.W.; Wang, S.L.; Qi, X.Z.; Samuri, S.M.; Yang, C. Review of ECG Detection and Classification Based on Deep Learning: Coherent Taxonomy, Motivation, Open Challenges and Recommendations. Biomed. Signal Process. Control 2022, 74, 103493. [Google Scholar] [CrossRef]
  53. Chen, H.; Wang, G.; Zhang, G.; Zhang, P.; Yang, H. CLECG: A Novel Contrastive Learning Framework for Electrocardiogram Arrhythmia Classification. IEEE Signal Process. Lett. 2021, 28, 1993–1997. [Google Scholar] [CrossRef]
  54. Dong, Y.; Zhang, M.; Qiu, L.; Wang, L.; Yu, Y. An Arrhythmia Classification Model Based on Vision Transformer with Deformable Attention. Micromachines 2023, 14, 1155. [Google Scholar] [CrossRef]
  55. Gao, H.; Wang, X.; Chen, Z.; Wu, M.; Li, J.; Liu, C. ECG-CL: A Comprehensive Electrocardiogram Interpretation Method Based on Continual Learning. IEEE J. Biomed. Health Inform. 2023, 27, 5225–5236. [Google Scholar] [CrossRef]
  56. Lv, Q.-J.; Chen, H.-Y.; Zhong, W.-B.; Wang, Y.-Y.; Song, J.-Y.; Guo, S.-D.; Qi, L.-X.; Chen, C.Y.-C. A Multi-Task Group Bi-LSTM Networks Application on Electrocardiogram Classification. IEEE J. Transl. Eng. Health Med. 2019, 8, 1900111. [Google Scholar] [CrossRef]
  57. Kashou, A.H.; Ko, W.-Y.; Attia, Z.I.; Cohen, M.S.; Friedman, P.A.; Noseworthy, P.A. A Comprehensive Artificial Intelligence–Enabled Electrocardiogram Interpretation Program. Cardiovasc. Digit. Health J. 2020, 1, 62–70. [Google Scholar] [CrossRef]
  58. Le, D.; Truong, S.; Brijesh, P.; Adjeroh, D.A.; Le, N. sCL-ST: Supervised Contrastive Learning with Semantic Transformations for Multiple Lead ECG Arrhythmia Classification. IEEE J. Biomed. Health Inform. 2023, 27, 2818–2828. [Google Scholar] [CrossRef] [PubMed]
  59. Li, L.; Chen, X.; Hu, S. Application of an End-to-End Model with Self-Attention Mechanism in Cardiac Disease Prediction. Front. Physiol. 2024, 14, 1308774. [Google Scholar] [CrossRef] [PubMed]
  60. Li, D.; Wu, H.; Zhao, J.; Tao, Y.; Fu, J. Automatic Classification System of Arrhythmias Using 12-Lead ECGs with a Deep Neural Network Based on an Attention Mechanism. Symmetry 2020, 12, 1827. [Google Scholar] [CrossRef]
  61. Lu, X.; Wang, X.; Zhang, W.; Wen, A.; Ren, Y. An End-to-End Model for ECG Signals Classification Based on Residual Attention Network. Biomed. Signal Process. Control 2023, 80, 104369. [Google Scholar] [CrossRef]
  62. Park, J.; Kim, J.; Jung, S.; Gil, Y.; Choi, J.-I.; Son, H.S. ECG-Signal Multi-Classification Model Based on Squeeze-and-Excitation Residual Neural Networks. Appl. Sci. 2020, 10, 6495. [Google Scholar] [CrossRef]
  63. Ping, Y.; Chen, C.; Wu, L.; Wang, Y.; Shu, M. Automatic Detection of Atrial Fibrillation Based on CNN-LSTM and Shortcut Connection. Healthcare 2020, 8, 139. [Google Scholar] [CrossRef]
  64. Ratnaparkhi, A.; Deshpande, P.; Ghule, G. A Framework for Segmentation and Classification of Arrhythmia Using Novel Bidirectional LSTM Network. Int. J. Comput. Digit. Syst. 2021, 10, 851–861. [Google Scholar] [CrossRef]
  65. Choudhary, P.S.; Dandapat, S. Morphology-Aware ECG Diagnostic Framework with Cross-Task Attention Transfer for Improved Myocardial Infarction Diagnosis. IEEE Trans. Instrum. Meas. 2024, 73, 4007811. [Google Scholar] [CrossRef]
  66. Shi, J.; Li, Z.; Liu, W.; Zhang, H.; Luo, D.; Ge, Y.; Chang, S.; Wang, H.; He, J.; Huang, Q. An Adaptive Threshold-Based Semi-Supervised Learning Method for Cardiovascular Disease Detection. Inf. Sci. 2024, 677, 120881. [Google Scholar] [CrossRef]
  67. Wang, S.; Li, R.; Wang, X.; Shen, S.; Zhou, B.; Wang, Z. Multiscale Residual Network Based on Channel Spatial Attention Mechanism for Multilabel ECG Classification. J. Healthc. Eng. 2021, 2021, 630643. [Google Scholar] [CrossRef]
  68. Yang, S.; Lian, C.; Zeng, Z.; Xu, B.; Zang, J.; Zhang, Z. A Multi-View Multi-Scale Neural Network for Multi-Label ECG Classification. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 7, 648–660. [Google Scholar] [CrossRef]
  69. Yoo, J.; Jin, Y.; Ko, B.; Kim, M.-S. K-Labelsets Method for Multi-Label ECG Signal Classification Based on SE-ResNet. Appl. Sci. 2021, 11, 7758. [Google Scholar] [CrossRef]
  70. Zhang, J.; Liang, D.; Liu, A.; Gao, M.; Chen, X.; Zhang, X.; Chen, X. MLBF-Net: A Multi-Lead-Branch Fusion Network for Multi-Class Arrhythmia Classification Using 12-Lead ECG. IEEE J. Transl. Eng. Health Med. 2021, 9, 1900211. [Google Scholar] [CrossRef] [PubMed]
  71. Zhou, R.; Lu, L.; Liu, Z.; Xiang, T.; Liang, Z.; Clifton, D.A.; Dong, Y.; Zhang, Y.-T. Semi-Supervised Learning for Multi-Label Cardiovascular Diseases Prediction: A Multi-Dataset Study. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 3305–3320. [Google Scholar] [CrossRef]
  72. Zhu, J.; Lv, J.; Kong, D. CNN-FWS: A Model for the Diagnosis of Normal and Abnormal ECG with Feature Adaptive. Entropy 2022, 24, 471. [Google Scholar] [CrossRef]
  73. Chen, X.; Guo, W.; Zhao, L.; Huang, W.; Wang, L.; Sun, A.; Li, L.; Mo, F. Acute Myocardial Infarction Detection Using Deep Learning-Enabled Electrocardiograms. Front. Cardiovasc. Med. 2021, 8, 654515. [Google Scholar] [CrossRef]
  74. Almasoud, A.S.; Mengash, H.A.; Eltahir, M.M.; Almalki, N.S.; Alnfiai, M.M.; Salama, A.S. Automated Arrhythmia Classification Using Farmland Fertility Algorithm with Hybrid Deep Learning Model on Internet of Things Environment. Sensors 2023, 23, 8265. [Google Scholar] [CrossRef]
  75. Alsaleem, M.N.; Islam, M.S.; Al-Ahmadi, S.; Soudani, A. Multiscale Encoding of Electrocardiogram Signals with a Residual Network for the Detection of Atrial Fibrillation. Bioengineering 2022, 9, 480. [Google Scholar] [CrossRef]
  76. Anand, A.; Kadian, T.; Shetty, M.K.; Gupta, A. Explainable AI Decision Model for ECG Data of Cardiac Disorders. Biomed. Signal Process. Control 2022, 75, 103584. [Google Scholar] [CrossRef]
  77. Avanzato, R.; Beritelli, F. Automatic ECG Diagnosis Using Convolutional Neural Network. Electronics 2020, 9, 951. [Google Scholar] [CrossRef]
  78. Bender, T.; Beinecke, J.M.; Krefting, D.; Müller, C.; Dathe, H.; Seidler, T.; Spicher, N.; Hauschild, A.-C. Analysis of a Deep Learning Model for 12-Lead ECG Classification Reveals Learned Features Similar to Diagnostic Criteria. IEEE J. Biomed. Health Inform. 2023, 28, 1848–1859. [Google Scholar] [CrossRef]
  79. Cai, W.; Chen, Y.; Guo, J.; Han, B.; Shi, Y.; Ji, L.; Wang, J.; Zhang, G.; Luo, J. Accurate Detection of Atrial Fibrillation from 12-Lead ECG Using Deep Neural Network. Comput. Biol. Med. 2020, 116, 103378. [Google Scholar] [CrossRef]
  80. Cao, X.-C.; Yao, B.; Chen, B.-Q. Atrial Fibrillation Detection Using an Improved Multi-Scale Decomposition Enhanced Residual Convolutional Neural Network. IEEE Access 2019, 7, 89152–89161. [Google Scholar] [CrossRef]
  81. Chang, K.-C.; Hsieh, P.-H.; Wu, M.-Y.; Wang, Y.-C.; Chen, J.-Y.; Tsai, F.-J.; Shih, E.S.C.; Hwang, M.-J.; Huang, T.-C. Usefulness of Machine Learning-Based Detection and Classification of Cardiac Arrhythmias With 12-Lead Electrocardiograms. Can. J. Cardiol. 2021, 37, 94–104. [Google Scholar] [CrossRef] [PubMed]
  82. Chang, K.-C.; Hsieh, P.-H.; Wu, M.-Y.; Wang, Y.-C.; Wei, J.-T.; Shih, E.S.C.; Hwang, M.-J.; Lin, W.-Y.; Lin, W.-T.; Lee, K.-J.; et al. Usefulness of Multi-Labelling Artificial Intelligence in Detecting Rhythm Disorders and Acute ST-Elevation Myocardial Infarction on 12-Lead Electrocardiogram. Eur. Heart J.-Digit. Health 2021, 2, 299–310. [Google Scholar] [CrossRef] [PubMed]
  83. Che, C.; Zhang, P.; Zhu, M.; Qu, Y.; Jin, B. Constrained Transformer Network for ECG Signal Processing and Arrhythmia Classification. BMC Med. Inform. Decis. Mak. 2021, 21, 184. [Google Scholar] [CrossRef] [PubMed]
  84. Cheng, Y.; Zhu, W.; Li, D.; Wang, L. Multi-Label Classification of Arrhythmia Using Dynamic Graph Convolutional Network Based on Encoder-Decoder Framework. Biomed. Signal Process. Control 2024, 95, 106348. [Google Scholar] [CrossRef]
  85. Choi, J.-W.; Hong, D.-Y.; Jung, C.; Hwang, E.; Park, S.-H.; Roh, S.-Y. A Multi-View Learning Approach to Enhance Automatic 12-Lead ECG Diagnosis Performance. Biomed. Signal Process. Control 2024, 93, 106214. [Google Scholar] [CrossRef]
  86. Dai, H.; Hwang, H.-G.; Tseng, V.S. Convolutional Neural Network Based Automatic Screening Tool for Cardiovascular Diseases Using Different Intervals of ECG Signals. Comput. Methods Programs Biomed. 2021, 203, 106035. [Google Scholar] [CrossRef]
  87. Dai, Y.; Xu, B.; Yan, S.; Xu, J. Study of Cardiac Arrhythmia Classification Based on Convolutional Neural Network. Comput. Sci. Inf. Syst. 2020, 17, 445–458. [Google Scholar] [CrossRef]
  88. Gao, Y.; Wang, H.; Liu, Z. An End-to-End Atrial Fibrillation Detection by a Novel Residual-Based Temporal Attention Convolutional Neural Network with Exponential Nonlinearity Loss. Knowl.-Based Syst. 2021, 212, 106589. [Google Scholar] [CrossRef]
  89. Ge, Z.; Jiang, X.; Tong, Z.; Feng, P.; Zhou, B.; Xu, M.; Wang, Z.; Pang, Y. Multi-Label Correlation Guided Feature Fusion Network for Abnormal ECG Diagnosis. Knowl.-Based Syst. 2021, 233, 107508. [Google Scholar] [CrossRef]
  90. Guo, X.; Wang, Q.; Zheng, J. An Intelligent Computer-Aided Diagnosis Approach for Atrial Fibrillation Detection Based on Multi-Scale Convolution Kernel and Squeeze-and-Excitation Network. Biomed. Signal Process. Control 2021, 68, 102778. [Google Scholar] [CrossRef]
  91. Bui, T.H.; Hoang, V.M.; Pham, M.T. Automatic Varied-Length ECG Classification Using a Lightweight DenseNet Model. Biomed. Signal Process. Control 2023, 82, 104529. [Google Scholar] [CrossRef]
  92. Hassan, S.U.; Mohd Zahid, M.S.; Abdullah, T.A.; Husain, K. Classification of Cardiac Arrhythmia Using a Convolutional Neural Network and Bi-Directional Long Short-Term Memory. Digit. Health 2022, 8, 205520762211027. [Google Scholar] [CrossRef] [PubMed]
  93. He, R.; Liu, Y.; Wang, K.; Zhao, N.; Yuan, Y.; Li, Q.; Zhang, H. Automatic Cardiac Arrhythmia Classification Using Combination of Deep Residual Network and Bidirectional LSTM. IEEE Access 2019, 7, 102119–102135. [Google Scholar] [CrossRef]
  94. Hiriyannaiah, S.; G M, S.; M H M, K.; Srinivasa, K.G. A Comparative Study and Analysis of LSTM Deep Neural Networks for Heartbeats Classification. Health Technol. 2021, 11, 663–671. [Google Scholar] [CrossRef]
  95. Hsieh, C.-H.; Li, Y.-S.; Hwang, B.-J.; Hsiao, C.-H. Detection of Atrial Fibrillation Using 1D Convolutional Neural Network. Sensors 2020, 20, 2136. [Google Scholar] [CrossRef]
  96. Jang, J.-H.; Kim, T.Y.; Yoon, D. Effectiveness of Transfer Learning for Deep Learning-Based Electrocardiogram Analysis. Healthc. Inform. Res. 2021, 27, 19–28. [Google Scholar] [CrossRef]
  97. Jo, Y.-Y.; Kwon, J.; Jeon, K.-H.; Cho, Y.-H.; Shin, J.-H.; Lee, Y.-J.; Jung, M.-S.; Ban, J.-H.; Kim, K.-H.; Lee, S.Y.; et al. Detection and Classification of Arrhythmia Using an Explainable Deep Learning Model. J. Electrocardiol. 2021, 67, 124–132. [Google Scholar] [CrossRef]
  98. Jo, Y.-Y.; Cho, Y.; Lee, S.Y.; Kwon, J.; Kim, K.-H.; Jeon, K.-H.; Cho, S.; Park, J.; Oh, B.-H. Explainable Artificial Intelligence to Detect Atrial Fibrillation Using Electrocardiogram. Int. J. Cardiol. 2021, 328, 104–110. [Google Scholar] [CrossRef]
  99. Katsaouni, N.; Aul, F.; Krischker, L.; Schmalhofer, S.; Hedrich, L.; Schulz, M.H. Energy Efficient Convolutional Neural Networks for Arrhythmia Detection. Array 2022, 13, 100127. [Google Scholar] [CrossRef]
  100. Kennedy, A.; Doggart, P.; Smith, S.W.; Finlay, D.; Guldenring, D.; Bond, R.; McCausland, C.; McLaughlin, J. Device Agnostic AI-Based Analysis of Ambulatory ECG Recordings. J. Electrocardiol. 2022, 74, 154–157. [Google Scholar] [CrossRef] [PubMed]
  101. Kim, H.K.; Sunwoo, M.H. An Automated Cardiac Arrhythmia Classification Network for 45 Arrhythmia Classes Using 12-Lead Electrocardiogram. IEEE Access 2024, 12, 44527–44538. [Google Scholar] [CrossRef]
  102. Kim, J.-K.; Jung, S.; Park, J.; Han, S.W. Arrhythmia Detection Model Using Modified DenseNet for Comprehensible Grad-CAM Visualization. Biomed. Signal Process. Control 2022, 73, 103408. [Google Scholar] [CrossRef]
  103. Le, K.H.; Pham, H.H.; Nguyen, T.B.T.; Nguyen, T.A.; Thanh, T.N.; Do, C.D. LightX3ECG: A Lightweight and eXplainable Deep Learning System for 3-Lead Electrocardiogram Classification. Biomed. Signal Process. Control 2023, 85, 104963. [Google Scholar] [CrossRef]
  104. Li, Y.; Chen, M.; Jiang, X.; Liu, L.; Han, B.; Zhang, L.; Wei, S. An Atrial Fibrillation Detection Algorithm Based on Lightweight Design Architecture and Feature Fusion Strategy. Biomed. Signal Process. Control 2024, 91, 106016. [Google Scholar] [CrossRef]
  105. Li, Y.; Zhang, L.; Zhu, L.; Liu, L.; Han, B.; Zhang, Y.; Wei, S. Diagnosis of Atrial Fibrillation Using Self-Complementary Attentional Convolutional Neural Network. Comput. Methods Programs Biomed. 2023, 238, 107565. [Google Scholar] [CrossRef]
  106. Li, Y.; Qian, R.; Li, K. Inter-Patient Arrhythmia Classification with Improved Deep Residual Convolutional Neural Network. Comput. Methods Programs Biomed. 2022, 214, 106582. [Google Scholar] [CrossRef]
  107. Li, W.; Tang, Y.M.; Yu, K.M.; To, S. SLC-GAN: An Automated Myocardial Infarction Detection Model Based on Generative Adversarial Networks and Convolutional Neural Networks with Single-Lead Electrocardiogram Synthesis. Inf. Sci. 2022, 589, 738–750. [Google Scholar] [CrossRef]
  108. Ma, C.; Long, X.; Sheng, W.; Vullings, R.; Yang, M.; Zhao, L.; Aarts, R.M.; Li, J.; Liu, C. An Atrial Fibrillation Detection Strategy in Dynamic ECGs With Significant Individual Differences. IEEE Trans. Instrum. Meas. 2024, 73, 4002010. [Google Scholar] [CrossRef]
  109. Soumiaa, M.-A.; Elhabbari, S.; Mansouri, M. The Use of the Multi-Scale Discrete Wavelet Transform and Deep Neural Networks on ECGs for the Diagnosis of 8 Cardio-Vascular Diseases. Mendel 2022, 28, 62–66. [Google Scholar] [CrossRef]
  110. Ng, Y.; Liao, M.-T.; Chen, T.-L.; Lee, C.-K.; Chou, C.-Y.; Wang, W. Few-Shot Transfer Learning for Personalized Atrial Fibrillation Detection Using Patient-Based Siamese Network with Single-Lead ECG Records. Artif. Intell. Med. 2023, 144, 102644. [Google Scholar] [CrossRef] [PubMed]
  111. Obayya, M.; Nemri, N.; Alharbi, L.A.; Nour, M.K.; Alnfiai, M.M.; Abdullah Al-Hagery, M.; Salem, N.M.; Al Duhayyim, M. Improved Bat Algorithm with Deep Learning-Based Biomedical ECG Signal Classification Model. Comput. Mater. Contin. 2023, 74, 3151–3166. [Google Scholar] [CrossRef]
  112. Omarov, B.; Baikuvekov, M.; Momynkulov, Z.; Kassenkhan, A.; Nuralykyzy, S.; Iglikova, M. Convolutional LSTM Network for Heart Disease Diagnosis on Electrocardiograms. Comput. Mater. Contin. 2023, 76, 3745–3761. [Google Scholar] [CrossRef]
  113. Prabhakararao, E.; Dandapat, S. Attentive RNN-Based Network to Fuse 12-Lead ECG and Clinical Features for Improved Myocardial Infarction Diagnosis. IEEE Signal Process. Lett. 2020, 27, 2029–2033. [Google Scholar] [CrossRef]
  114. Qiu, X.; Liang, S.; Meng, L.; Zhang, Y.; Liu, F. Exploiting Feature Fusion and Long-Term Context Dependencies for Simultaneous ECG Heartbeat Segmentation and Classification. Int. J. Data Sci. Anal. 2021, 11, 181–193. [Google Scholar] [CrossRef]
  115. Shin, K.; Kim, H.; Seo, W.-Y.; Kim, H.-S.; Shin, J.-M.; Kim, D.-K.; Park, Y.-S.; Kim, S.-H.; Kim, N. Enhancing the Performance of Premature Ventricular Contraction Detection in Unseen Datasets through Deep Learning with Denoise and Contrast Attention Module. Comput. Biol. Med. 2023, 166, 107532. [Google Scholar] [CrossRef]
  116. Srivastava, A.; Pratiher, S.; Alam, S.; Hari, A.; Banerjee, N.; Ghosh, N.; Patra, A. A Deep Residual Inception Network with Channel Attention Modules for Multi-Label Cardiac Abnormality Detection from Reduced-Lead ECG. Physiol. Meas. 2022, 43, 064005. [Google Scholar] [CrossRef]
  117. Sun, Y.; Shen, J.; Jiang, Y.; Huang, Z.; Hao, M.; Zhang, X. MMA-RNN: A Multi-Level Multi-Task Attention-Based Recurrent Neural Network for Discrimination and Localization of Atrial Fibrillation. Biomed. Signal Process. Control 2024, 89, 105747. [Google Scholar] [CrossRef]
  118. Tesfai, H.; Saleh, H.; Al-Qutayri, M.; Mohammad, M.B.; Tekeste, T.; Khandoker, A.; Mohammad, B. Lightweight Shufflenet Based CNN for Arrhythmia Classification. IEEE Access 2022, 10, 111842–111854. [Google Scholar] [CrossRef]
  119. Ullah, A.; Rehman, S.U.; Tu, S.; Mehmood, R.M.; Fawad; Ehatisham-ul-haq, M. A Hybrid Deep CNN Model for Abnormal Arrhythmia Detection Based on Cardiac ECG Signal. Sensors 2021, 21, 951. [Google Scholar] [CrossRef] [PubMed]
  120. Wang, J. A Deep Learning Approach for Atrial Fibrillation Signals Classification Based on Convolutional and Modified Elman Neural Network. Future Gener. Comput. Syst. 2020, 102, 670–679. [Google Scholar] [CrossRef]
  121. Wang, J.; Wu, X. A Deep Learning Refinement Strategy Based on Efficient Channel Attention for Atrial Fibrillation and Atrial Flutter Signals Identification. Appl. Soft Comput. 2022, 130, 109552. [Google Scholar] [CrossRef]
  122. Wang, J.; Zhang, S. An Improved Deep Learning Approach Based on Exponential Moving Average Algorithm for Atrial Fibrillation Signals Identification. Neurocomputing 2022, 513, 127–136. [Google Scholar] [CrossRef]
  123. Wang, J. An Intelligent Computer-Aided Approach for Atrial Fibrillation and Atrial Flutter Signals Classification Using Modified Bidirectional LSTM Network. Inf. Sci. 2021, 574, 320–332. [Google Scholar] [CrossRef]
  124. Wang, J. Automated Detection of Atrial Fibrillation and Atrial Flutter in ECG Signals Based on Convolutional and Improved Elman Neural Network. Knowl.-Based Syst. 2020, 193, 105446. [Google Scholar] [CrossRef]
  125. Wang, J. Automated Detection of Premature Ventricular Contraction Based on the Improved Gated Recurrent Unit Network. Comput. Methods Programs Biomed. 2021, 208, 106284. [Google Scholar] [CrossRef]
  126. Wu, L.; Huang, G.; Yu, X.; Ye, M.; Liu, L.; Ling, Y.; Liu, X.; Liu, D.; Zhou, B.; Liu, Y.; et al. Deep Learning Networks Accurately Detect ST-Segment Elevation Myocardial Infarction and Culprit Vessel. Front. Cardiovasc. Med. 2022, 9, 797207. [Google Scholar] [CrossRef]
  127. Wu, Q.; Sun, Y.; Yan, H.; Wu, X. ECG Signal Classification with Binarized Convolutional Neural Network. Comput. Biol. Med. 2020, 121, 103800. [Google Scholar] [CrossRef]
  128. Xiong, Z.; Stiles, M.K.; Gillis, A.M.; Zhao, J. Enhancing the Detection of Atrial Fibrillation from Wearable Sensors with Neural Style Transfer and Convolutional Recurrent Networks. Comput. Biol. Med. 2022, 146, 105551. [Google Scholar] [CrossRef]
  129. Yao, Q.; Wang, R.; Fan, X.; Liu, J.; Li, Y. Multi-Class Arrhythmia Detection from 12-Lead Varied-Length ECG Using Attention-Based Time-Incremental Convolutional Neural Network. Inf. Fusion 2020, 53, 174–182. [Google Scholar] [CrossRef]
  130. Yildirim, O.; Talo, M.; Ciaccio, E.J.; Tan, R.S.; Acharya, U.R. Accurate Deep Neural Network Model to Detect Cardiac Arrhythmia on More than 10,000 Individual Subject ECG Records. Comput. Methods Programs Biomed. 2020, 197, 105740. [Google Scholar] [CrossRef]
  131. Zhang, S.; Lian, C.; Xu, B.; Su, Y.; Alhudhaif, A. 12-Lead ECG Signal Classification for Detecting ECG Arrhythmia via an Information Bottleneck-Based Multi-Scale Network. Inf. Sci. 2024, 662, 120239. [Google Scholar] [CrossRef]
  132. Zhang, H.; Gu, H.; Gao, J.; Lu, P.; Chen, G.; Wang, Z. An Effective Atrial Fibrillation Detection from Short Single-Lead Electrocardiogram Recordings Using MCNN-BLSTM Network. Algorithms 2022, 15, 454. [Google Scholar] [CrossRef]
  133. Zhang, X.; Gu, K.; Miao, S.; Zhang, X.; Yin, Y.; Wan, C.; Yu, Y.; Hu, J.; Wang, Z.; Shan, T.; et al. Automated Detection of Cardiovascular Disease by Electrocardiogram Signal Analysis: A Deep Learning System. Cardiovasc. Diagn. Ther. 2020, 10, 227–235. [Google Scholar] [CrossRef]
  134. Zhang, J.; Liu, A.; Gao, M.; Chen, X.; Zhang, X.; Chen, X. ECG-Based Multi-Class Arrhythmia Detection Using Spatio-Temporal Attention-Based Convolutional Recurrent Neural Network. Artif. Intell. Med. 2020, 106, 101856. [Google Scholar] [CrossRef]
  135. Zhang, X.; Li, J.; Cai, Z.; Zhang, L.; Chen, Z.; Liu, C. Over-Fitting Suppression Training Strategies for Deep Learning-Based Atrial Fibrillation Detection. Med. Biol. Eng. Comput. 2021, 59, 165–173. [Google Scholar] [CrossRef]
  136. Zhao, Y.; Cheng, J.; Zhan, P.; Peng, X. ECG Classification Using Deep CNN Improved by Wavelet Transform. Comput. Mater. Contin. 2020, 64, 1615–1628. [Google Scholar] [CrossRef]
  137. Ayano, Y.M.; Schwenker, F.; Dufera, B.D.; Debelee, T.G.; Ejegu, Y.G. Interpretable Hybrid Multichannel Deep Learning Model for Heart Disease Classification Using 12-Lead ECG Signal. IEEE Access 2024, 12, 94055–94080. [Google Scholar] [CrossRef]
  138. Ba Mahel, A.S.; Al-Gaashani, M.S.A.M.; Alkanhel, R.I.; Hassan, D.S.M.; Muthanna, M.S.A.; Muthanna, A.; Aziz, A. A Multi-Scale Deep Learning Framework Combining MobileViT-ECA and LSTM for Accurate ECG Analysis. IEEE Access 2025, 13, 85473–85492. [Google Scholar] [CrossRef]
  139. Butt, F.S.; Wagner, M.F.; Schafer, J.; Ullate, D.G. Toward Automated Feature Extraction for Deep Learning Classification of Electrocardiogram Signals. IEEE Access 2022, 10, 118601–118616. [Google Scholar] [CrossRef]
  140. Cai, J.; Sun, W.; Guan, J.; You, I. Multi-ECGNet for ECG Arrythmia Multi-Label Classification. IEEE Access 2020, 8, 110848–110858. [Google Scholar] [CrossRef]
  141. Hooda, S.; Tripathy, R.K. SERN-AwGOP: Squeeze-and-Excitation Residual Network with an Attention-Weighted Generalized Operational Perceptron for Atrial Fibrillation Detection. IEEE Access 2025, 13, 34844–34853. [Google Scholar] [CrossRef]
  142. Ji, W.; Zhu, D. ECG Classification Exercise Health Analysis Algorithm Based on GRU and Convolutional Neural Network. IEEE Access 2024, 12, 59842–59850. [Google Scholar] [CrossRef]
  143. Khan, A.; Mughal, M.R.; Ali Irtaza, S.; Khan, M.; Tahir, M.; Ali, A.; Saeed, Z. A Deep Learning-Based Ultra-Lightweight Architecture for Atrial Fibrillation Detection Using Single-Lead ECG Recordings. IEEE Access 2025, 13, 86474–86486. [Google Scholar] [CrossRef]
  144. Maghawry, E.; Gharib, T.F.; Ismail, R.; Zaki, M.J. An Efficient Heartbeats Classifier Based on Optimizing Convolutional Neural Network Model. IEEE Access 2021, 9, 153266–153275. [Google Scholar] [CrossRef]
  145. Mahim, S.M.; Emamul Hossen, M.; Al Hasan, S.; Islam, M.K.; Iqbal, Z.; Alibakhshikenari, M.; Collotta, M.; Miah, M.S. TransMixer-AF: Advanced Real-Time Detection of Atrial Fibrillation Utilizing Single-Lead Electrocardiogram Signals. IEEE Access 2024, 12, 143149–143162. [Google Scholar] [CrossRef]
  146. Rafi, T.H.; Woong Ko, Y. HeartNet: Self Multihead Attention Mechanism via Convolutional Network with Adversarial Data Synthesis for ECG-Based Arrhythmia Classification. IEEE Access 2022, 10, 100501–100512. [Google Scholar] [CrossRef]
  147. Wong, J.; Nerbonne, J.; Zhang, Q. Ultra-Efficient Edge Cardiac Disease Detection Towards Real-Time Precision Health. IEEE Access 2024, 12, 9940–9951. [Google Scholar] [CrossRef]
  148. Sadiq, I.; Qureshi, H.N.; Rizwan, A.; Imran, A. Cardiac Arrhythmia Classification from Lead I ECG Recorded in a Free-Living Environment. IEEE J. Biomed. Health Inform. 2025, 1–14. [Google Scholar] [CrossRef] [PubMed]
  149. Wang, R.; Fan, J.; Li, Y. Deep Multi-Scale Fusion Neural Network for Multi-Class Arrhythmia Detection. IEEE J. Biomed. Health Inform. 2020, 24, 2461–2472. [Google Scholar] [CrossRef] [PubMed]
  150. Zubair, M.; Woo, S.; Lim, S.; Kim, D. Deep Representation Learning with Sample Generation and Augmented Attention Module for Imbalanced ECG Classification. IEEE J. Biomed. Health Inform. 2024, 28, 2461–2472. [Google Scholar] [CrossRef] [PubMed]
  151. Jiang, H.; Mutahira, H.; Wei, S.; Muhammad, M.S. ECG-Mamba: Cardiac Abnormality Classification with Non-Uniform-Mix Augmentation on 12-Lead ECGs. IEEE J. Transl. Eng. Health Med. 2025, 13, 461–470. [Google Scholar] [CrossRef]
  152. Luan Pham, H.; Tran, T.D.; Trung Duong Le, V.; Nakashima, Y. MINA: A Hardware-Efficient and Flexible Mini-InceptionNet Accelerator for ECG Classification in Wearable Devices. IEEE Trans. Circuits Syst. I 2025, 72, 2740–2753. [Google Scholar] [CrossRef]
  153. Kim, Y.K.; Lee, M.; Song, H.S.; Lee, S.-W. Automatic Cardiac Arrhythmia Classification Using Residual Network Combined with Long Short-Term Memory. IEEE Trans. Instrum. Meas. 2022, 71, 4005817. [Google Scholar] [CrossRef]
  154. Kim, Y.K.; Lee, M.; Song, H.S.; Lee, S.-W. Local-Global Temporal Fusion Network with an Attention Mechanism for Multiple and Multiclass Arrhythmia Classification. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 6569–6584. [Google Scholar] [CrossRef]
  155. Hannun, A.Y.; Rajpurkar, P.; Haghpanahi, M.; Tison, G.H.; Bourn, C.; Turakhia, M.P.; Ng, A.Y. Cardiologist-Level Arrhythmia Detection and Classification in Ambulatory Electrocardiograms Using a Deep Neural Network. Nat. Med. 2019, 25, 65–69. [Google Scholar] [CrossRef]
  156. Li, Y.; Chen, M.; Qu, X.; Han, B.; Liu, L.; Wei, S. An Atrial Fibrillation Signals Analysis Algorithm in Line with Clinical Diagnostic Criteria. Signal Process. 2025, 236, 110068. [Google Scholar] [CrossRef]
  157. Rahman, M.M.; Rivolta, M.W.; Vaglio, M.; Maison-Blanche, P.; Badilini, F.; Sassi, R. Residual-Attention Deep Learning Model for Atrial Fibrillation Detection from Holter Recordings. J. Electrocardiol. 2025, 89, 153876. [Google Scholar] [CrossRef]
  158. Si, J.; Bao, Y.; Chen, F.; Wang, Y.; Zeng, M.; He, N.; Chen, Z.; Guo, Y. Research on Atrial Fibrillation Diagnosis in Electrocardiograms Based on CLA-AF Model. Eur. Heart J.-Digit. Health 2025, 6, 82–95. [Google Scholar] [CrossRef] [PubMed]
  159. Toosi, M.H.; Mohammadi-nasab, M.; Mohammadi, S.; Salehi, M.E. Efficient Quantized Transformer for Atrial Fibrillation Detection in Cross-Domain Datasets. Eng. Appl. Artif. Intell. 2025, 148, 110371. [Google Scholar] [CrossRef]
  160. Wang, L.-H.; Wang, J.-W.; Xie, C.-X.; Lee, Z.-J.; Cai, B.-J.; Chen, T.-Y.; Chen, S.-L.; Chen, C.-A.; Abu, P.A.R.; Yang, T. Hierarchical Multiattention Temporal Fusion Network for Dual-Task Atrial Fibrillation Subtyping and Early Risk Prediction. Mathematics 2025, 13, 2872. [Google Scholar] [CrossRef]
  161. Wu, X.; Yan, M.; Wang, R.; Xie, L. Multiscale Feature Enhanced Gating Network for Atrial Fibrillation Detection. Comput. Methods Programs Biomed. 2025, 261, 108606. [Google Scholar] [CrossRef]
  162. Xia, P.; Bai, Z.; Yao, Y.; Xu, L.; Zhang, H.; Du, L.; Chen, X.; Ye, Q.; Zhu, Y.; Wang, P.; et al. Advanced Deep Neural Network with Unified Feature-Aware and Label Embedding for Multi-Label Arrhythmias Classification. Tsinghua Sci. Technol. 2025, 30, 1251–1269. [Google Scholar] [CrossRef]
  163. Zou, Y.; Wang, P.; Du, L.; Chen, X.; Li, Z.; Song, J.; Fang, Z. A Multi-Level Multiple Contrastive Learning Method for Single-Lead Electrocardiogram Atrial Fibrillation Detection. Bioengineering 2025, 12, 44. [Google Scholar] [CrossRef]
  164. Darmawahyuni, A.; Sari, W.K.; Afifah, N.; Tutuko, B.; Nurmaini, S.; Marcelino, J.; Isdwanta, R.; Khairunnisa, C.Z. A Deep Learning-Based Myocardial Infarction Classification Based on Single-Lead Electrocardiogram Signal. Int. J. Adv. Appl. Sci. 2025, 14, 352. [Google Scholar] [CrossRef]
  165. Guhdar, M.; Mohammed, A.O.; Mstafa, R.J. Advanced Deep Learning Framework for ECG Arrhythmia Classification Using 1D-CNN with Attention Mechanism. Knowl.-Based Syst. 2025, 315, 113301. [Google Scholar] [CrossRef]
  166. Liu, X.; Xu, Y.; Xu, H.; He, L.; Long, S.; Huang, Y.; Wang, Y.; Lu, Y.; Huang, Y.; Wu, J.; et al. Advancing Interpretable Cardiac Disease Diagnosis via a Transformer-Convolutional Hybrid Network on Electrocardiograms. Eng. Appl. Artif. Intell. 2025, 152, 110675. [Google Scholar] [CrossRef]
  167. Sekhar, K.; K, P.; J, J.S. Efficient Detection of Cardiac Arrhythmias Using Low-Latency Fpga-Accelerated Hybrid Deep Learning Models. Eng. Res. Express 2025, 7, 025277. [Google Scholar] [CrossRef]
  168. Xiao, Q.; Wang, C. Adaptive Wavelet Base Selection for Deep Learning-Based ECG Diagnosis: A Reinforcement Learning Approach. PLoS ONE 2025, 20, e0318070. [Google Scholar] [CrossRef] [PubMed]
  169. Fajardo, C.A.; Parra, A.S.; Castellanos-Parada, T.V. Lightweight Deep Learning for Atrial Fibrillation Detection: Efficient Models for Wearable Devices. Ing. Inv. 2025, 45, e114530. [Google Scholar] [CrossRef]
  170. Cross, J.L.; Choma, M.A.; Onofrey, J.A. Bias in Medical AI: Implications for Clinical Decision-Making. PLoS Digit Health 2024, 3, e0000651. [Google Scholar] [CrossRef]
  171. Merdjanovska, E.; Rashkovska, A. A Framework for Comparative Study of Databases and Computational Methods for Arrhythmia Detection from Single-Lead ECG. Sci. Rep. 2023, 13, 11682. [Google Scholar] [CrossRef] [PubMed]
  172. Chung, C.T.; Lee, S.; King, E.; Liu, T.; Armoundas, A.A.; Bazoukis, G.; Tse, G. Clinical Significance, Challenges and Limitations in Using Artificial Intelligence for Electrocardiography-Based Diagnosis. Int. J. Arrhythm. 2022, 23, 24. [Google Scholar] [CrossRef]
  173. Brasil, S.; Pascoal, C.; Francisco, R.; dos Reis Ferreira, V.; Videira, P.A.; Valadão, G. Artificial Intelligence (AI) in Rare Diseases: Is the Future Brighter? Genes 2019, 10, 978. [Google Scholar] [CrossRef]
  174. Liu, G.; Xue, Y.; Liu, Y.; Wang, S.; Geng, Q. Multimorbidity in Cardiovascular Disease and Association with Life Satisfaction: A Chinese National Cross-Sectional Study. BMJ Open 2020, 10, e042950. [Google Scholar] [CrossRef]
  175. Li, X.; Zheng, Q.; Zhang, S.; Fu, S.; Chen, Y.; Ye, K. A Reliable Deep Learning Model for ECG Interpretation: Mitigating Overconfidence and Direct Uncertainty Quantification. Symmetry 2025, 17, 794. [Google Scholar] [CrossRef]
  176. Van De Leur, R.R.; Bos, M.N.; Taha, K.; Sammani, A.; Yeung, M.W.; Van Duijvenboden, S.; Lambiase, P.D.; Hassink, R.J.; Van Der Harst, P.; Doevendans, P.A.; et al. Improving Explainability of Deep Neural Network-Based Electrocardiogram Interpretation Using Variational Auto-Encoders. Eur. Heart J.-Digit. Health 2022, 3, 390–404. [Google Scholar] [CrossRef]
  177. Al-Zaiti, S.S.; Bond, R.R. Explainable-by-Design: Challenges, Pitfalls, and Opportunities for the Clinical Adoption of AI-Enabled ECG. J. Electrocardiol. 2023, 81, 292–294. [Google Scholar] [CrossRef]
  178. Mauricio, D.; Cárdenas-Grandez, J.; Uribe Godoy, G.V.; Rodríguez Mallma, M.J.; Maculan, N.; Mascaro, P. Maximizing Survival in Pediatric Congenital Cardiac Surgery Using Machine Learning, Explainability, and Simulation Techniques. J. Clin. Med. 2024, 13, 6872. [Google Scholar] [CrossRef]
Figure 1. Typical ECG signal of a healthy individual. Adapted from [27].
Figure 1. Typical ECG signal of a healthy individual. Adapted from [27].
Diagnostics 16 00161 g001
Figure 2. Changes in the ECG signal associated with each type of ischemia. Adapted from [22].
Figure 2. Changes in the ECG signal associated with each type of ischemia. Adapted from [22].
Diagnostics 16 00161 g002
Figure 3. PRISMA selection process used for the literature review [52].
Figure 3. PRISMA selection process used for the literature review [52].
Diagnostics 16 00161 g003
Figure 4. Distribution of articles by publication year.
Figure 4. Distribution of articles by publication year.
Diagnostics 16 00161 g004
Figure 5. Distribution of articles by quartiles.
Figure 5. Distribution of articles by quartiles.
Diagnostics 16 00161 g005
Figure 6. Counts of selected articles by journal.
Figure 6. Counts of selected articles by journal.
Diagnostics 16 00161 g006
Figure 7. Preprocessing techniques used by each author (NR: not reported).
Figure 7. Preprocessing techniques used by each author (NR: not reported).
Diagnostics 16 00161 g007
Figure 8. Percentage of database usage.
Figure 8. Percentage of database usage.
Diagnostics 16 00161 g008
Figure 9. Percentage of studies per pathology (only the 14 most frequent are shown).
Figure 9. Percentage of studies per pathology (only the 14 most frequent are shown).
Diagnostics 16 00161 g009
Figure 10. Distribution of performance metric results in end-to-end DL models for CAIC with ECG.
Figure 10. Distribution of performance metric results in end-to-end DL models for CAIC with ECG.
Diagnostics 16 00161 g010
Figure 11. Public vs. private ECG data: a barrier to benchmarking?
Figure 11. Public vs. private ECG data: a barrier to benchmarking?
Diagnostics 16 00161 g011
Figure 12. Lead diversity in public ECG datasets: a challenge for model generalization.
Figure 12. Lead diversity in public ECG datasets: a challenge for model generalization.
Diagnostics 16 00161 g012
Figure 13. Frequency of performance metrics used in the selected studies.
Figure 13. Frequency of performance metrics used in the selected studies.
Diagnostics 16 00161 g013
Figure 14. Distribution of explainability types in reviewed studies.
Figure 14. Distribution of explainability types in reviewed studies.
Diagnostics 16 00161 g014
Table 1. Inclusion and exclusion criteria.
Table 1. Inclusion and exclusion criteria.
Inclusion CriteriaExclusion Criteria
Addresses the research question
Study Type: Original journal article
Language: English
Period: 2019 to 2025
Studies not aligned with the objectives of this review.
Studies conducted in different contexts (e.g., sleep disorders, diabetes, neonates or fetuses, non-human subjects, drug effects, recent surgeries).
Studies focusing on other aspects (e.g., risk factors, treatments, prevention, use of tools other than ECG such as radar, echocardiography, or pulse oximetry, or not employing end-to-end DL).
Conference proceedings, posters, editorials, and theses.
Studies without contributions or results.
Table 2. Potentially eligible and selected articles.
Table 2. Potentially eligible and selected articles.
SourcePotentially Eligible Articles (n)Selected Articles (n)Selected Articles
Scopus47835[19,21,40,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72]
Web of Science109168[17,18,20,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136]
PubMed00---
IEEE Xplore44718[137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154]
Total2016121[13,14,15,16,17,30,41,42,43,44,45,46,47,48,49,50,51,32,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123]
Table 3. Types and descriptions of preprocessing techniques.
Table 3. Types and descriptions of preprocessing techniques.
IDTechnique TypeDescriptionUsage CountReferences
T01Noise and Artifact RemovalNoise: Unwanted random signals with a broad spectrum and low amplitude (~0.01–0.1 mV) superimposed on the ECG signal, including thermal and electronic noise.
Artifacts: Higher-amplitude distortions (~0.1–10 mV) caused by physiological factors (breathing, muscle movement, physical activity, sweating, pacemakers), technical issues (poor electrode contact, faulty cables), or environmental factors (vibrations, 50/60 Hz interference). Their spectral range is ~0.05–100 Hz and they appear as abrupt spikes, irregular waves, interruptions, or baseline fluctuations around 0.05 Hz.
61[17,19,20,21,40,55,60,64,65,71,72,73,76,79,81,83,87,88,90,94,96,98,99,104,105,106,108,110,112,114,115,119,120,121,122,124,126,130,132,135,136,137,141,142,143,145,146,153,156,157,158,159,160,161,162,163,164,165,166,167]
T02Amplitude NormalizationScaling ECG amplitudes into the same range to improve comparability and reduce scale bias.57[17,19,21,55,60,61,62,63,65,66,68,71,74,75,78,79,80,84,85,86,87,88,90,91,92,98,99,102,104,106,110,115,120,121,122,123,124,127,129,131,132,135,137,141,149,151,153,154,156,159,162,163,164,165,166,167,168]
T03SegmentationDividing the signal into fixed-length segments for efficient processing because DL models require fixed-length inputs.66[17,18,19,21,40,55,64,65,66,75,78,79,80,83,85,86,87,88,89,90,91,93,95,96,97,99,102,104,105,106,107,108,110,113,114,115,117,118,120,121,122,124,125,126,128,132,133,138,141,142,149,151,153,154,156,157,158,159,160,161,162,163,164,169]
T04ResamplingEnsuring consistent sampling frequencies when using multiple databases. Downsampling is often used to reduce computational load.42[17,18,19,21,40,53,58,65,66,68,78,87,91,96,99,100,102,104,108,110,113,115,116,121,122,123,124,125,131,132,134,149,151,153,154,156,158,159,162,163,164]
T05Length NormalizationApplying techniques like padding and cropping to equalize signal length across ECG records. Required because DL models need fixed-length inputs.30[18,40,58,60,61,63,67,68,70,71,78,84,85,88,89,91,93,95,101,103,112,116,127,129,131,134,141,151,162,163]
T06Class BalancingAdjusting class distribution in datasets when classes are unevenly represented.16[20,63,64,75,80,87,92,93,106,107,114,118,133,144,146,153]
T07Data CleaningCorrecting or removing missing, duplicate, or invalid data, including the removal of noisy sections (clipping).13[20,67,75,76,80,82,83,85,97,98,110,114,140]
T08Data AugmentationEnhancing model robustness through synthetic data generation or transformations. May also help balance class distribution.16[21,53,58,59,65,85,103,110,112,128,129,140,143,145,149,167]
T09Z-shaped ReconstructionConverting one-dimensional data into two-dimensional representations.1[105]
T10Lead ExpansionCreating new leads by mathematically combining existing ones.2[67,68]
T11Wavelet DecompositionDecomposing the ECG signal into different frequencies or scales to extract features at each level.2[80,109]
T12Inter-Patient Variability ReductionMinimizing ECG variability across patients with the same pathology to improve the generalization of DL models.1[135]
Table 4. Preprocessing specific techniques.
Table 4. Preprocessing specific techniques.
IDTechnique TypeSpecific TechniquesUsage Count
T01Noise and Artifact RemovalWavelet8
Digital filter21
LOESS1
Moving average1
Smoothing1
NLM2
Normalization1
Thresholding2
T02Amplitude NormalizationZ-score49
Min–Max7
Unit variance1
T03SegmentationFixed window60
Multiple fixed windows3
Overlapping sliding windows3
T04ResamplingDownsampling40
Upsampling2
T05Length NormalizationZero-padding11
Cropping11
Trimming5
Replication2
Segmentation1
Resampling3
Filling4
T06Class BalancingOversampling: SMOTE4
Oversampling: GAN 1
Oversampling: ADYSAN1
Downsampling2
Oversampling1
Replication2
Segmentation 2
Data amplification2
T07Data CleaningRemove missing values2
Remove zeros or NaN data2
Remove noisy segments5
Remove duplicates1
Remove anomalous portions2
T08Data AugmentationCropping1
Jittering1
Warping1
Noise injection2
Scaling2
Random sampling2
Others5
T09Z-shaped Reconstruction---1
T10Lead Expansion---2
T11Wavelet Decomposition---2
T12Inter-Patient Variability ReductionFFT- and Hanning window-based filter1
Table 5. End-to-end DL technique families.
Table 5. End-to-end DL technique families.
FamilyRepresentative TechniquesUsage CountReferences
CNN-based modelsCNN, ResNet, DenseNet, Inception, SE-ResNet, ShuffleNet, U-Net, AlexNet-1D, Multi-Resolution CNN, Temporal/Dilated CNN, GoogLeNet, XResNet35[55,57,62,73,76,77,78,80,86,87,89,91,95,97,98,100,102,106,109,114,118,119,128,133,136,140,144,148,149,152,154,156,169]
RNN-based modelsLSTM, Bi-LSTM, GRU, BiGRU, Elman5[64,74,81,82,94]
Hybrid CNN-RNN modelsCNN–LSTM, CNN–BiLSTM, CNN–GRU, CNN–BiGRU, CNN–BiLSTM–BiGRU, Deep CNN–LSTM23[17,19,63,84,92,93,99,112,120,123,124,125,126,130,132,135,139,142,143,161,164,167]
Transformer-based modelsCNN–Transformer, Swin–Transformer, Dual-view Transformers8[21,54,83,108,131,145,159,166]
Attention-enhanced modelsSE blocks, channel/spatial/temporal attention, multi-head attention, CNN + SSM39[18,19,56,59,60,61,65,67,68,69,70,71,79,85,88,90,101,103,104,105,113,115,116,117,121,122,129,134,137,138,141,146,150,151,153,157,158,160,165]
Generative/ContrastiveAutoencoder, Contrastive Learning7[40,53,58,96,110,162,163]
Custom/Ensemble/Neural Architecture SearchReinforcement Learning, Bat Algorithm, Binarized Neural Network, AlexNet-1D Semi-supervised4[66,111,127,168]
Table 6. Characteristics of the databases used.
Table 6. Characteristics of the databases used.
IDDatabasefs (Hz)No. of RecordsRecord DurationAccessLeads Used
DB01 aAHA ECG Database (AHA)50045,15210 sPUB 12
DB02Asan Medical Center Liver Transplant Database50065,93210 sPRIV12
DB03 aAUMC ICU Biosignal Database500190,00010–20 sPRIV12
DB04 aAuthor-collected dataset50068776–60 sPUB12
DB05Chinese PLA General Hospital20014366 s–30 minPUBI, II
DB06CPSC-2018 (public set + CPSC-Extra)250358 minPUB1
DB07CPSC-2020250352 hPUB2
DB08 aCPSC-2021 (V1.0.3)50010,3445–10 sPUB12
DB09 aCustom wearable ECG device recordings50032,14210 sPRIVI, II, V1–V6
DB10Datasets from South Korean University Hospitals2577530 minPUB12 
DB11ECG Arrhythmia Classification Dataset3604830 minPUB2
DB12Federal Ministry of Education and Research Dataset2502510 hPUB2
DB13First Affiliated Hospital of Nanjing Medical University ECG Database30085289–61 sPUB1
DB14First People’s Hospital of Guangzhou Database10005492 minPUB12, 3 Frank
DB15 aKorea University Anam Hospital ECG Dataset100, 50021,79910 sPUB12
DB16Lobachevsky University Database (LUDB)50020010 sPUB12
DB17Long-Term AF Database (LTAFDB)50045,15210 sPUB12
DB18Mayo Clinic ECG Database250803 hPRIV2
DB19MIMIC-III125436NRPRIVII
DB20MIT-BIH Malignant Ventricular Arrhythmia Database (VFDB)250, 5002,648,100NRPRIVII
DB21MIT-BIH Noise Stress Test Database (NSTDB)NR6500NRPRIV12 
DB22MIT-BIH Supraventricular Arrhythmia Database (SVDB)500NR10 sPRIV12
DB23 aPatch Database50013,2566–144 sPUB/PRIV12
DB24 aPhysioNet 202040010~24 hPUB1
DB25 aPrivate 12-lead ECG Dataset100–1000>100,0005 s–30 minPUB12
DB26QT Database (QTDB)4002924 hPRIVI
DB27 aShandong Provincial Hospital Database (SPHw) aNR52,04310 sPRIVII
DB28Shandong Provincial Hospital Database (SPH) NR500010 sPRIV12
DB29Shandong Provincial Hospital Database (SPHDB) a51216,000120 sPRIVI, II
DB30Shanghai Ninth People’s Hospital Database (SNPH)NR277,80710–60 sPRIV12
DB31Shanxi Bethune Hospital Dataset10009010 sPRIV12
DB32Telehealth Network Minas Gerais (TNMG)20028,30810 sPUBII
DB33 aThird Affiliated Hospital of Sun Yat-sen University Database50020010 sPUB12
DB34Wearable ECG device recordings1288424–25 hPUB2
DB35 aWearable long-term ECG device recordings5002,499,522~10 sDUA12
DB36AHA ECG Database (AHA)125>67,000Up to several weeksPUBI, II, III, aVR, V
DB37Asan Medical Center Liver Transplant Database2502230 minPUB2
DB38AUMC ICU Biosignal Database3601230 minPUB2
DB39Author-collected dataset1287830 minPUB2
DB40Chinese PLA General HospitalNR32830 sPRIV1
DB41 aCPSC-2018 (public set + CPSC-Extra)100–100043,1015 s–30 minPUB12
DB42CPSC-2020NR549,211NRPRIV12
DB43CPSC-2021 (V1.0.3)25010515 mPUB2
DB44Custom wearable ECG device recordings200NR24 hPUB12
DB45 aDatasets from South Korean University Hospitals50025,77010–60 sPUB12
DB46ECG Arrhythmia Classification Dataset200NR24 hPUB12
DB47Federal Ministry of Education and Research Dataset50075,11111–92 sPRIV12
DB48 aFirst Affiliated Hospital of Nanjing Medical University ECG Database5007000NRPRIV12
DB49 aFirst People’s Hospital of Guangzhou Database300–6002,322,5137–10 sPUB12
DB50Korea University Anam Hospital ECG Dataset100079310 sPRIV12
DB51Lobachevsky University Database (LUDB)5005189NRPRIV12
DB52Long-Term AF Database (LTAFDB)40012~2 daysPRIV12
NR: not reported by the authors, a multi-label database, PUB: public, PRIV: private, DUA: Data Use Agreement.
Table 7. Metrics used.
Table 7. Metrics used.
IDMetricCountReferences
M01Precision35[17,21,55,57,61,62,63,64,66,71,76,77,82,83,84,87,88,89,91,93,95,98,99,102,104,111,112,113,120,121,122,128,130,133,134]
M02Sensitivity (recall)54[17,20,21,53,55,56,57,61,62,63,64,66,67,68,70,71,73,76,77,80,81,82,83,84,87,88,89,91,92,93,94,95,98,99,102,103,104,108,110,111,112,113,118,120,122,123,125,126,127,128,129,132,133,135]
M03Accuracy55[17,18,20,21,54,55,56,57,58,61,62,63,65,66,67,68,69,71,77,78,80,81,83,85,86,87,88,89,90,91,92,94,95,97,99,102,103,104,107,108,110,111,112,113,115,116,117,118,125,126,127,128,131,134,136]
M04Specificity29[21,56,62,66,67,71,73,76,80,81,82,87,89,92,94,102,103,104,110,111,112,123,125,126,127,129,132,134,135]
M05F1-Score59[17,18,20,21,40,54,55,56,57,60,61,62,63,64,69,70,74,75,76,77,79,81,82,83,84,86,87,88,89,91,93,95,97,98,99,100,101,104,105,106,108,110,111,112,114,115,116,117,118,120,121,122,129,130,131,132,133,134,136]
M06AUROC22[18,53,57,59,60,61,62,68,70,75,76,85,90,98,99,109,115,116,117,123,129,132]
M07AUPRC4[18,59,85,117]
M08Macro-F12[90,128]
M09G-Mean1[112]
M10NPV1[129]
M11mAP1[70]
Table 8. Explainability techniques used.
Table 8. Explainability techniques used.
IDTechnique DescriptionType of ExplainabilityStudies
TE01Activation MapsEnable understanding of how a model processes inputs across different convolutional layers.Post hoc
Integrated
[118]
TE02Attention MapsVisualize the spatial or temporal distribution of the attention learned by the model.Post hoc[115]
TE03Attention MechanismAllocates weights to portions of the input according to their importance for classification.Integrated[18,21,54,56,59,61,67,85,88,103,104,113,116,117,123,129,131,134,138,141,146,157,159,165,166]
TE04Embedding VisualizationShows how internal representations are organized in the model’s latent space.Post hoc[83]
TE05Feature HeatmapsHighlight the local importance of features over the input, typically in the temporal domain.Post hoc[105]
TE06Global Channel Attention Block (GCAB)Assigns weights to input channels to emphasize the most relevant ones.Integrated[101]
TE07Grad-CAMGenerates activation maps to identify the input regions most relevant for prediction.Post hoc[21,60,65,68,84,91,93,102,131,134,137,145,154,156,160,161,166,168]
TE08Gradient-based VisualizationVisualizes gradient magnitudes with respect to the input as an indicator of importance.Post hoc[121]
TE09HeatmapsDisplay the intensity of a feature at each point of input.Post hoc[114]
TE10Integrated GradientsAccumulates gradients between a baseline signal and the actual input to estimate feature importance.Post hoc[78]
TE11Layer-wise Relevance Propagation (LRP)Propagates relevance scores back to the input layers to identify key regions.Post hoc[78]
TE12Lead-wise Grad-CAMApplies Grad-CAM individually to each ECG lead to show its contribution to the prediction.Post hoc[103]
TE13Neural-Backed Ensemble Trees (NBET)Combines decision trees with neural networks to improve model interpretability.Integrated[97,98]
TE14Optimal Energy ClassifierApplies a minimum-energy principle to identify discriminative features.Integrated[99]
TE15Principal Component Analysis (PCA)Reduces the dimensionality of representations for visual analysis or pattern identification.Post hoc[79]
TE16SHapley Additive exPlanations (SHAP)Evaluates the contribution of each feature using principles from game theory.Post hoc[76,158]
TE17Saliency MappingCreates sensitivity maps that highlight the influence of each feature on the results.Post hoc[79,99]
TE18Self-attentionAssesses relationships between input elements with respect to themselves to assign relevance.Integrated[20]
TE19Semantic TransformationsApply semantic transformations to evaluate the model’s robustness and its understanding.Post hoc[58]
TE20Sensitivity MapsIndicate how the model’s output varies when parts of the input are modified.Post hoc[97]
TE21Similarity Matrix of EmbeddingsRepresents the similarity between embeddings to understand learned internal relationships.Post hoc[83]
TE22t-Distributed Stochastic Neighbor Embedding (t-SNE)Permits analysis and visualization of high-dimensional data, often in combination with other techniques to enhance representations.Post hoc[66,76,79,105,108,153]
TE23Weight-based Recursive Feature EliminationIteratively removes the least relevant features based on the learned weights.Post hoc[72]
Table 9. Purpose of the aspects of difficulty analysis.
Table 9. Purpose of the aspects of difficulty analysis.
IDAspect of Difficulty AnalysisPurpose
AA1PreprocessingEnhance data quality and usability so models can learn and infer with greater efficiency and accuracy.
AA2End-to-end DL techniquesAllow the model to automatically learn relevant features and classify arrhythmias and ischemia directly from data, without manual feature extraction or expert intervention.
AA3DatabaseSupply high-quality, well-labeled, and balanced data with demographic and pathological diversity of sufficient size and with clear documentation to support accurate ECG-based classification of cardiac pathologies.
AA4Cardiac pathologiesServe as diagnostic targets for the model, providing the classes it must learn to classify from ECG signals.
AA5MetricsProvide objective and quantitative measures of model performance in classifying cardiac pathologies and excluding non-pathological cases.
AA6Explainability techniquesClarify and provide support for the model’s results.
Table 10. Difficulties in end-to-end DL techniques.
Table 10. Difficulties in end-to-end DL techniques.
IDDifficultyEffectsReferences
D01Long sequencesIncrease resource consumption and cause processing latency, as well as temporal and clinical bias for transient events.[93]
D02Model complexity (black box)Requires large amounts of data, computational resources, and careful tuning. Expensive to train. Limited use on portable or real-time devices. Reduces explainability.[17,54,59,60,62,76,77,78,80,82,83,86,89,91,92,95,99,102,119,131]
D03Use of multiple leadsRaises model complexity, computational load, and resource demand. Requires coherent integration of signals and more labeled data. Unsuitable for portable devices.[56,58,62,68,80,82,95,101,103,107,128,134]
D04High memory and CPU/GPU requirementsLead to higher energy consumption and prevent implementation in real time or on resource-constrained hardware.[54,55,79,91,118]
D05Multi-labelingAdd complexity in design, training, and validation, and increase the difficulty of managing the output space.[58,67,82,91,116]
D06Large number of classesIncreases model complexity and computational demand while reducing accuracy for rare classes.[21]
D07Hyperparameter selectionStrongly affects performance and complicate adaptation to new domains or tasks.[74,91,93,131]
D08Embedded hardware or FPGARequires model optimization that may reduce performance. Involves high development complexity (FPGA) and limited resources in portable systems.[17,20,79,80,82,99,103,109,113,118,127,128,131,134]
D09Lack of standardization in architectures and protocolsHinders comparison across studies, limits reproducibility, and complicates benchmarking, cross-validation, transfer, and clinical adoption.[18,128]
D10Scaling the model to other pathologiesIncreases complexity, demands additional databases, hardware, and metrics, and further complicates explainability.[17,82,83,121,123]
D11Conversion to 2D spectrogramCauses loss of fine morphological and temporal information, raises computational complexity, and reduces clinical interpretability.[95]
D12Personalization of the model for individual patientsLeads to overfitting due to limited patient data and failure from intra-individual variability.[127]
D13Dependence on large volumes of high-quality labeled dataCreates latency, high computational requirements, imbalance issues, and challenges in multimodal integration.[20,62,107,121,133]
D14Deployment in diverse real-world or clinical settingsProduces bias toward training datasets and poor performance under atypical conditions or comorbidities.[20,67,69,72,82,83,85,90,95,100,108,123,125,126,127,128,129,131]
D15External cross-validationOmitting it inflates performance estimates and limits clinical acceptance.[21]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Oporto, E.; Mauricio, D.; Maculan, N.; Uribe, G. Challenges in the Classification of Cardiac Arrhythmias and Ischemia Using End-to-End Deep Learning and the Electrocardiogram: A Systematic Review. Diagnostics 2026, 16, 161. https://doi.org/10.3390/diagnostics16010161

AMA Style

Oporto E, Mauricio D, Maculan N, Uribe G. Challenges in the Classification of Cardiac Arrhythmias and Ischemia Using End-to-End Deep Learning and the Electrocardiogram: A Systematic Review. Diagnostics. 2026; 16(1):161. https://doi.org/10.3390/diagnostics16010161

Chicago/Turabian Style

Oporto, Edgard, David Mauricio, Nelson Maculan, and Giuliana Uribe. 2026. "Challenges in the Classification of Cardiac Arrhythmias and Ischemia Using End-to-End Deep Learning and the Electrocardiogram: A Systematic Review" Diagnostics 16, no. 1: 161. https://doi.org/10.3390/diagnostics16010161

APA Style

Oporto, E., Mauricio, D., Maculan, N., & Uribe, G. (2026). Challenges in the Classification of Cardiac Arrhythmias and Ischemia Using End-to-End Deep Learning and the Electrocardiogram: A Systematic Review. Diagnostics, 16(1), 161. https://doi.org/10.3390/diagnostics16010161

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop