Efficient ECG Beat Classification Using SMOTE-Enhanced SimCLR Representations and a Lightweight MLP

Gurler Ari, Berna

doi:10.3390/sym17101677

Open AccessArticle

Efficient ECG Beat Classification Using SMOTE-Enhanced SimCLR Representations and a Lightweight MLP

by

Berna Gurler Ari

Department of Computer Engineering, Engineering Faculty, Turkish National Defence University, Ankara 06654, Turkey

Symmetry 2025, 17(10), 1677; https://doi.org/10.3390/sym17101677

Submission received: 29 August 2025 / Revised: 18 September 2025 / Accepted: 20 September 2025 / Published: 7 October 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Cardiac arrhythmias are among the leading causes of morbidity and mortality worldwide, and accurate classification of electrocardiogram (ECG) beats is critical for early diagnosis and follow-up. Supervised deep learning is effective but requires abundant labels and substantial computation, limiting practicality. We propose a simple, efficient framework that learns self-supervised ECG representations with SimCLR and uses a lightweight Multi-Layer Perceptron (MLP) for classification. Beat-centered 300-sample segments from MIT-BIH Arrhythmia are used, and imbalance is mitigated via SMOTE. Framed from a symmetry/asymmetry perspective, we exploit a symmetric beat window (150 pre- and 150 post-samples) to encourage approximate translation invariance around the R-peak, while SimCLR jitter/scale augmentations further promote invariance in the learned space; conversely, arrhythmic beats are interpreted as symmetry-breaking departures that aid discrimination. The proposed approach achieves robust performance: 97.2% overall test accuracy, 97.2% macro-average F1-score, and AUC > 0.997 across five beat classes. Notably, the challenging atrial premature beat (A) attains 94.1% F1, indicating effective minority-class characterization with low computation. These results show that combining SMOTE with SimCLR-based representations yields discriminative features and strong generalization under symmetry-consistent perturbations, highlighting potential for real-time or embedded healthcare systems.

Keywords:

self-supervised learning; contrastive representation learning; ECG beat classification; SimCLR; MIT-BIH Arrhythmia dataset

1. Introduction

Cardiovascular diseases (CVDs) are the most common causes of death worldwide. CVD is the single leading cause of death in both developed and developing countries and includes a variety of cardiac conditions, including heart attack and hypertension [1]. According to the World Health Organization (WHO), 19.8 million people died of CVD worldwide in 2022 [2]. This number corresponds to approximately one-third of global deaths. Due to this high rate, early diagnosis of the disease is critical. Due to the gradual aging of the population worldwide, this number is thought to increase to 24 million by 2030 and 32.3 million by 2050 [3,4]. Technological developments, mainly including artificial intelligence, have facilitated a major revolution that extends to healthcare [5,6]. Ischemia is characterized by insufficient blood flow to the myocardium and can cause sudden cardiac death. Arrhythmia involves irregular heart rate due to abnormal electrical activity in the atriums of the heart. Early and accurate prognosis of the mentioned cardiac diseases is essential for patients’ improved quality of life and the prevention of deaths [4].

ECG is the most basic method for diagnosing heart rhythm disorders. Since manual analysis of ECGs, which provides important information in the diagnosis of heart diseases, is time-consuming, an accurate method of automatic analysis would be of great value. ECG records the electrical signals produced by the heart, which effectively assist the diagnosis of ischemia, arrhythmia, and other CVD conditions [7]. Any heart rhythm irregularity can change the ECG signal. It is based on a standard 12-lead system that tests the electrical potential of ten electrodes placed in various parts of the body, six on the chest and four on the limbs. ECG measures the electrical impulses of the heart through electrodes on the surface of the skin [8,9]. However, it is difficult to diagnose many arrhythmias with a standard resting ECG because it only provides a snapshot of the patient’s cardiovascular activity over time. An intermittent arrhythmia may go unnoticed, and physicians must rely on patients’ self-monitoring and reported symptoms to support their final diagnosis [1]. The detection of arrhythmia using ECG is challenging. This is due to the variability in each individual’s typical ECG waveform, the occurrence of different symptoms for a disease in different ECG waveform patients, the fact that two different diseases can have roughly similar effects in different ECG waveform patients, the inconsistency of ECG features, and the complete absence of an effective detection algorithm for ECG classification [8,10].

The normal frequency range of the signal is 0.05–100 Hz [1], the amplitude range is between 10 uV and 5 mV, and the normal value is 1 mV. Various heart diseases are analyzed using ECG signals [11]. Various detection techniques for cardiovascular diseases have largely been presented in recent years. Most of these methods consist of four steps: preprocessing (denoising), dimension reduction, feature selection, and different cardiac arrhythmia identification. The preprocessing stage makes signals suitable for processing, emphasizing the use of filters to remove the existing noise in the ECG signal record. Again, different transformation techniques are used to detect sensitive R peaks and the QRS complex in ECG signals. Using machine learning and deep learning techniques, feature extraction, feature selection, and classification techniques have been presented in ECG beat classification [12]. In recent years, deep learning-based models have achieved significant success in automatic analysis of ECG signals. In particular, one-dimensional convolutional neural networks (1-D CNNs) [13]; long-short-term memory (LSTM) networks [14]; and, more recently, Transformer-based models [15] have achieved high accuracy rates in arrhythmia classification tasks by effectively capturing sequential correlations in time series. The majority of studies on arrhythmia classification have used the MIT-BIH Arrhythmia Database, an open access and labeled resource. This dataset is widely accepted as a benchmark for model training and literature comparison due to the patient records covering various types of arrhythmia [16]. However, these supervised learning-based models present difficulties, such as the need for labeled data, that limit their wide-scale applicability. Manual labeling of ECG signals at the beat level is a time-consuming and costly process that requires significant expertise [15]. This limits the ability of supervised models to easily adapt to different datasets and their generalization capacity. In datasets such as MIT-BIH, critical arrhythmia types such as ventricular premature beats (VBs) are represented in minimal numbers compared to normal beats due to class distribution imbalance. This imbalance leads to low model performance in minority classes [17]. The main motivation of this study is to make the arrhythmia classification task more flexible and generalizable by reducing the need for labeled data. For this purpose, firstly, five clinically significant beat types (N, L, R, V, and A) were considered on the MIT-BIH Arrhythmia Database dataset; then, the class imbalance problem was balanced with the SMOTE (Synthetic Minority Over-sampling Technique) method. In this way, the representation density of all classes was equalized, and the learning performance of the model was increased in small classes. In order to reduce the need for labeled data in particular, an unsupervised representation learning method, SimCLR (Simple Framework for Contrastive Learning of Visual Representations), was used in this study, and meaningful vector representations were obtained based on two different views of each segment. These representations achieved 97.46% accuracy with a lightweight MLP (Multi-Layer Perceptron) classifier with only two layers. In addition, the performance of the obtained model was evaluated in detail with multi-faceted analyses such as confusion matrix, ROC-AUC curves, F1-score values, and t-SNE visualizations. With these aspects, this study shows that self-supervised learning can be effectively applied to medical time series such as ECG, and offers an alternative to supervised, complex models in the literature with its simple and explainable structure. Accordingly, expected symmetries in beat-centered segments are made explicit, and invariance in the learned representations is promoted through SimCLR augmentations. Departures from these symmetries—typical of arrhythmic beats—are exploited as informative signals that enhance class separability.

The organization of this paper is as follows: Section 2 presents the literature review of recent studies and methods used on ECG classification. Section 3 explains the steps of the proposed method in detail; the dataset, segmentation process, class balancing with SMOTE, SimCLR-based representation learning, and MLP classifier are discussed in this section. Section 4 presents the experimental results of the proposed method; training performance, class-based evaluation metrics, and results supported by visual analysis, such as ROC and t-SNE, are included. Section 5 compares the method with similar studies in the literature and discusses the general implications of the obtained results. Finally, Section 6 provides a general summary of the work, highlights the main contributions, and makes recommendations for future work.

2. Related Work

The vast majority of studies on ECG beat classification use publicly available datasets such as the MIT-BIH Arrhythmia Database and develop supervised models based on deep learning. Acharya et al. [13] reported 94.03% accuracy with a 1D CNN-based model. Similarly, Yildirim [14] achieved 99.39% accuracy with an LSTM-based deep structure. Rajkumar et al. [11] proposed a CNN model that achieved 98.21% accuracy on MIT-BIH data. More recently, Transformer architectures have also been adapted to this field [15]. Although these studies achieve strong results, high accuracy requires large amounts of labeled data, complex architectures, and long training times. While N (normal) beats are generally dominant in MIT-BIH data, the number of classes, such as VEB (ventricular ectopic beat), SVEB (supraventricular ectopic beat), is quite low. This imbalance makes it difficult for the model to learn minority classes. Various approaches have been proposed to solve this problem, such as resampling methods, such as SMOTE, GAN-based data generation, and class-weighted loss functions. However, most of these methods are tightly based on the supervised learning structure [17,18,19]. In recent years, self-supervised learning (SSL) approaches have also been used in signal processing, especially to reduce the dependency on labeled data. Contrastive learning methods such as SimCLR [20], BYOL [21], and MoCo [22], which have become widespread in image-based applications, have recently been adapted for 1-D signals such as ECG. Mehari and Strodthoff [23] compared SimCLR, BYOL, and SwAV-based representation learning approaches on 12-lead ECG signals and reported their classification success after both linear evaluation and fine-tuning. In their work, while the SimCLR method is most successful with the linear classifier, BYOL-based representation vectors have demonstrably provided higher overall performance in downstream tasks after fine-tuning. Nevertheless, studies in this area are conducted with a limited number of classes, low sample diversity, and detailed analyses, such as ROC-AUC, are often not presented.

This study proposes a novel framework that combines SimCLR-based self-supervised representation learning with segment-based data processing, class balancing with SMOTE, and an MLP classifier, working on five classes. Compared to the existing literature, the following was applied:

SimCLR + MIT-BIH + 5-class combination is performed.
Data balancing before contrastive learning with SMOTE is an original contribution.
A total of 97.46% accuracy and 0.997+ AUC are achieved in all classes with only the MLP classifier.
Discriminative performance of the model is detailed visually and numerically with ROC, t-SNE, and confusion matrix analyses.

3. Materials and Methods

In this section, the path followed in the ECG classification process is explained in detail. The structure of the MIT-BIH Arrhythmia dataset used in the study and the classes used are introduced. Then, the following steps are explained: segment generation from signals, studies carried out to eliminate class imbalance, self-supervised representation learning, and classification stages. The implementation method of the SimCLR-based contrastive learning architecture is particularly emphasized, and then, the augmentation strategies are detailed. We explain that the classification process is performed with a Multi-Layer Perceptron (MLP), and the metrics and analysis methods (confusion matrix, ROC-AUC, and t-SNE) used to evaluate the performance of the developed system are presented. The methodology flow summarizing the entire process from signal segmentation to final classification is presented in Figure 1.

3.1. Dataset Description

The dataset used in this study, the MIT-BIH Arrhythmia Database, is an open access resource developed by the Massachusetts Institute of Technology and Beth Israel Hospital [16]. The most widely used dataset in the field of arrhythmia diagnosis consists of 48 ECG recordings, each lasting 30 min, and contains a total of 47 different patient data points. The recordings in the dataset are at a sampling frequency of 360 Hz [24]. These recordings are mostly obtained over two channels, MLII and V5. In the annotation files that come with each recording, the location and type of each beat are labeled in accordance with the American Association for the Advancement of Medical Instrumentation (AAMI) standard. In the experimental study, five beat classes that are both clinically significant and frequently used in the literature were considered: normal sinus beat (N), left bundle branch block beat (L), right bundle branch block beat (R), ventricular premature contraction (V), and atrial premature contraction (A). This limitation to five categories was further justified by the following three additional reasons: (i) these classes conform to the AAMI EC57 guidelines, which aggregate MIT-BIH annotations into clinically pertinent categories commonly utilized in prior research; (ii) they encompass the most prevalent and clinically significant arrhythmias, guaranteeing sufficient sample sizes for effective training; (iii) the omission of infrequent beat types mitigates severe class imbalance and enhances generalization. Consequently, concentrating on these five criteria guarantees clinical significance and comparability with previous studies. During data processing, only the MLII channel was utilized, and segmentation activities were conducted exclusively over this channel [25].

3.2. Beat Segmentation

The signal data of each ECG record was processed in .csv format and annotations in .txt files for classification. Since the MLII derivation reflects the electrical activity of the heart most clearly at the ventricular and atrial levels, only the MLII channel was used in this study. This channel has also been preferred as the main signal source in many previous studies [25]. The location information of each beat is provided by the “Sample #” column in the Annotation files, and fixed-length segments are formed by taking the relevant locations as references. A segment is formed for each beat from 300 samples (approximately 0.83 s) based on the beat center. The basis of the formation of this window is a symmetrical structure consisting of 150 samples before the beat and 150 samples after the beat. Thanks to this example of a commonly used structure, attempts have been made to completely capture the full morphological structure of an ECG beat (P wave, QRS complex, and T wave) [26]. By centering 300-sample segments on the R-peak, a symmetric context is imposed, encouraging approximate translation invariance to minor temporal shifts.

Beats that are close to the signal boundaries and outside the window were not taken into account during the classification process. The segments to be used were converted to NumPy arrays and made suitable for use in machine learning algorithms. The transition to the augmentation and labeling processes was facilitated by the arrangement made. Methods such as sliding window or fixed interval segmentation cannot be positioned according to the beat center. For this reason, they may be insufficient in capturing the morphology of rhythm disorders. This is one of the motivations for the choice of beat-centered segmentation in our study.

3.3. Class Imbalance Handling with SMOTE

Class imbalance in the dataset used negatively affects the performance of supervised or unsupervised learning methods. It causes the model to generalize poorly, especially in arrhythmia types (such as V and A) with few data [27]. This imbalance is quite evident in the MIT-BIH dataset. In order to solve this problem, SMOTE (Synthetic Minority Over-sampling Technique) was applied in our study [28]. Instead of directly copying the examples in the classes with a few data points, SMOTE produces new synthesized data. Thanks to the technique used, the risk of overfitting is reduced and the representativeness of the data is increased. SMOTE is frequently preferred, especially in time series data and ECG classification tasks. Bing et al. [28] achieved an accuracy exceeding 99% in MIT-BIH data using the combination of SMOTE and focal loss in a study conducted in 2022. In another study by Khan et al. focusing on ECG, it was reported that 98.6% accuracy was achieved after the imbalance resolution with SMOTE [29]. Based on the studies, it has been stated that SMOTE is a suitable method both to increase the quality of representation in deep learning processes and to ensure that the model learns in a more balanced fashion in classes containing a small number of data [27,28,29,30]. Complex methods such as GAN or VAE were not preferred due to their computational and application difficulties. In our study, approximately 71,700 segments were created for each class. As shown in Table 1, this number was created for each class, which formed a balanced dataset before SimCLR. Through jitter and mild amplitude scaling, invariance to physiologically plausible transformations is promoted, while preserving morphology relevant for discrimination.

3.4. SimCLR-Based Representation Learning

In this study, the SimCLR (Simple Framework for Contrastive Learning of Representations) method was used to transform the signals obtained from the segments into meaningful and discriminative representations. Among self-supervised learning methods, SimCLR has been shown to produce strong representations from unlabeled data, primarily through its contrastive loss function [20]. Two differently augmented views of the same example were treated as a positive pair, whereas all other examples in the batch were treated as negatives, thereby encouraging similar samples to be pulled together and dissimilar samples to be pushed apart in the representation space. The architecture comprised an encoder and a projection head. A one-dimensional, three-layer convolutional neural network (1-D CNN) was used as the encoder, with 64, 128, and 256 filters in the first, second, and third layers, respectively; ReLU activations and batch normalization were applied after each layer, and adaptive average pooling was used to obtain a fixed-length feature vector. The encoder output was passed through a two-layer MLP to obtain a 128-dimensional projection used for contrastive learning. The loss function was the Normalized Temperature-scaled Cross-Entropy (NT-Xent):

L_{i, j} = \frac{e x p (s i m (z_{i}, z_{j}) / τ)}{- l o g \sum_{k = 1}^{2 N} 1 [k \neq i] e x p (s i m (z_{i}, z_{k}) / τ)}

(1)

z_{i}

ve

z_{j}

are the projection vectors obtained as a result of two different augmentations of the same sample. The expression

(s i m (z_{i}, z_{j}))

represents the cosine similarity between these two vectors,

τ

represents the temperature hyperparameter, while the expression

2 N

shows the total number of augmented samples in the batch. This function is computed for each positive pair, and the batch loss is obtained by averaging the per-pair values. During the augmentation process, three distortion techniques—jittering, mild amplitude scaling, and Gaussian noise—were applied randomly and in combination with each signal. Through jitter and mild amplitude scaling, invariance to physiologically plausible transformations is promoted, while preserving morphology relevant for discrimination. Training used a batch size of 512 for 10 epochs, during which the contrastive loss decreased from 5.40 to 4.98, indicating that positive pairs were embedded closer together while negative pairs were pushed apart. Although supervised CNNs or autoencoder-based alternatives can be adopted, such approaches typically entail high labeling costs or may yield representations with weaker class separation. In contrast, SimCLR produces class-agnostic, interpretable, and balanced representations driven solely by augmentations and signal similarity. Moreover, balancing the dataset with SMOTE provided evenly represented classes, enabling balanced batches and more stable contrastive optimization prior to the downstream classifier.

3.5. MLP Classifier

After completing SimCLR training, fixed-size representation vectors were obtained for each segment from the encoder network (in eval() mode). These vectors are 256-dimensional embeddings containing the learned representations taken from the output of the encoder layer of SimCLR. These representations were used as input in the classification phase.

A model with a two-layer lightweight Multi-Layer Perceptron (MLP) architecture was selected as the classifier [31]. The MLP model consists of an input layer that receives 256-dimensional vectors, a hidden layer with 128 neurons, and an output layer with SoftMax activation for five classes. The architecture of the model is shown in Figure 2:

During the training process, the model was trained using these representations, balanced with SMOTE, and extracted from the SimCLR encoder. Cross-Entropy Loss, which is suitable for multi-class problems, was preferred as the loss function, and the optimization was performed with the Adam algorithm. Training was performed for 10 epochs with a batch size of 256, and according to the results, the accuracy of the model started at 89.8% and reached 97.46% at the end of the 10th epoch. This shows that contrastive representations increase the discrimination power between classes and that high performance can be achieved with a simple classifier. Studies have been frequently conducted in the literature using more complex classifiers (e.g., LSTM, Transformer, or ensemble structures), but in this study, similar accuracy rates were achieved with a simple MLP architecture. This reveals that the method has both low computational cost and high interpretability.

The selection of a lightweight MLP classifier was intentional. The proposed approach effectively extracts highly discriminative and linearly separable representations via SimCLR, rendering a sophisticated downstream classifier like LSTM, CNN, or Transformer unnecessary. A straightforward two-layer MLP offers adequate non-linearity while maintaining a low parameter count (~0.45 M), rapid inference (0.8 ms/beat), and straightforward deployment on embedded platforms. This architecture mitigates the risk of overfitting and emphasizes that the efficacy of the proposed strategy predominantly resides in the quality of the learnt representations rather than the intricacy of the classifier.

4. Experimental Results

In this section, the performance of the proposed system, consisting of SimCLR-based representation learning and an MLP classifier, is evaluated in detail. The learning curves during the training process of the model are examined, and the performance is measured with metrics such as class-based accuracy, precision, recall, and F1-score. In addition, analyses such as confusion matrix, ROC-AUC curves, and t-SNE visualizations are performed to evaluate the discrimination and generalizability of the system. The obtained results show that its applicability is high and its classification performance is strong, thanks to its simple structure.

4.1. Training Performance

The proposed method has a two-stage training process: in the first stage, SimCLR-based self-supervised representation learning was performed, and in the second stage, the MLP classifier was trained in line with these representations.

The contrastive loss used during SimCLR training started from 5.40 in the 1st epoch and decreased to 4.97 at the end of the 10th epoch. This decrease shows that the model successfully learned to transform augmented positive pairs into close representations and negative pairs into distant representations. During the augmentations, the intention was to increase the generalizability of the model by using jittering, scaling, and Gaussian noise methods.

After the representations were obtained, the MLP classifier was trained with these fixed-size feature vectors. The cross-entropy loss value used in MLP training was 355.26 in the first epoch and decreased steadily to 79.09 at the end of the 10th epoch. In parallel, the accuracy value started from 89.7% and increased at the end of each epoch, reaching 97.46% as of the 10th epoch.

No overfitting was observed during the training and validation processes. The accuracy and loss curves were balanced; validation performance developed in parallel with training performance. This reveals that the model can learn generalizable representations not only to training data but also to validation samples in the same distribution.

The balanced dataset obtained after SMOTE was expanded to include 71,700 samples for each class. This dataset was divided into 80% training and 20% test ratios to maintain class balance. Thus, there are 14,340 samples in the test set of each class, and the experimental results are calculated on this test set.

In addition, as seen in Figure 3, when the course of precision, recall, and F1-score values according to epochs was examined, a balanced improvement was observed. These distributions serve as evidence that the model improves consistently across classes and can learn without neglecting any class. In particular, the performance increase observed in minority classes reveals the positive contribution of balancing applied with SMOTE to representation learning.

All these findings show that representation learning is successfully performed via SimCLR, with even a simple MLP classifier used afterwards being able to classify with high accuracy, and the model has high generalizability capacity in general.

4.2. Quantitative Evaluation

The performance of the model is evaluated using four basic metrics that are widely used for multi-class classification problems: precision, recall, F1-score, and overall accuracy. These metrics are calculated separately for each beat class (N, L, R, V, and A), and their macro and weighted averages are also included in the evaluation. In addition to accuracy, precision, recall, F1 score, and AUC, we also calculated Cohen’s Kappa and Matthews Correlation Coefficient (MCC), as shown in Table 2. Kappa corrects for random fit and provides a robust measure of inter-rater reliability, while MCC is a balanced correlation coefficient suitable for imbalanced datasets.

According to the evaluation results, the overall accuracy of the model is 97.2%, and the F1-score varies between 94% and 99.6% across the five classes. Precision and recall values are observed to be above 99% in the left bundle branch block (L) and ventricular premature contraction (V) classes, and between 96 and 98% in the normal beat (N) and right bundle branch block (R) classes. Although a minority class, the atrial premature contraction (A) class exhibits a lower F1-score (94.1%) compared to the others; however, this level is considered adequate given the class’s morphological diversity and relatively small sample size. Since the data are balanced, the weighted average, macro average, and accuracy coincide; therefore, only the accuracy value is reported. Performance was observed to remain stable under small temporal shifts, whereas classes with asymmetric morphology (e.g., A and V) were characterized by departures from the expected symmetries that aided separability. When the confusion matrix is examined, the majority of A-class samples are misclassified as N (normal) and R, as shown in Figure 4. This finding is also clinically plausible, since some atrial premature beats may display morphology close to sinus beats. In addition to these results, the proposed method achieved a Cohen’s Kappa score of 0.965 and an MCC of 0.965, confirming strong agreement beyond chance and showing balanced predictive performance across all classes.

The generalizability of the model across classes is also confirmed in the scatter plots of precision–recall and F1-score–recall relationships. These plots, presented in Figure 5, show that learning is not unevenly distributed across classes and that SimCLR representations are discriminative and reliable.

Additionally, the distribution of features obtained from the SimCLR encoder in the two-dimensional plane was examined using the t-SNE visualization given in Figure 6. It was observed that each class formed distinct clusters in the image, proving that sufficient separation was achieved in the representation space.

Finally, in Figure 7, the ROC curves evaluated the discrimination of the model for each class. The AUC scores were calculated as 1.000 for classes L, R, and V; 0.998 for N; and 0.997 for A. These values show that the model can decide between classes with high sensitivity and make clinically reliable classifications.

4.3. Comparison with Previous Studies

This section compares the performance of the proposed SimCLR-based representation learning and MLP classifier architecture with other previous work utilizing the MIT-BIH Arrhythmia dataset. The majority of the literature focuses on supervised deep learning models, including CNN, LSTM, or Transformer, while self-supervised learning approaches are rather rare. In a study including hybrid CNN–LSTM models that present a heterogeneous framework compared to conventional supervised models, Sun et al. [32] achieved an accuracy of 98.5% on the MIT-BIH dataset using the CNN-LSTM-SE architecture, alongside precision exceeding 97%, recall surpassing 98%, and an F1-score for each class. A separate study attained an accuracy of 99.58% using a hybrid CNN–Transformer technique; nonetheless, the model’s intricate and computationally intensive architecture was highlighted [33]. Conversely, Alamatsaz et al. [34] indicate that lightweight models, such as 1D CNN-LSTM, can achieve accuracy levels between 98% and 99%.

Research on self-supervised learning is also proliferating. Chen et al. evaluated the SimCLR, BYOL, and CLOCS methodologies on multi-channel ECG utilizing the “Temporal-Spatial Self-Supervised Learning” approach, achieving elevated ROC-AUC values with the SimCLR-based variation [35].

The efficacy of the most prominent strategies from the literature and the suggested framework is encapsulated in Table 3.

Table 3 demonstrates that Transformer-based models [15,33] attain the highest accuracy levels (~99–99.5%); nonetheless, they rely on considerably deep and computationally demanding architectures. CNN–LSTM hybrids, exemplified by [32,34], achieve above 98% accuracy; nonetheless, their complexity surpasses that of lightweight approaches. Patient-specific or residual CNN techniques [26,28,29,30] provide competitive outcomes (~95–98%) with varying trade-offs. The proposed SimCLR + MLP framework achieves an accuracy of 97.2%, an F1 score of 0.971, and an AUC of 0.983, employing around 0.45 million parameters. This balance between precision and efficiency highlights the competitiveness of the proposed methodology, particularly for real-time and embedded healthcare applications.

4.4. Computational Complexity Analysis

The computational efficiency of the suggested method was evaluated against leading ECG classification techniques. Table 3 encapsulates the findings regarding the dataset, number of classes, reported accuracy, supplementary metrics, and computational attributes.

Table 4 demonstrates that Transformer-based approaches attain the highest accuracy (~99.5%), but with intricate designs comprising millions of parameters and extended training durations. Likewise, CNN-LSTM-SE (Sun et al., 2024) [32] achieved 98.5% accuracy; however, it incorporates multiple convolutional and recurrent layers alongside channel attention, hence augmenting implementation complexity. The proposed SimCLR + MLP framework achieves competitive performance (97.2% accuracy, F1 = 0.971, AUC = 0.983) while necessitating over 0.45 million parameters, with a training duration of around 22 min for 10 epochs on a single GPU and an inference latency of 0.8 milliseconds per beat. The efficiency–performance equilibrium indicates that the suggested method is lightweight, scalable, and appropriate for real-time and embedded healthcare applications, rendering its implementation valuable despite somewhat reduced accuracy relative to more complex options. From a computational perspective, the total model size is approximately 1.8 MB, which is sufficiently small for deployment on embedded and mobile devices. These results confirm that the proposed approach not only maintains competitive accuracy but also fulfills the requirement for low computational complexity, making it feasible for real-time monitoring scenarios.

5. Discussion and Limitations

This study has shown that the self-supervised learning approach can be used effectively in the classification of beat segments obtained from ECG signals. With SimCLR-based representation learning, features with high discrimination were obtained without the need for class labels, and these representations were used very successfully with a simple MLP classifier.

Traditionally, ECG classification studies have been performed with high-parameter models such as CNN, LSTM, and Transformer. Although these models produce strong results, they have limitations in practical applications due to both the need for labeled data and computational costs. On the other hand, since the method used in this study works with unlabeled pre-training, it is more suitable, especially for situations where data labeling is difficult or costly.

The quality of the representations obtained with SimCLR has been demonstrated with both class-based metrics (F1-score between 94 and 99%) and visualization techniques such as t-SNE. In addition, the AUC values for each class in ROC-AUC analyses being 0.997 and above revealed that the model has a very high discrimination power between classes.

In particular, solving the data imbalance problem with SMOTE and applying it before contrastive learning has been a strategy that is rarely seen in the literature but has yielded effective results in this study. SMOTE has helped SimCLR learn more balanced positive and negative pairs, which has made it possible to achieve very successful classification results even for minority classes.

From this perspective, this study presents a strong architecture not only in terms of providing high accuracy but also in terms of simplicity, explainability, and reproducibility. The low computational requirement and label independence of the model indicate that this method can be integrated into real-time healthcare systems or form a basis for mobile healthcare applications. From a symmetry perspective, robustness to benign transformations was maintained, while clinically meaningful symmetry breaking was leveraged as an informative signal. A dedicated ablation in which the beat window is off-centered and augmentation transforms are removed is expected to quantify these effects more explicitly.

However, there are some limitations of the proposed method. The SimCLR framework used in the study is quite sensitive to augmentation techniques, and the effects of different augmentation combinations on representations have not been systematically investigated. In addition, the model has only been evaluated on the MIT-BIH dataset, and its generalizability to other datasets has not yet been tested. Future studies can address issues such as optimization of augmentation strategies, comparison of different self-supervised learning approaches (e.g., MoCo, BYOL), and applicability to different patient groups through transfer learning.

6. Conclusions

This study proposes a simple but effective approach to the classification of beat segments obtained from electrocardiography (ECG) signals using self-supervised learning-based representation learning. Strong features were learned without any label information using SimCLR, and these representations were evaluated with only a two-layer MLP classifier.

The data imbalance problem was resolved with the SMOTE method; thus, the representation power of minority classes was increased, and a balanced learning was achieved for all classes. The overall accuracy rate of the model was calculated as 97.46%, class-based F1-score values were in the range of 94–99%, and ROC-AUC scores were 0.997 and above for all classes. These results show that both SimCLR-based representation learning and a simple MLP structure can work effectively together. Overall, ECG-appropriate symmetries were made explicit, and invariance was promoted in the learned representations, while symmetry-breaking patterns were exploited to enhance class separability with minimal computational cost.

Compared to CNN, LSTM, and Transformer-based models common in the literature, the proposed method offers advantages such as lower computational cost, label independence, and high interpretability. In this respect, the proposed system offers an infrastructure that can be integrated into real-time or mobile health applications. In future studies, we plan to diversify augmentation strategies; compare different self-supervised structures; and perform multi-center, patient-dependent/independent tests. Thus, the clinical validity of the method can be evaluated on a larger scale.

Funding

This research received no external funding.

Data Availability Statement

The ECG data used in this study are openly available in the MIT-BIH Arrhythmia Database at PhysioNet (https://physionet.org/physiobank/database/mitdb/). Processed data and analysis codes are available from the corresponding author upon reasonable request. [PhysioNet] [https://physionet.org/physiobank/database/mitdb/] [16], accessed on 27 September 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Oresko, J.J.; Jin, Z.; Cheng, J.; Huang, S.; Sun, Y.; Duschl, H.; Cheng, A.C. A Wearable Smartphone-Based Platform for Real-Time Cardiovascular Disease Detection Via Electrocardiogram Processing. IEEE Trans. Inf. Technol. Biomed. 2010, 14, 734–740. [Google Scholar] [CrossRef]
World Health Organization (WHO). Cardiovascular Diseases (CVDs). 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 26 June 2025).
Yıldırım, Ö.; Pławiak, P.; Tan, R.-S.; Acharya, U.R. Arrhythmia detection using deep convolutional neural network with long duration ECG signals. Comput. Biol. Med. 2018, 102, 411–420. [Google Scholar] [CrossRef]
Alam Sathi, T.; Jany, R.; Ela, R.Z.; Azad, A.; Alyami, S.A.; Hossain, A.; Hussain, I. An interpretable electrocardiogram-based model for predicting arrhythmia and ischemia in cardiovascular disease. Results Eng. 2024, 24, 103381, Corrigendum in Results Eng. 2025, 25, 104070. [Google Scholar] [CrossRef]
Narasimhan, G.; Victor, A. Empirical analysis of predicting heart disease using diverse datasets and classification procedures of machine learning. Ain Shams Eng. J. 2025, 16, 103470. [Google Scholar] [CrossRef]
Rajpurkar, P.; Chen, E.; Banerjee, O.; Topol, E.J. AI in health and medicine. Nat. Med. 2022, 28, 31–38. [Google Scholar] [CrossRef]
Ayano, Y.M.; Schwenker, F.; Dufera, B.D.; Debelee, T.G. Interpretable Machine Learning Techniques in ECG-Based Heart Disease Classification: A Systematic Review. Diagnostics 2022, 13, 111. [Google Scholar] [CrossRef]
Manivannan, G.S.; Rajaguru, H.; S, R.; Talawar, S.V. Cardiovascular disease detection from cardiac arrhythmia ECG signals using artificial intelligence models with hyperparameters tuning methodologies. Heliyon 2024, 10, e36751. [Google Scholar] [CrossRef]
Jambukia, S.H.; Dabhi, V.K.; Prajapati, H.B. ECG beat classification using machine learning techniques. Int. J. Biomed. Eng. Technol. 2018, 26, 32. [Google Scholar] [CrossRef]
Singh, Y.N.; Singh, S.K.; Ray, A.K. Bioelectrical Signals as Emerging Biometrics: Issues and Challenges. ISRN Signal Process. 2012, 2012, 1–13. [Google Scholar] [CrossRef]
Rajkumar, A.; Ganesan, M.; Lavanya, R. Arrhythmia classification on ECG using Deep Learning. In Proceedings of the 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), Coimbatore, India, 15–16 March 2019; IEEE: New York, NY, USA, 2019; pp. 365–369. [Google Scholar] [CrossRef]
Sahoo, S.; Dash, M.; Behera, S.; Sabut, S. Machine Learning Approach to Detect Cardiac Arrhythmias in ECG Signals: A Survey. IRBM 2020, 41, 185–194. [Google Scholar] [CrossRef]
Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adam, M.; Gertych, A.; San Tan, R. A deep convolutional neural network model to classify heartbeats. Comput. Biol. Med. 2017, 89, 389–396. [Google Scholar] [CrossRef]
Yildirim, Ö. A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification. Comput. Biol. Med. 2018, 96, 189–202. [Google Scholar] [CrossRef]
Hu, R.; Chen, J.; Zhou, L. A transformer-based deep neural network for arrhythmia detection using continuous ECG signals. Comput. Biol. Med. 2022, 144, 105325. [Google Scholar] [CrossRef]
Moody, G.B.; Mark, R.G. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng. Med. Biol. Mag. 2001, 20, 45–50. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Esteban, C.; Hyland, S.L.; Rätsch, G. Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs. arXiv 2017, arXiv:1706.02633. [Google Scholar] [CrossRef]
Kachuee, M.; Fazeli, S.; Sarrafzadeh, M. ECG Heartbeat Classification: A Deep Transferable Representation. In Proceedings of the 2018 IEEE International Conference on Healthcare Informatics (ICHI), New York, NY, USA, 4–7 June 2018; IEEE: New York, NY, USA, 2018; pp. 443–444. [Google Scholar] [CrossRef]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. July 2020. Available online: http://arxiv.org/abs/2002.05709 (accessed on 1 July 2020).
Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.D.; Azar, M.G. Bootstrap your own latent: A new approach to self-supervised Learning. arXiv 2020, arXiv:2006.07733. [Google Scholar] [CrossRef]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 9726–9735. [Google Scholar] [CrossRef]
Mehari, T.; Strodthoff, N. Self-supervised representation learning from 12-lead ECG data. Comput. Biol. Med. 2022, 141, 105114. [Google Scholar] [CrossRef]
Clifford, G.D.; Liu, C.; Moody, B.; Lehman, L.H.; Silva, I.; Li, Q.; Johnson, A.E.; Mark, R.G. AF Classification from a Short Single Lead ECG Recording: The PhysioNet/Computing in Cardiology Challenge 2017. In Proceedings of the 2017 Computing in Cardiology (CinC), Rennes, France, 24–27 September 2017. [Google Scholar] [CrossRef]
Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.-K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, 101, E215–E220. [Google Scholar] [CrossRef]
Kiranyaz, S.; Ince, T.; Gabbouj, M. Real-Time Patient-Specific ECG Classification by 1-D Convolutional Neural Networks. IEEE Trans. Biomed. Eng. 2016, 63, 664–675. [Google Scholar] [CrossRef]
Salmi, M.; Atif, D.; Oliva, D.; Abraham, A.; Ventura, S. Handling imbalanced medical datasets: Review of a decade of research. Artif. Intell. Rev. 2024, 57, 1–57. [Google Scholar] [CrossRef]
Bing, P.; Liu, Y.; Liu, W.; Zhou, J.; Zhu, L. Electrocardiogram classification using TSST-based spectrogram and ConViT. Front. Cardiovasc. Med. 2022, 9, 983543. [Google Scholar] [CrossRef]
Khan, F.; Yu, X.; Yuan, Z.; Rehman, A.U. ECG classification using 1-D convolutional deep residual neural network. PLoS ONE 2023, 18, e0284791. [Google Scholar] [CrossRef]
Kwak, J.; Jung, J. Classification of imbalanced ECGs through segmentation models and augmented by conditional diffusion model. PeerJ Comput. Sci. 2024, 10, e2299. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Sun, A.; Hong, W.; Li, J.; Mao, J. An Arrhythmia Classification Model Based on a CNN-LSTM-SE Algorithm. Sensors 2024, 24, 6306. [Google Scholar] [CrossRef]
Kim, D.; Lee, K.R.; Lim, D.S.; Lee, K.H.; Lee, J.S.; Kim, D.-Y.; Sohn, C.-B. A novel hybrid CNN-transformer model for arrhythmia detection without R-peak identification using stockwell transform. Sci. Rep. 2025, 15, 7817. [Google Scholar] [CrossRef] [PubMed]
Alamatsaz, N.; Tabatabaei, L.; Yazdchi, M.; Payan, H.; Alamatsaz, N.; Nasimi, F. A lightweight hybrid CNN-LSTM explainable model for ECG-based arrhythmia detection. Biomed. Signal Process. Control. 2023, 90, 105884. [Google Scholar] [CrossRef]
Chen, W.; Wang, H.; Zhang, L.; Zhang, M. Temporal and spatial self supervised learning methods for electrocardiograms. Sci. Rep. 2025, 15, 6029. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed ECG classification methodology.

Figure 2. Architecture of MLP classifier.

Figure 3. Precision, recall, F1-score vs. epoch.

Figure 4. Confusion matrix for five-class arrhythmia classification task.

Figure 5. Class-wise scatter plots of precision vs. recall and F1-score vs. recall.

Figure 6. t-SNE projection of the learned representations from SimCLR encoder.

Figure 7. ROC curves of the five-class ECG classifier with AUC metrics.

Table 1. Class distribution before and after SMOTE oversampling.

Beat Class	Label	Clinical Description	Original Count	After SMOTE
N	0	Normal sinus beat	71,700	71,700
L	1	Left bundle branch block beat	6070	71,700
R	2	Right bundle branch block beat	6002	71,700
V	3	Premature ventricular contraction	6455	71,700
A	4	Atrial premature contraction	2535	71,700

Table 2. Classification performance metrics.

Class	Precision	Recall	F1-Score
N	0.939	0.965	0.952
L	0.998	0.995	0.996
R	0.981	0.982	0.981
V	0.987	0.989	0.988
A	0.955	0.928	0.941
Cohen’s Kappa			0.965
MCC			0.965
Overall Accuracy			0.972

Table 3. Comparison of ECG categorization methodologies in the literature with the suggested approach.

Study	Dataset	Classes	Method	Accuracy	Notes
Kiranyaz et al. (2016) [26]	MIT-BIH	5	1D CNN (patient-specific)	95–96%	Real-time, personalized CNN model
Acharya et al. (2017) [13]	MIT-BIH	5	Deep CNN (9-layer)	94–95%	Benchmark deep CNN for ECG
Yıldırım et al. (2018) [3]	MIT-BIH	5	Deep CNN (long-duration ECG)	~99% (F1 > 0.98)	Long ECG segments
Yıldırım (2018) [14]	MIT-BIH	5	Wavelet sequence + BiLSTM	99.39%	Wavelet + BiLSTM
Hu et al. (2022) [15]	MIT-BIH, MIT-BIH AF	8/4/2	Transformer (ECG-DETR)	99.12–99.49%	End-to-end; no explicit segmentation
Alamatsaz et al. (2022) [34]	MIT-BIH	5	Lightweight CNN-LSTM	98.2%	Low-complexity hybrid
Bing et al. (2022) [28]	MIT-BIH	5	ConViT + TSST spectrogram	98%	Time-frequency + Transformer
Khan et al. (2023) [29]	MIT-BIH	5	1D deep residual CNN	98%	Residual blocks
Kwak & Jung (2024) [30]	MIT-BIH	5	Segmentation + diffusion model	98%	Conditional diffusion augmentation
Sun et al. (2024) [32]	MIT-BIH	5	CNN-LSTM-SE	98.5%	SE attention
Proposed (SimCLR + MLP)	MIT-BIH	5	Contrastive learning + MLP	97.2% (F1 = 0.971, AUC = 0.983)	~0.45 M params; lightweight

Table 4. Performance and complexity comparison of ECG classification methods.

Study/Model	Dataset	Classes	Accuracy	Other Metrics	Parameters/Complexity	Notes
Sun et al. [32]	MIT-BIH	5	98.5%	Precision > 97%, Recall > 98%, F1 > 0.98	Not reported	CNN-LSTM with SE attention
Kim et al. [33]	MIT-BIH + others	Multi-class	99.5%	F1 ≈ 0.99	Not reported	Vision Transformer architecture
Alamatsaz et al. [34]	MIT-BIH	5	97–98%	F1 > 0.96	Not reported	Lightweight CNN for arrhythmia classification
Proposed (SimCLR + MLP)	MIT-BIH	5	97.2%	F1 = 0.971, AUC = 0.983	~0.45 M params, training ~22 min/10 epochs, inference ~0.8 ms/beat	Lightweight, contrastive embeddings and simple MLP

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gurler Ari, B. Efficient ECG Beat Classification Using SMOTE-Enhanced SimCLR Representations and a Lightweight MLP. Symmetry 2025, 17, 1677. https://doi.org/10.3390/sym17101677

AMA Style

Gurler Ari B. Efficient ECG Beat Classification Using SMOTE-Enhanced SimCLR Representations and a Lightweight MLP. Symmetry. 2025; 17(10):1677. https://doi.org/10.3390/sym17101677

Chicago/Turabian Style

Gurler Ari, Berna. 2025. "Efficient ECG Beat Classification Using SMOTE-Enhanced SimCLR Representations and a Lightweight MLP" Symmetry 17, no. 10: 1677. https://doi.org/10.3390/sym17101677

APA Style

Gurler Ari, B. (2025). Efficient ECG Beat Classification Using SMOTE-Enhanced SimCLR Representations and a Lightweight MLP. Symmetry, 17(10), 1677. https://doi.org/10.3390/sym17101677

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient ECG Beat Classification Using SMOTE-Enhanced SimCLR Representations and a Lightweight MLP

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Dataset Description

3.2. Beat Segmentation

3.3. Class Imbalance Handling with SMOTE

3.4. SimCLR-Based Representation Learning

3.5. MLP Classifier

4. Experimental Results

4.1. Training Performance

4.2. Quantitative Evaluation

4.3. Comparison with Previous Studies

4.4. Computational Complexity Analysis

5. Discussion and Limitations

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI