Transferring Learned ECG Representations for Deep Neural Network Classification of Atrial Fibrillation with Photoplethysmography

Ramesh, Jayroop; Solatidehkordi, Zahra; Aburukba, Raafat; Sagahyroon, Assim; Aloul, Fadi

doi:10.3390/app15094770

Open AccessArticle

Transferring Learned ECG Representations for Deep Neural Network Classification of Atrial Fibrillation with Photoplethysmography

by

Jayroop Ramesh

^*

,

Zahra Solatidehkordi

,

Raafat Aburukba

,

Assim Sagahyroon

and

Fadi Aloul

Department of Computer Science and Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 4770; https://doi.org/10.3390/app15094770

Submission received: 1 March 2025 / Revised: 13 April 2025 / Accepted: 19 April 2025 / Published: 25 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Atrial fibrillation (AF) is a type of cardiac arrhythmia with a worldwide prevalence of more than 37 million among the adult population. This elusive disease is a major risk factor for ischemic stroke, along with increased rates of significant morbidity and eventual mortality. It is clinically diagnosed using medical-grade electrocardiogram (ECG) sensors in ambulatory settings. The recent emergence of consumer-grade wearables equipped with photoplethysmography (PPG) sensors has exhibited considerable promise for non-intrusive continuous monitoring in free-living conditions. However, the scarcity of large-scale public PPG datasets acquired from wearable devices hinders the development of intelligent automatic AF detection algorithms unaffected by motion artifacts, saturated ambient noise, inter- and intra-subject differences, or limited training data. In this work, we present a deep learning framework that leverages convolutional layers with a bidirectional long short-term memory (CNN-BiLSTM) network and an attention mechanism for effectively classifying raw AF rhythms from normal sinus rhythms (NSR). We derive and feed heart rate variability (HRV) and pulse rate variability (PRV) features as auxiliary inputs to the framework for robustness. A larger teacher model is trained using the MIT-BIH Arrhythmia ECG dataset. Through transfer learning (TL), its learned representation is adapted to a compressed student model (32x smaller) variant by using knowledge distillation (KD) for classifying AF with the UMass and MIMIC-III datasets of PPG signals. This results in the student model yielding average improvements in accuracy, sensitivity, F1 score, and Matthews correlation coefficient of 2.0%, 15.05%, 11.7%, and 9.85%, respectively, across both PPG datasets. Additionally, we employ Gradient-weighted Class Activation Mapping (Grad-CAM) to confer a notion of interpretability to the model decisions. We conclude that through a combination of techniques such as TL and KD, i.e., pre-trained initialization, we can utilize learned ECG concepts for scarcer PPG scenarios. This can reduce resource usage and enable deployment on edge devices.

Keywords:

biomedical informatics; cardiovascular disease; deep learning; ECG; heart rate variability; knowledge distillation; PPG; smart wearables; transfer learning

1. Introduction

The cardiovascular system consists of the heart and blood vessels. Cardiovascular disease (CVD) is a general term used to describe a wide array of issues that can arise in the cardiovascular system, including strokes, peripheral arterial disease, and coronary heart disease, which can lead to myocardial infarction and heart failure [1,2]. According to the World Health Organization, CVD is the leading cause of death worldwide, causing an estimated 17.9 million deaths in 2019, or 32% of all global deaths [3]. The average annual cost associated with CVD is estimated to be USD 378 billion in the US and EUR 210 billion in the European Union [4,5]. Cardiac arrhythmia is a type of CVD characterized by an abnormal heart rate or rhythm that is not physiologically justified [6]. Cardiac arrhythmia can involve a slow heartbeat (bradycardia) or a fast heartbeat (tachycardia). Atrial fibrillation (AF) is the most common type of cardiac arrhythmia, which involves irregular and often abnormally rapid heartbeats caused by the dysfunctional activity of electrical impulses in the atria [3,7]. It can lead to cardiovascular complications, such as blood clots, stroke, and heart failure, and is responsible for one-third of ischemic strokes [8]. The mortality risk in patients with AF is 4 times higher than in the general population and 1.5 to 1.9 times higher when adjusted for preexisting cardiovascular conditions related to AF [7,9]. The timely detection of AF is vital in order to initiate adequate treatment and prevent the risk of strokes, which are often the first manifestation of the disease [8].

The most common test used to diagnose cardiac arrhythmia is an electrocardiogram (ECG), which measures the heart’s electrical activity using skin-level electrodes with built-in sensors [10]. ECGs detect irregular heartbeats, blocked or narrowed arteries, and other cardiovascular abnormalities. With the emergence of deep learning as an effective method for medical prediction and diagnosis, several models have been proposed for ECG signal classification using neural networks [11,12,13,14]. However, most ECG devices currently available are expensive and are typically used in limited hospital or ambulatory settings. Newer home-monitoring devices have recently been integrated with ECGs, but they remain considerably more expensive than the alternative of photoplethysmography (PPG). Additionally, the materials used for high-quality equipment may cause skin irritation and discomfort during extended use, limiting the devices’ long-term use [15]. PPG is a vascular optical measurement technique that detects variations in the blood volume of skin tissue and can be used for heart rate monitoring and cardiovascular abnormality detection [16]. PPG sensors are non-invasive, unobtrusive, and inexpensive, making PPG an appropriate alternative to ECG, although only as a supplementary approach for screening and not for definitive diagnosis. PPG sensors are available in various devices, including smartphones, smartwatches, and fitness trackers, making them more widely available to the general population [16]. Some limitations of PPG include sensitivity to motion artifacts caused by hand movements and vulnerability to environmental noise [17,18].

The data obtained from ECG and PPG signals can be temporal or morphological [19]. Temporal features offer data based on time, such as the duration between heartbeats, while morphological features describe the appearance of the curve. Despite certain morphological differences, ECG- and PPG-based signals have been shown to have a high degree of correlation [20,21]. These two signals do indeed reflect different physical essences; however, the correlation is in terms of peak-to-peak variability. Heart rate variability (HRV) measures the variation in time between consecutive cardiac cycles obtained from ECG signals. PPG signals offer similar information through pulse rate variability (PRV), which represents the variation in time between pulse signals [22]. The similarity in the properties between HRV and PRV is higher when PPG signals are not excessively affected by motion artifacts and noise. In the temporal domain, AF irregularities follow similar statistical patterns in both modalities because they originate from the same arrhythmic cardiac activity. In the frequency domain, the chaotic frequency patterns characteristic of AF are preserved across both signals. Here, transfer learning (TL) is a paradigm that is leveraged to adapt knowledge mined on a specific task to another closely related secondary task. While absolute values may differ between HRV and PRV due to physiological transit time and vascular compliance factors, the relative patterns and feature relationships critical for AF classification remain consistent, making learned feature hierarchies transferable.

Some of the limitations we face when developing PPG-based analytical models that make TL appealing include the scarcity of publicly available, universally reviewed benchmark databases and the degradation of model performance due to low signal quality [23]. There are also many influencing factors such as skin tone, motion artifacts and signal crossover, body temperature, and ambient light, as well as sensor design characteristics and positioning [24]. The process of manually annotating datasets is complex and suffers from inter-rater variability, which occurs when multiple experts working on a dataset assign different labels to the same data instance [25]. Reaching a consensus among experts is difficult if the data are not optimally preprocessed, which is an issue present in the majority of PPG signal annotation efforts in the literature. Furthermore, most of the previously developed algorithms are only applicable in controlled clinical settings, making early prognosis inaccessible to the general population. Another limitation in the usage of large deep learning models with complex architectures for medical screening is their high requirements for memory, computational power, and storage space, making them inappropriate for deployment in real-world applications. Knowledge distillation (KD) is a method that compresses a complex model into a smaller and faster model without significant performance loss [26]. This method employs a larger network (called the teacher) to train a smaller network (called the student) to help the latter with improved generalization by virtue of the knowledge gained by the complex teacher model. This knowledge is transferred to the simplified student model and used for supervision of its training. KD can be used to introduce complementary information from additional modalities during training while avoiding their computation during testing [27]. Research has shown that the usage of KD results in greater generalization capabilities and faster training [28]. KD has previously been used to create small and robust models for tasks such as arrhythmia detection, sleep staging, and emotion recognition using ECG signals [28,29,30,31,32,33]. Essentially, within the scope of our experiments, the teacher model is trained on an ECG dataset, while the student model is the smaller variant that is fine-tuned with the PPG dataset and trained using the soft labels generated by the teacher. Hence, in accordance with the literature, we will be using the synonymous terms teacher and student to refer to these models throughout this work.

This work focuses on classifying normal sinus rhythm (NSR) and AF. NSR represents the rhythm of the healthy human heart, which originates from the sinus node [34]. In order to address the lack of large-scale annotated PPG datasets, our work proposes the usage of a deep learning model trained on a large ECG dataset to assist in enhancing the PPG-based model, as ECG datasets are relatively better annotated and compiled [35]. The knowledge representation learned from ECG signals is transferred to the PPG-based model using transfer learning along with KD, resulting in a smaller, less computationally expensive, and more robust PPG-based model with enhanced performance.

The objectives of this work are as follows:

To implement a deep learning model trained on gold-standard raw ECG signals and derived HRV features for AF detection to approximate a relationship with raw PPG signals and derived PRV features.
To investigate the effectiveness of transfer learning and KD using an upstream ECG dataset for an AF classification model across two downstream PPG datasets.
To provide a notion of interpretability using multiple techniques to elucidate the inner workings of black-box models and promote clinical acceptance.

This paper is organized as follows. Section 2 reviews the existing literature. Section 3 details the proposed approach. Section 4 reports the results obtained by the model. Section 5 discusses the results. Section 6 presents the conclusions and future work.

2. Literature Review

A literature survey of existing computer methods in biomedical research found that deep learning, specifically convolutional neural networks (CNNs), is one of the most promising techniques used to solve problems in neurology and cardiology. One such case can be found in [36], which presented a model for automated EEG-based (brain’s electrical activity) screening of depression using a 13-layer deep neural network. The proposed approach was designed to automatically learn differentiating features from the input EEG signals from depressive and normal subjects, achieving accuracy rates of 93.5% and 96.0% for the left and right hemispheres, respectively. The results of the study were found to be consistent with the research theory that depression is primarily associated with the right hemisphere.

Ref. [37] discussed ECG signal processing and heartbeat segmentation, feature description methods, and learning algorithms used in typical arrhythmia detection situations, which influenced the approach undertaken in this work, in addition to the use of a convolutional neural network.

The presented works are complementary to our approach, as our goal is to study the effects of transferring information learned from ECG to PPG waveforms due to the scarcity of PPG data in the public domain. We summarize the most relevant work in Table 1.

2.1. ECG and PPG Arrhythmia Classification

The University of California, Los Angeles [38], extracted and processed ECG heartbeats from the MIT-BIH and PTB (Pulmonary Tuberculosis) diagnostic databases to classify five different arrhythmias and predict myocardial infarctions using a deep learning convolutional neural network with residual connections. Their implementation focused on using statistical methods on individual heartbeats and utilizing a transferable approach for prediction tasks.

The researchers in [12] created a 1D convolutional neural network using the MIT-BIH database, with an analytical approach based on the analysis of 10 s ECG signal fragments to classify different classes of arrhythmia. The preprocessing methods involve normalizing, standardizing, and rescaling raw data. The system requires ECG readings from a user, which are then uploaded to the cloud using a smartphone and then processed by the proposed 1D-CNN.

In [14], a personalized health monitoring system was designed that can detect early occurrences of arrhythmia from a dedicated device’s ECG signals using 1D convolutional neural networks and the MIT-BIH database. A systematic method of using ABS filters, synthesizing abnormal beats from a person’s ECG beats, forming a training dataset for one person using their real and abnormal beats, and then using the neural network was followed. The solution was designed to be used in a client–server paradigm.

Deep PPG [39] takes in a 30 s PPG recording and passes it through a 50-layer convolutional neural network to predict the presence of atrial fibrillation. Samsung wrist-wearable devices with a sampling frequency of 20 Hz were used, and a dataset of 510,556 PPG records was collected from 81 patients, along with a simultaneous collection of ECG patch references. The model achieved an AUC (Area Under the Curve) of 95% on the test set and was shown to be robust to motion artifacts.

Hannun et al. [40] developed a convolutional deep neural network for ECG classification. They used 91,232 single-lead ECGs from 53,549 patients, recorded using an ambulatory ECG device. The network architecture contains 34 layers and consists of 16 residual blocks, with two convolutional layers per block. It receives raw ECG data as input and outputs one prediction per 256 samples or every 1.28 s. The model classifies 12 rhythm classes, and the results showed that, on average, it outperforms cardiologists by achieving similar specificity but higher sensitivity.

In [41], Cheng et al. employed a combination of time-frequency analysis and deep learning for AF classification with PPG signals. The continuous wavelet transformation (CWT) is used to process PPG signals and convert PPG data into a time-frequency chromatograph, which is then used as the input to the model. The model architecture is a combination of a 2D-CNN and an LSTM network. It achieved accuracy, sensitivity, specificity, and F1 scores above 98%.

In [42], a deep learning model called BayesBeat was proposed for AF detection using PPG signals. The dataset used includes more than 500k PPG signal segments from 175 individuals. The proposed model is based on Bayesian deep learning due to its robustness to noise, which it achieves by learning the distribution of network weights as opposed to fixed value weights. The network consists of nine Bayesian convolutional layers and two fully connected layers. The 1D Bayesian convolutional layer transforms an input into output distributions, and this model provides certainty estimates of the classification, making it useful for real-world applications. The results showed state-of-the-art performance and a relatively memory-efficient model.

Shashikumar et al. [43] proposed one of the first methods involving the cross-domain generalizability of AF detection models. The model is a bidirectional recurrent neural network that is first pre-trained on single-lead ECG data, then fine-tuned on PPG data, and adapted for PPG classification using transfer learning. The dataset contains ECG recordings of 2850 patients and PPG data of 97 patients obtained from a smartwatch. The pre-trained model achieved an AUC score of 0.97 and showed a statistically significant improvement compared to the model trained only on PPG data.

Aliamiri and Shen [44] proposed a deep learning-based AF detection model using wearable PPG sensors. The dataset used contains 1443 PPG segments collected from 19 patients. First, a quality assessment model was developed, which consists of an auxiliary convolutional neural network trained to detect PPG signals of good quality. These signals are then given to the AF detector network as input for AF classification. Two detector networks were developed: a baseline two-layer convolutional neural network model and a convolution–recurrent hybrid model, both of which exhibited strong performance.

2.2. Knowledge Distillation for Compression

Previous research has shown the effectiveness of KD in developing compressed models appropriate for deployment on smaller devices without performance loss. We report similar work in Table 2. Moreover, KD has been used to improve performance by introducing information from additional modalities without increasing the model’s complexity. Ref. [28] asserted that the required computational power and memory overheads impose restrictions on continuous monitoring systems, especially within the context of wearables. The authors applied knowledge distillation to a convolutional recurrent neural network for AF detection using ECG signals, allowing deployment on low-power wearable devices. In the distillation process, the teacher is a network containing more than one million parameters, trained on a labeled dataset of 8528 single-lead ECG signals. The student is a slim network containing 138k parameters, trained on the signals resampled at 122 Hz and the soft labels generated by the teacher. The compression of the network resulted in a speed-up of 38% and a decrease of 43% in memory footprint. After the application of KD and quantization, the slim model achieved an increase of 6.3% in the F1 score and 7.8% in sensitivity.

In [29], the authors proposed a KD framework for single-lead ECG-based sleep staging. This framework employs multi-modal training of EEG and ECG in the teacher network and unimodal testing in the final ECG-based student model. Both the teacher and the student models’ base architectures are adapted from the fully convolutional network U-Time. The dataset contains whole-night EEG and ECG recordings from 200 participants, resampled down to 200 Hz. The model showed an increase of 14.3% in the F1 score of the four-class classification model and 13.4% in the F1 score of the three-class classification model compared to the ECG baseline.

In [30], the authors proposed a continuous emotion recognition model using visual-to-EEG KD. The dataset used contains 239 labels, synchronized recordings of facial videos, and EEG signals from 24 subjects. The teacher model is a cascade convolutional neural network–temporal neural network trained on facial video frames to predict emotional states. The visual spatiotemporal features or dark knowledge learned by the teacher are used to supervise the student model, a temporal neural network that predicts emotional state using EEG signals as input. The student model is trained using the EEG signals, the visual spatiotemporal features, and the corresponding labels. The experiments show that the visual-to-EEG KD improves the results with statistical significance compared to the standalone teacher and student models.

The authors of [31] used KD to develop a model for ECG signal classification with low computing power and memory requirements. The teacher network is a full-precision deep convolutional neural network with nine 1D convolutional layers and one fully connected layer. The student model is a less computationally complex binarized convolutional neural network trained by binarizing the model weights and activations. Both the student and teacher models use ECGs for cardiac arrhythmia detection. KD was used to counter the model performance degradation caused by binarization by transferring knowledge from the teacher network to the student network. With the use of KD, the performance loss was reduced to only 1% in the F1 score (0.88 to 0.87). On the other hand, the memory usage of the binarized network was 30.2 times smaller.

In [32], the authors used KD to compress an arrhythmia classification model based on 12-lead ECG signals into a single-lead ECG model. The dataset contains labeled 12-lead ECG signals obtained from 10646 individuals. The teacher is a convolutional neural network consisting of 17 deep convolutional layers, which is trained on the 12-lead signals. The student contains five deep convolutional layers and is trained on only one of the 12 leads, along with the knowledge received from the teacher. The student model showed significantly lower memory consumption and a higher compression rate with no degradation in model performance.

The authors of [33] proposed an ECG classification model for single-lead portable devices created from the compression of a 12-lead model using KD. A generative adversarial network (GAN) is used to match the feature dimensions between the teacher’s 12-lead ECG and the student’s single-lead ECG. The GAN generates one-dimensional data by treating a single-lead ECG as an input feature and the other 11 lead signals as extended features. Two teacher models were developed to ensure model distillation performance. The dataset contains 3518 labeled single-lead ECG samples from portable devices and 100,000 12-lead ECG samples. The final model showed an improvement of 30% in the F1 score compared to the baseline model trained on only single-lead ECG signals (49.54% to 79.30%).

The authors of [45] proposed a distilled ECG classification model for small edge devices using KD. The dataset used was the MIT Arrhythmia Database, which contains ECG records from 48 patients, each 30 min long. The teacher model was a deep neural network with 8 layers consisting of fully connected, max pooling, one-dimensional, and activation layers. The soft labels generated by the teacher were included in the student model’s training. The student model achieved a lower loss and runtime but with a drop in accuracy.

In [46], the authors proposed an ECG classification model for CVD detection using KD. The dataset contained 43191 standard 12-lead ECG recordings with 24 unique labels. The proposed model had a ResNet-based architecture, while the teacher model had an SE-ResNet-based architecture. The authors introduced a Multi-Level Knowledge Distillation (MLKD) approach to reduce the number of leads and retain the knowledge gained from the large model trained on multi-lead ECG data. The MLKD method can be sequential or parallel and involves two teachers: T1 with 12 input channels and T2 with n < 12 input channels. The final distilled model had 106 times fewer parameters than the teacher model; moreover, it achieved a 75% decrease in inference time and a 3.2% increase in performance.

3. Proposed Approach

The end-to-end sequential pipeline, illustrated in Figure 1, consists of five primary steps: (i) dataset preprocessing, (ii) model training, (iii) transfer learning, (iv) knowledge distillation, and (v) evaluation. These steps are briefly described as follows, with further details provided in the relevant subsequent sections:

The dataset preprocessing step involves window-based segmentation, signal quality assessment, peak detection, and HRV/PRV feature extraction.
The model training step involves the implementation of multiple neural network architectures and hyperparameter selection.
The transfer learning step involves adapting the learned representation of the ECG signals to fine-tune the PPG-based model.
The knowledge distillation step involves the utilization of the teacher–student learning paradigm to compress the fine-tuned PPG-based model for lower resource consumption.
Finally, in the evaluation step, we assess the performance of the implemented models using standard measures and interpretability techniques.

While ECGs and PPG represent different physiological processes, a reliable sensor captures heart rhythm patterns similarly, even if their magnitudes differ. A peak in an ECG correlates with a peak in PPG, and rhythmic abnormalities manifest in both modalities. The underlying patterns, i.e., an irregular rhythm and the absence of distinct P-waves in ECGs, translate to comparable irregularities in pulse intervals detected by PPG. In atrial fibrillation specifically, the fundamental irregularity in cardiac rhythm manifests in both signals through a common underlying pathophysiology. ECGs directly measure the electrical activity of the heart, while PPG captures blood volume changes in peripheral vessels. The electrical impulses detected by ECGs trigger mechanical contraction of the heart, which then generates pressure waves that propagate through the arterial system and are detected by PPG with a small, predictable time delay. Our transfer learning approach focuses on these shared pattern characteristics rather than intricate modality-specific details. This physiological correspondence provides the foundation for our teacher–student model’s effectiveness, as the network learns to recognize these rhythm patterns regardless of the specific signal characteristics.

3.1. Datasets

ECG Dataset for Supervised Pre-Training: The Massachusetts Institute of Technology–Beth Israel Hospital Atrial Fibrillation Database (MIT-BIH AF-DB) consists of 25 long-term two-channel ambulatory ECG recordings of human subjects with mostly paroxysmal atrial fibrillation [47]. Each recording is up to 10 h in duration and was sampled at 250 Hz by the associated recorders, with a typical bandwidth of approximately 0.1 to 40 Hz. This publicly available dataset is accessible through the PhysioBank repository, an online archive of well-characterized biomedical signals initiated by the United States National Institutes of Health [48]. The signals in the MIT-BIH AF-DB dataset have rhythm labels primarily corresponding to AF and NSR, but also include annotated instances of atrial flutter and AV junctional rhythm. The labels define the period of the signal between the start and end of a specific cardiac event. We considered the Lead 1 channel ECG, associated with the right ventricle and right atrium, and the labels for NSR and AF only. We extracted 30 s long non-overlapping segments from each individual recording, after downsampling to 50 Hz to match the sampling rate of the provided PPG dataset [49,50]. The rationale for selecting the 30 s duration is twofold. First, it is ideal for rapid AF episode detection in a wearable monitoring context [51] and has shown considerably high accuracy in recent works. Second, a general arbitrary consensus among the electrophysiology community asserts that at least 30 s of an ECG signal is required to justifiably define brief episodes of isolated AF, even if it may not always correlate to prevailing longer or recurring AF arrhythmia [52]. Stationary signal check, heart rate check, and the signal-to-noise ratio check algorithms were utilized, as formulated in [53], to ascertain the signal quality of the ECG lead. All signals in the MIT-BIH AF-DB dataset passed the signal quality acceptance checks and were in agreement across the three algorithms. No additional filtering, replication, or augmentation procedures were applied to the raw signals at this point. In total, there were 11,737 and 5151 window segments belonging to the NSR and AF classes, respectively.

PPG Datasets for Downstream Classification: For further analysis, we were granted access to a private database of PPG signals (UMass-DB), held by the authors of [50]. The work was conducted at the University of Massachusetts Medical School with 37 human subjects who were instructed to perform several kinds of movements replicating free-living conditions in the clinic [49]. The smart wrist-worn wearable Samsung Simband recorded the PPG signals at an original sampling frequency of 128 Hz. The PPG signals in the database version we were provided with were already preprocessed into 30 s labeled segments with a downsampled frequency of 50 Hz. As discussed in [43], the signal-to-noise ratios of signals acquired from wearable PPG are not always consistent and depend on a number of external factors. In the interest of keeping the noise within the data input so that the model could learn while accounting for this natural variance in raw sensor values, we did not perform filtering and replication augmentations for this modality. After considering the original authors’ annotations regarding signal quality and estimating the perfusion and skewness indices as per [54], we removed the poorest-quality PPG signals. In total, there were 192 and 54 window segments belonging to the NSR and AF classes, respectively.

The second dataset was obtained from the AF subset of the MIMIC III database and made available through [55]. This dataset contained data from 34 critically ill adults in routine clinical care. Data were recorded using a bedside monitor at 125 Hz, which was then downsampled to 50 Hz to match the previous dataset. The data did not require additional preprocessing or filtering as previously performed [55]. Manual labels for AF and non-AF subjects were obtained from 34 patients (18 in AF, 16 not in AF). In total, there were 1280 and 152 window segments belonging to the NSR and AF classes, respectively.

Finally, a single window segment in either modality contained a raw ECG/PPG signal lasting 30 s and containing 50 Hz × 30 = 1500 sampling points. Figure 2a,e represent NSR rhythms for ECG and PPG, respectively, where NSR is observed as a rhythm maintaining a steady rate with no abnormalities. Figure 2b,f represent AF rhythms for ECG and PPG, respectively, where AF is observed as an irregular rhythm exhibiting unsteadiness and rapid fluctuations.

We further leveraged the ability of HRV/PRV to non-invasively assess the functioning of the cardiovascular autonomic nervous system (ANS). Multiple studies have reported the utility of HRV/PRV in reflecting the sympathetic and parasympathetic activity of the ANS in response to stressors and abnormalities [56]. It is worth mentioning that the primary criterion considered for almost all HRV/PRV indices is the interval between two successive heartbeats. This interval is known as the NN interval or the RR interval. For ease of reference, we use the term NN interval synonymously with RR interval when mentioning HRV/PRV indices throughout this work. We employed the peak detection method, as outlined in [57] with respect to QRS complexes for 30 s ECG signals and [58] with respect to the pulsatile components for 30 s PPG signals to extract the NN intervals from both modalities. We ensured the use of dynamic peak detection algorithms with relatively low computational demand to yield good accuracy, and we used the series of NN intervals detected to derive the six HRV/PRV features mentioned previously.

HRV/PRV indices belong to the time-domain, frequency-domain, and non-linear categories [59]. Time-domain indices quantify the statistical properties relating to the intervals between successive heartbeats over a period of time. Frequency-domain indices estimate the absolute or relative amounts of signal energy within the component bands. Finally, non-linear indices capture the unpredictability, fractality, and complexity associated with a series of heartbeat intervals [60]. In this work, we considered only six HRV/PRV indices in total from the time domain and non-linear domain, as relatively longer signal segments (≤1 min) are required for reliable frequency-domain outcomes [61]. Although longer signals can be extracted from MIT-BIH AF-DB, this limitation is introduced by the UMass-DB signals. As the available PPG dataset consisted mostly of labeled non-contiguous 30 s segments, there was insufficient energy content for accurate power spectrum analysis. For the time-domain indices, we chose the recommended robust median absolute deviation of the NN intervals (madNN), the madNN divided by the median of the absolute differences of their successive differences (mcvNN), the count of pairs of adjacent NN intervals differing by more than 20 ms over the window (pNN20), and the count of pairs of adjacent NN intervals differing by more than 50 ms over the window (pNN50) [62,63]. For the non-linear indices, we opted for the suggested Shannon entropy (SHAN) and Correlation Dimension (CD). SHAN quantifies the complexity of the NN interval series distribution based on information theory, and CD relates to chaos theory to measure the number of correlations present in the signal [64].

We hypothesize that combining the raw signals with a holistic semantic representation of a series of consecutive heartbeats/NN intervals allows for a more robust transferable approach. To observe initial differences in the HRV and PRV metrics between the NSR and AF instances, hypothesis testing was conducted. The Shapiro–Wilk test of normality revealed deviations from the Gaussian distribution across all HRV metrics from both classes and the PRV metrics for the NSR class. Therefore, we applied the non-parametric, distribution-agnostic Mann–Whitney U-Test to compare the distributions for both classes in both modalities. All p-values were <0.05, which is typically taken as the cut-off. Hence, the null hypothesis can be rejected, denoting statistical significance in the differences between the respective indices for both classes. Additionally, we selected the common top HRV/PRV features across all datasets using mutual information feature selection with respect to the target variable. Cut-offs of >0.5 (MIT-BIH AF-DB), >0.2 (UMass-DB), and >0.3 (MIMIC III) were applied to mitigate multicollinearity and overfitting by reducing the total number of auxiliary features.

3.2. Neural Networks

We experimented with various configurations and specifications, leveraging studies that have achieved state-of-the-art performance in heart disease detection [43,65,66,67].

This entire process is illustrated in Figure 3. After combining the architectural components, the final proposed model consisted of a single dilated convolutional layer with a rectified linear unit (ReLU) applied across a time-distributed wrapper, a BiLSTM layer, and an attention mechanism, followed by two fully connected layers with ReLU and sigmoid activations, respectively. With regard to regularization, batch normalization was added after the convolutional layer reduced the covariance shift, and a dropout layer preceded and succeeded the BiLSTM-Attn block to mitigate overfitting. The model accepts the first input with a dimensionality of

[10, 150, 1]

, indicating timesteps of 10(≈3 s), with features representing 150 signal sampling points. The output of the CNN is an X-dimensional feature vector summarizing the spatial irregularities and patterns found in the raw input signal. This output is propagated into the BiLSTM network, where the temporal patterns are captured and fed into the attention mechanism to highlight the integral aspects of the input. The model then accepts the second auxiliary input of HRV/PRV metrics with a dimensionality of

[6, 1]

, and concatenates this to the output of the attention mechanism to aid in a better representation of the input. Finally, this concatenated vector passes through a sigmoid activation function for binary classification of NSR and AF classes.

The KD procedure results in a compressed student model, whereby the knowledge is transferred from the teacher by minimizing a loss function that attempts to match the softened teacher logits and the ground-truth labels [26]. The logits are softened by applying a temperature scaling parameter, which serves to smooth the underlying probability distribution representative of inter-class relationships. We also account for the optimality of the knowledge procedure in terms of consistent teaching and patience, as described in [68]. The overall loss function incorporating both distillation and student losses modified for binary classification is defined as

L (x; W) = α * H (y, σ (z_{s}; T = 1)) + (1 - α) * H (σ (z_{t}; T = τ), σ (z_{s}, T = τ))

(1)

In Equation (1), x is the input; W represents the student model’s parameters; y is the ground-truth label;

H

is the binary cross-entropy loss function;

z_{s}

and

z_{t}

are the logits of the student and teacher, respectively;

σ

is the sigmoid function parameterized by the temperature T; and

α

is a weighting factor for the student and distillation losses.

As per the findings of [26], lower temperatures work better when the student model is much smaller than the teacher model, as the latter’s softened labels carry more information that the student model may not be able to capture. Experimentally, this makes sense, as when we used temperatures ranging from 1 to 10, we found 5 to be the optimal value for our implementation. The process of KD for this work is depicted in Figure 4. The original teacher model, as shown in Figure 3, is trained on the MIT-BIH AF-DB ECG dataset. This model is then fine-tuned on the UMass-DB PPG dataset using TL, where only the last three layers are retrained. This preserves the coarse- and medium-granularity representations of the ECG signals. The fine-tuned model now effectively “teaches” the smaller student model its internalized knowledge during concurrent supervised training with softened labels. During this last stage, the model’s output logits, not the binary labels, and the sigmoid function are applied during the loss calculation, as shown in Equation (1).

While our approach of combining CNN and BiLSTM with attention and HRV/PRV features yields improved performance, this architectural complexity introduces notable trade-offs. The concatenated representation increases computational overhead and inference time, which can be particularly challenging for real-time monitoring on resource-constrained wearable devices. This approach also requires the successful extraction of HRV/PRV features, which may fail in cases of severe motion artifacts or poor signal quality.

4. Results

4.1. Experimental Settings

All analyses were conducted using Python 3.8.0 on a workstation running Linux OS with 48 GB of RAM, an Intel Quad-Core Xeon CPU (2.3 GHz), and a Tesla K80 GPU (4 GB VRAM). Data were processed with NumPy 1.19.5 [69] and pandas 1.1.5 [70]. Statistical methods and correlation tests were performed using SciPy 1.4.1 [71], and other data-wrangling tasks were carried out with scikit-learn 1.0.0 [72]. Visualizations were prepared using Matplotlib 3.2.2 [73] and UMAP-learn 0.5.3 [74]. Additionally, for the deep learning models and Grad-CAM pipeline, Keras 2.7.0 [75] and TensorFlow 2.8.0 [76] were used. HRV and PRV indices were calculated using NeuroKit 0.2.0 [77].

The dataset instances were divided into a standard 60-20-20 training–validation–hold-out testing split at the patient level (so that there was no overlap in the data samples in each split), using a five-fold cross-validation approach. To mitigate sampling bias, 5-fold cross-validation was repeated three times and averaged.

Before the signals and HRV indices were fed into the model, Z-score normalization was performed, wherein all ECG and PPG datasets’ signals were fixed with zero mean (

μ

) and unit standard deviation

σ = 1

. This sped up convergence and reduced the amplitude scaling and drastic variability caused by outlier values.

4.2. Evaluation

Classification metrics, including accuracy, sensitivity, specificity, F1 score, and Matthews correlation coefficient (MCC), were used for the quantitative evaluation of the algorithms. Accuracy is the proportion of correct predictions across the total test dataset. Sensitivity is the proportion of AF-deficient students correctly identified as positive, and specificity is the proportion of non-AF patients correctly identified as negative. The F1 score measures the harmonic mean of the precision and recall, i.e., it denotes the balance between the rates of type-1 and type-2 errors. Recent findings show that the MCC can be a more reliable metric, as its score is dependent on all four quadrants of the confusion matrix, i.e., true positives, false negatives, true negatives, and false positives, proportional to both classes [78].

The final properties of the model for the purpose of replication by the research community are provided in Table 3. In addition to the proposed model, we also implemented convolution-focused and LSTM-focused variants, and the results are elucidated in Table 4. A random search algorithm was the hyperparameter selection method used for deciding an appropriate number of convolutional filters, LSTM units, and fully connected layer neurons for the range [4, 8, 16, 32, 64, 128, 256]. After the value of ≈32, noticeably, the tendency to overfit dramatically increased for all models and did not yield a significant increase in performance (less than 1%). It is worth mentioning that our proposed model for ECG classification appeared to take full advantage of the attention mechanism and the HRV features to elicit an average ≈6% improvement across all metrics.

For the ECG model, the batch size used for training was 32, and the number of epochs was 25. For the same larger ECG model (teacher) fine-tuned on the PPG dataset, the batch size used for training was 16 with 40 epochs.

When applying TL, the teacher only ever sees the ECG data in a pre-training context and only one signal type at a time. To provide a fair comparison, a model of the same size as the teacher was trained from scratch using the PPG datasets as well, with a batch size of 16 and 40 epochs. Finally, the student model that was reduced by a factor of 32 was trained with a batch size of 16 for 30 epochs using KD and from scratch. Early stopping and automated reduction of the learning rate by a factor of 0.5 upon a plateau in the validation loss were implemented as callbacks during training for all aforementioned processes.

We rationalize the CNN-BiLSTM architecture performance as follows: CNNs capture spatial features such as shapes, patterns, peaks, and troughs (i.e., low level), whereas LSTM captures temporal dependencies and sequential information (i.e., high level), so there is better representation capability.

Table 5 and Table 6 report the metrics for the TL and KD experiments on the UMass-DB and MIMIC-III PPG datasets, respectively. The application of TL resulted in an average ≈48% improvement across all metrics on the small PPG dataset compared to the baseline teacher, indicating that the AF versus NSR representation is indeed transferable to some degree. After using TL and KD in conjunction, the student model that was 32x smaller than the teacher achieved average performance boosts of ≈13.6% and ≈1.38% across all metrics on each dataset. Note that the accuracy and specificity were higher than the other metrics due to the class imbalance in favor of the NSR majority. High specificity indicates a high true negative rate, making the model efficient in correctly identifying NSR rhythms. We observed that the average standard deviation across both PPG datasets also decreased by ≈4% and 1%, respectively.

4.3. Interpretability

We used two techniques to confer a sense of interpretability to the classification decisions made by the model to aid clinicians’ diagnoses. Inspired by the practicality of Gradient-weighted Class Activation Maps (Grad-CAM) [79] in ECG analysis presented in [80,81], it was the first technique we leveraged to shed light on the black-box operations of DL algorithms. Grad-CAM uses the gradients of a target class in terms of either logits or labels propagating to the final convolutional layer to generate a comprehensible localization map highlighting the features of the input pertinent to predicting the target class. This method allowed us to visualize the activation regions of the input that the model deemed indicative of class-discriminative patterns. As can be seen in Figure 5a,c, the NSR rhythm for ECG and PPG, respectively, has periodic intervals, and these normal regions are indicated in the heat map with similar color intensities. In Figure 5b,d, the regions with specific intensities of interest, as detected by the CNN component to make the classification decision of AF, can be observed for the ECG and PPG segments, respectively.

To summarize, within the purview of interpretability, we note that the Grad-CAM highlighted areas can be further examined by a clinician to ascertain the validity of the model’s internal functioning and reiterate the latter’s role as a clinical support framework. For longer signal durations, measures of complexity and correlations may hold additional utility.

5. Discussion

This study explored the utility of TL and KD in aiding the performance of downstream AF classification across two different one-dimensional signal representations reflecting cardiac activity. During the initial phase of experimentation on the MIT-BIH AF-DB ECG dataset, we observed that convolutions or LSTMs alone were not able to completely capture the underlying patterns. By combining these components, we obtained a model that was essentially greater than the sum of its parts, especially when augmented by attention and auxiliary indices. It can be surmised that the spatio-temporal nature and complexity of the signal data validate the need for larger, complex structures to capture the intricacies in the latent space for larger datasets. The positive results of fine-tuning this representation on a smaller dataset and then applying KD imply that condensing the learned information is more beneficial than random initialization for the same problem.

Downstream performance: With regard to UMass-DB, we note that the variance in performance was quite disparate, likely owing to the presence of greater-than-anticipated noise elements in the student models. This was less persistent for the teacher model, where TL reduced the variance in cross-fold performance, particularly for sensitivity and the MCC. This was likely due to the larger fine-tuned model being able to distinguish between the NSR and AF samples in the PPG dataset, which it was unable to do before TL. While TL and KD did in fact elevate the performance of a much smaller student model as opposed to training from scratch, larger models may be preferred if more accurate predictions are desired. However, smaller models have the benefit of being able to be deployed on edge devices, where, based on the trade-off between false positives and false negatives, their classifications can be meaningful for a clinician remotely. Ultimately, it depends on the priority of accuracy or frequency, i.e., seldom but accurate versus often but less accurate.

However, we observed that for the MIMIC-III dataset, where the signals were acquired at the bedside in a clinical setting, the baseline performance was already considerably high. In this context, both TL and KD did not offer much additional value (i.e., 1%). We hypothesize that this was due to two primary reasons: data size and quality. First, the dataset was 10× the size of the UMass-DB dataset and contained more representative and diverse samples for both of the predicted classes. Second, the signals had low saturated noise and were consistent in terms of morphology owing to standardized acquisition. We have seen the arguments in [82] and agree that with smaller datasets, fine-tuning approaches yield better performances when looking at single-channel ECG signals. The authors emphasized that transferring knowledge may not always necessarily or consistently improve performance. Moreover, as the number of data samples increases, the student predictions naturally move closer to those of the teacher, provided that the capacity and architecture are sufficient for the scale of the dataset and the complexity of the data [83]. Therefore, we can summarize that sharing initialization (with TL) increases alignment in the activation space, but KD does not offer additional advantages for the MIMIC-III dataset.

Complementary findings: Compared to similar studies focusing on efficient TL or KD paradigms for AF detection [32,43,67], our models achieved comparable results, with the added advantage of a greater model compression ratio with respect to the scores obtained. More recent studies used a pipeline similar to our work to estimate systolic and diastolic blood pressure [84] and for sleep staging [85]. The latter involved combining PPG and wrist-worn actigraphy as input to the student model, whereas the teacher model was pre-trained on ECG data acquired during sleep. The results mirrored ours (in the context of AF) for sleep analysis, as it appears empirically that single-channel sensors collected from different populations, captured at different sampling rates, can be adapted for other types of sensors. Specifically, the electric field-based modality of ECG is indeed comparable to the light absorption mechanisms of PPG.

Research implications: There is an increasing presence of foundational models across different domains, with architectures that are pre-trained on large-scale unlabeled data using representation learning, making them adaptable to a variety of tasks while accommodating increased complexity [86]. Our proposed work considered supervised pre-training on a related modality for better representations due to the scarcity of large, diverse public datasets in this field and tangentially aligns, in principle, with the new paradigms of foundational models. The work by [87] leveraged the Apple Health and Movement study (AHMS) involving 140,000 participants and 3 years of wearable ECG signals to implement a foundational model. This involved self-supervised pre-training with participant-level positive pair selection, a stochastic augmentation module, and a regularized contrastive loss optimized using momentum training. The downstream tasks involved the classification and regression of age, body mass index, and biological gender.

Alternatively, [88] proposed a foundational model for general PPG signal analysis with self-supervised contrastive learning using a large semi-private dataset. The premise was that representations of PPG signals from similar physiological states should be similar, and bad-quality PPG signals in the temporal vicinity of a good-quality one from the same person are likely to be from a similar physiological state. Downstream performance was reported for AF detection, blood pressure, heart rate, and respiration rate estimations.

Interestingly, [87] advocated for the future use of HRV/PRV indices to infer valuable insights from pre-trained embeddings, which is a direction we have explored. Additionally, [88] achieved generalization across different qualities of data with certain robustness to artifacts and noise but reported limitations in successfully detecting conditions due to the scarcity of annotated PPG signals for diseases. Our work exhibits promise in transferring ECG representations to downstream PPG tasks with high fidelity and contributes to bridging similar gaps in the aforementioned line of research regarding foundational models.

Limitations: We refer to the findings of [83,89], who argued that although KD can indeed improve student generalization, there can be discrepancies between the predictive distributions of the teacher and the student, even when they match closely, due to the inductive and implicit biases of the dataset considered, the nature of the domain, and the augmentation strategies. For instance, some of the information gained by the teacher is lost during the distillation process, resulting in a slight drop in performance in the student model. We suspect that this is because the morphological characteristics of the signals are not preserved independently during distillation, leading to some misclassifications for borderline instances, where AF and NSR signals have similar temporal features and/or higher signal-to-noise ratios. However, the top-level hierarchical features common to both ECG and PPG, such as the intervals between peaks, the periodic behavior, and the general fluctuations of the waves, appear to be captured. Additionally, measuring biological signals is inherently challenging because of the low quality of data (in naturalistic settings), saturated noise, and artifacts (patient movement and/or sensor displacement), as well as individual physiological variations and inter-device differences [90].

Future work may look at the determination of clinically meaningful thresholds through collaborative efforts with cardiologists and the consideration of specific demographics, as well as regulatory requirements.

We therefore envision our work as an initial study emphasizing the utility of different training approaches in the form of TL and KD to develop uniform knowledge representation-based models to enable continuous detection of cardiac arrhythmia. This supports the creation of enhanced foundational models that utilize both ECG and PPG to capture more relevant patterns and improve representation learning for more varied downstream tasks.

6. Conclusions

This work proposes the use of TL and KD with deep neural networks trained on large-scale ECG datasets for the purpose of AF classification in smaller PPG datasets. The objectives are to investigate the feasibility of a unified, modality-agnostic knowledge representation for AF classification and to validate performance measures during distillation and subsequent compression.

We purport that using both raw signals and derived HRV/PRV features can increase the robustness of model behavior in free-living conditions. This can be particularly beneficial when adapting learned representations across modalities due to the implicit translation invariance to noise and motion artifacts.

Empirically, we find that TL and KD enabled a much smaller model (32× smaller) to achieve performance boosts of approximately 13.6% and 1.38% in terms of accuracy, specificity, sensitivity, F1 score, and MCC across two very different types of PPG datasets by leveraging and condensing innate information from training on a large ECG dataset. Additionally, we employ interpretability methods using Grad-CAM to elucidate the inner workings of the convolutional layers and estimate the contribution of HRV/PRV-derived features. We highlight that KD may not offer much additional utility if the available data quality and size are sufficiently high.

Future research could involve the acquisition and testing of the proposed method with additional PPG datasets using foundational model architectures, performing quantization for edge device employment, and conducting longitudinal studies with a controlled cohort of patients to attain qualitative evidence of low-cost wearable monitoring.

Author Contributions

Conceptualization, R.A., A.S. and F.A.; data curation, Z.S.; investigation, J.R. and Z.S.; methodology, J.R.; project administration, R.A. and A.S.; resources, F.A. and A.S.; software, J.R.; supervision, R.A., A.S. and F.A.; validation, J.R.; writing—original draft, J.R. and Z.S.; writing—review and editing, R.A., A.S. and F.A. All authors have read and agreed to the published version of the manuscript.

Funding

The work in this paper was supported, in part, by the Open Access Program from the American University of Sharjah. This paper represents the opinions of the author(s) and does not mean to represent the position or opinions of the American University of Sharjah.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The ECG dataset adopted in this research is openly available from Physionet at https://doi.org/10.13026/C2MW2D (accessed on 1 January 2022). The first PPG dataset analyzed in this research is openly available at https://www.synapse.org/#!Synapse:syn23565056/wiki/608635 (accessed on 13 October 2023). The second PPG dataset analyzed in this research is openly available at https://ppg-beats.readthedocs.io/en/latest/datasets/mimic_perform_af/ (accessed on 1 January 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AF	Atrial Fibrillation
ANS	Autonomous Nervous System
Attn	Attention
BiLSTM	Bidirectional Long Short-Term Memory
CNN	Convolutional Neural Network
ECG	Electrocardiogram
GAN	Generative Adversarial Network
Grad-CAM	Gradient-weighted Class Activation Mapping
HRV	Heart Rate Variability
KD	Knowledge Distillation
MCC	Matthews Correlation Coefficient
MLKD	Multi-Level Knowledge Distillation
NSR	Normal Sinus Rhythm
PPG	Photoplethysmography
PRV	Pulse Rate Variability

References

Olvera Lopez, E.; Ballard, B.D.; Jan, A. Cardiovascular Disease. In StatPearls; StatPearls Publishing: Treasure Island, FL, USA, 2022. [Google Scholar]
Cardiovascular Disease. Available online: https://www.nhs.uk/conditions/cardiovascular-disease/ (accessed on 17 October 2017).
Cardiovascular Diseases (CVDs). Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 5 November 2024).
Tsao, C.; Aday, A.; Almarzooq, Z.; Alonso, A.; Beaton, A.; Bittencourt, M.; Boehme, A.; Buxton, A.; Carson, A.; Commodore-Mensah, Y.; et al. Heart Disease and Stroke Statistics-2022 Update: A Report From the American Heart Association. Circulation 2022, 145, CIR0000000000001052. [Google Scholar] [CrossRef]
Timmis, A.; Townsend, N.; Gale, C.P.; Torbica, A.; Lettino, M.; Petersen, S.E.; Mossialos, E.A.; Maggioni, A.P.; Kazakiewicz, D.; May, H.T.; et al. European Society of Cardiology: Cardiovascular Disease Statistics 2019. Eur. Heart J. 2020, 41, 12–85. [Google Scholar] [CrossRef] [PubMed]
Antzelevitch, C.; Burashnikov, A. Overview of Basic Mechanisms of Cardiac Arrhythmia. Card. Electrophysiol. Clin. 2011, 3, 23–45. [Google Scholar] [CrossRef] [PubMed]
Lee, E.; Choi, E.K.; Han, K.D.; Lee, H.; Choe, W.S.; Lee, S.R.; Cha, M.J.; Lim, W.H.; Kim, Y.J.; Oh, S. Mortality and Causes of Death in Patients with Atrial Fibrillation: A Nationwide Population-Based Study. PLoS ONE 2018, 13, e0209687. [Google Scholar] [CrossRef] [PubMed]
Cunha, S.; Antunes, E.; Antoniou, S.; Tiago, S.; Relvas, R.; Fernandez-Llimós, F.; Alves da Costa, F. Raising Awareness and Early Detection of Atrial Fibrillation, an Experience Resorting to Mobile Technology Centred on Informed Individuals. Res. Soc. Adm. Pharm. RSAP 2020, 16, 787–792. [Google Scholar] [CrossRef]
Ej, B.; Pa, W.; Rb, D.; H, S.; Wb, K.; D, L. Impact of Atrial Fibrillation on the Risk of Death: The Framingham Heart Study. Circulation 1998, 98, 946–952. [Google Scholar] [CrossRef]
Electrocardiogram (ECG or EKG). Available online: https://www.heart.org/en/health-topics/heart-attack/diagnosing-a-heart-attack/electrocardiogram-ecg-or-ekg (accessed on 5 November 2024).
Hagiwara, Y.; Fujita, H.; Oh, S.L.; Tan, J.H.; Tan, R.S.; Ciaccio, E.J.; Acharya, U.R. Computer-Aided Diagnosis of Atrial Fibrillation Based on ECG Signals: A Review. Inf. Sci. 2018, 467, 99–114. [Google Scholar] [CrossRef]
Yıldırım, Ö.; Pławiak, P.; Tan, R.S.; Acharya, U.R. Arrhythmia Detection Using Deep Convolutional Neural Network with Long Duration ECG Signals. Comput. Biol. Med. 2018, 102, 411–420. [Google Scholar] [CrossRef]
Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adam, M.; Gertych, A.; Tan, R.S. A Deep Convolutional Neural Network Model to Classify Heartbeats. Comput. Biol. Med. 2017, 89, 389–396. [Google Scholar] [CrossRef]
Kiranyaz, S.; Ince, T.; Gabbouj, M. Personalized Monitoring and Advance Warning System for Cardiac Arrhythmias. Sci. Rep. 2017, 7, 9270. [Google Scholar] [CrossRef]
Zhu, Q.; Tian, X.; Wong, C.W.; Wu, M. ECG Reconstruction via PPG: A Pilot Study. In Proceedings of the 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), Chicago, IL, USA, 19–22 May 2019; pp. 1–4. [Google Scholar] [CrossRef]
Castaneda, D.; Esparza, A.; Ghamari, M.; Soltanpur, C.; Nazeran, H. A Review on Wearable Photoplethysmography Sensors and Their Potential Future Applications in Health Care. Int. J. Biosens. Bioelectron. 2018, 4, 195–202. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Liu, B.; Zhang, Z. Combining Ensemble Empirical Mode Decomposition with Spectrum Subtraction Technique for Heart Rate Monitoring Using Wrist-Type Photoplethysmography. Biomed. Signal Process. Control 2015, 21, 119–125. [Google Scholar] [CrossRef]
Zhang, Z.; Pi, Z.; Liu, B. TROIKA: A General Framework for Heart Rate Monitoring Using Wrist-Type Photoplethysmographic Signals during Intensive Physical Exercise. IEEE Trans. Bio-Med. Eng. 2015, 62, 522–531. [Google Scholar] [CrossRef] [PubMed]
Paradkar, N.; Chowdhury, S.R. Cardiac Arrhythmia Detection Using Photoplethysmography. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju-do, Republic of Korea, 11–15 July 2017; pp. 113–116. [Google Scholar] [CrossRef]
Koshy, A.N.; Sajeev, J.K.; Nerlekar, N.; Brown, A.J.; Rajakariar, K.; Zureik, M.; Wong, M.C.; Roberts, L.; Street, M.; Cooke, J.; et al. Utility of Photoplethysmography for Heart Rate Estimation among Inpatients. Intern. Med. J. 2018, 48, 587–591. [Google Scholar] [CrossRef]
Millán, C.A.; Girón, N.A.; Lopez, D.M. Analysis of Relevant Features from Photoplethysmographic Signals for Atrial Fibrillation Classification. Int. J. Environ. Res. Public Health 2020, 17, 498. [Google Scholar] [CrossRef]
Pinheiro, N.; Couceiro, R.; Henriques, J.; Muehlsteff, J.; Quintal, I.; Gonçalves, L.; Carvalho, P. Can PPG Be Used for HRV Analysis? In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 17–20 August 2016; pp. 2945–2949. [Google Scholar] [CrossRef]
Jang, J.H.; Kim, T.Y.; Yoon, D. Effectiveness of Transfer Learning for Deep Learning-Based Electrocardiogram Analysis. Healthc. Inform. Res. 2021, 27, 19–28. [Google Scholar] [CrossRef]
Scardulla, F.; Cosoli, G.; Spinsante, S.; Poli, A.; Iadarola, G.; Pernice, R.; Busacca, A.; Pasta, S.; Scalise, L.; D’Acquisto, L. Photoplethysmograhic Sensors, Potential and Limitations: Is It Time for Regulation? A Comprehensive Review. Measurement 2023, 218, 113150. [Google Scholar] [CrossRef]
Pereira, T.; Tran, N.; Gadhoumi, K.; Pelter, M.M.; Do, D.H.; Lee, R.J.; Colorado, R.; Meisel, K.; Hu, X. Photoplethysmography Based Atrial Fibrillation Detection: A Review. NPJ Digit. Med. 2020, 3, 3–12. [Google Scholar] [CrossRef] [PubMed]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Dai, R.; Das, S.; Bremond, F. Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 13033–13044. [Google Scholar] [CrossRef]
Faraone, A.; Sigurthorsdottir, H.; Delgado-Gonzalo, R. Atrial Fibrillation Detection on Low-Power Wearables Using Knowledge Distillation. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Guadalajara, Mexico, 1–5 November 2021; Volume 2021, pp. 6795–6799. [Google Scholar] [CrossRef]
Perslev, M.; Jensen, M.; Darkner, S.; rgen Jennum, P.J.; Igel, C. U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
Zhang, S.; Tang, C.; Guan, C. Visual-to-EEG Cross-Modal Knowledge Distillation for Continuous Emotion Recognition. Pattern Recognit. 2022, 130, 108833. [Google Scholar] [CrossRef]
Wu, Q.; Sun, Y.; Yan, H.; Wu, X. ECG Signal Classification with Binarized Convolutional Neural Network. Comput. Biol. Med. 2020, 121, 103800. [Google Scholar] [CrossRef] [PubMed]
Sepahvand, M.; Abdali-Mohammadi, F. A Novel Method for Reducing Arrhythmia Classification from 12-Lead ECG Signals to Single-Lead ECG with Minimal Loss of Accuracy through Teacher-Student Knowledge Distillation. Inf. Sci. 2022, 593, 64–77. [Google Scholar] [CrossRef]
Tang, R.; Qian, J.; Jin, J.; Luo, J. Building Portable ECG Classification Model with Cross-Dimension Knowledge Distillation. In Proceedings of the Algorithms and Architectures for Parallel Processing; Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2022; pp. 724–737. [Google Scholar] [CrossRef]
Goldberger, A.L.; Goldberger, Z.D.; Shvilkin, A. Chapter 13—Sinus and Escape Rhythms. In Goldberger’s Clinical Electrocardiography, 5th ed.; Goldberger, A.L., Goldberger, Z.D., Shvilkin, A., Eds.; Elsevier: Amsterdam, The Netherlands, 2018; pp. 122–129. [Google Scholar] [CrossRef]
Ebbehoj, A.; Thunbo, M.Ø.; Andersen, O.E.; Glindtvad, M.V.; Hulman, A. Transfer Learning for Non-Image Data in Clinical Research: A Scoping Review. PLoS Digit. Health 2022, 1, e0000014. [Google Scholar] [CrossRef]
Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adeli, H.; Subha, D.P. Automated EEG-based Screening of Depression Using Deep Convolutional Neural Network. Comput. Methods Programs Biomed. 2018, 161, 103–113. [Google Scholar] [CrossRef] [PubMed]
Ej, L.; Wr, S.; G, C.C.; D, M. ECG-based Heartbeat Classification for Arrhythmia Detection: A Survey. Comput. Methods Programs Biomed. 2016, 127, 144–164. [Google Scholar] [CrossRef]
Kachuee, M.; Fazeli, S.; Sarrafzadeh, M. ECG Heartbeat Classification: A Deep Transferable Representation. In Proceedings of the 2018 IEEE International Conference on Healthcare Informatics (ICHI), New York, NY, USA, 4–7 June 2018; pp. 443–444. [Google Scholar] [CrossRef]
Shen, Y.; Voisin, M.; Aliamiri, A.; Avati, A.; Hannun, A.; Ng, A. Ambulatory Atrial Fibrillation Monitoring Using Wearable Photoplethysmography with Deep Learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19), Anchorage, AK, USA, 4–8 August 2019; pp. 1909–1916. [Google Scholar] [CrossRef]
Hannun, A.Y.; Rajpurkar, P.; Haghpanahi, M.; Tison, G.H.; Bourn, C.; Turakhia, M.P.; Ng, A.Y. Cardiologist-Level Arrhythmia Detection and Classification in Ambulatory Electrocardiograms Using a Deep Neural Network. Nat. Med. 2019, 25, 65–69. [Google Scholar] [CrossRef]
Cheng, P.; Chen, Z.; Li, Q.; Gong, Q.; Zhu, J.; Liang, Y. Atrial Fibrillation Identification With PPG Signals Using a Combination of Time-Frequency Analysis and Deep Learning. IEEE Access 2020, 8, 172692–172706. [Google Scholar] [CrossRef]
Das, S.S.S.; Shanto, S.K.; Rahman, M.; Islam, M.S.; Rahman, A.H.; Masud, M.M.; Ali, M.E. BayesBeat: Reliable Atrial Fibrillation Detection from Noisy Photoplethysmography Data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–21. [Google Scholar] [CrossRef]
Shashikumar, S.P.; Shah, A.J.; Clifford, G.D.; Nemati, S. Detection of Paroxysmal Atrial Fibrillation Using Attention-based Bidirectional Recurrent Neural Networks. arXiv 2018, arXiv:1805.09133. [Google Scholar]
Aliamiri, A.; Shen, Y. Deep Learning Based Atrial Fibrillation Detection Using Wearable Photoplethysmography Sensor. In Proceedings of the 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), Las Vegas, NV, USA, 4–7 March 2018; pp. 442–445. [Google Scholar] [CrossRef]
Hartmann, M.; Farooq, H.; Imran, A. Distilled Deep Learning Based Classification of Abnormal Heartbeat Using ECG Data through a Low Cost Edge Device. In Proceedings of the 2019 IEEE Symposium on Computers and Communications (ISCC), Barcelona, Spain, 29 June–3 July 2019; pp. 1068–1071. [Google Scholar] [CrossRef]
Chauhan, E.; Guptha, S.; Reddy, L.; Raju, B. LRH-Net: A Multi-Level Knowledge Distillation Approach for Low-Resource Heart Network. arXiv 2022, arXiv:2204.08000. [Google Scholar] [CrossRef]
Moody, G. A New Method for Detecting Atrial Fibrillation Using R-R Intervals. Comput. Cardiol. 1983, 10, 227–230. [Google Scholar]
Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, 101, E215–E220. [Google Scholar] [CrossRef] [PubMed]
Bashar, S.K.; Han, D.; Hajeb-Mohammadalipour, S.; Ding, E.; Whitcomb, C.; McManus, D.D.; Chon, K.H. Atrial Fibrillation Detection from Wrist Photoplethysmography Signals Using Smartwatches. Sci. Rep. 2019, 9, 15054. [Google Scholar] [CrossRef] [PubMed]
Han, D.; Bashar, S.K.; Mohagheghian, F.; Ding, E.; Whitcomb, C.; Mcmanus, D.; Chon, K. Premature Atrial and Ventricular Contraction Detection Using Photoplethysmographic Data from a Smartwatch. Sensors 2020, 20, 5683. [Google Scholar] [CrossRef] [PubMed]
Eerikäinen, L.M.; Bonomi, A.G.; Dekker, L.R.C.; Vullings, R.; Aarts, R.M. Atrial Fibrillation Monitoring with Wrist-Worn Photoplethysmography-Based Wearables: State-of-the-art Review. Cardiovasc. Digit. Health J. 2020, 1, 45–51. [Google Scholar] [CrossRef]
Steinberg, J.S.; O’Connell, H.; Li, S.; Ziegler, P.D. Thirty-Second Gold Standard Definition of Atrial Fibrillation and Its Relationship With Subsequent Arrhythmia Patterns. Circ. Arrhythmia Electrophysiol. 2018, 11, e006274. [Google Scholar] [CrossRef]
Kramer, L.; Menon, C.; Elgendi, M. ECGAssess: A Python-Based Toolbox to Assess ECG Lead Signal Quality. Front. Digit. Health 2022, 4, 847555. [Google Scholar] [CrossRef]
Elgendi, M. Optimal Signal Quality Index for Photoplethysmogram Signals. Bioengineering 2016, 3, 21. [Google Scholar] [CrossRef]
Charlton, P.H.; Bacevicˇius, J.; Bonnici, T.; Brimicombe, J.; Chapman, C.; Dymond, A.; Emmenis, M.V.; Kyriacou, P.A.; Marozas, V.; Williams, K.; et al. The Acceptability of Wearables for Atrial Fibrillation Screening: Interim Analysis of the SAFER Wearables Study. Available online: https://peterhcharlton.github.io/info/datasets (accessed on 17 June 2024).
Shaffer, F.; Meehan, Z.M.; Zerr, C.L. A Critical Review of Ultra-Short-Term Heart Rate Variability Norms Research. Front. Neurosci. 2020, 14, 594880. [Google Scholar] [CrossRef]
Kalidas, V.; Tamil, L. Real-Time QRS Detector Using Stationary Wavelet Transform for Automated ECG Analysis. In Proceedings of the 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE), Washington, DC, USA, 23–25 October 2017; pp. 457–461. [Google Scholar] [CrossRef]
Elgendi, M.; Norton, I.; Brearley, M.; Abbott, D.; Schuurmans, D. Systolic Peak Detection in Acceleration Photoplethysmograms Measured from Emergency Responders in Tropical Conditions. PLoS ONE 2013, 8, e76585. [Google Scholar] [CrossRef]
Shaffer, F.; Ginsberg, J.P. An Overview of Heart Rate Variability Metrics and Norms. Front. Public Health 2017, 5, 258. [Google Scholar] [CrossRef]
Wang, Y.; Wei, S.; Zhang, S.; Zhang, Y.; Zhao, L.; Liu, C.; Murray, A. Comparison of Time-Domain, Frequency-Domain and Non-Linear Analysis for Distinguishing Congestive Heart Failure Patients from Normal Sinus Rhythm Subjects. Biomed. Signal Process. Control 2018, 42, 30–36. [Google Scholar] [CrossRef]
Pham, T.; Lau, Z.J.; Chen, S.H.A.; Makowski, D. Heart Rate Variability in Psychology: A Review of HRV Indices and an Analysis Tutorial. Sensors 2021, 21, 3998. [Google Scholar] [CrossRef] [PubMed]
Mejía-Mejía, E.; Budidha, K.; Abay, T.Y.; May, J.M.; Kyriacou, P.A. Heart Rate Variability (HRV) and Pulse Rate Variability (PRV) for the Assessment of Autonomic Responses. Front. Physiol. 2020, 11, 779. [Google Scholar] [CrossRef] [PubMed]
Fang, X.; Liu, H.Y.; Wang, Z.Y.; Yang, Z.; Cheng, T.Y.; Hu, C.H.; Hao, H.W.; Meng, F.G.; Guan, Y.G.; Ma, Y.S.; et al. Preoperative Heart Rate Variability During Sleep Predicts Vagus Nerve Stimulation Outcome Better in Patients With Drug-Resistant Epilepsy. Front. Neurol. 2021, 12, 691328. [Google Scholar] [CrossRef]
Henriques, T.; Ribeiro, M.; Teixeira, A.; Castro, L.; Antunes, L.; Costa-Santos, C. Nonlinear Methods Most Applied to Heart-Rate Time Series: A Review. Entropy 2020, 22, 309. [Google Scholar] [CrossRef]
Oh, S.L.; Ng, E.Y.K.; Tan, R.S.; Acharya, U.R. Automated Diagnosis of Arrhythmia Using Combination of CNN and LSTM Techniques with Variable Length Heart Beats. Comput. Biol. Med. 2018, 102, 278–287. [Google Scholar] [CrossRef]
Alkhodari, M.; Fraiwan, L. Convolutional and Recurrent Neural Networks for the Detection of Valvular Heart Diseases in Phonocardiogram Recordings. Comput. Methods Programs Biomed. 2021, 200, 105940. [Google Scholar] [CrossRef]
Ramesh, J.; Solatidehkordi, Z.; Aburukba, R.; Sagahyroon, A. Atrial Fibrillation Classification with Smart Wearables Using Short-Term Heart Rate Variability and Deep Convolutional Neural Networks. Sensors 2021, 21, 7233. [Google Scholar] [CrossRef]
Beyer, L.; Zhai, X.; Royer, A.; Markeeva, L.; Anil, R.; Kolesnikov, A. Knowledge Distillation: A Good Teacher Is Patient and Consistent. arXiv 2021, arXiv:2106.05237. [Google Scholar] [CrossRef]
Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 56–61. [Google Scholar] [CrossRef]
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
McInnes, L.; Healy, J.; Saul, N.; Grossberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
Chollet, F. Keras: The Python Deep Learning Library. In Astrophysics Source Code Library; National Aeronautics and Space Administration (NASA): Washington, DC, USA, 2018; p. ascl:1806.022. [Google Scholar]
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Makowski, D.; Pham, T.; Lau, Z.J.; Brammer, J.C.; Lespinasse, F.; Pham, H.; Schölzel, C.; Chen, S.H.A. NeuroKit2: A Python Toolbox for Neurophysiological Signal Processing. Behav. Res. Methods 2021, 53, 1689–1696. [Google Scholar] [CrossRef] [PubMed]
Chicco, D.; Jurman, G. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Kim, J.K.; Jung, S.; Park, J.; Han, S.W. Arrhythmia Detection Model Using Modified DenseNet for Comprehensible Grad-CAM Visualization. Biomed. Signal Process. Control 2022, 73, 103408. [Google Scholar] [CrossRef]
Jahmunah, V.; Ng, E.Y.K.; Tan, R.S.; Oh, S.L.; Acharya, U.R. Explainable Detection of Myocardial Infarction Using Deep Learning Models with Grad-CAM Technique on ECG Signals. Comput. Biol. Med. 2022, 146, 105550. [Google Scholar] [CrossRef]
Nguyen, C.V.; Do, C.D. Transfer Learning in ECG Diagnosis: Is It Effective? arXiv 2024, arXiv:2402.02021, 2021. [Google Scholar] [CrossRef]
Stanton, S.; Izmailov, P.; Kirichenko, P.; Alemi, A.A.; Wilson, A.G. Does Knowledge Distillation Really Work? arXiv 2021, arXiv:2106.05945. [Google Scholar]
Tang, H.; Ma, G.; Qiu, L.; Zheng, L.; Bao, R.; Liu, J.; Wang, L. Blood Pressure Estimation Based on PPG and ECG Signals Using Knowledge Distillation. Cardiovasc. Eng. Technol. 2024, 15, 39–51. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Li, Q.; Cakmak, A.S.; Da Poian, G.; Bliwise, D.L.; Vaccarino, V.; Shah, A.J.; Clifford, G.D. Transfer Learning from ECG to PPG for Improved Sleep Staging from Wrist-Worn Wearables. Physiol. Meas. 2021, 42, 044004. [Google Scholar] [CrossRef]
Mathew, G.; Barbosa, D.; Prince, J.; Venkatraman, S. Foundation Models for Cardiovascular Disease Detection via Biosignals from Digital Stethoscopes. npj Cardiovasc. Health 2024, 1, 25. [Google Scholar] [CrossRef]
Abbaspourazad, S.; Elachqar, O.; Miller, A.C.; Emrani, S.; Nallasamy, U.; Shapiro, I. Large-Scale Training of Foundation Models for Wearable Biosignals. arXiv 2024, arXiv:2312.05409. [Google Scholar] [CrossRef]
Ding, C.; Guo, Z.; Chen, Z.; Lee, R.J.; Rudin, C.; Hu, X. SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals. arXiv 2024, arXiv:2404.17667. [Google Scholar]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. arXiv 2021, arXiv:2006.05525. [Google Scholar] [CrossRef]
Weimann, K.; Conrad, T.O.F. Transfer Learning for ECG Classification. Sci. Rep. 2021, 11, 5251. [Google Scholar] [CrossRef]

Figure 1. High-level end-to-end overview of the proposed approach. Each of the models separately undergoes preprocessing and model evaluation steps. The PPG (student) model development is based on the distillation of the ECG (teacher) model. The PPG model is then further fine-tuned on the PPG dataset.

Figure 2. Examples of 30 s signals represented as amplitude in millivolts (y-axis) against time in seconds (x-axis).

Figure 3. Architecture of the proposed CNN-BiLSTM-Attn model with raw signal inputs and auxiliary HRV inputs (for ECG) or PRV inputs (for PPG).

Figure 4. Pipeline utilizing TL and KD for transferable AF representation-based classification with a larger teacher model and a compressed student model.

Figure 5. Selected instances of Grad-CAM highlighting the class-specific regions of interest for NSR and AF across the different datasets represented as amplitude in millivolts (y-axis) against time in seconds (x-axis).

Table 1. Summary of papers on ECG and PPG classification.

Paper	Task	Model	Data	Results
[36]	Depression screening from EEG data	CNN	EEG data from Psychiatry Department, Medical College, Calicut, Kerala, India	93.5% and 96.0% accuracy (left and right hemispheres)
[38]	Arrhythmia and myocardial infarction classification from ECG data	CNN	ECG data from MIT-BIH and PTB diagnostic databases	93.4% and 95.9% accuracy for arrhythmia and MI
[12]	Arrhythmia classification from ECG data	CNN	ECG data from MIT-BIH database	91.33% accuracy
[14]	Arrhythmia detection from ECG data	CNN	ECG data from MIT-BIH database	99.4% accuracy
[39]	Atrial fibrillation detection from PPG data	CNN	PPG data from wrist-wearable devices	95% AUC
[40]	Classification of rhythm classes from ECG data	CNN	ECG data from Zio monitor	0.837 F1
[41]	Atrial fibrillation detection from PPG data	CNN-LSTM (CLSTM)	PPG data from MIMIC-III, IEEE TBME Respiratory Rate Benchmark, and synthetic PPG data	98.21% accuracy
[42]	Atrial fibrillation detection from PPG data	BNN	Stanford wearable PPG dataset	0.671 F1
[43]	Paroxysmal atrial fibrillation detection from ECG and PPG data (transfer learning)	CNN, BRNN with attention	ECG data from University of Virginia Heart Station, PPG data from smartwatch	0.97 AUC for both ECG and PPG
[44]	Atrial fibrillation detection from PPG data	CNN, CNN-RNN (CRNN)	PPG data from wearable devices	99% AUC

Table 2. Summary of knowledge distillation approaches for ECG and EEG signal classification.

Paper	Task	Teacher	Student	Data	Results
[28]	Atrial fibrillation detection using ECG data	CNN-LSTM	CNN-RNN	ECG data from CinC2017	Increase in accuracy (0.792 to 0.828 F1 score)
[30]	Emotion recognition using EEG data	CNN-TCN	TCN	Video and EEG data from MAHNOB-HCI database	Visual-to-EEG cross-modal KD improves prediction with significance
[31]	Arrhythmia detection using ECG data	CNN	CNN	ECG data from PhysioNet/CinC arrhythmia detection challenge 2017	1% F1 loss (0.88 to 0.87), memory usage 30.2× less
[32]	Arrhythmia classification using ECG data	CNN	CNN	ECG signals from Chapman Univ. and Shaoxing Hospital	Student model 262× compressed without performance degradation
[33]	ECG classification model for portable devices	CNN (EfficientNet)	CNN (NFNet)	Clinical ECG dataset from primary medical institutions	F1 metric increases from 49.54% to 79.3%
[45]	ECG classification model for small edge devices	CNN	Feedforward NN	ECG data from MIT Arrhythmia Database	Decrease in memory/latency; accuracy drop 3–5%
[46]	Cardiovascular disease detection from ECG data	SE-Resnet	ResNet	CPSC, INCART, PTB, PTB-XL, and Georgia 12-lead ECG Challenge	106× fewer parameters, 76% faster inference, 3.25% better performance

Table 3. Summary of properties for the proposed teacher and student CNN-BiLSTM-Attn models, where

T \to S

indicates the respective parameter reduction.

Table 3. Summary of properties for the proposed teacher and student CNN-BiLSTM-Attn models, where

T \to S

indicates the respective parameter reduction.

Layers	Type	#Kernels/Units	Kernel Size	Configuration
0–1	Input 1: Raw Signal	-	-	-
1–2	Convolution 1D	$T : 32 \to S : 8$	3	Activation = ReLU, Strides = 1
2–3	BatchNormalization	-	-	- Momentum = 0.9
3–4	Dropout	-	-	Rate = 0.3
4–5	Flatten	-	-	-
5–6	BiLSTM	$T : 32 \to S : 4$	-	Return_Sequences = True
6–7	Attn	-	-	Return_Sequences = False
7–8	Dropout	-	-	Rate = 0.2
8–9	Input 2: HRV/PRV	-	-	-
9–10	Concatenate	-	-	-
10–11	Fully Connected	$T : 32 \to S : 8$	-	Activation = ReLU
11–12	Fully Connected	1	-	Activation = Sigmoid

Table 4. Quantitative metrics reporting model performance on ECG signals across different architectures.

Model	Accuracy	Sensitivity	Specificity	F1 Score	MCC	Size
CNN	89.7 ± 1.0	92.1 ± 2.0	84.2 ± 4.0	92.6 ± 1.0	76.0 ± 3.0	1,542,785
LSTM	87.0 ± 1.0	89.6 ± 1.0	81.0 ± 2.0	90.5 ± 1.0	69.7 ± 2.0	208,801
BiLSTM	87.1 ± 1.0	90.0 ± 1.0	80.6 ± 1.0	90.7 ± 0.0	70.1 ± 1.0	470,689
CNN-LSTM	87.9 ± 1.0	89.1 ± 1.0	85.4 ± 1.0	91.1 ± 1.0	72.5 ± 2.0	619,969
CNN-BiLSTM-Attn	91.8 ± 0.0	92.7 ± 0.0	89.6 ± 0.0	94.0 ± 0.0	81.0 ± 0.0	1,239,691
CNN-BiLSTM-Attn w/HRV	96.7 ± 0.0	95.4 ± 0.0	97.3 ± 0.0	94.7 ± 0.0	92.4 ± 0.0	1,239,883

Table 5. Quantitative metrics reporting model performance on UMass-DB PPG signals for KD. Best score is in bold, and second best is underlined.

Model	Accuracy	Sensitivity	Specificity	F1 Score	MCC	Size
Teacher Baseline	46.7 ± 6.0	31.5 ± 9.0	51.0 ± 9.0	20.4 ± 7.0	14.9 ± 12.0	1,239,883
Teacher (TL)	89.9 ± 3.0	74.2 ± 6.0	94.3 ± 3.0	76.3 ± 7.0	70.1 ± 9.0	1,239,883
Student Baseline	82.5 ± 4.0	31.3 ± 16.0	96.9 ± 4.0	42.0 ± 18.0	40.3 ± 16.0	38,771
Student (TL)	85.2 ± 5.0	53.3 ± 20.0	92.7 ± 5.0	58.0 ± 17.0	50.4 ± 19.0	38,771
Distilled Student Model (TL and KD)	85.4 ± 3.0	61.3 ± 14.0	92.2 ± 6.0	64.4 ± 8.0	57.7 ± 8.0	38,771

Table 6. Quantitative metrics reporting model performance on MIMIC-III PPG signals for KD. Best score is in bold, and second best is underlined.

Model	Accuracy	Sensitivity	Specificity	F1 Score	MCC	Size
Teacher Baseline	97.1 ± 1.0	97.8 ± 1.0	96.2 ± 1.0	97.3 ± 1.0	94.1 ± 1.0	1,239,883
Teacher (TL)	98.2 ± 0.0	98.6 ± 1.0	97.9 ± 0.0	98.4 ± 0.0	96.5 ± 1.0	1,239,883
Student Baseline	95.5 ± 2.0	97.4 ± 1.0	93.1 ± 4.0	95.9 ± 2.0	90.9 ± 3.0	38,771
Student (TL)	96.4 ± 1.0	97.3 ± 1.0	95.3 ± 2.0	96.7 ± 1.0	92.8 ± 1.0	38,771
Distilled Student Model (TL and KD)	96.6 ± 1.0	97.5 ± 1.0	95.5 ± 2.0	96.9 ± 1.0	93.2 ± 2.0	38,771

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ramesh, J.; Solatidehkordi, Z.; Aburukba, R.; Sagahyroon, A.; Aloul, F. Transferring Learned ECG Representations for Deep Neural Network Classification of Atrial Fibrillation with Photoplethysmography. Appl. Sci. 2025, 15, 4770. https://doi.org/10.3390/app15094770

AMA Style

Ramesh J, Solatidehkordi Z, Aburukba R, Sagahyroon A, Aloul F. Transferring Learned ECG Representations for Deep Neural Network Classification of Atrial Fibrillation with Photoplethysmography. Applied Sciences. 2025; 15(9):4770. https://doi.org/10.3390/app15094770

Chicago/Turabian Style

Ramesh, Jayroop, Zahra Solatidehkordi, Raafat Aburukba, Assim Sagahyroon, and Fadi Aloul. 2025. "Transferring Learned ECG Representations for Deep Neural Network Classification of Atrial Fibrillation with Photoplethysmography" Applied Sciences 15, no. 9: 4770. https://doi.org/10.3390/app15094770

APA Style

Ramesh, J., Solatidehkordi, Z., Aburukba, R., Sagahyroon, A., & Aloul, F. (2025). Transferring Learned ECG Representations for Deep Neural Network Classification of Atrial Fibrillation with Photoplethysmography. Applied Sciences, 15(9), 4770. https://doi.org/10.3390/app15094770

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transferring Learned ECG Representations for Deep Neural Network Classification of Atrial Fibrillation with Photoplethysmography

Abstract

1. Introduction

2. Literature Review

2.1. ECG and PPG Arrhythmia Classification

2.2. Knowledge Distillation for Compression

3. Proposed Approach

3.1. Datasets

3.2. Neural Networks

4. Results

4.1. Experimental Settings

4.2. Evaluation

4.3. Interpretability

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI