Integration of EHR and ECG Data for Predicting Paroxysmal Atrial Fibrillation in Stroke Patients

Vafaei Sadr, Alireza; Mareboina, Manvita; Orabueze, Diana; Sarkar, Nandini; Hejazian, Seyyed Sina; Vemuri, Ajith; Shah, Ravi; Maheshwari, Ankit; Zand, Ramin; Abedi, Vida

doi:10.3390/bioengineering12090961

Open AccessProtocol

Integration of EHR and ECG Data for Predicting Paroxysmal Atrial Fibrillation in Stroke Patients

by

Alireza Vafaei Sadr

¹

,

Manvita Mareboina

²,

Diana Orabueze

³,

Nandini Sarkar

³,

Seyyed Sina Hejazian

⁴

,

Ajith Vemuri

⁴

,

Ravi Shah

⁵,

Ankit Maheshwari

⁵,

Ramin Zand

^4,* and

Vida Abedi

^1,*

¹

Department of Public Health Sciences, College of Medicine, Pennsylvania State University, Hershey, PA 17033, USA

²

Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA 17033, USA

³

Penn State Hershey Medical Center, Penn State College of Medicine, Hershey, PA 17033, USA

⁴

Department of Neurology, College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USA

⁵

Division of Cardiology, Heart and Vascular Institute, Penn State Hershey Medical Center, Hershey, PA 17033, USA

^*

Authors to whom correspondence should be addressed.

Bioengineering 2025, 12(9), 961; https://doi.org/10.3390/bioengineering12090961

Submission received: 17 July 2025 / Revised: 26 August 2025 / Accepted: 3 September 2025 / Published: 7 September 2025

(This article belongs to the Special Issue Machine Learning Technology in Predictive Healthcare)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Predicting paroxysmal atrial fibrillation (PAF) is challenging due to its transient nature. Existing methods often rely solely on electrocardiogram (ECG) waveforms or Electronic Health Record (EHR)-based clinical risk factors. We hypothesized that explicitly balancing the contributions of these heterogeneous data sources could improve prediction accuracy. We developed a Transformer-based deep learning model that integrates 12-lead ECG signals and 47 structured EHR variables from 189 patients with cryptogenic stroke, including 49 with PAF. By systematically varying the relative contributions of ECG and EHR data, we identified an optimal ratio for prediction. Best performance (accuracy: 0.70, sensitivity: 0.72, specificity: 0.87, Area Under Curve - Receiver Operating Characteristics (AUROC): 0.65, Area Under the Precision-Recall Curve (AUPRC): 0.43) was achieved using a 5-fold cross-validation when EHR data contributed one-third and ECG data two-thirds of the model’s input. This multimodal approach outperformed unimodal models, improving accuracy by 35% over EHR-only and 5% over ECG-only methods. Our results support the value of combining ECG and structured EHR information to improve accuracy and sensitivity in this pilot cohort, motivating validation in larger studies.

Keywords:

paroxysmal atrial fibrillation; electronic health records; electrocardiogram; multimodal data integration; deep learning; feature importance analysis

Graphical Abstract

1. Introduction

Paroxysmal atrial fibrillation (PAF), a major stroke risk factor, characterized by transient arrhythmic episodes, poses significant diagnostic challenges due to its sporadic nature, with up to 40% of cases remaining asymptomatic [1,2]. Traditional detection methods, such as Holter monitoring, often fail to capture these fleeting events, leading to delayed diagnosis and increased risk of recurrent stroke and systemic embolism [3,4]. Recent advances in machine learning have enabled novel approaches to PAF prediction, with convolutional neural networks (CNNs) achieving higher AUROCs than conventional ML methods using electrograms or 12-lead electrocardiograms (ECGs) alone [5,6,7,8,9]. However, traditional (unimodal) models face limitations: ECG-based approaches primarily detect electrophysiological anomalies, while electronic health record (EHR)-driven models rely on systemic risk factors such as hypertension and diabetes, which lack key temporal resolution [7,10].

Integrating ECG and EHR data could synergistically enhance predictive accuracy by contextualizing transient ECG abnormalities within longitudinal clinical profiles, a hypothesis supported by studies showing improvement in diagnosis prediction when combining these data sources [11]. Multimodal deep learning frameworks, particularly those employing attention mechanisms, have demonstrated promise in cardiac applications; Transformer architectures excel at modeling long-range dependencies in sequential data, while CNNs capture localized ECG features such as P-wave morphology and QRS complex variations [12,13]. For instance, Tzou et al. [13] achieved high sensitivity for PAF prediction by analyzing P-wave dynamics and skin sympathetic nerve activity using a wavelet–CNN hybrid, while Tang et al. [7] reported strong performance for PAF reoccurrence by fusing intracardiac electrograms with clinical variables. However, no study has systematically determined the optimal integration of raw ECG waveforms and structured EHR data for predicting PAF, representing a significant gap given the episodic and elusive nature of the condition [2,10].

To address this gap, our pilot study deliberately focuses on dissecting the relative contributions of ECG and EHR data. The primary objective was to develop a multimodal deep learning model for predicting PAF. We hypothesized that a multimodal deep learning model would outperform unimodal models by optimally balancing ECG and EHR contributions. We introduce a Transformer-based architecture that integrates denoised 12-lead ECG signals with 47 EHR parameters, including cardiac monitoring duration and hemoglobin levels. This approach quantifies the optimal ratio between ECG and EHR. Class imbalance is addressed through stochastic data augmentation techniques, time warping, amplitude scaling, and Gaussian noise injection [14]. We further discuss implications for early intervention strategies, address limitations in sensitivity, and propose future directions, including latent EHR feature exploration and prospective and external validation. This work advances personalized PAF management by bridging electrophysiological and systemic risk assessment through explainable multimodal learning [12,15]. This work is intended to support outpatient rhythm monitoring triage and not to replace clinician judgment.

2. Materials and Methods

2.1. Study Design and Data Collection

This study utilized a dataset comprising 189 cryptogenic stroke patients, collected by medical students at an academic hospital, Penn State College of Medicine, from January 2017 to May 2023 under an exempt Institutional Review Board protocol. The dataset included both ECG waveforms and EHR data. All data were validated by a cardiologist and a stroke neurologist to ensure diagnostic accuracy and clinical relevance. Reporting followed the TRIPOD-AI guideline; a completed checklist is provided in the Supplement (Table A1).

2.2. Inclusion/Exclusion

Inclusion criteria were adults ≥ 18 with cryptogenic ischemic stroke; participants were excluded if they had persistent/permanent AF before index ECG, non-12-lead ECG, or missing EHR core variables.

2.3. Data Preprocessing

ECG waveforms were extracted from XML files. Each 12-lead ECG signal was normalized to a range of 0–1 and preprocessed using wavelet denoising and bandpass filtering (0.5–40 Hz). Multiple representations of ECG data were explored, including raw signals, denoised signals, and denoised–filtered signals. Data augmentation techniques were applied to the ECG signals to address class imbalance, including time warping, amplitude scaling, baseline wander addition, Gaussian noise injection, random permutation, and random shifting. The augmentation probability (P_aug) and the number of augmented samples were optimized as hyperparameters.

We selected 47 clinically relevant EHR parameters based on known PAF risk factors and predictors. We used information available at the index stroke encounter. Predictors included demographics (age, sex, race, ethnicity); BMI (body mass index); comorbidities/new diagnoses at index (hypertension, type 2 diabetes, hyperlipidemia, hypercoagulability, chronic kidney disease, liver disease, active cancer, prior ischemic/hemorrhagic stroke, myocardial infarction, coronary artery disease, peripheral arterial disease, systemic embolism, dementia, systolic/diastolic heart failure); social and family history (tobacco, alcohol, family history of stroke or AF); stroke severity/imaging (NIHSS on admission, ipsilateral ICA stenosis > 50%, intracranial arterial disease, lacunar pattern, stroke laterality); echocardiography (ejection fraction %, left atrial size, left atrial enlargement, cardiac shunt/PFO); risk score (CHA₂DS₂-VASc); and renal function and labs (eGFR, hemoglobin, platelet count, ALT, AST, HDL, LDL, HbA1c). Following manual chart review, no missing values were present for demographics and laboratory measures. For binary comorbidity indicators, records with unpopulated fields were adjudicated from notes/problem lists; when documentation did not support the condition, the indicator was coded absent. Non-numeric features were excluded, and numeric features were normalized to a range of 0–1. From an initial 223 cases reviewed based on their EHR, we identified 197 with ECG listed for extraction. Ultimately, 189 of these ECGs were successfully extracted and matched to the reviewed EHR record, forming our final study cohort. The sample size was fixed by cohort availability, with 49 PAF events.

2.4. Deep Learning Model Architecture

The proposed model consisted of a hybrid architecture designed to integrate ECG and EHR data for binary classification of PAF risk. For ECG feature extraction, a series of CNN layers processed the ECG signals to extract temporal features. Multi-head attention layers captured long-range dependencies across leads. The compressed ECG features were reduced to a fixed dimensionality (n_ECG) using dense layers. For EHR feature processing, EHR features were processed through dense layers to reduce their dimensionality (n_EHR) while preserving critical information. To clearly define the optimal balance between modalities, we systematically adjusted the representation (compression dimensions) of ECG and EHR inputs to identify their relative contributions to predictive performance.

Compressed ECG and EHR features were concatenated and passed through additional dense layers for final classification. The model architecture was optimized through hyperparameter tuning, including the number of attention heads (4–8), compression dimensions (n_ECG and n_EHR), learning rate (10⁻⁴ to 10⁻³), and augmentation strategies. The tuning process specifically aimed to identify the optimal balance between ECG and EHR inputs for the best performance. Models were implemented in TensorFlow [16].

2.5. Training and Validation

The dataset was split into training and testing sets using 5-fold cross-validation to ensure robustness. Each experiment was repeated 10 times with different random seeds to assess variability in performance metrics. Data augmentation was performed during training with an augmentation probability (P_aug = 0.1) optimized for generalization without overfitting. The model was trained on NVIDIA RTX 6000 Ada GPU (Manufactured by NVIDIA Corporation, Santa Clara, CA, United States), using the Adam optimizer with an exponential learning rate decay schedule. Binary cross-entropy loss was used as the objective function. Test–time augmentation further improved prediction robustness by averaging predictions across augmented test samples. The training time was 50 GPU-hours. Decision curve analysis was not performed in this pilot due to limited events; clinical utility was not assessed.

2.6. Statistical Analysis

Model performance was evaluated using several metrics: accuracy, area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, precision, and F1 score. The statistical significance of performance differences between models was assessed using paired t-tests. We also conducted pairwise Wilcoxon tests versus the best overall EHR contribution for the top performing repeats across all metrics. The significance threshold was considered p < 0.05. Feature importance analysis was conducted using Random Forest models to identify the most predictive variables in the EHR dataset. Additionally, the contribution of ECG versus EHR data was analyzed by varying their respective compression dimensions in the model and evaluating performance changes.

The study pipeline, as shown in Figure 1, consists of data preparation, an example of a preprocessed 12-lead ECG waveform, and multimodal data processing in the deep learning pipeline for PAF prediction.

3. Results

The study included 189 cryptogenic stroke patients, of whom 49 (26%) had a diagnosis of PAF. The mean age of the cohort was 71.4 years. Patients with PAF were significantly older than those without (75.4 years vs. 70.0 years, p = 0.004). The cohort was predominantly female (57.7%) and White (82.5%), with no significant differences in sex or race between the PAF and non-AF groups. The average monitoring duration for the PAF group was longer than for the non-AF group (22.6 months vs. 18.3 months), though this difference was not statistically significant (p = 0.064) (Table 1, Figure A1).

The results of this study demonstrate the effectiveness of a multimodal deep learning model that integrates ECG (denoised + band-pass filtered 0.5–40 Hz) and EHR data for predicting PAF. Using 189 stroke patients, the model was evaluated across multiple configurations and metrics, with a primary focus on the accuracy, sensitivity, and specificity. Each best performing configuration achieved an accuracy of 0.70 (SD: 0.04), sensitivity of 0.72 (SD: 0.42), and specificity of 0.87 (SD: 0.06) (Appendix A, Table A2). The large SD for some metrics reflects fold-to-fold variability driven by the small number of PAF events per fold.

This model compressed ECG to 32 and EHR to 16 latent dimensions and incorporated 8 attention heads into the Transformer architecture. Data augmentation with a probability of 0.1 further enhanced model generalization without overfitting. Figure 2 illustrates the overall distribution of performance metrics of different architectures and compares key performance metrics for the best model configuration.

Models using only EHR data achieved an accuracy of 0.67 (SD: 0.2; p < 0.05), a sensitivity of 0.72 (SD: 0.42; p < 0.05), and a specificity of 0.80 (SD: 0.32; p < 0.05), while those using only ECG data showed an accuracy of 0.52 (SD: 0.06; p < 0.05), a sensitivity of 0.51 (SD: 0.23; p < 0.05), and a specificity of 0.84 (SD: 0.07; p < 0.05). The integration of both data modalities not only enhanced overall predictive performance but also highlighted the critical role of achieving the right balance between ECG and EHR inputs. Systematic analysis demonstrated that predictive performance peaked when EHR data comprised approximately one-third of the model input (Figure 3, Table A3), underscoring the importance of balancing clinical data with electrophysiological information. Exploratory Wilcoxon tests versus the 33% EHR condition showed significant differences for nearly all comparisons (Appendix E, Table A4); the only non-significant pair was accuracy versus 67% (p = 0.33).

Figure 4 ranks EHR feature importance. Higher ranks included age, hemoglobin, and left-atrial measures, consistent with known PAF risk correlates.

4. Discussion

The findings from this study highlight the potential of multimodal deep learning models in predicting PAF by leveraging complementary information from ECG waveforms and EHR data. Although the dataset is not large and imbalanced, we employed 5-fold cross-validation and repeated the experiments 10 times to avoid potential overfitting. A novel aspect of this work is the direct comparison of ECG and EHR contributions, which reveals that an optimal balance is achieved when EHR data accounts for approximately 33% of the overall input. The observed improvement in predictive performance when combining these modalities aligns with prior studies [7,17,18]. For instance, Tang et al. demonstrated that integrating ECG with clinical features improved PAF recurrence prediction after catheter ablation [7]. Also, Khurshid et al. reported that combining ECG with clinical risk factors yielded complementary benefits for PAF prediction [19]. Recent studies suggest links between brain tissue susceptibility and vascular/arrhythmic risk [20,21]. In clinics, multimodal risk scores could be used to prioritize patients for extended ambulatory ECG and earlier cardiology follow-up.

Although our model’s accuracy is comparable to other studies, our critical insight is that optimizing the EHR data contribution significantly enhances model specificity and maintains clinical interpretability, a vital step for real-world applicability [6,22]. However, when considering the optimal contribution balance, the 33% input from EHR data emerges as a factor in attaining this performance level. This discrepancy may be attributed to differences in dataset size, patient demographics, or the transient nature of PAF, which present unique challenges for prediction.

The relatively low sensitivity observed in our study reflects the difficulty in detecting sporadic PAF episodes from limited data points, a limitation also noted in other works focusing on PAF [23]. Nonetheless, the high specificity indicates that our model is effective at ruling out low-risk cases, and our analysis suggests that the EHR contribution plays a pivotal role in enhancing specificity while maintaining an acceptable trade-off with sensitivity.

Regarding feature importance analysis (Figure 4 and Figure A2), hemoglobin levels emerged as an important feature, potentially reflecting underlying anemia, polycythemia [24], or other systemic conditions associated with PAF risk [18,25]. This further reinforces the value of the EHR component in our multimodal approach, underscoring that even a one-third contribution from EHR data can capture essential clinical nuances that improve overall model performance.

While our results are promising, several limitations warrant consideration. The primary limitation is the sample size of 189 patients from a single center, which may affect the generalizability of our findings. We observed instability in some metrics, which is attributable to the small pilot cohort and rare positives per CV fold. Accordingly, the retrospective nature of this pilot study requires our model to be validated in larger, more diverse populations, preferably through prospective studies. Given the pilot scope, we did not include full classical ML benchmarking; this is planned for a larger, prospective, multi-site validation. Furthermore, the relatively low sensitivity observed in our models reflects the inherent difficulty of predicting sporadic PAF episodes from limited data. Future work should focus on external validation to confirm our findings and refine the optimal 33% EHR data contribution as a benchmark for improving predictive accuracy and sensitivity (e.g., class-weighted or focal loss) in different clinical settings. Also, medication exposures (e.g., aspirin, statin) were not used at index to avoid post-stroke initiation bias; they are planned for future prospective cohorts.

5. Conclusions

This pilot study provides evidence supporting the use of multimodal deep learning for predicting paroxysmal atrial fibrillation among stroke patients by combining ECG and EHR data. Our analysis uniquely reveals that optimal performance is achieved when EHR data contributes approximately 33% of the overall input, underscoring the critical importance of balancing heterogeneous data sources. By leveraging complementary information from these modalities, our approach offers a scalable solution for early risk stratification and intervention in clinical practice. Future research should validate our findings externally, dynamically optimize the balance between EHR and ECG data, and explore real-time clinical deployment to enhance early PAF detection, clinical decision making, and patient outcomes.

Author Contributions

All authors affirm that they contributed to the writing of the manuscript. A.V.S.: Writing—review and editing, Visualization, Conceptualization, Formal analysis, Methodology, Data curation. M.M., D.O. and N.S.: Data curation, Writing—review and editing. S.S.H. and A.V.: Writing—review and editing. R.S. and A.M.: Data curation, Writing—review and editing. V.A.: Writing—review and editing, Validation, Supervision. R.Z.: Writing—review and editing, Conceptualization, Supervision, Project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki. The research protocol was reviewed by the Institutional Review Board of Penn State College of Medicine (protocol number STUDY00023096). The Human Research Protection Program determined that the research met the criteria for exempt research according to institutional policies and federal regulations and therefore waived the need for formal IRB review.

Informed Consent Statement

Patient consent was waived for this study. The research involved a retrospective analysis of existing data, and it was not practicable to obtain consent from the individuals whose data were used. The Institutional Review Board approved this waiver, as the study posed minimal risk to the subjects.

Data Availability Statement

The data that support the findings of this study are not publicly available due to institutional policies and to protect patient privacy and confidentiality.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUROC	Area under the receiver operating characteristic curve
CNN	Convolutional neural network
ECG	Electrocardiogram
EHR	Electronic health record
PAF	Paroxysmal atrial fibrillation

Appendix A. Additional Evaluation and Interpretability

We include the receiver operating characteristic (ROC) and precision–recall (PR) curves for the best performing multimodal configuration (Figure A1). The observed areas match the summary in the manuscript (AUROC ≈ 0.65; AUPRC ≈ 0.43), providing a fuller view of threshold behavior in this imbalanced setting.

Figure A1. ROC and precision–recall curves for the best multimodal model. (a): ROC curve; the dashed diagonal indicates chance performance. (b): Precision–Recall curve.

Beyond the EHR feature importance analysis already shown in the main text, we computed SHAP attributions for the multimodal model to characterize local (per-patient) contributions. A representative waterfall plot (Figure A2) illustrates how individual features push the prediction toward/away from PAF for a correctly classified case.

Figure A2. SHAP waterfall plot for a representative prediction. Local explanation showing how the top features cumulatively shift the log-odds toward PAF (right direction) or away from PAF (left direction).

Appendix B

The following is a TRIPOD-AI checklist.

Table A1. TRIPOD-AI table.

Item	Evidence in Manuscript
Title/Abstract identify model type and purpose	Title and abstract specify multimodal deep learning to predict PAF in stroke patients; metrics reported.
Background and objectives (intended use/clinical context)	Rationale for combining ECG+EHR; objective to develop a multimodal model and examine relative contributions.
Source of data and study design/setting	Single-center retrospective cohort; Penn State COM; accrual Jan 2017–May 2023; cardiologist and stroke neurologist validated data.
Participants (eligibility, selection, numbers)	Cryptogenic stroke patients; flow from 223 reviewed → 197 with ECG → 189 final.
Outcome (definition, timing, how assessed, blinding)	Outcome is PAF; data validated by specialists.
Predictors (definitions, timing, measurement)	47 EHR variables (demographics, labs, comorbidities); 12-lead ECG; preprocessing described.
Sample size (rationale)	n = 189; 49 events (26%).
Handling of missing data	No imputation; chart-adjudicated binary comorbidities coded 0.
Bias, drift, data splits (leakage prevention)	5-fold CV; repeated with different seeds.
Modeling details (algorithms, hyperparameters, class imbalance)	Hybrid CNN + attention for ECG; MLP for EHR; concatenation; tuned heads 4–8, learning rate 1 × 10⁻⁴–1 × 10⁻³; augmentation probability 0.1; compression dimensions varied.
Internal validation	5-fold cross-validation; 10 repeats; test-time augmentation.
Performance measures (discrimination, calibration, CIs)	Accuracy, AUROC, sensitivity, specificity, precision, F1; mean ± SD and p-values; AUROC/AUPRC reported/best values.
Explainability	Global EHR feature importance (Random Forest); figures.
Subgroups/fairness	Appendix D Table A3
Model presentation (final model, thresholds, access)	All codes are shared at https://github.com/TheDecodeLab/AFib-multimodal.git. (accessed on 4 September 2025)
Clinical utility	Not assessed (pilot; limited events).
Reproducibility (software, versions)	Python 3.11, TensorFlow 2.20.
Data availability	Data not publicly available due to privacy.

Appendix C

Below is an exploratory subgroup analysis by age (≥52 vs. <52), sex, race, and ethnicity. Values are mean accuracy (% ± SD) across 5-fold × 2-repeat cross-validation.

Table A2. Subgroup accuracy (% ± SD) and p-values by age, sex, race, and ethnicity (5-fold × 10-repeat CV). p-values compare groups within each category. Age split at 52 years.

Category	Group	n	Accuracy (%)	SD (%)	p-Value
Age	Old (≥52)	178	0.66	0.27	0.6413
Age	Young (<52)	11	0.71	0.24	0.6413
Sex	Male	80	0.69	0.28	0.0997
Sex	Female	109	0.65	0.26	0.0997
Race	White	156	0.67	0.27	0.6539
	Black	17	0.63	0.27
	Asian	1	0.22	-
	Others	15	0.62	0.27
Ethnicity	Hispanic	3	0.62	0.33	0.98
Ethnicity	Non-Hispanic	186	0.67	0.27	0.98

Appendix D

Table A3 summarizes the hyperparameters and cross-validation performance for the best model: eight heads, number component ECG 32, number component EHR 16 (≈33% EHR: 67% ECG at fusion), Adam optimizer, initial learning rate 1 × 10⁻⁴, batch size 32, 100 epochs, minority oversampling within training folds, and 5 times test-time augmentation.

Table A3. Final hyperparameters and summary performance for the best performing model (multimodal ECG+EHR).

Component	Setting (Value)	Notes
Architecture	CNN encoder → Multi-Head Attention × 1 → latent compressors → concat → classifier	Transformer-style attention stack
Attention heads	8	—
ECG representation	Denoised + band-pass filtered 12-lead ECG (600 × 12)	Final choice used for the best model
EHR feature set	47 structured predictors at index encounter	Demographics, comorbidities, labs, echo, stroke features
Fusion latent (ECG)	32
Fusion latent (EHR)	16	~33% EHR: 67% ECG contribution
Augmentation (train)	0.1	None used in the final model
Test-time augmentation	5 replicates (averaged)	—
Optimizer/Loss	Adam, binary cross-entropy	—
Initial learning rate	1 × 10⁻⁴	Fixed across folds
Epochs/Batch size	100/32	Early stopping as implemented
Class imbalance handling	Minority oversampling in training folds	Evaluation on the original class mix
Cross-validation	5-fold × 2 repeats	Model selection by mean CV AUROC
Performance (CV mean)	AUROC = 0.654 ± 0.071; Accuracy = 0.751 ± 0.092; Specificity = 0.859 ± 0.105; F1 = 0.487 ± 0.106	Threshold metrics from CV; full ROC/PR in Figure A1

Appendix E

We compared performance across ECG-EHR contribution settings using pairwise tests versus the 33% EHR condition, which showed the highest metric. For each setting, we retained the top existing model instances to focus on the best achievable operating regime per condition. We then performed two-sided Wilcoxon rank-sum tests across the comparisons per metric.

Table A4. Pairwise comparisons vs 33% HER.

Comparison with 33% EHR	AUROC	Accuracy	Sensitivity	Specificity	F1 Score
0%	≤0.05	≤0.05	≤0.05	≤0.05	≤0.05
11%	≤0.05	≤0.05	≤0.05	≤0.05	≤0.05
20%	≤0.05	≤0.05	≤0.05	≤0.05	≤0.05
50%	≤0.05	≤0.05	≤0.05	≤0.05	≤0.05
67%	≤0.05	0.33	≤0.05	≤0.05	≤0.05
80%	≤0.05	≤0.05	≤0.05	≤0.05	≤0.05
100%	≤0.05	≤0.05	≤0.05	≤0.05	≤0.05

References

Boon, K.H.; Khalil-Hani, M.; Malarvili, M.B.; Sia, C.W. Paroxysmal Atrial Fibrillation Prediction Method with Shorter HRV Sequences. Comput. Methods Programs Biomed. 2016, 134, 187–196. [Google Scholar] [CrossRef]
Linz, D.; Hermans, A.; Tieleman, R.G. Early Atrial Fibrillation Detection and the Transition to Comprehensive Management. Europace 2021, 23, ii46–ii51. [Google Scholar] [CrossRef]
Miyazaki, Y.; Yamagata, K.; Ishibashi, K.; Inoue, Y.; Miyamoto, K.; Nagase, S.; Aiba, T.; Kusano, K. Paroxysmal Atrial Fibrillation as a Predictor of Pacemaker Implantation in Patients with Unexplained Syncope. J. Cardiol. 2022, 80, 28–33. [Google Scholar] [CrossRef]
Nayak, T.; Lohrmann, G.; Passman, R. Controversies in Diagnosis and Management of Atrial Fibrillation. Cardiol. Rev. 2024. [Google Scholar] [CrossRef]
Xia, Y.; Wulan, N.; Wang, K.; Zhang, H. Detecting Atrial Fibrillation by Deep Convolutional Neural Networks. Comput. Biol. Med. 2018, 93, 84–92. [Google Scholar] [CrossRef]
Raghunath, S.; Pfeifer, J.M.; Ulloa-Cerna, A.E.; Nemani, A.; Carbonati, T.; Jing, L.; vanMaanen, D.P.; Hartzel, D.N.; Ruhl, J.A.; Lagerman, B.F.; et al. Deep Neural Networks Can Predict New-Onset Atrial Fibrillation From the 12-Lead ECG and Help Identify Those at Risk of Atrial Fibrillation-Related Stroke. Circulation 2021, 143, 1287–1298. [Google Scholar] [CrossRef] [PubMed]
Tang, S.; Razeghi, O.; Kapoor, R.; Alhusseini, M.I.; Fazal, M.; Rogers, A.J.; Rodrigo Bort, M.; Clopton, P.; Wang, P.J.; Rubin, D.L.; et al. Machine Learning-Enabled Multimodal Fusion of Intra-Atrial and Body Surface Signals in Prediction of Atrial Fibrillation Ablation Outcomes. Circ. Arrhythm. Electrophysiol. 2022, 15, e010850. [Google Scholar] [CrossRef] [PubMed]
Bhagubai, M.; Vandecasteele, K.; Swinnen, L.; Macea, J.; Chatzichristos, C.; De Vos, M.; Van Paesschen, W. The Power of ECG in Semi-Automated Seizure Detection in Addition to Two-Channel behind-the-Ear EEG. Bioengineering 2023, 10, 491. [Google Scholar] [CrossRef] [PubMed]
Neri, L.; Gallelli, I.; Dall’Olio, M.; Lago, J.; Borghi, C.; Diemberger, I.; Corazza, I. Validation of a New and Straightforward Algorithm to Evaluate Signal Quality during ECG Monitoring with Wearable Devices Used in a Clinical Setting. Bioengineering 2024, 11, 222. [Google Scholar] [CrossRef]
Murat, F.; Sadak, F.; Yildirim, O.; Talo, M.; Murat, E.; Karabatak, M.; Demir, Y.; Tan, R.-S.; Acharya, U.R. Review of Deep Learning-Based Atrial Fibrillation Detection Studies. Int. J. Environ. Res. Public Health 2021, 18, 11302. [Google Scholar] [CrossRef]
Bhattacharya, A.; Sadasivuni, S.; Chao, C.-J.; Agasthi, P.; Ayoub, C.; Holmes, D.R.; Arsanjani, R.; Sanyal, A.; Banerjee, I. Multi-Modal Fusion Model for Predicting Adverse Cardiovascular Outcome Post Percutaneous Coronary Intervention. Physiol. Meas. 2022, 43, 124004. [Google Scholar] [CrossRef]
Jo, Y.-Y.; Cho, Y.; Lee, S.Y.; Kwon, J.-M.; Kim, K.-H.; Jeon, K.-H.; Cho, S.; Park, J.; Oh, B.-H. Explainable Artificial Intelligence to Detect Atrial Fibrillation Using Electrocardiogram. Int. J. Cardiol. 2021, 328, 104–110. [Google Scholar] [CrossRef] [PubMed]
Tzou, H.-A.; Lin, S.-F.; Chen, P.-S. Paroxysmal Atrial Fibrillation Prediction Based on Morphological Variant P-Wave Analysis with Wideband ECG and Deep Learning. Comput. Methods Programs Biomed. 2021, 211, 106396. [Google Scholar] [CrossRef]
Machine Learning for Detecting Atrial Fibrillation from ECGs: Systematic Review and Meta-Analysis. Available online: https://www.imrpress.com/journal/RCM/25/1/10.31083/j.rcm2501008/htm (accessed on 2 February 2025).
Jabbour, G.; Nolin-Lapalme, A.; Tastet, O.; Corbin, D.; Jordà, P.; Sowa, A.; Delfrate, J.; Busseuil, D.; Hussin, J.G.; Dubé, M.-P.; et al. Prediction of Incident Atrial Fibrillation Using Deep Learning, Clinical Models, and Polygenic Scores. Eur. Heart J. 2024, 45, 4920–4934. [Google Scholar] [CrossRef]
Abadi, M. TensorFlow: Learning Functions at Scale. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, Nara, Japan, 18–22 September 2016; Association for Computing Machinery: New York, NY, USA, 2016; p. 1. [Google Scholar]
Qiu, Y.; Guo, H.; Wang, S.; Yang, S.; Peng, X.; Xiayao, D.; Chen, R.; Yang, J.; Liu, J.; Li, M.; et al. Deep Learning-Based Multimodal Fusion of the Surface ECG and Clinical Features in Prediction of Atrial Fibrillation Recurrence Following Catheter Ablation. BMC Med. Inform. Decis. Mak. 2024, 24, 225. [Google Scholar] [CrossRef]
Lin, F.; Zhang, P.; Chen, Y.; Liu, Y.; Li, D.; Tan, L.; Wang, Y.; Wang, D.W.; Yang, X.; Ma, F.; et al. Artificial-Intelligence-Based Risk Prediction and Mechanism Discovery for Atrial Fibrillation Using Heart Beat-to-Beat Intervals. Med 2024, 5, 414–431.e5. [Google Scholar] [CrossRef] [PubMed]
Khurshid, S.; Friedman, S.; Reeder, C.; Di Achille, P.; Diamant, N.; Singh, P.; Harrington, L.X.; Wang, X.; Al-Alusi, M.A.; Sarma, G.; et al. ECG-Based Deep Learning and Clinical Risk Factors to Predict Atrial Fibrillation. Circulation 2022, 145, 122–133. [Google Scholar] [CrossRef] [PubMed]
Uchida, Y.; Kan, H.; Kano, Y.; Onda, K.; Sakurai, K.; Takada, K.; Ueki, Y.; Matsukawa, N.; Hillis, A.E.; Oishi, K. Longitudinal Changes in Iron and Myelination Within Ischemic Lesions Associate With Neurological Outcomes: A Pilot Study. Stroke 2024, 55, 1041–1050. [Google Scholar] [CrossRef]
Uchida, Y.; Kan, H.; Inoue, H.; Oomura, M.; Shibata, H.; Kano, Y.; Kuno, T.; Usami, T.; Takada, K.; Yamada, K.; et al. Penumbra Detection With Oxygen Extraction Fraction Using Magnetic Susceptibility in Patients With Acute Ischemic Stroke. Front. Neurol. 2022, 13, 752450. [Google Scholar] [CrossRef]
Attia, Z.I.; Noseworthy, P.A.; Lopez-Jimenez, F.; Asirvatham, S.J.; Deshmukh, A.J.; Gersh, B.J.; Carter, R.E.; Yao, X.; Rabinstein, A.A.; Erickson, B.J.; et al. An Artificial Intelligence-Enabled ECG Algorithm for the Identification of Patients with Atrial Fibrillation during Sinus Rhythm: A Retrospective Analysis of Outcome Prediction. Lancet 2019, 394, 861–867. [Google Scholar] [CrossRef]
Chen, W.; Zheng, P.; Bu, Y.; Xu, Y.; Lai, D. Achieving Real-Time Prediction of Paroxysmal Atrial Fibrillation Onset by Convolutional Neural Network and Sliding Window on R-R Interval Sequences. Bioengineering 2024, 11, 903. [Google Scholar] [CrossRef] [PubMed]
Abstract 16068: Polycythemia Vera Is Associated with Increased Atrial Fibrillation Compared to the General Population: Results from the National Inpatient Sample Database|Circulation. Available online: https://www.ahajournals.org/doi/10.1161/circ.134.suppl_1.16068?utm_source=chatgpt.com (accessed on 13 July 2025).
Truong, E.T.; Lyu, Y.; Ihdayhid, A.R.; Lan, N.S.R.; Dwivedi, G. Beyond Clinical Factors: Harnessing Artificial Intelligence and Multimodal Cardiac Imaging to Predict Atrial Fibrillation Recurrence Post-Catheter Ablation. J. Cardiovasc. Dev. Dis. 2024, 11, 291. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overview of the study pipeline, including (a) data preparation steps with patient inclusion numbers, (b) an example of a preprocessed 12-lead ECG waveform, and (c) the deep learning pipeline integrating multimodal ECG and EHR data for paroxysmal atrial fibrillation (PAF).

Figure 2. (a) Performance metrics across different deep learning architectures. (b) Spider plot illustrating the performance metrics (AUROC, sensitivity, specificity, precision, accuracy, and F1 score) for the best performing model configuration. The plot highlights the balanced trade-offs achieved across all six metrics.

Figure 3. Impact of varying the relative contribution of EHR versus ECG data on predictive performance metrics. Moving left along the x-axis increases ECG data’s contribution (reducing EHR), whereas moving right increases EHR data’s contribution. For comparison, the dashed line represents baseline performance (dummy classifier).

Figure 4. Feature importance analysis identifying key predictors in the EHR dataset. The error bars represent variability through training random seeds. The symbol ‘@’ indicates measurements obtained at the index stroke.

Table 1. Baseline demographic and clinical characteristics of the study population, stratified by paroxysmal atrial fibrillation (PAF) diagnosis. The p-values show the significant difference between PAF and no PAF.

Variable	Total (n = 189)	No PAF (n = 140)	PAF (n = 49)	p-Value
Age (years)	71.4 ± 11.4	70.0 ± 11.6	75.4 ± 9.6	0.004
Sex, n (%)				0.452
Male	80 (42.3)	62 (44.3)	18 (36.7)
Female	109 (57.7)	78 (55.7)	31 (63.3)
Race, n (%)				0.205
White	156 (82.5)	116 (82.9)	40 (81.6)
Black	17 (9.0)	13 (9.3)	4 (8.2)
Asian	1 (0.5)	0 (0.0)	1 (2.0)
Others	14 (7.4)	11 (7.9)	3 (6.1)
Ethnicity, n (%)				0.999
Hispanic	3 (1.6)	2 (1.4)	1 (2.0)
Non-Hispanic	186 (98.4)	138 (98.6)	48 (98.0)
Monitoring (months)	19.4 ± 13.9	18.3 ± 13.6	22.6 ± 14.5	0.064

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vafaei Sadr, A.; Mareboina, M.; Orabueze, D.; Sarkar, N.; Hejazian, S.S.; Vemuri, A.; Shah, R.; Maheshwari, A.; Zand, R.; Abedi, V. Integration of EHR and ECG Data for Predicting Paroxysmal Atrial Fibrillation in Stroke Patients. Bioengineering 2025, 12, 961. https://doi.org/10.3390/bioengineering12090961

AMA Style

Vafaei Sadr A, Mareboina M, Orabueze D, Sarkar N, Hejazian SS, Vemuri A, Shah R, Maheshwari A, Zand R, Abedi V. Integration of EHR and ECG Data for Predicting Paroxysmal Atrial Fibrillation in Stroke Patients. Bioengineering. 2025; 12(9):961. https://doi.org/10.3390/bioengineering12090961

Chicago/Turabian Style

Vafaei Sadr, Alireza, Manvita Mareboina, Diana Orabueze, Nandini Sarkar, Seyyed Sina Hejazian, Ajith Vemuri, Ravi Shah, Ankit Maheshwari, Ramin Zand, and Vida Abedi. 2025. "Integration of EHR and ECG Data for Predicting Paroxysmal Atrial Fibrillation in Stroke Patients" Bioengineering 12, no. 9: 961. https://doi.org/10.3390/bioengineering12090961

APA Style

Vafaei Sadr, A., Mareboina, M., Orabueze, D., Sarkar, N., Hejazian, S. S., Vemuri, A., Shah, R., Maheshwari, A., Zand, R., & Abedi, V. (2025). Integration of EHR and ECG Data for Predicting Paroxysmal Atrial Fibrillation in Stroke Patients. Bioengineering, 12(9), 961. https://doi.org/10.3390/bioengineering12090961

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integration of EHR and ECG Data for Predicting Paroxysmal Atrial Fibrillation in Stroke Patients

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design and Data Collection

2.2. Inclusion/Exclusion

2.3. Data Preprocessing

2.4. Deep Learning Model Architecture

2.5. Training and Validation

2.6. Statistical Analysis

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Additional Evaluation and Interpretability

Appendix B

Appendix C

Appendix D

Appendix E

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI