Machine Learning Modeling to Predict Atrial Fibrillation Detection in Embolic Stroke of Undetermined Source Patients

Background: In patients with embolic stroke of undetermined source (ESUS), occult atrial fibrillation (AF) has been implicated as a key source of cardioembolism. However, only a minority acquire implantable cardiac loop recorders (ILRs) to detect occult paroxysmal AF, partly due to financial cost and procedural inconvenience. Without the initiation of appropriate anticoagulation, these patients are at risk of increased ischemic stroke recurrence. Hence, cost-effective and accurate methods of predicting AF in ESUS patients are highly sought after. Objective: We aimed to incorporate clinical and echocardiography data into machine learning (ML) algorithms for AF prediction on ILRs in ESUS. Methods: This was a single-center cohort study that included 157 consecutive patients diagnosed with ESUS from October 2014 to October 2017 who had ILR evaluation. We developed four ML models, with hyperparameters tuned, to predict AF detection on an ILR. Results: The median age of the cohort was 67 (IQR 59–74) years old and the median monitoring duration was 1051 (IQR 478–1287) days. Of the 157 patients, 32 (20.4%) had occult AF detected on the ILR. Support vector machine predicted for AF with a 95% confidence interval area under the receiver operating characteristic curve (AUC) of 0.736–0.737, multilayer perceptron with an AUC of 0.697–0.708, XGBoost with an AUC of 0.697–0.697, and random forest with an AUC of 0.663–0.674. ML feature importance found that age, HDL-C, and admitting heart rate were important non-echocardiography variables, while peak mitral A-wave velocity and left atrial volume were important echocardiography parameters aiding this prediction. Conclusion: Machine learning modeling incorporating clinical and echocardiographic variables predicted AF in ESUS patients with moderate accuracy.


Introduction
The annual burden of ischemic stroke (IS) has increased substantially in the past 20 years, with 12.2 million incident cases in 2019 [1].Embolic stroke of undetermined source (ESUS) accounts for approximately 17% of acute IS cases and confers an increased stroke recurrence risk of 4-5% annually [2].ESUS is defined as a non-lacunar brain infarct in the absence of extracranial or intracranial atherosclerosis, major-risk cardioembolic sources, and any other specific cause of stroke [3].Proposed etiologies of ESUS include atrial cardiopathy, non-obstructive arterial atherosclerotic plaques, left ventricular (LV) systolic dysfunction, cardiac valvular disease, patent foramen ovale, and cancer [4,5].
In patients with ESUS, occult paroxysmal atrial fibrillation (AF) has been implicated as a key occult source of embolism.In the prospective ASSERT cohort study, 10% of patients with ESUS were found to have atrial tachyarrhythmias detected on an implantable cardiac loop recorder (ILR), with an increased hazard rate of 5.6 for clinical AF at 2.5 years [6].An ILR allows for the prolonged recording of heart rhythms for up to three years, increasing the detection rate of paroxysmal cardiac arrhythmias such as AF [7].In patients with ESUS, continuous heart rhythm monitoring using ILRs identifies AF in approximately 30% of these patients [8], and previous studies have shown that the presence of such paroxysmal AF episodes conferred an increased risk of stroke [6].Despite current recommendations, only a minority of patients with ESUS receive prolonged cardiac monitoring owing to patient preferences, procedural inconvenience, and the significant cost of an ILR implantation [9,10].This leads to a missed opportunity to institute ideal treatment with oral anticoagulation to mitigate the risk of a recurrent stroke.As such, cost-effective and accurate ways to risk stratify patients and predict for underlying occult AF in patients with ESUS are highly sought after.
Machine learning (ML) is a branch of artificial intelligence that utilizes data and algorithms to learn and make predictions.While there is established evidence regarding the association of echocardiographic parameters such as atrial and ventricular size, valvular heart disease, and left atrial thrombus with AF, ML using echocardiographic parameters has not been adequately applied to predict AF in patients with ESUS.The incorporation of echocardiographic parameters alongside clinical variables may enhance the predictive accuracy of ML models.
Hence, the aim of this study was to develop an ML prediction model to predict paroxysmal AF in patients with ESUS using a combination of clinical parameters, biomarkers, and echocardiographic parameters.We hypothesized that an ML model incorporating a combination of clinical and echocardiographic parameters will predict the occurrence of AF with moderate-high accuracy and provide insights on important variables aiding this prediction that may not be identified using traditional statistics.

Study Design
This study involved a retrospective cohort of consecutive patients with ESUS diagnosis from a stroke unit at a tertiary care hospital from October 2014 to October 2017.All 291 patients were offered an ILR, of which 157 proceeded with implantation and were included in the study and analyzed.Clinical and ILR data were collected from the institution's electronic medical record and electronic ESUS database.The data collected comprised patient demographics, medical comorbidities, and laboratory and imaging results.Quantified data from echocardiography were recorded as categorical or numerical features and not run through the ML models as visual images.ESUS was defined according to the criteria outlined by the Cryptogenic Stroke/ESUS International Working Group as a non-lacunar brain infarction without the following: extracranial or intracranial atherosclerosis resulting in a luminal stenosis of 50% in the arteries supplying the area of infarction, major cardioembolic source, and other specific cause (e.g., vasculopathy, dissection, vasospasm, or thrombophilia) [3].All the ILR data were extracted and evaluated by a trained electrophysiologist.This study is reported following the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist [11].Ethics approval was obtained from the National Healthcare Group Domain Specific Review Board (NHG DSRB Reference: 2021/00623).The study was conducted in accordance with the Declaration of Helsinki.Exemption for informed consent taking was given in view of the retrospective nature of the cohort and use of de-identified data.

Data Pre-Processing for Machine Learning Models
Descriptive statistics were used to compare characteristics between AF and non-AF groups, with Pearson's Chi-squared test used for categorical variables and the Mann-Whitney U test for continuous variables.The dataset was first divided into 70% for the training set and 30% for the test set using stratified sampling from the Scikit-learn library [12].IterativeImputer from the Scikit-learn library was used to impute missing values in numerical features, while missing values in categorical features were imputed using mode.Integer encoding was applied to ordinal variables, while one-hot encoding was applied to nominal variables to ensure compatibility with ML algorithms.

Overcoming an Imbalanced Dataset
Several resampling techniques were employed to balance the target distribution in the training set using the imbalanced-learn library in Python [13].We implemented three broad resampling strategies, namely oversampling, undersampling, and a combination of both.We utilized two approaches to implement oversampling-random oversampling and the synthetic minority oversampling technique (SMOTE) [14].Undersampling was implemented using random undersampling [15].Furthermore, we proposed two combinations of oversampling and undersampling to balance the target distribution.Random oversampling and random undersampling were used jointly in the first combination while SMOTE and random undersampling were used jointly in the second combination.

Machine Learning Algorithms
Several ML algorithms, namely support vector machine (SVM), random forest, extreme gradient boosting (XGBoost), and multilayer perceptron (MLP), were implemented in Python version 3.9.12 with the aid of open-source packages from Scikit-learn version 1.2.0 and XGBoost version 1.7.2 [12].Before resampling the training set, StandardScalar from the Scikit-learn library helped standardize numerical variables for SVM and MLP so that each variable had zero mean and unit variance.
SVM was implemented using Support Vector Classifier, random forest using Random Forest Classifier, XGBoost using XGBoost Classifier, and MLP using MLP Classifier, all of which were derived from the Scikit-learn library.Additionally, XGBoost leveraged the XGBoost library.Our MLP models contained one input layer, one hidden layer, and one output layer.The hyperparameters tuned using grid search for each ML model can be found in Supplementary Material: Table S1.
Feature selection was additionally performed on SVM with linear kernel, XGBoost, and random forest.Features were selected based on feature importance in the Scikit-learn library, which measured the individual contribution of each feature towards the performance of the respective classifier [16].Thus, features with an absolute importance value greater than or equal to the specified threshold selected using grid search were retained in these models.Outside of feature selection models, feature importance values were also obtained for our best-performing random forest model (RF with random undersampling and hyperparameter tuning) without reducing the number of features (Supplementary Materials: Figure S1).Shapley additive explanation (SHAP) [17] was performed on our best-performing model, namely SVM with random oversampling and hyperparameter tuning (Figure 1), with mean absolute SHAP values obtained as a unified measure of feature importance.and hyperparameter tuning) without reducing the number of features (Supplementary Materials: Figure S1).Shapley additive explanation (SHAP) [17] was performed on our best-performing model, namely SVM with random oversampling and hyperparameter tuning (Figure 1), with mean absolute SHAP values obtained as a unified measure of feature importance.

Figure 1. Mean absolute SHAP values of features in best-performing SVM model (top 20 features displayed)
. Abbreviations: SHAP, Shapley additive explanation; SVM, support vector machine.Abbreviations of all features can be found in Supplementary Materials: Table S2.

Performance Metrics
For each ML model, the performance scores of 20 iterations were obtained, with each iteration using a different random state.In each iteration, a grid search with 5-fold crossvalidation was utilized to select the best hyperparameter set for the model.The 95% confidence interval was calculated for each test performance metric by aggregating the results from 20 iterations.As the outcome was categorical, the summary performance estimates used included test AUC, accuracy, sensitivity, specificity, and F1 score.

Baseline Characteristics
This study included 157 patients who had an ILR implantation after ESUS.The median age was 67 years (IQR 59-74), with 43 (27.4%)patients being female and 128 (81.5%) being of Chinese ethnicity.Patients were monitored for a median duration of 2.88 (IQR 1.31-3.52)years.There were 108 (68.8%) patients with hypertension, 96 (61.1%) with hyperlipidemia, 60 (38.2%) who were current or previous smokers, and 60 (38.2%) with diabetes mellitus.The median National Institutes of Health Stroke Scale (NIHSS) score was 4 (IQR 1.5-8.5).Of the 157 patients, 32 (20.4%) had AF detected on their ILR subsequently.Comparing the group with AF detected versus the group without, the median age and high-density lipoprotein cholesterol (HDL-C) were significantly higher, while the admitting heart rate was significantly lower (Table 1).Apart from the proportion of patients Abbreviations of all features can be found in Supplementary Materials: Table S2.

Performance Metrics
For each ML model, the performance scores of 20 iterations were obtained, with each iteration using a different random state.In each iteration, a grid search with 5-fold cross-validation was utilized to select the best hyperparameter set for the model.The 95% confidence interval was calculated for each test performance metric by aggregating the results from 20 iterations.As the outcome was categorical, the summary performance estimates used included test AUC, accuracy, sensitivity, specificity, and F1 score.

Baseline Characteristics
This study included 157 patients who had an ILR implantation after ESUS.The median age was 67 years (IQR 59-74), with 43 (27.4%)patients being female and 128 (81.5%) being of Chinese ethnicity.Patients were monitored for a median duration of 2.88 (IQR 1.31-3.52)years.There were 108 (68.8%) patients with hypertension, 96 (61.1%) with hyperlipidemia, 60 (38.2%) who were current or previous smokers, and 60 (38.2%) with diabetes mellitus.The median National Institutes of Health Stroke Scale (NIHSS) score was 4 (IQR 1.5-8.5).Of the 157 patients, 32 (20.4%) had AF detected on their ILR subsequently.Comparing the group with AF detected versus the group without, the median age and high-density lipoprotein cholesterol (HDL-C) were significantly higher, while the admitting heart rate was significantly lower (Table 1).Apart from the proportion of patients with mitral stenosis, there were no significant differences in echocardiography parameters between these two groups (Table 2).The top five predictive features in AF detection on an ILR are heart rate, estimated glomerular filtration rate (eGFR), age, height, and HDL-C, before optimizing the number of features retained (Supplementary Materials: Figure S1).

Feature Importance via SHAP in SVM
The mean absolute SHAP values of features were obtained from our best performing model-SVM with random oversampling only and with hyperparameter tuning-of which the top 20 features are displayed in Figure 1.The five features with the highest mean absolute SHAP values were peak mitral A-wave velocity (MitralAVel), HDL-C, age, heart rate, and left atrial volume (LAV), three of which correspond with the five most important features in random forest (Figure 1 and Figure S1).Corresponding violin and beeswarm plots are visualized in Figure 2 and Figure S2, respectively.Force plots are visualized in Supplementary Materials: Figures S3 and S4.The top five predictive features in AF detection on an ILR are heart rate, estimated glomerular filtration rate (eGFR), age, height, and HDL-C, before optimizing the number of features retained (Supplementary Materials: Figure S1).

Feature Importance via SHAP in SVM
The mean absolute SHAP values of features were obtained from our best performing model-SVM with random oversampling only and with hyperparameter tuning-of which the top 20 features are displayed in Figure 1.The five features with the highest mean absolute SHAP values were peak mitral A-wave velocity (MitralAVel), HDL-C, age, heart rate, and left atrial volume (LAV), three of which correspond with the five most important features in random forest (Figures 1 and S1).Corresponding violin and beeswarm plots are visualized in Figures 2 and S2, respectively.Force plots are visualized in Supplementary Materials: Figures S3 and S4.S2.

Feature Selection Using SVM, Random Forest, and XGBoost
Feature selection using the SVM model resampled with SMOTE generated an AUC of 0.676-0.676,sensitivity of 0.200-0.200,and specificity of 0.868-0.868and retained 50 features (Table 3).The top five most important features were eGFR, sex, height, creatinine levels, and laterality of stroke (left) (Figure 3a).Feature selection using random forest resampled with random undersampling generated an AUC of 0.529, sensitivity of 0.300, and specificity of 0.816 and retained 70 features (Table 3).The top five most important features were sex, peak mitral A-wave velocity, HDL-C, admitting heart rate, and triglyceride levels (Figure 3b).Feature selection using XGBoost resampled with random undersampling generated an AUC of 0.650, sensitivity of 0.700, and specificity of 0.526, while  S2.

Feature Selection Using SVM, Random Forest, and XGBoost
Feature selection using the SVM model resampled with SMOTE generated an AUC of 0.676-0.676,sensitivity of 0.200-0.200,and specificity of 0.868-0.868and retained 50 features (Table 3).The top five most important features were eGFR, sex, height, creatinine levels, and laterality of stroke (left) (Figure 3a).Feature selection using random forest resampled with random undersampling generated an AUC of 0.529, sensitivity of 0.300, and specificity of 0.816 and retained 70 features (Table 3).The top five most important features were sex, peak mitral A-wave velocity, HDL-C, admitting heart rate, and triglyceride levels (Figure 3b).Feature selection using XGBoost resampled with random undersampling generated an AUC of 0.650, sensitivity of 0.700, and specificity of 0.526, while retaining retaining 18 features (Table 3).The top five most important features were height, left atrial diameter, peak mitral A-wave velocity, BMI, and admitting heart rate (Figure 3c).Abbreviations: SVM, support vector machine; SMOTE, synthetic minority oversampling technique; XGBoost, extreme gradient boosting.Abbreviations of all features can be found in Supplementary Materials: Table S2.

Discussion
In this study, we have presented a series of ML models to identify variables that may be important in the prediction of AF occurrence through ILR monitoring to aid decisionmaking for ILR implantation in ESUS.Our SVM model performed the best in this prediction (AUC = 0.736-0.737)when compared to XGBoost, random forest, and MLP.
To our knowledge, there has been little research specifically regarding the detection of pAF after ESUS using ML algorithms.Our study is one of the few to incorporate an extensive list of pre-reported echocardiogram results with clinical and biochemical data into the predictive modeling of AF on ILRs.
The C 2 HEST score was developed using a large French nationwide cohort to predict AF after ischemic stroke, comprising of coronary artery disease, chronic obstructive pulmonary disease, hypertension, advanced age, systolic heart failure, and thyroid disease [18].Other scoring systems were previously developed using traditional statistics to predict AF specifically after cryptogenic stroke, namely the Brown ESUS-AF [19], HAVOC [20], AS5F [21], and AF-ESUS scores [22].The Brown ESUS-AF score was developed using multivariable logistic regression, comprising age and left atrial enlargement, which predicted AF using cardiac monitoring in ESUS [19].The HAVOC score stratified patients into low-, medium-, and high-risk groups and evaluated the probability of AF detection after ESUS in each group using seven clinical variables, namely age, obesity, congestive heart failure, hypertension, coronary artery disease, peripheral vascular disease, and valve disease [20].The AS5F score comprised age and stroke severity [21].The AF-ESUS score assigned positive weights for age, hypertension, left atrial diameter > 40 mm, and supraventricular extrasystole and negative weights for left ventricular hypertrophy, left ventricular ejection fraction < 35%, subcortical infarct, and non-stenotic carotid plaques [22].
Comparing random forest's feature importance to the feature importance via mean absolute SHAP values using SVM, among the ten most important features in each of the two models, six were identical (admitting heart rate, eGFR, age, HDL-C, peak mitral Awave velocity, and systolic blood pressure).Among the top five most important features in each of the two models, three were identical (age, HDL-C, and admitting heart rate) (Figures 1 and S1).SHAP values provide an interpretable approximation of the original model with a unique additive measure that adheres to three desirable properties and defines simplified inputs using conditional expectations [17].Amongst all the aforementioned clinical scores and features of importance in our ML models, age was identified as an important predictor of pAF detection in ESUS.This corroborates the results of past studies that found age to be a predictor of AF in ESUS [23] and age above 60 years old to be a robust indicator of occult AF after cryptogenic stroke [24].The mechanism could be attributed to age-related atrial myocardial electrical and structural remodeling [25].Thus, this reinstates the usefulness of age to stratify patients who require extended cardiac monitoring in ESUS.Mitral A-wave velocity is an echocardiography parameter that assesses blood flow through the mitral valve due to atrial contraction.While it is poorly understood whether transmitral inflow waves and filling can be used to predict AF, studies have reported that patients with progression to permanent AF had lower peak A velocity than those without progression [26].In our study, peak mitral A-wave velocity was identified as an important variable in feature selection using random forest and XGBoost (Figure 3b,c).However, no association was found on traditional descriptive statistics.
Left atrial volume was identified as important in our ML feature importance assessment, which concurs with a past study that found left atrial volume to be significantly higher in the group with AF detected via ILR in unexplained stroke [27].Similarly, left atrial enlargement and left atrial diameter are also variables included in the Brown ESUS-AF score and AF-ESUS score, respectively.
Estimated glomerular filtration rate (eGFR) was the second most important feature in our best-performing random forest model and the most important feature in SVM with feature selection (Supplementary Materials: Figures S1 and S3).An explanation is that AF is a prothrombotic state that causes microthrombi of the renal vasculature.This reduces renal perfusion and results in renal ischemia, causing kidney impairment reflected as reduced eGFR [28].The converse process may be plausible as well.Initial renin-angiotensin-aldosterone system (RAAS) activation in patients with renal impairment may itself be arrhythmogenic, where oxidative stress mediates changes in cellular ion channels [29], resulting in paroxysmal AF.Furthermore, the RAAS has been found to be closely connected to AF development through inflammation and structural and electrical cardiac remodeling [30].Sex was identified as the most important and second most important feature in the random forest and SVM models, respectively.It is possible that the attributable proportions of etiologies of ESUS may differ between genders.In males, the size, composition, and morphology of carotid atherosclerotic plaques were found to be more pronounced than in females [31]; thus, ESUS etiology in males may be more greatly attributed to non-stenotic atherosclerotic plaques than paroxysmal AF.In the AF-ESUS score, variables with negative weights assigned were more predictive of an absence of new AF detection.Similarly, our ML models were able to identify an ordered list of features that are predictive of absence of AF detection on an ILR, such as eGFR, height, and systolic blood pressure (Figures 2 and 3a).
An extensive 48-country survey of stroke units found prolonged cardiac monitoring not to be a routine workup for cryptogenic stroke even in high-income nations [10].With ESUS patients having a notable risk of stroke recurrence of 4-5% yearly [22], optimizing resource allocation for patients who require prolonged cardiac monitoring after ESUS remains important to reduce the high costs associated with implementing prolonged cardiac monitoring.Our study provides an ML approach to aid decision-making for prolonged cardiac monitoring in ESUS patients and provides future studies with a supplemental list of variables for evaluation.

Strengths and Limitations
Machine learning models have been used to predict outcomes of stroke.However, reporting standards have been suboptimal, such as the exclusion of hyperparameter selection reporting, lack of clear reporting regarding the handling of imbalanced datasets, and the absence of feature selection [32].In our study, we have presented four distinct ML models with data imbalance addressed and hyperparameters tuned and with feature selection for selected models.Multiple resampling techniques were evaluated to handle the imbalanced target class, including random oversampling, random undersampling, SMOTE, and a combination of these, which facilitated the development of the best performing model for each distinctive ML type.The directional relationship of variables with outcomes should be observed using our violin plots of SHAP values generated with our best performing SVM model and the SVM feature selection plot (Figures 2 and 3a) instead of our random forest feature importance plot as the latter was unable to discern direction of outcome prediction.Thus, absolute predictive ability should not be confused with directional predictability.Internal validation was performed through performance evaluation on unseen validation (test) data.Generalizability should be further evaluated in the future.The performance of our MLP model should be interpreted with discretion as it is a neural network algorithm limited by a relatively small patient cohort.In our study, many features were used to predict the presence of AF on ILRs, compared to several used in existing clinical scores.In the pre-processing stage, imputation of missing values of categorical features may be explored using multiple imputations to compare with the current imputation by mode.Future studies should further explore discordances between important features identified via descriptive statistics and traditional logistic regression compared to machine learning modalities in larger international ESUS datasets.

Conclusions
Machine learning modeling incorporating clinical and echocardiographic variables predicted AF in ESUS patients with moderate accuracy.

Figure 1 .
Figure 1.Mean absolute SHAP values of features in best-performing SVM model (top 20 features displayed).Abbreviations: SHAP, Shapley additive explanation; SVM, support vector machine.Abbreviations of all features can be found in Supplementary Materials: TableS2.

Figure 2 .
Figure 2. Violin plot of best-performing SVM model (top 20 features displayed).Abbreviations: SVM, support vector machine.Abbreviations of all features can be found in Supplementary Materials: TableS2.

Figure 2 .
Figure 2. Violin plot of best-performing SVM model (top 20 features displayed).Abbreviations: SVM, support vector machine.Abbreviations of all features can be found in Supplementary Materials: TableS2.

Table 1 .
Characteristics of patients with ILR implantation after ESUS, with comparison between AF detected on ILR and AF not detected on ILR.

Table 2 .
Characteristics of patients with ILR implantation after ESUS, with comparison between AF detected on ILR and AF not detected on ILR (echocardiography parameters).