Outcome Analysis in Elective Electrical Cardioversion of Atrial Fibrillation Patients: Development and Validation of a Machine Learning Prognostic Model

Background: The integrated approach to electrical cardioversion (EC) in atrial fibrillation (AF) is complex; candidates can resolve spontaneously while waiting for EC, and post-cardioversion recurrence is high. Thus, it is especially interesting to avoid the programming of EC in patients who would restore sinus rhythm (SR) spontaneously or present early recurrence. We have analyzed the whole elective EC of the AF process using machine-learning (ML) in order to enable a more realistic and detailed simulation of the patient flow for decision making purposes. Methods: The dataset consisted of electronic health records (EHRs) from 429 consecutive AF patients referred for EC. For analysis of the patient outcome, we considered five pathways according to restoring and maintaining SR: (i) spontaneous SR restoration, (ii) pharmacologic-cardioversion, (iii) direct-current cardioversion, (iv) 6-month AF recurrence, and (v) 6-month rhythm control. We applied ML classifiers for predicting outcomes at each pathway and compared them with the CHA2DS2-VASc and HATCH scores. Results: With the exception of pathway (iii), all ML models achieved improvements in comparison with CHA2DS2-VASc or HATCH scores (p < 0.01). Compared to the most competitive score, the area under the ROC curve (AUC-ROC) was: 0.80 vs. 0.66 for predicting (i); 0.71 vs. 0.55 for (ii); 0.64 vs. 0.52 for (iv); and 0.66 vs. 0.51 for (v). For a threshold considered optimal, the empirical net reclassification index was: +7.8%, +47.2%, +28.2%, and +34.3% in favor of our ML models for predicting outcomes for pathways (i), (ii), (iv), and (v), respectively. As an example tool of generalizability of ML models, we deployed our algorithms in an open-source calculator, where the model would personalize predictions. Conclusions: An ML model improves the accuracy of restoring and maintaining SR predictions over current discriminators. The proposed approach enables a detailed simulation of the patient flow through personalized predictions.


Introduction
Restoring and maintaining sinus rhythm (SR) is an integral part of the atrial fibrillation (AF) process. Electrical cardioversion (EC) quickly and effectively converts AF to SR and can be performed safely for patients with AF of ≥48 h or unknown duration when anticoagulation with vitamin-K antagonists, a factor Xa inhibitor, or a direct thrombin inhibitor is used for at least 3 weeks before and at least 4 weeks after EC [1][2][3].
Although EC restores SR in around 80% of patients, the rate of recurrence is high-around 60% in the coming months [4][5][6][7][8][9][10]-even under antiarrhythmic drugs [11][12][13][14][15][16]. Furthermore, it has been described that up to 60% of patients with recent-onset AF and candidates for rhythm control resolve spontaneously even while waiting for scheduled EC [17]. Thus, the integrated approach to elective EC is complex, and it is especially interesting to identify potential predictors of recurrence post-cardioversion, in order to avoid unnecessary drugs or procedures that could involve risks and costs in addition to avoiding the programming of EC or the use of drugs in patients who would restore the SR spontaneously. For this purpose, traditional clinical models have been previously proposed [18][19][20][21][22][23][24], although their use and utility in clinical practice is unclear due to the complexity of AF management.
Interest in machine-learning (ML) in electrophysiology is increasing in order to enhance automatic clinical workflows and increase efficiency [25]. Although ML is starting to be widely applied in arrhythmia [26], examples regarding the whole process workflow for a clinician to make better decisions are scarce [27]. In this study, we used ML to move the EC of AF process management a notch ahead. AF patients go through different pathways: from the diagnosis of the AF and prescription of anticoagulation and antiarrhythmic drugs to the post-cardioversion medical follow-up. We analyzed the whole elective EC of the AF process using ML algorithms, in order to enable a more realistic and detailed simulation of the patient flow for decision making purposes. This study followed the TRIPOD guidelines for reporting the development and validation of prognostic models [28], see Appendix A. Figure 1 summarizes the phases we followed to build our ML models: preparation of the model, model training, and model evaluation. The models were developed in Python and the implementation of the classification algorithms was performed using the open code libraries scikit-learn and xgboost [29].

Task Definition and Clinical Pathways of Patients
The aim of our study was to automatically enhance the process of scheduled EC in AF by incorporating ML in all pathways of the process to predict success. In pursuing a rhythm-control strategy, patients scheduled for planned EC followed a process that is summarized in Figure 2, where the different outcomes at each pathway have been highlighted. Importantly, management options in hemodynamically stable patients with AF >48 h in our hospital do not follow the strategy of treatment guided by transesophageal echocardiography findings [1,2].
Given the outcomes at each pathway, we aimed to build an ML model for each circumstance: (i) spontaneous SR restoration, predicting the conversion to SR in the pre-scheduled EC period for non-antiarrhythmics-treated patients; (ii) pharmacologic cardioversion, predicting the conversion to SR in the pre-scheduled EC period for antiarrhythmics-treated patients; (iii) direct-current cardioversion, predicting the efficacy of direct-current shock application; (iv) AF recurrence, predicting the AF recurrence at the 6-month follow-up for those patients who underwent SR restoration spontaneously, by pharmacologic or directcurrent cardioversion; and (v) rhythm control, predicting the overall 6-month follow-up maintenance in SR from the moment EC was scheduled. Figure 1. Overview of the phases followed to build and evaluate the machine-learning models.

Task Definition and Clinical Pathways of Patients
The aim of our study was to automatically enhance the process of scheduled EC in AF by incorporating ML in all pathways of the process to predict success. In pursuing a rhythm-control strategy, patients scheduled for planned EC followed a process that is summarized in Figure 2, where the different outcomes at each pathway have been highlighted. Importantly, management options in hemodynamically stable patients with AF >48 h in our hospital do not follow the strategy of treatment guided by transesophageal echocardiography findings [1,2].

Study Population
From April 2014 to January 2019, a registry of 429 consecutive patients scheduled for planned EC in the tertiary referral university hospital of Salamanca were included in the analysis. EC in our center relies on the application of direct-current biphasic waveform shock, with a fixed energy of 150 J with a progressive energy level of 200 J, via two anteroposterior (parasternal and left infrascapular) electrodes. Patients undergo the procedure in our Cardiology Day Hospital by trained personnel, usually under propofol sedation, and remain under observation for at least 3 h before discharge, where an ECG is performed to check the heart rhythm [30]. For all the patients, a visit to the outpatient clinic was scheduled at 6-months, where a second ECG was also performed. Implantable loop recorders or Holter ECGs were not used either before or after the scheduled cardioversion.

Data Collection and Preparation
The ML models were trained and validated with the use of the patient charts stored in electronic health records (EHRs). Input data (features) consisted of patient demographics, cardiovascular risk factors, cardiovascular history, comorbidities, clinical and biochemical variables, atrial fibrillation classification, echocardiographic findings, medical treatment, and direct-current shock variables. As for the corresponding outcomes, we labeled the presence of SR in 4 of the analyzed pathways (spontaneous restoration of SR, pharmacologic cardioversion, direct-current cardioversion, and 6-month rhythm control) and the presence of AF for the 6-month AF recurrence. All EHRs were reviewed by a single investigator who classified the type of AF according to the current guidelines in paroxysmal AF, persistent AF, and long-standing persistent AF [1,2].
We preprocessed our EHR raw data as a set of features to be usable by ML classifiers and a set of labels to classify the different outcomes for each of the patients. For this purpose, multicategory variables were one-hot encoded in binary variables. Missing data were imputed using the average of the rest of the dataset for continuous variables and the median for categorical variables. Weight and height were imputed according to gender specific averages, and if only weight was missing, BMI was imputed first, then weight was obtained from BMI and height. The value of tricuspid regurgitant jet velocity was imputed using the average of likewise severity of tricuspid regurgitation patients in the dataset. The dataset was divided then into a training dataset consisting of 316 patients that attended before 1 January 2018 and a testing dataset consisting of 113 patients that attended afterwards.  Patients scheduled for planned electrical cardioversion flow diagram where the different outcomes at each pathway are highlighted. The different machine-learning models were then built for each of these 5 different circumstances: (i) spontaneous sinus rhythm restoration (conversion to sinus rhythm in the pre-scheduled electrical cardioversion period for non-antiarrhythmics-treated patients); (ii) pharmacologic cardioversion (conversion to sinus rhythm in the pre-scheduled electrical cardioversion period for antiarrhythmics-treated patients); (iii) direct-current cardioversion (conversion to sinus rhythm after direct-current shock application); (iv) atrial fibrillation recurrence (atrial fibrillation recurrence at 6-month follow-up for those patients who underwent sinus rhythm restoration spontaneously, by pharmacologic or direct-current cardioversion); and (v) rhythm control (maintenance in sinus rhythm at 6-month follow-up).
Given the outcomes at each pathway, we aimed to build an ML model for each circumstance: (i) spontaneous SR restoration, predicting the conversion to SR in the prescheduled EC period for non-antiarrhythmics-treated patients; (ii) pharmacologic cardioversion, predicting the conversion to SR in the pre-scheduled EC period for antiarrhythmics-treated patients; (iii) direct-current cardioversion, predicting the efficacy of directcurrent shock application; (iv) AF recurrence, predicting the AF recurrence at the 6-month Patients scheduled for planned electrical cardioversion flow diagram where the different outcomes at each pathway are highlighted. The different machine-learning models were then built for each of these 5 different circumstances: (i) spontaneous sinus rhythm restoration (conversion to sinus rhythm in the pre-scheduled electrical cardioversion period for non-antiarrhythmics-treated patients); (ii) pharmacologic cardioversion (conversion to sinus rhythm in the pre-scheduled electrical cardioversion period for antiarrhythmics-treated patients); (iii) direct-current cardioversion (conversion to sinus rhythm after direct-current shock application); (iv) atrial fibrillation recurrence (atrial fibrillation recurrence at 6-month follow-up for those patients who underwent sinus rhythm restoration spontaneously, by pharmacologic or direct-current cardioversion); and (v) rhythm control (maintenance in sinus rhythm at 6-month follow-up).

Machine Learning Classifiers
The goal of the training phase was to produce a working ML model that accepted data from any new patient (formatted in the same way as our processed dataset) and classified it. We applied and compared the performance of the following state-of-the-art ML classifiers: logistic regression with a regularization term, random forest, extremely randomized trees, and boosted trees [26].

Hyperparameter Tuning
Model hyperparameters are the properties that govern the behavior of the classification algorithm, i.e., the number of branches in a boosted trees algorithm. Tuning these parameters may improve the performance of the ML models and was consequently conducted in our pipeline.
To determine the best performing hyperparameters without using the testing dataset, a stratified cross-validation scheme was used. We performed a 10-fold cross-validation methodology to randomly split the training dataset into 10 equally sized parts (folds), with equal distribution of positive and negative cases. Nine folds were used to train the algorithms with different combinations of hyperparameters, and the remaining one was used as a test dataset for evaluating the models. We used these predictions to choose the best hyperparameters for each classification algorithm (Table 1). The models were evaluated on the test dataset. Additionally, internal validation was performed using the training dataset only. This internal validation consisted of a stratified 10-fold cross-validation with 10 repetitions. Since the training of the model also contained a hyperparameter tuning step with its own cross-validation scheme, this resulted in nested cross-validations [31]. The information from this internal validation was used to transform the models into hard classifiers to be used in clinical practice by choosing a probability cutoff threshold that translated continuous probability predictions into distinct clinical decisions.
In both the internal and external validation, the receiver-operating-characteristic (ROC) and the Precision-Recall (PR) curve analysis were used to assess the predictive capacity of the ML models at each clinical pathway [32]. The classification performance of the model at a particular cutoff threshold was evaluated according to its sensitivity (recall), specificity, positive predictive value (precision), and negative predictive value. Confidence intervals were calculated for both the external validation results [33] and the internal validation results. The latter ones were calculated using a t-statistic based on the fold results, corrected for the correlation between fold samples [34,35].

Comparison with Standard Successful Cardioversion Risk Scores
We further compared the performance of the developed ML algorithms to existing predictive multivariate logistic regression models: CHA2DS2-VASc [23] and HATCH [36,37] scores. For this comparison, we evaluated the existing scores directly on our dataset, essentially performing an external validation of the prediction rules. In order not to give the ML models an unfair advantage, we further refitted the scores with beta coefficients in our study population for the different pathways' outcomes. In addition, we estimated the Net Reclassification Index (NRI) of the ML models with respect to the existing scores, calculated at the optimum cutoff threshold for the score [38]. This index was the difference of the sum of sensitivity and specificity between two classifiers.

Feature Analysis
The differences in data variables between event and non-event patient groups in each AF pathway were compared using χ 2 or Fisher tests for categorical variables and Student's t-test or ANOVA for continuous variables.
We further computed feature importance for the models by measuring how the area under the ROC curve (AUC ROC) decreased when a feature was not available through the method known as permutation importance or mean decrease accuracy (MDA) [39]. The method consisted of replacing each feature in the test dataset with random noise-feature column and measuring the performance for the ML model. The weight of the feature with positive impact in the predictive model was scaled to 1. This method was chosen because it is classification algorithm-agnostic and offers an intuitive idea of what happens when some part of the data of a given subject is missing and is substituted by a random value distributed according to the rest of the population.

Open-Source Software
The developed code used to train and evaluate the models can be consulted as open-source at https://github.com/IA-Cardiologia-husa/Cardioversion, (accessed on 7 April 2022) [40]. We deployed our ML classifiers in an online open-source calculator that can be run on any Google Drive account, as an example tool for prospective external validation of ML models from a small imbalanced sample size. The calculator chained all ML models to provide personalized outcome predictions.

Characteristics and Flow of the Study Population
The characteristics of the study population are shown in Table 2. The EHRA classification of atrial fibrillation symptoms was not widely described in the EHRs, and it was not provided. Figure 2 presents the movement of patients through the elective EC of the AF process. The presence of SR on the ECG recorded at the end of the scheduled EC visit occurred in 374 (87.2%) of the 429 patients included in the study: in 52 (20.6%) of 252 non-antiarrhythmics-treated patients, conversion to SR occurred spontaneously in the pre-scheduled EC period; in 35 (19.8%) of 177 antiarrhythmics-treated patients, pharmacologic cardioversion occurred in the pre-scheduled EC period; and of the 342 patients still in AF at the scheduled EC visit, 287 (83.9%) converted to SR after direct-current shock application. Among the 374 patients in SR after the scheduled-EC visit, a recurrence of AF occurred in 145 (38.8%) patients on the ECG recorded at the 6-month visit. Thus, final successful rhythm control at 6 months was achieved in 229 (53.4%) of the 429 patients initially included in the study. Table 2. Baseline characteristics of the study cohort. List of continuous and categorical data input of the patients used for ML model development. Continuous variables are expressed as mean ± standard deviation and categorical as n (%). Reference ranges for LVEF were considered normal greater than 50%, mild dysfunction from 49 to 40%, moderate dysfunction from 39 to 30%, and severe dysfunction less than 30%. Paroxysmal AF was defined as AF with episodes recurring with variable frequency; persistent AF was defined as continuous AF that is sustained >7 days; long-standing persistent AF was defined as continuous AF >12 months in duration. Reference ranges for LA volume index were considered normal <35 mL/m 2 , mildly dilated from 35 to 41 mL/m 2 , moderately dilated from 42 to 48 mL/m 2 , and severely dilated >48 mL/m 2 [41].

Comparison of Prediction Models for Each Pathway
The prediction accuracy of the different models under consideration evaluated at each clinical pathway is shown in Tables 3 and 4 for the cross-validation with training data and the evaluation with testing data, respectively. We used both the CHA2DS2-VASc and HATCH risk scores as baseline models for performance evaluation. With the exception of the direct-current cardioversion pathway, all the standard ML models achieved statistically significant improvements compared to the baseline CHA2DS2-VASc or HATCH scores (p < 0.01).
The best overall ML classifier algorithm in the internal validation was extremely randomized trees with an AUC ROC of 0.81 for spontaneous SR restoration, 0.68 for pharmacological cardioversion, 0.47 for direct-current cardioversion, 0.67 for 6-month AF recurrence, and 0.69 for overall 6-month rhythm control, and it was chosen as the classifier algorithm to be used with the test set and the online open-source calculator.
In order to better assess the clinical significance of these results, we compared the classification performance in the 113 patients test set of the ML model with the CHA2DS2-VASc and HATCH risk scores, operating at an optimal threshold that was selected based on the ROC and PR curves.
For the spontaneous restoration of SR or pharmacologic cardioversion, the ML model classified 35 patients as likely to return to SR before the scheduled direct-current shock application ( Figure 3A). Of those, 16 returned to SR before the direct-current shock application (46% precision); meanwhile, of the 78 remaining patients, 68 stayed in AF (87% negative predictive value). For the direct-current cardioversion ( Figure 3B), the ML model classified 83 patients as likely to be in SR after the electric shock, and 4 with likely to remain in AF; 73/83 of the likely to be in the SR group returned to SR (88% precision), and so did 2/4 of the likely to remain in the AF group (50% negative predictive value). For the recurrence of AF ( Figure 3C), out of the 101 patients that returned to SR, the ML model grouped them as 30 likely to have a recurrence within 6 months and 71 not likely to have a recurrence. From the likely to have a recurrence group, 16/30 did (53% precision); meanwhile, from the not likely to have a recurrence group, 47/71 stayed in SR (66% negative predictive value). Finally, for the overall success of rhythm control at 6 months ( Figure 3D), the ML categorized the 113 patients into 69 patients likely to be successful and 44 patients not likely to be successful. In the likely to be successful group, 45/69 were in SR (65% precision); meanwhile in the not likely to be successful group, 28/44 were in AF (64% negative predictive value).
Compared to the most competitive existing score (Table 5), the ML model classified correctly three more patients in the positive class and six less in the negative class for predicting spontaneous SR restoration, for an NRI of +5.9% in favor of the ML model; it classified correctly 2 less patients in the positive class and 22 more patients in the negative class for predicting pharmacologic cardioversion, for an NRI of +38.8% in favor of the ML model; it classified correctly 12 more patients in the positive class and 2 less in the negative class for predicting direct-current cardioversion, for an NRI of −0.6% favoring the HATCH score; it predicted correctly two patients more in the positive class and six patients more in the negative class for predicting 6-month AF recurrence, for an NRI of +14.8% in favor of the ML model; and finally, it predicted four more patients in the positive class and eight more in the negative class for predicting the overall 6-month rhythm control, for an NRI of +22.1% in favor of the ML model. Table 3. Performance of all prediction models at each clinical pathway in the cross-validation of the training data, measured in terms of the area under the ROC curve (AUC ROC) and area under the precision-recall curve (AUC PR). Both the CHA2DS2-VASc and HATCH risk scores were used as baseline models for the performance evaluation of each machine-learning developed model.  (65% precision); meanwhile in the not likely to be successful group, 28/44 were in AF (64% negative predictive value). Compared to the most competitive existing score (Table 5), the ML model classified correctly three more patients in the positive class and six less in the negative class for predicting spontaneous SR restoration, for an NRI of +5.9% in favor of the ML model; it classified correctly 2 less patients in the positive class and 22 more patients in the negative class for predicting pharmacologic cardioversion, for an NRI of +38.8% in favor of the ML model; it classified correctly 12 more patients in the positive class and 2 less in the negative class for predicting direct-current cardioversion, for an NRI of −0.6% favoring the HATCH score; it predicted correctly two patients more in the positive class and six patients more in the negative class for predicting 6-month AF recurrence, for an NRI of +14.8% in favor of the ML model; and finally, it predicted four more patients in the positive class and eight more in the negative class for predicting the overall 6-month rhythm control, for an NRI of +22.1% in favor of the ML model. Table 5. Classification analysis. The classification performance of the CHA2DS2-VASc and HATCH risk scores and the best performance machine-learning model were calculated for each electric cardioversion pathway. The net increase performance (number of patients and percentage) and net reclassification index were provided when utilizing the developed machine-learning model. The most competitive existing risk score, either CHA2DS2-VASc or HATCH, was used as the baseline model for the performance evaluation of the machine-learning developed model at each pathway.  Panel (A) represents predictions (blue background) to undergo spontaneous restoration of SR or pharmacological cardioversion (CV) and ground truth findings for each patient (yellow or red). Panel (B) represents predictions (blue background) of efficacy of direct-current shock application and ground truth findings for each dataset patient (yellow or red). Panel (C) represents predictions (blue background) of AF recurrence at 6 months after SR restoration and ground truth findings for each patient (yellow or red). Panel (D) represents predictions (blue background) of SR control at 6 months and ground truth findings for each patient (yellow or red). Table 5. Classification analysis. The classification performance of the CHA2DS2-VASc and HATCH risk scores and the best performance machine-learning model were calculated for each electric cardioversion pathway. The net increase performance (number of patients and percentage) and net reclassification index were provided when utilizing the developed machine-learning model. The most competitive existing risk score, either CHA2DS2-VASc or HATCH, was used as the baseline model for the performance evaluation of the machine-learning developed model at each pathway.  Table 6 shows the five most important variables, along with their importance scores, ranked according to their contribution to the predictions of the ML model at each pathway. Variables related to paroxysmal AF classification and left atrial dilatation appeared to be more important for the predictions than traditional cardiovascular risk factors and age included in the CHA2DS2-VASc or HATCH scores. Paroxysmal AF was on the list of top predictors of three pathways: spontaneous SR restoration, pharmacologic cardioversion, and 6-month rhythm control, while left atrial dilatation was among the most important risk factors of four pathways: spontaneous SR restoration, direct-current cardioversion, AF recurrence, and 6-month rhythm control. Table 6. Feature importance. Variable ranking by their contribution to the predictions of the extremely randomized tree model at each pathway. The score represents the relative importance of that variable for the machine-learning model. The weight of the features is scaled from 0 to 1; thus, variables close to 1 show a higher impact on the predictive model.

Pathway
Variable Score

Machine-Learning Models Deployment in a Calculator
We deployed our machine-learning algorithms in a calculator where you can input the information of the 18 features that were found as main predictors for each clinical pathway and see the individual prediction for the concrete outcome. These features are weight, height, time of AF onset, LA volume, mitral regurgitation, LVEF, NYHA functional class, tobacco smoking history, previous direct-current shock application attempt, previous transient ischemic attack or stroke, history of heart failure, history of anticoagulation, pulmonary disease including sleep apnea, impaired physical mobility, beta blockers, ACE inhibitors/Angiotensin receptor blockers, and type of anticoagulation. For the prediction of AF recurrence, additional features, such as the type of cardioversion (spontaneous, pharmacologic, or direct current), antiarrhythmic prescription, and creatinine clearance, were required. The calculator is available at https://colab.research.google.com/drive/ 1TbHf9waHNQYHQJhu5M9iqnpO5AESGDO5, accessed on 7 April 2022.

Discussion
To our knowledge, this is the first ML analysis of the whole elective EC of the AF process. In a consecutive and well-characterized cohort, we were able to concatenate different ML algorithms to establish predictions at each different clinical pathway observed throughout the EC process. Our ML prediction models were superior to the classical existing scores, CHA2DS2-VASc and HATCH. Taking into account the difficulty of using ML algorithms in clinical practice, we further integrated them into a simple open-source calculator where predictions are easy to calculate and understand.
Other investigators have used the ML methodology to predict particular pathways of the EC of the AF process, such as Oto et al. studying the successful cardioversion workflow for patients who perform pharmacologic cardioversion after 48 h of flecainide treatment [42] or Sterling et al. predicting successful cardioversion using ECG variables [21]. However, we would like to highlight that the analysis presented here is more integral than the previous ones, that we have considered ML models with nonlinear interactions between the variables, and that we have been more thorough with the feature selection phase to avoid pitfalls in our evaluation phase.
In our study, the performance of the resulting ML models ranged from a very good classification for the spontaneous restoration of SR and the pharmacological cardioversion models; a reasonable classification for the AF recurrence and successful cardioversion model; to a not statistically significantly better than random guessing classification for the direct-current cardioversion model. Differences in performance between models with different ML classification algorithms in each of the workflows were not statistically significant, and we reported the best algorithm result among them. The results from the development were consistent with the ones from the validation.
The classical existing predictive multivariate logistic regression scores, CHA2DS2-VASc and HATCH, are currently considered the cornerstone for the management of AF. Although the CHA2DS2-VASc score is the basis for the management of anticoagulation therapy [1,2], some studies suggest that it has predictive value for AF recurrence after cardioversion. In a pooled meta-analysis collecting data of 2889 patients, the CHA2DS2-VASc score was an independent predictor of early recurrence of AF after pharmacologic or EC [23]. In addition, the HATCH score, initially described to predict progression from paroxysmal to persistent AF [36], has also been shown to be useful in predicting the short-term success of EC [37]. The herein developed ML models have been consistently better than the CHA2DS2-VASc and HATCH scores at predicting the different outcomes in our cohort of patients. We must acknowledge that the existing scores are facing an external validation, meanwhile ML models are evaluated internally, and the possible selection bias or other existing biases in our dataset will play in favor of the ML models. However, the results of both existing scores were poor and unlikely to be useful in clinical practice; meanwhile, the results of the ML models were optimal in the independent dataset, corresponding to 113 patients, where they were validated. An external validation should be performed to confirm that these results that could be improved by using additional datasets for the continuous development of the models. In particular, a bigger dataset or the inclusion of different variables, such as ECG variables, might help in discovering additional interactions between features with predictive properties. This is the main reason why open-source of the developed algorithms can be accessed through this publication, with the aim of improving their predictions with the addition of new patient cohorts and new variables.
The clinical application of the ML prediction models is relevant. Using the developed open-source calculator, we could make individual predictions for each AF patient for whom EC is a therapy option and optimize the procedure (i.e., adding pre-cardioversion antiarrhythmic drugs) in cases with a low likelihood of having a successful cardioversion, or we would prioritize the waiting list for those patients for whom the EC is estimated to be successful over time. The use of the developed open-source calculator is simple and can facilitate the implementation of other elective EC models where nursing plays a predominant role [43].
Finally, this study has several limitations. The results of the ML models in the five AF workflows have been uneven. In particular, we have not been able to predict success of the application of direct current cardioversion. The results of the recurrence of AF, and subsequently the long-term success of the cardioversion process, were also moderate. We must acknowledge that we are working with a single-hospital dataset and that it would have been desirable to have a different population to validate the ML models. Nevertheless, we consider it enough to showcase the possibilities of applying ML techniques in a clinical workflow and want to emphasize the effort made to offer a rigorous evaluation, performing nested cross-validation steps for selecting features and hyperparameters to report an accurate measure of the performance. A greater number of patients would have allowed for the development of more precise models and to study in more detail the relationship between variables and outcomes. However, the sample size and number of events were enough to perform a proper evaluation of the ML and is reflected in that we were able to ascertain statistically significant differences in performance between ML models and risk scores ( Figure 4). We encourage researchers with larger databases to use the provided code as a basis to build more refined models.    Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and the outcome. Table 2 13c V For validation, show a comparison with the development data of the distribution of important variables (demographics, predictors, and outcome). Tables 2  and 6 Model development 14a D Specify the number of participants and outcome events in each analysis. Figure 2 14b D If done, report the unadjusted association between each candidate predictor and outcome. NA

Model specification 15a D
Present the full prediction model to allow predictions for individuals (i.e., all regression coefficients and model intercept or baseline survival at a given time point