1. Introduction
Atrial fibrillation (AF) is the most common cardiac arrhythmia diagnosed in clinical practice, with an estimated prevalence of about 1–2% of the general population and above 10% in the elderly [
1]. Moreover, its prevalence is likely to double in the next 50 years as the population ages [
2]. At least 15% of the budget in cardiac disease healthcare is earmarked to AF [
3,
4]. This disease is associated with an increased risk of stroke and congestive heart failure, so that AF patients have twice the risk of death as compared with healthy persons [
5]. These facts make AF a major public health challenge, and its medical and economic aspects could worsen in the future [
2]. From an electrophysiological viewpoint, AF is characterized by rapid and chaotic contractions of the atria, originating in disorganized atrial electrical activation [
6]. As with many other arrhythmias, AF may require therapeutic intervention, even in patients who suffer no subjective discomfort [
7]. Since Haissaguerre et al. reported the paramount relevance of the pulmonary veins (PV) in the initiation and maintenance of AF [
8], the procedure of catheter ablation (CA) targeting PV foci, namely pulmonary vein isolation, has become such an effective therapy for AF [
9], it is considered the first-line therapeutical alternative to pharmacological treatments.
Despite its high prevalence, the physiological mechanisms underlying AF are still pending to be completely understood, and the present therapeutic approaches to AF have major limitations [
10]. During recent years, many efforts have addressed personalizing CA treatments by mapping the atrial electrophysiological substrate [
11] and have been introduced with the aim to identify the arrhythmogenic atrial sites responsible for AF generation [
12]. In this respect, atrial substrate characterization has been one of the most recent approaches aimed at reducing the limited clinical efficacy of the current therapeutic intervention, due to the major knowledge gaps in the mechanisms for sustaining AF [
13].
In clinical practice, one of the goals for the characterization of the atrial substrate aims at discerning patients with paroxysmal AF (ParAF) versus persistent AF (PerAF) [
14], as statistics have shown that the success rate of CA is dependent on the area of ablation and the AF type [
15]. In fact, a high success rate is reached in ParAF just ablating the pulmonary veins (PVs), while for PerAF, the use of further ablations is required to achieve similar results [
12]. This defines a challenge for the precise characterization of the atrial substrate aimed at optimally guiding the CA, where methodologies to distinguish between the complex fractionated atrial electrograms (CFAEs) of ParAF versus PerAF would be very interesting and useful for fast and efficient atrial substrate mapping methods [
16]. CFAEs can be identified by the presence of multiple electrogram deflections without interruption, a baseline perturbation with continuous deflection [
17], or a cycle length ≥120 ms that includes isoelectric intervals between deflections [
18].
In the attempt to personalize AF treatment, nonlinear indices have been applied to CFAEs, which is aimed at quantifying atrial remodeling and the atrial electrophysiological substrate, supporting clinical management decisions, and suggesting the most appropriate approach for ablation procedures [
19]. In this regard, different works have been published proposing several classification strategies based on nonlinear metrics as assessed via statistical tests, aimed at classifying ParAF versus PerAF via a CFAE analysis. In this respect, Ciaccio et al. measured the CFAE repetitiveness [
16] and quantified the degree of morphological heterogeneity in CFAE deflections [
20]; Acharya et al. adopted recurrence plots, the Recurrence Quantification Analysis, and entropy measures, proving that the underlying signal generation process of CFAEs in AF is somehow repetitive, even for sequences as short as one second [
21]; Ndrepepa et al. and Ravi et al. used the AF cycle length to show that patients with persistent AF had shorter cycle lengths and a higher degree of disorganized activity than patients with paroxysmal AF [
22,
23]; and Sanders et al. employed a spectral analysis to identify localized sites of high-frequency activity, reporting different distributions in paroxysmal versus permanent AF [
24].
However, the aforesaid prior studies do not respond to the clinical demand for easy and intuitive interpretation methods for CFAEs [
25]. In fact, clinicians demand the use of straightforward classification models which are readily understandable, fed with features easily applicable to CFAEs. Furthermore, previous methods do not provide any signal quality or stability assessment applied to the variables and recordings used to develop the introduced models to discriminate between AF types, so that the robustness of the approaches previously introduced can be compromised. In fact, the prior reported results miss the assessment of the intra-recording and intra-patient stability of the analyzed data, as well as the CFAE signal quality. By omitting the study of the stability and the signal quality evaluation, two main issues may arise: first, averaging among recording places and AF types without having previously checked the intra-recording and intra-patient stability may lead to an oversimplification of the processes taking place at different regions of the atria; second, the inclusion of artifacted or noisy segments unlinked to the AF mechanism, such as drifts or very distorted recordings, due to bad contacts of the recording electrode on the atrial walls, may lead to biased and unreliable results in the characterization of the atrial electrophysiological substrate.
The present work has two principal objectives. On the one hand, to assess the stability of CFAEs with nonlinear indices, both intra-recording and intra-patient, and, on the other hand, to exploit nonlinear strategies and straightforward models to discriminate between the CFAEs of ParAF and PerAF from patients undergoing catheter ablation of the AF.
In the first part of the manuscript, the intra-recording and intra-patient stability of CFAEs have been assessed with the nonlinear indices of determinism (DET) of a Recurrence Quantification Analysis (RQA) and sample entropy (SE). Furthermore, the presence of artifacted or noisy segments in CFAEs has been considered as well, evaluating the consequences of their discarding in the final outcome. The idea behind it is that a discarding process to remove poor quality segments may benefit the intra-recording and intra-patient stability assessment. Moreover, this approach may decrease the differences in the DET and SE between the intra-patient recording places, thus helping to quantify the atrial substrate with reliable and representative values.
In the second part of the study, the exploitation of nonlinear strategies and the development of straightforward models to discriminate between ParAF and PerAF from the CFAEs of patients undergoing a CA of AF has been performed using, besides the nonlinear indices of DET and SE, the widely accepted indices of dominant frequency (DF) and the AF cycle length (AFCL). The indices extracted from CFAEs were processed and selected prior to being converted into features for coarse tree classification models. The assumption is that a thoughtful selection of reduced sets of indices, feeding straightforward classification models, would enable a more accurate discernment of ParAF versus PerAF CFAEs, thus providing a more understandable insight for atrial substrate evaluation and improved therapeutic decision for AF management.
3. Results
As a result of the application of the CFAE quality assessment,
of the one-second-length segments were discarded, while for the two- and four-second-length segments, the percentages were slightly higher, respectively, 8.6% and 12.5%, due to the aforesaid increased loss of discarded segments.
Figure 3 shows the discarded segments distribution along the recording places in the datasets. In particular, the number of discards in the LSPV is quite low as compared with the other recording places and in contrast with the RSPV, in which the discards are more frequent. The proportion of the discarded segments in the paroxysmal and persistent AF patients is similar: for 1 s length segments, 46.4% of the discarded segments were in ParAF and 53.6% in PerAF; for the 2 s length segments, 44.4% were in ParAF and 55.6% in PerAF; and finally, for the 4 s length segments, 43.8% were in ParAF and 56.2% in PerAF.
Regarding the results of the stability, the averaged statistical descriptors (range, mean, and standard deviation) resulted by applying SE to the 1, 2, and 4 s length datasets, with and without discards, reported in
Table 1. With the discards, the ranges were reduced as the lower boundary took greater values mainly due to the removal of the drifts, which generally presented high amplitude and low SE values. As a consequence, the standard deviations also diminished with the discards.
The statistical descriptors of the DET were computed in the same way as the SE, and the results are reported in
Table 2. For the DET, the ranges were reduced with discards due to the upper boundary that took lower values, thus also decreasing the standard deviations. In fact, the DET and SE are complementary measures; one measures predictability, and the other, complexity.
The intra-recording analysis showed a significant variation in the CV(%) in any segment length, both for the SE and DET, as shown in
Table 3. Discarding the segments benefited the stability, decreasing the CV, with deeper decreases for longer segments. These variations were on average of greater magnitude for PerAF (DET = 29.1%, SE = 37.6%) versus ParAF (DET = 19.6%, SE = 31.8%).
The intra-patient stability also provided large variations in the CV(%) for the DET and even bigger for the SE at any segment length, as shown in
Table 4. In this case, discarding segments was useless and the CV provided similar variations.
The results of the Kruskal–Wallis test, as well as the visual inspection of the box plots computed for every recording place, every window length, and every index (72 plots not included here), suggests that the atrial electrophysiological substrate mostly differs at the recording places analyzed, and shows a great variability in the intra-patient indexing. For the 1 s length datasets, H was always rejected for the SE, while for the DET, it was accepted only once with no discard. For the 2 s length, H was still always rejected for the SE, while for the DET, it was accepted in two cases with no discard and in one case with discard. For the 4 s length, the null hypothesis was accepted in seven cases (one for the SE with discard, one for the DET with no discard, and five for the DET with discard); however, once the box plots were visualized to verify the accuracy of the results, the median values had demonstrated differences. The inaccuracy found for the 4 s length datasets are justified by the fact that the sample size is small, and therefore, the test does not follow a distribution.
With respect to the results for the discrimination between the CFAE of the ParAF versus PerAF, the mean and standard deviation values resulted by applying the SE and DET to the 1, 2, and 4 s length datasets, reported in
Table 5 and organized by the AF type. The AFCL and the DF obtained from the 16 s length CFAEs demonstrated values of AFCL with an average and standard deviation of 7.75 ± 1.56 in ParAF and 7.01 ± 1.46 in PerAF, while the DF had, respectively, an average and a standard deviation of 5.59 ± 1.36 in ParAF and 6.20 ± 1.08 in PerAF.
As shown in
Table 6 for the SE and DET, the Mann–Whitney test rejected the null hypothesis in most of the recording places, with particularly low
p-values in the right superior pulmonary vein (RSPV), while in the right inferior pulmonary vein (RIPV), the null hypothesis was accepted, which suggests the presence of a similar atrial electrophysiological substrate in both AF types. For the AFCL, the
p-values were always higher than the significance level of 0.05, except for the RSPV. Finally, the statistical tests run for the DF showed no differences between ParAF versus PerAF except in the left superior pulmonary vein (LSPV) and in the posterior free wall of the left atrium (POS).
The correlation matrix obtained by averaging the correlation values of the 1, 2, and 4 s length datasets is shown in
Figure 4. In particular, a strong negative correlation between the index pairs of the SE-DET and AFCL-DET recorded at the same recording places was observed, while a strong positive correlation appeared between the paired SE-AFCL measured at the same recording place.
The application of the correlation matrix filter to 1 and 2 s length datasets removed the indices DF and entirely removed the indices of the DET and AFCL, due to their strong correlation with the SE. For the 4 s length dataset, the subset of indices kept was SE, SE, SE, SE, SE, AFLC, DF, DF, DF, DF, and DF.
For the three datasets, the variables scoring provided by the Random Forest is presented in
Figure 5. For the 1 and 2 s length datasets, the variables ranked with a score higher than 40 were DF
, DF
, DF
, SE
, and SE
, while in the 4 s length dataset, the AFLC
was also included as it surpassed the threshold value.
After testing all the possible combinations of the high-ranked features, the group SE, DF, DF, and DF provided the best classification performance to discriminate between the CFAE of ParAR and PerAF with an accuracy of 88.3% for all the segment lengths. The same accuracy was reached by also adding SE to the group. However, in order to simplify the model as much as possible, it was decided to consider the minimal set of features as optimal. The accuracy values of the models built with the other possible combinations of the highest-ranked features (score > 40) had a mean and standard deviation of 70.7 ± 8.7% for the 1 s length, 70.8 ± 8.3% for the 2 s length, and 69.2 ± 8.8% for the 4 s length, which is a significant reduction in the accuracy. Finally, the highest accuracy achieved by a single index was provided by DET with 82.2% for any segment length as well, while the averaged single accuracies reached by the other indices were 60.2 ± 11% for the 1 s length, 54.7 ± 13.7% for the 2 s length, and 57.3 ± 11.4% for the 4 s length.
4. Discussion and Conclusions
The present work demonstrates that the intra-recording and intra-patient stability assessment of CFAEs is significantly benefited by the exclusion of artifacted or noisy segments, thus helping to quantify the atrial electrophysiological substrate with reliable and representative values. By contrast, previous studies may lead to an oversimplification of the processes taking place at different regions of the atria, thus providing biased and unreliable results in the characterization of the atrial substrate.
To this respect, the stability of the CFAE and their corresponding nonlinear indices are significantly influenced by the length of the analyzed segment, and specifically by the recording site within the left atrium (see
Figure 3). In this regard, the number of discards in the LSPV have been quite low compared to the other recording places. In contrast, the RSPV has been the recording site in which the discards have been more frequent. This fact could be due to the difficulties in reaching the RSPV using a Lasso
TM catheter (Biosense-Webster, Diamond Bar, CA, USA), such as in the present study, in comparison with basket catheterization. In fact, catheters such as the Constellation
TM (Boston Scientific, Natick, MA, USA) have the advantage to better fit in most veins, adapting to size and anatomical form [
48], their main disadvantage being the higher cost [
49,
50].
The introduction of discards provoked the reduction in ranges and standard deviations both in the SE and DET (
Table 1 and
Table 2), thus showing that mapping CFAE with contact catheters on a beating heart is a delicate task [
48]. Furthermore, the discarding process benefited more the intra-recording stability (
Table 3) than the intra-patient stability (
Table 4), so that the CV varied more significantly in the first case. Nonetheless, the continued high variability of the CV suggests that averaging data in the same recording (intra-recording), as well as among recording places (intra-patient), may lead to an unfair oversimplification of the CFAE-based atrial electrophysiological substrate characterization, which has not been considered in many previous studies. In particular, in the intra-patient analyses, the visualized box plots exhibited many instances in which just a part of the recording places presented similar atrial electrophysiological properties as identified by like DET and SE values, enhancing the conclusion that averaging causes a loss of singularity of the electrophysiological substrate at the different atrial sites, which is the basis for the development of personalized catheter ablation procedures for AF treatment.
Unlike other complicated previously published models, this work has also proved that it is possible to develop straightforward solutions for clinical practice able to discriminate between ParAF and PerAF from the CFAEs of patients undergoing catheter ablation, thus providing a more understandable insight for atrial substrate evaluation and improved therapeutic decisions for AF management. To this respect, the SE mean values diminished with the segment length (
Table 5), with greater values in PerAF as compared to ParAF, thus reflecting a higher degree of disorganization in PerAF, which is in agreement with a previous study [
22]. Similarly, the DET values increased with the segment length, showing greater values in ParAF versus PerAF, thus highlighting ParAF as more predictable than PerAF.
The computation of the AFCL from the 16 s length CFAEs resulted in higher values for ParAF than for PerAR, as well as in lower values of DF for ParAF than for PerAF, as has been reported in prior work [
40]. However, the discussion remains open in this respect, because other studies have reported DF peak frequencies higher in ParAF than in PerAF [
51].
The study of the correlation between the indices and recording sites is illustrative (see
Figure 4) because it showed a high correlation between the SE and DET in the same recording site. Similarly, a high correlation was also observed between the AFCL and SE, and the AFCL and DET, thus indicating that these three indices maintain a strong relationship between each other, linking the linear and the nonlinear domain. Finally, as expected, the AFCL and DF were also highly correlated. Altogether, these high correlations indicate that the selection of such indices for a substrate assessment is the right choice because they have been able to capture the essence of the atrial electrophysiological substrate.
The study of the discriminatory power of DF using statistical tests led to the wrong conclusion that the index is not discriminative, as the null hypothesis of no differences between ParAF and PerAF was accepted. Contrarily, the feature selection ranked the DF, together with SE, DF, and DF, as the set of most important variables to discriminate between the CFAEs of the ParAF and PerAF, reaching the highest accuracy of 88.3%. Therefore, the careful selection of limited sets of indices feeding straightforward classifiers are able to discriminate accurately between the CFAE of different AF types. Furthermore, the use of just one nonlinear index, such as DET, provided a classification accuracy as high as 82.2% for any segment length. This result can serve as a starting point to prove that simple classification models, which are readily understandable, can be built to provide improved methodologies for atrial substrate characterization in AF.
The proposed analysis has also some limitations that merit consideration. Firstly, the study has been carried out using a limited set of data which, lately, has been reduced more due to the discarding process. Obviously, in order to obtain more generalizable results, a wider database with many more patients would be desirable. In this regard, our group is now working toward the expansion of the database for future studies. Secondly, regarding the creation of the classification models, the generalization of the results for the classification between the ParAF and PerAF could provoke model overfitting, especially using the Random Forest algorithm. However, given that the accuracy obtained with a single index has been high enough (82.4%) with respect to the accuracy provided by the most performing model (88.2%), it is reasonable to consider that overfitting has a reduced effect in overall classification performance.
As an overall conclusion, the observed high variability of the CFAE has shown that averaging data in one recording place or among different recording places may lead to an unfair oversimplification of the CFAE-based atrial electrophysiological substrate characterization. Furthermore, a thoughtful selection of the limited combinations of features feeding straightforward classification models are able to discriminate accurately between the CFAEs of the ParAF and PerAF, thus providing improved therapeutic decision making for AF management, as well as clearer insight concerning the evaluation of the atrial substrate.