Automated Detection of Paroxysmal Atrial Fibrillation Using an Information-Based Similarity Approach

: Atrial ﬁbrillation (AF) is an abnormal rhythm of the heart, which can increase heart-related complications. Paroxysmal AF episodes occur intermittently with varying duration. Human-based diagnosis of paroxysmal AF with a longer-term electrocardiogram recording is time-consuming. Here we present a fully automated ensemble model for AF episode detection based on RR-interval time series, applying a novel approach of information-based similarity analysis and ensemble scheme. By mapping RR-interval time series to binary symbolic sequences and comparing the rank-frequency patterns of m -bit words, the dissimilarity between AF and normal sinus rhythms (NSR) were quantiﬁed. To achieve high detection speciﬁcity and sensitivity, and low variance, a weighted variation of bagging with multiple AF and NSR templates was applied. By performing dissimilarity comparisons between unknown RR-interval time series and multiple templates, paroxysmal AF episodes were detected. Based on our results, optimal AF detection parameters are symbolic word length m = 9 and observation window n = 150, achieving 97.04% sensitivity, 97.96% speciﬁcity, and 97.78% overall accuracy. Sensitivity, speciﬁcity, and overall accuracy vary little despite changes in m and n parameters. This study provides quantitative information to enhance the categorization of AF and normal cardiac rhythms


Introduction
Atrial fibrillation (AF), the most common sustained cardiac arrhythmia, is an abnormal heart rhythm characterized by rapid and irregular beating of the atria [1].The disease is associated with an increased risk of heart failure, dementia, stroke and other heart-related complications [2].Paroxysmal AF (PAF), also termed intermittent AF, is defined as an episode of AF that terminates spontaneously or with intervention in less than seven days [3].The frequency of PAF is uncertain, because previous studies have suggested that a majority of these episodes are asymptomatic [4,5], including some that may last more than 48 h [4].Experienced clinicians can identify AF patterns by visual inspection of the electrocardiogram (ECG) chart.However, due to the paroxysmal nature of the onset and termination of PAF in certain patients, human-based diagnosis of AF is usually time consuming when using a longer-term ECG recording such as a Holter or event recorder.Therefore, an automated, computerized AF detector may provide timely diagnosis and have substantial clinical utility.
It is challenging to implement diagnostic ECG waveform criteria for AF into a computerized algorithm, partly due to the difficulty of quantifying P-waves (and their absence) and that cardiac inter-beat intervals, i.e., RR intervals, follow no repetitive patterns.One feasible approach is to identify the presence of irregular ventricular rhythm during AF episodes based on analysis of RR intervals [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25].Moody and Mark [11] showed that the Markov process model for the AF detection is equivalent to determining the arithmetic mean of a series scores based on the RR interval sequence.Tateno and Glass applied standard density histograms of the RR and ∆RR intervals to detect the onset and termination of AF using standard coefficients of variation and the Kolmogorov-Smirnov test [12].Kikillus, et al. [13] and Babaeizadeh, et al. [14] applied Markov modeling technique to identify AF.Lian, et al. [15] developed an AF detector with its basis centered on the Map of RR intervals versus change of RR intervals.Huang, et al. [16] utilized a histogram of ∆RR n and standard deviation analysis.Parvaresh, et al. [17] evaluated three classifiers for AF screening by using autoregressive modeling.Dash, et al. [18] proposed the randomness-variability-complexity approach.Lee, et al. [19] introduced the time-varying coherence approach.Petrenas, et al. [20] applied the low-complexity approach.Zhou, et al. [21,22] proposed the symbolic dynamics approach.Additionally, the entropy and heart rate dynamics approaches were also used in many studies [23][24][25][26].Other studies of AF detections are based on the analysis of the probability density or autocorrelation function of the RR interval series during AF [27][28][29].
Studies have demonstrated that physiologic systems generate complex fluctuations in their output signals that reflect the underlying dynamics [30][31][32].We have previously proposed a novel information-based similarity (IBS) index to detect and quantify the repetitive appearance of certain basic patterns that are embedded in the human heart rate time series using tools from physics and statistical linguistics [33][34][35][36].Human cardiac dynamics are driven by the complex nonlinear interactions of two competing forces: Sympathetic stimulation increases and parasympathetic stimulation decreases heart rate.For this type of intrinsically noisy system, it may be useful to simplify the dynamics via mapping the output to binary sequences, where the increase and decrease of the inter-beat intervals are denoted by 1 and 0, respectively.The resulting binary sequence retains important features of the dynamics generated by the underlying control system, but is tractable enough to be analyzed as a symbolic sequence [33][34][35][36][37][38] Therefore, analysis of symbolic sequences derived from RR intervals may reveal hidden physiological properties of AF.We hypothesize that symbolic patterns mapped from fluctuations of the RR time series may contain important information representing the underlying dynamics, which can be used to discriminate AF and non-AF ventricular rhythms.
Therefore, here we present a study based on a public ECG database on PhysioNet (http://physionet.org)[39][40][41].We aim to develop a computerized AF detector based on quantifying the dissimilarity of AF and normal RR-interval time series using an information-based approach.To achieve high detection specificity and sensitivity, and low variance, we designed an ensemble AF detection model.A weighted variation of bagging with multiple AF and normal sinus rhythm (NSR) templates was applied.With respect to the setting of parameters and selection of training data, this study provides quantitative information to enhance the categorization of AF and normal RR time series.
Figure 1 illustrates a typical tracing of RR interval time series for a PAF patient from the MIT-BIH AFDB Database.Visual inspection suggests that the cardiac rhythm changes dramatically with the AF onset, and the amplitude of the RR-interval fluctuations in AF episodes is substantially higher than that in non-AF periods.
database consists of 25 ECG recordings (10 h in duration) of patients with Paroxysmal AF, whereas the NSR database consists of 18 long-term ECG recordings from healthy subjects who had no significant arrhythmias [39][40][41].
Figure 1 illustrates a typical tracing of RR interval time series for a PAF patient from the MIT-BIH AFDB Database.Visual inspection suggests that the cardiac rhythm changes dramatically with the AF onset, and the amplitude of the RR-interval fluctuations in AF episodes is substantially higher than that in non-AF periods.During an episode of atrial fibrillation, the line is set to "AF"; otherwise it is set to "Non-AF", which means a rhythm that is not atrial fibrillation.

Information-Based Similarity Index
We have previously proposed an algorithm to measure the distance or dissimilarity between two symbolic sequences [33][34][35][36].The algorithm is based on measuring differences in the occurrence of repetitive patterns between two symbolic sequences.In this study, the RR-interval time series was mapped to a binary symbolic sequence, where an increase in the RR-interval was represented by '1' and no change or a decrease in the RR-interval was represented by '0'.We map m + 1 successive intervals to a binary sequence of length m, called an m-bit "word".Each m-bit word, therefore, represents a unique pattern of fluctuations in a given RR-interval time series.By shifting one data point at a time, the algorithm produces a collection of m-bit words over the whole time series (total of 2 m possible words).Therefore, it is plausible that the occurrence of these m-bit words reflects the underlying dynamics of the original RR time series.Different patterns of dynamics thus produce different distributions of these m-bit words.
Figure 2 illustrates this mapping procedure using 6-bit words (m = 6) from a part of the RRinterval time series.For m = 6, there are a total of 64 (=2 6 ) possible words.The first binary word (100100) shown in Figure 2 is equivalent to decimal number of 36 (1 × 2 5 + 1 × 2 2 = 36), so as (001001) and (010010) are termed 9 and 18 respectively.Representative inter-beat (RR) interval time series derived from an electrocardiographic recording of a patient with paroxysmal atrial fibrillation.The dark circles represent consecutive RR intervals and the solid line indicates the presence/absence of AF episodes as reported in the annotations of PhysioNet database.During an episode of atrial fibrillation, the line is set to "AF"; otherwise it is set to "Non-AF", which means a rhythm that is not atrial fibrillation.

Information-Based Similarity Index
We have previously proposed an algorithm to measure the distance or dissimilarity between two symbolic sequences [33][34][35][36].The algorithm is based on measuring differences in the occurrence of repetitive patterns between two symbolic sequences.In this study, the RR-interval time series was mapped to a binary symbolic sequence, where an increase in the RR-interval was represented by '1' and no change or a decrease in the RR-interval was represented by '0'.We map m + 1 successive intervals to a binary sequence of length m, called an m-bit "word".Each m-bit word, therefore, represents a unique pattern of fluctuations in a given RR-interval time series.By shifting one data point at a time, the algorithm produces a collection of m-bit words over the whole time series (total of 2 m possible words).Therefore, it is plausible that the occurrence of these m-bit words reflects the underlying dynamics of the original RR time series.Different patterns of dynamics thus produce different distributions of these m-bit words.
Figure 2 illustrates this mapping procedure using 6-bit words (m = 6) from a part of the RR-interval time series.For m = 6, there are a total of 64 (=2 6 ) possible words.The first binary word (100100) shown in Figure 2 is equivalent to decimal number of 36 (1 × 2 5 + 1 × 2 2 = 36), so as (001001) and (010010) are termed 9 and 18 respectively.These m-bit words are then sorted according to their frequency of occurrence.The rankfrequency of any given m-bit word may differ between the two sequences mapped from two RR interval time series.We then plot the rank order of each m-bit word in the first symbolic sequence against its rank order in the second symbolic sequence (Figure 3).Each data point on the graph represent a binary word with its rank on first symbolic sequence (horizontal axis) plotted against that on second symbolic sequence on vertical axis.The diagonal line of identity indicates equal rank order for both signal series.If two symbolic sequences are similar in their rank order, the data points will be located near this diagonal line (Figure 3c,d), comparisons between RR time series for either two AF patients or two healthy subjects).The average deviation of the plotted points from the dashed diagonal line is, therefore, a measure of the distance between two symbolic sequences.Greater distance indicates less similarity (Figure 3e, comparison between AF and normal RR time series), and, vice versa.These m-bit words are then sorted according to their frequency of occurrence.The rank-frequency of any given m-bit word may differ between the two sequences mapped from two RR interval time series.We then plot the rank order of each m-bit word in the first symbolic sequence against its rank order in the second symbolic sequence (Figure 3).Each data point on the graph represent a binary word with its rank on first symbolic sequence (horizontal axis) plotted against that on second symbolic sequence on vertical axis.The diagonal line of identity indicates equal rank order for both signal series.If two symbolic sequences are similar in their rank order, the data points will be located near this diagonal line (Figure 3c,d), comparisons between RR time series for either two AF patients or two healthy subjects).The average deviation of the plotted points from the dashed diagonal line is, therefore, a measure of the distance between two symbolic sequences.Greater distance indicates less similarity (Figure 3e, comparison between AF and normal RR time series), and, vice versa.These m-bit words are then sorted according to their frequency of occurrence.The rankfrequency of any given m-bit word may differ between the two sequences mapped from two RR interval time series.We then plot the rank order of each m-bit word in the first symbolic sequence against its rank order in the second symbolic sequence (Figure 3).Each data point on the graph represent a binary word with its rank on first symbolic sequence (horizontal axis) plotted against that on second symbolic sequence on vertical axis.The diagonal line of identity indicates equal rank order for both signal series.If two symbolic sequences are similar in their rank order, the data points will be located near this diagonal line (Figure 3c,d), comparisons between RR time series for either two AF patients or two healthy subjects).The average deviation of the plotted points from the dashed diagonal line is, therefore, a measure of the distance between two symbolic sequences.Greater distance indicates less similarity (Figure 3e, comparison between AF and normal RR time series), and, vice versa.Let d r (ψ 1 , ψ 2 ) denote a dissimilarity value between 0 and 1 of two symbolic sequences, s k denote an m-bit word, and L denote the number of unique m-bit words.Let R and p denote the word's rank and probability, respectively.Let F denote the weight of the word, where F is computed using Shannon's entropy and normalized with the normalization factor Z. The degree of dissimilarity between two symbolic sequences can be defined as [16][17][18]: The sum is divided by the value L to keep d r (ψ 1 , ψ 2 ) in the range [0, 1].A bigger dissimilarity value corresponds to a higher degree of dissimilarity.Therefore, if for an unknown RR-interval series, d N > d AF , where d N is the dissimilarity value between unknown series and normal RR series, and d AF is the dissimilarity value between unknown series and AF RR series, the unknown series is more similar to AF, and vice versa.

Overall Algorithm of the Ensemble Model
The development of this proposed ensemble AF detector follows five key steps: 1.
Retrieving sets of AF and NSR RR-interval series from PhysioNet ECG data; 2.
randomly setting aside a percentage of AF and NSR sets as training data and the rest as testing data.In AF training data, only AF segments were picked according to annotations provided on the PhysioNet database.This procedure was repeated five times such that five datasets (i.e., datasets 1-5) with different training and testing data could be generated; 3.
extracting RR-interval increment signatures (i.e., the rank-frequency of m-bit word) of the desired observation window length from training and testing data, respectively; 4.
building templates to represent AF and NSR signature patterns; 5.
designing an ensemble classifier, which is composed of various pairs of AF and NSR templates; 6.
comparing the information-based dissimilarity index between an unknown RR-interval time series and the templates; and 7.
tuning ensemble parameters to achieve our twin aims of high detection accuracy and low detection variance.

Weighted Dissimilarity Index
Since cardiac patterns can be quite variable between different subjects and even within the same subject, the single template approach of averaging multiple and lengthy ECG signals together could result in dilution of significant patterns and data.We thus propose creating multiple AF and NSR templates from shorter observation-window segments and incorporating the ensemble method to obtain better predictive performance for detecting AF.Furthermore, we propose using a weighted variation of bootstrap aggregating (bagging) to perform weighted voting when comparing an unknown RR-interval time series with AF and NSR templates.
In the current study, we randomly generated 10 AF templates and 15 NSR templates in each dataset.More NSR templates were created due to the greater variability of NSR beat patterns between healthy subjects.Each template was created from a set of observation-window segments extracted from the original RR time series of one subject.To assess accuracy, we tested AF RR time series that were not part of the training data.We analyzed these series and compared their diagnoses to the ground truth provided by annotations from the PhysioNet database.Among the 10 AF and 15 NSR templates, all possible template pair combinations were compared against each testing RR-interval time series segment for a total of 150 dissimilarity comparisons.Each pair of dissimilarities was normalized such that d AF + d N = 1.Weighted sums of the dissimilarity values were then calculated for AF and NSR comparisons and averaged to produce a weighted average AF and NSR dissimilarity values, i.e., weighted dissimilarity index: In this study, T = 150.

Parameter Tuning
In addition to the m and n parameters, we include a tuning parameter ∆ to optimize our final results.The final step in our computation is to compare weighted averages of AF and NSR dissimilarity values, where the smaller dissimilarity index determines the predicted diagnosis of a given test segment.While AF detection accuracy remains high, with optimum accuracies between 98-100%, the NSR accuracy can fall as low as 70% to 80%.When observing the normalized dissimilarity values, many of the false positive diagnoses during NSR testing had very close AF and NSR dissimilarity values that were close to a 50/50 weighting (e.g., a 0.49 weighted AF dissimilarity index compared to a 0.51 NSR weighted dissimilarity index).Although the dissimilarities were close in value, our greater/less than comparison scheme caused many NSR segments to be falsely diagnosed as AF.Using a bias factor to adjust decision boundary is a well-established statistical method when training data may exhibit imbalanced distribution [42].As a result, we implemented a tuning parameter ∆ to shift the dissimilarity comparison boundary and yield more accurate results, where if D N > D AF + ∆, then the segment is AF.Otherwise, the segment is non-AF.For each m-bit word and observation window n combination, we tested ∆ values between 0.00 and 0.19 to find the optimum sensitivity and specificity combination.
Thus, in our ensemble model, we use three parameters: the word length m, observation window n, and a bias parameter ∆.We experimented with m from 4 to 12, n from 50 to 200, and ∆ of different values.We aimed to find the best parameter setting(s) and sensitivity of different parameter settings.To evaluate the predictive performance of our ensemble model, the sensitivity (SEN), specificity (SPE) and overall accuracy (ACC) were calculated, and repeated cross-validation were performed.SEN is defined as (True Positive)/(True Positive + False Negative), SPE is defined as (True Negative)/(True Negative + False Positive), and ACC is defined as (True Negative + True Positive)/(True Negative + True Positive + False Negative + False Positive).

Overall Performance of the Ensemble Model
Our ensemble model achieved great performance.Table 1 summarizes the results of SEN, SPE and ACC of the prediction models with different cross-validation datasets.Optimal AF-detection parameters are m = 9 and observation window of 150, achieving 97.04% sensitivity, 97.96% specificity, and 97.78% overall accuracy.SEN, SPE, and overall ACC vary little despite changes in word length m, observation window n and cross-validation datasets (see Table 1).The performance of the ensemble model (m = 9, n = 150) at changing the tuning parameter ∆ is shown in Table 2.The ensemble model had best performance with ∆ = 0.08, achieving 96.30% sensitivity, 98.71% specificity and 98.41% accuracy.

Continuous Behavior of the Detector
Here we present a graphic description of the testing results using the proposed ensemble detection model for the case of m = 9, the observation window n = 150, and ∆ = 0.1 (see Figure 4).The representative testing data in Figure 4 was taken from PhysioNet MIT-BIH AFDB database, record number 04908.The upper panel in Figure 4a displays the testing data, i.e., 10-h raw RR-interval time series recorded from a PAF patient.The lower panel in Figure 4a displays the testing results: D AF (the black curve) represents the weighted average dissimilarity index between testing RR-interval time series and AF templates and D N (the red curve) represents weighted average dissimilarity index between testing RR-interval time series and NSR templates.D AF + 0.1 > D N indicates the dynamic patterns of the testing RR intervals are similar to normal beats, otherwise, the testing RR intervals are similar to AF beats.The black step line indicates the ground truth of the AF episodes (marked as AF or non-AF on y axis) as reported in the annotations of PhysioNet database, and the detection results of testing data are shown in the red step line, achieving 94.40% sensitivity, 99.59% specificity, and 97.01%overall accuracy.Moreover, we enlarge a representative AF segment (see the red rectangle in Figure 4a) to show the detailed fluctuations of AF episode (see Figure 4b).In addition to normal beats, our detection model can successfully distinguish AF episode from non-AF fluctuations, e.g., the segment of beat number 18,800-22,100 in testing RR-interval time series in Figure 4a, which might be caused by R peak detection failure or other problems in this segment.Compared with AF signal, the time series inside of the blue rectangle in Figure 4a was enlarged to show the signal details, see Figure 4c.
In case of sporadic AF episodes of very short duration (e.g., less than 30 s), performance of the proposed detection model is not satisfying.Here we take record 04043 from MIT-BIH AFDB database as representative example (see Figure 5).The testing RR-interval time series are shown in the upper panel.With m = 9, n = 150, and ∆ = 0.05, the weighted average dissimilarity index D AF (the black curve) and D N (the red curve) are calculated and shown in the lower panel.D AF + 0.05 > D N indicates the testing RR intervals are similar to normal beats, otherwise, they are similar to AF beats.The black step line indicates the ground truth, and the red step line indicates detection results (AF or non-AF).The detection performance for this record is 89.55% sensitivity, 67.43% specificity and 72.05% accuracy.The solid and hollow triangles marked on the x axis show some examples of false positive and false negative brief segments.

Main Findings
In this study, we present a fully automated ensemble model for AF episode detection based on RR-interval time series, applying a novel approach-information-based similarity index and ensemble scheme.By mapping RR time series to binary sequences and comparing the rank-frequency patterns of m-bit word, this study provides quantitative information to enhance the categorization of AF and normal cardiac rhythms.In addition, using a weighted variation of bagging with multiple AF and NSR templates, we can obtain results with low variance and high accuracy.By performing dissimilarity comparisons across multiple templates, we are able to account for RR-interval increment variations between different subjects.Based on our results, optimal AF-detection parameters are symbolic word length m = 9 and observation window n = 150, achieving 97.04% sensitivity, 97.96% specificity, and 97.78% overall accuracy.Sensitivity, specificity, and overall accuracy vary little despite changes in m and n parameters.Our findings indicate that the information-based similarity index is relatively reliable in distinguishing AF episode within considerably long time ECG recordings.

Advantages of Ensemble Model
AF and NSR patterns may not be consistent among different subjects.In addition, not only do different patients exhibit different patterns, but ECG fluctuates from the same subject throughout different activities, such as during wakefulness versus during sleep.This first observation leads us to design multiple templates for each target class to sufficiently represent AF and NSR patterns.Second, a final class detection, AF or NSR, should be a joint decision made by a committee (or an ensemble), with each member consisting of an AF and NSR template pair.The final decision is made through voting by the committee members.When a committee member strongly endorses one class over the other, that member's vote should be weighted higher compared to the vote of a "lukewarm" committee member.
This observation motivated us to design a weighted voting scheme, which enforces high confidence votes but discounts low confidence votes.The empirical study shows that such an ensemble scheme reduces detection noise and hence leads to lower detection variance.
Most of the detectors in Table 3 employ a window length of 127/128 beats, i.e., [15,16,18,19,21,22], however, detectors with a 128-beat window tend to miss brief clinical episodes.It is important to also consider the ability to detect brief AF episodes when evaluating detector performance.Table 3 also compared the shortest length of the detected AF episodes.The proposed method investigated the performance of the detector from 50 beats to 200 beats.With short detection window n = 50 beats, it achieving 89.58% sensitivity, 90.32% specificity, and 90.04% accuracy (see Table 1).The proposed detection process involves the estimation of probabilities, and a shorter window implies increased statistical uncertainty.Petr ėnas, et al. [20], Lee et al. [19], Lake, et al. [23] and Lian et al. [15] reported on performance for shorter windows.Specially, Lake, et al. [23] worked on short window of 12 beats (91% sensitivity and 94% specificity).Petr ėnas, et al. [20] used a even shorter window of only 8 beats.Figure 6 shows the computation time according to word length m and observation window n, with ∆ changing from 0.00 to 0.19 in each computation.Compared to the observation window, the word length has more influence to the computation time.The computation time is between 6.09 and 6.42 ms with m = 4, and between 23.51 and 28.72 ms with m = 12.With the optimal AF-detection parameters (m = 9, n = 150), the computation time is 9.12 ms (programs run in MATLAB R2015a on Intel(R) Core(TM) i7-6700k CPU @ 4.00GHz processor, Lenovo, Beijing, China).This shows that our algorithm can be realizable in real time for practical applications, and it is faster than many other algorithms: 20 ~30 ms with observation seg = 128, and 3 ~4 ms with seg = 12 in Lee et al. [19], 5.2 s with seg = 128 in Lake and Moorman [23], 200 ms with seg = 128 in Dash et al. [18], and 3 s with seg = 100 in Tateno and Glass [12].Figure 6 shows the computation time according to word length m and observation window n, with Δ changing from 0.00 to 0.19 in each computation.Compared to the observation window, the word length has more influence to the computation time.The computation time is between 6.09 and 6.42 ms with m = 4, and between 23.51 and 28.72 ms with m = 12.With the optimal AF-detection parameters (m = 9, n = 150), the computation time is 9.12 ms (programs run in MATLAB R2015a on Intel(R) Core(TM) i7-6700k CPU @ 4.00GHz processor, Lenovo, Beijing, China).This shows that our algorithm can be realizable in real time for practical applications, and it is faster than many other algorithms: 20 ~ 30 ms with observation seg = 128, and 3 ~ 4 ms with seg = 12 in Lee et al. [19], 5.2 s with seg = 128 in Lake and Moorman [23], 200 ms with seg = 128 in Dash et al. [18], and 3 s with seg = 100 in Tateno and Glass [12].

Study Limitations and Future Work
The selection of the observation window places a lower boundary on the length of AF episode that can be detected.This approach is suitable for AF episodes that are prolonged, and the detection results reveal limited performance for sporadic AF episodes of very short duration (e.g., less than 30 s).From a diagnostic point of view, using multiple and complementary methods to detect AF episode may be helpful.Our new approach complements conventional approaches of AF detection, since our algorithm is based on a completely different concept from other approaches.This algorithm

Study Limitations and Future Work
The selection of the observation window places a lower boundary on the length of AF episode that can be detected.This approach is suitable for AF episodes that are prolonged, and the detection results reveal limited performance for sporadic AF episodes of very short duration (e.g., less than 30 s).From a diagnostic point of view, using multiple and complementary methods to detect AF episode may be helpful.Our new approach complements conventional approaches of AF detection, since our algorithm is based on a completely different concept from other approaches.This algorithm may also be easily adapted to other physiological and physical time series data, provided that a meaningful symbolic mapping rule can be defined.
We propose the implementation of a confidence-based voting scheme due to the disproportionate weighting that may be given to certain dissimilarity values.For example, if a dissimilarity computation for a given test segment in one case yields a 0.90 AF dissimilarity and a 0.95 NSR dissimilarity, the values suggest that the test segment is quite dissimilar from both templates.However, normalization of the dissimilarity values results in a normalized value of 0.47 AF dissimilarity and 0.53 NSR dissimilarity.In a second case where a test segment yields a 0.15 AF dissimilarity and a 0.10 NSR dissimilarity, the test segment is much more similar to each segment and thus its weighting may be more significant.However, the dissimilarity values would be normalized to approximately 0.60 AF and 0.40 NSR, which are not too different from that of the first case.In a future study, we aim to test whether dissimilarity values such as those in the first case should be regarded with less confidence than dissimilarity values of the second case.
In addition to confidence-based voting, to further improve and validate our results, we will also perform cross validation with different sets of training data from the AFDB and NSRDB to validate our current results.

Figure 1 .
Figure 1.Representative inter-beat (RR) interval time series derived from an electrocardiographic recording of a patient with paroxysmal atrial fibrillation.The dark circles represent consecutive RR intervals and the solid line indicates the presence/absence of AF episodes as reported in the annotations of PhysioNet database.During an episode of atrial fibrillation, the line is set to "AF"; otherwise it is set to "Non-AF", which means a rhythm that is not atrial fibrillation.

Figure 1 .
Figure 1.Representative inter-beat (RR) interval time series derived from an electrocardiographic recording of a patient with paroxysmal atrial fibrillation.The dark circles represent consecutive RR intervals and the solid line indicates the presence/absence of AF episodes as reported in the annotations of PhysioNet database.During an episode of atrial fibrillation, the line is set to "AF"; otherwise it is set to "Non-AF", which means a rhythm that is not atrial fibrillation.

Figure 3 .
Figure 3. Representative inter-beat time series for a healthy subject (a) and an AF patient (b); (c) Rank order comparison of the time series for two healthy subjects; (d) Rank order comparison of the time series for two AF patients; (e) Rank order comparison of the time series in (a,b).The results in (c-e) are for the case m = 6.

Figure 3 .
Figure 3. Representative inter-beat time series for a healthy subject (a) and an AF patient (b); (c) Rank order comparison of the time series for two healthy subjects; (d) Rank order comparison of the time series for two AF patients; (e) Rank order comparison of the time series in (a,b).The results in (c-e) are for the case m = 6.

Figure 3 .
Figure 3. Representative inter-beat time series for a healthy subject (a) and an AF patient (b); (c) Rank order comparison of the time series for two healthy subjects; (d) Rank order comparison of the time series for two AF patients; (e) Rank order comparison of the time series in (a,b).The results in (c-e) are for the case m = 6.
) where d N and d AF represent the normalized values of d N and d AF , D N and D AF represent the weighted average of d N and d AF .T represents the total number of dissimilarity pair comparisons.

Figure 4 .
Figure 4. (a) Graphic illustration of detection results for a testing data (record 04908, m = 9, observation window n = 150, Δ = 0.1); (b) enlarged AF segment derived from (a); (c) Enlarged signal segment of neither AF nor normal beats to compare with (b).

Figure 4 .
Figure 4. (a) Graphic illustration of detection results for a testing data (record 04908, m = 9, observation window n = 150, ∆ = 0.1); (b) enlarged AF segment derived from (a); (c) Enlarged signal segment of neither AF nor normal beats to compare with (b).

Figure 4 .
Figure 4. (a) Graphic illustration of detection results for a testing data (record 04908, m = 9, observation window n = 150, Δ = 0.1); (b) enlarged AF segment derived from (a); (c) Enlarged signal segment of neither AF nor normal beats to compare with (b).

Figure 6 .
Figure 6.Computation time according to m and n (the computation times are the average values of 100 trials).

Figure 6 .
Figure 6.Computation time according to m and n (the computation times are the average values of 100 trials).

Table 1 .
Detection performances (%) of changing m and n parameters.

Table 2 .
Performance of the ensemble model (m= 9, n= 150) at changing the tuning parameter ∆.

Table 3 .
Comparison of detector performance on the MIT-BIH Atrial Fibrillation Database (AFDB).

Table 3 .
Comparison of detector performance on the MIT-BIH Atrial Fibrillation Database (AFDB).