Towards Real-Time Heartbeat Classification: Evaluation of Nonlinear Morphological Features and Voting Method

Abnormal heart rhythms are one of the significant health concerns worldwide. The current state-of-the-art to recognize and classify abnormal heartbeats is manually performed by visual inspection by an expert practitioner. This is not just a tedious task; it is also error prone and, because it is performed, post-recordings may add unnecessary delay to the care. The real key to the fight to cardiac diseases is real-time detection that triggers prompt action. The biggest hurdle to real-time detection is represented by the rare occurrences of abnormal heartbeats and even more are some rare typologies that are not fully represented in signal datasets; the latter is what makes it difficult for doctors and algorithms to recognize them. This work presents an automated heartbeat classification based on nonlinear morphological features and a voting scheme suitable for rare heartbeat morphologies. Although the algorithm is designed and tested on a computer, it is intended ultimately to run on a portable i.e., field-programmable gate array (FPGA) devices. Our algorithm tested on Massachusetts Institute of Technology- Beth Israel Hospital(MIT-BIH) database as per Association for the Advancement of Medical Instrumentation(AAMI) recommendations. The simulation results show the superiority of the proposed method, especially in predicting minority groups: the fusion and unknown classes with 90.4% and 100%.

In the class-oriented approach, from the 16 types of beats including the normal ones in the MIT database, a part or an entire collection of beats are preferred for classification. In [4], 17 types of heartbeats including normal and pacemaker are classified using the features based on various power spectrum density methods. Later, a novel genetic algorithm is used to identify the optimum features to enhance the classification process. Finally, these selected features are fed to the various standard machine learning algorithms. In [19], 13 types of heartbeats are classified using the combination of higher-order statistics (HOS) of the ECG and Hermite basis representation features using a support vector machine (SVM) classifier. In [18], six types of heartbeats are classified by using a local fractal dimension based nearest neighbor classifier. In [21], seven types of heartbeats are classified using gray relational analysis. In [20], Ye et al. designed a heartbeat classification algorithm using dynamic and morphological ECG features. For the morphological feature extraction process, the combination of wavelet transform and the dimensionality reduction technique, namely independent component analysis (ICA), is implemented on the heartbeats. R-R intervals are used as dynamic features. These features are then fed to SVM for classifying 16 types of heartbeats. In [8], a novel genetic ensemble of classifiers machine learning method is proposed. A new genetic training coupled with genetic optimization is used to classify 17 types of heartbeats. In [17], statistical and nonlinear features are derived from the modes obtained from the empirical mode decomposition (EMD) algorithm. Later, these features are provided to one-against-one SVM for classifying five types of heartbeats. In [22], ventricular extra systole or ectopic beats are recognized with the help of morphology matching, R-R intervals, and clustering algorithms. In [6], 17 types of EG beats are classified using hexadecimal local patterns claculated from wavelet sub-bands. In [7], five primary types of heartbeats are classified using ensemble empirical mode decomposition (EEMD) based features subjected to sequential minimal optimization-SVM (SMO-SVM). Besides, Neural networks plays a crucial role in biological signal analysis [23]. Recently, deep learning-based class-oriented schemes come into the picture. Deep learning techniques are a part of machine learning techniques implemented based on more hidden neural networks. In [9,10] these works, 17 types of heartbeats are classified using 1D-CNN and a novel 3-layer deep genetic ensemble of classifiers.
In the subject-oriented approach, the entire MIT-BIH database is subdivided into five groups of heartbeats according to the American National Standards Institute/Advancement of Medical Instrumentation (ANSI/AAMI) EC57:1998 standard. The list of these groups is non-ectopic (N), supraventricular ectopic (S), ventricular ectopic (V), fusion (F), and unknown (Q). Again, two strategies are observed for classifying these distinct groups: intra-patient and inter-patient schemes. The fundamental disagreement between these two strategies is the separation of training and testing datasets. Intra-patient scheme based methods are widely explored in the literature [24][25][26][27][28][29][30]. However, these approaches have less impact in real-time scenarios. Because, in real-time applications, an unknown subject that usually undergoes the testing will be foreign to the constructed model. Thus, the model has to be adequate to capture the inter-individual variations among the ECG. While designing the intra-patient based model, there might be a chance of having common subject information in both training and testing. To mitigate such an issue, De Chazal et al. [31] introduced an inter-patient scheme based heartbeat classification. Here, the overall MIT-BIH database is separated into two groups. One group is assigned to training, and the other one is for testing by ensuring that there is no similar subject data in both groups.
The advantage of the aforementioned computer-aided expert systems can be exploited only after developing real-time systems. In literature, in recent years, some of the field-programmable gate array (FPGA) based ECG signal analysis systems are implemented. In [32], an FPGA based heartbeat classification system is developed using the least-squares linear-phase finite impulse response filter and feed-forward neural network. In [33], three types of common arrhythmia beats, namely, premature ventricular contraction, ventricular fibrillation, and heart block beat along with normal beats, are classified using a real-time FPGA implementation. In [34], an intra-patient scheme based on arrhythmia classification is implemented in the FPGA system. However, most of the successful FPGA implemented systems are followed by an intra-patient scheme. Very few methods are developed in real-time systems based on inter-patient schemes [35]. However, still, these systems failed in detecting rare abnormal beats accurately. Hence, there is a need for developing a new expert system that can succeed in identifying rare heartbeats.

Contribution
In this paper, we presented an efficient inter-patient heartbeat classification algorithm. For any pattern recognition process, identifying an appropriate set of features and classifier is highly significant. From [36], it is noticeable that ECG is a non-stationary, non-Gaussian signal derived from nonlinear systems. Hence, we employed a decomposition method, namely improved complete ensemble empirical mode decomposition (ICEEMD) to obtain features from the ECG beats. This technique is capable of disclosing the implicit information lying in the ECG. Later, different nonlinear measures like entropies and HOS are determined from the modes obtained after ICEEMD. These measures will serve as features for proper discrimination of the heartbeat groups. The fundamental difficulty in processing these groups is the class imbalance. Here, a significant fraction of the heartbeats is non-ectopic. Hence, the results may be biased toward the majority group, which is undesirable. Therefore, to alleviate such an issue, we followed an algorithmic level approach. To achieve this, we employed a majority voting scheme based classification. It is a type of ensemble classification. The advantage of ensemble classification is that it can reduce both variance and bias. In this work, we used different combinations of classifiers, namely, naïve Bayes, linear, and quadratic discriminant functions, J48, and consolidated J48 classifiers for majority voting.
The rest of the paper is ordered as follows: the ECG data set, training, and testing data division of AAMI labeling, experimental details and theoretical background of the methodology are presented in Section 2. Section 3 presents the simulation results of the proposed method. The comparison with existing works, limitations, and future directions are presented in Section 4. The conclusions of the work are presented in Section 5.

Methods
The block diagram of the proposed method is illustrated in Figure 1. The methodology consists of three stages including pre-processing, feature extraction on training and testing data, and a classification model for evaluation. In this section, the database used and the theoretical background of the used techniques are discussed.

Database
The proposed method is examined using the MIT-BIH arrhythmia database. MIT-BIH is a standard database widely explored for arrhythmia classification. It comprises of Holter monitoring records from several male and female patients. Each record duration is 30 minutes, sampled at 360 Hz. The records consist of both normal and abnormal beats of 15 types.
The annotation files available in the database are obtained from the chart recordings recognized by the experts. This file describes the 'R' peak locations and the labeling of normal and abnormal beats. Based on the recommendations of AAMI, class-labeling was assigned for discriminating various heartbeat groups.

AAMI Class Labeling Recommendations
According to the ANSI/AAMI EC57:1998 standard, within the annotation files, beat labels are divided into five groups, namely, N, S, V, F, and Q based on the physiological origin of the beats. Here, the mainly N group consists of normal and bundle branch block beats. S and V groups consist of ectopic beats, originated above and below Atrio Ventricular (A-V) junction of the heart, respectively. The F group consists of the combination of ventricular and normal beats. Unclassifiable beats are placed in the Q group. According to [31], the total number of available heartbeats are divided into training for modeling and testing for evaluation. Details of the number of heartbeats utilized for this work are presented in Table 1.

Pre-Processing
Pre-processing is an initial step in any data processing systems. Raw ECG signals will inherently have some artifacts. These may occur due to instrumental noise (power line interference), a physiological signal disturbance (muscular movements), or the environment where the experiment takes place. These artifacts are undesirable and diminish significant features in the ECG. Therefore, to attenuate the effect of this noise, we perform denoising as one of the pre-processing steps. For this, we used a filtering routine proposed by [37] with minimal modification. This operation comprises the following: 1. Mean separation from the noisy ECG, 2. Moving average filter of order five, 3. High-pass filter with cut-off frequency 1 Hz (for baseline wander suppression), 4. Low pass Butter worth filter with cut-off frequency 45 Hz (To suppress any left out high-frequency noise).
We need individual heartbeats from the long-term ECG recording for heartbeat classification. We perform a segmentation process after denoising. In the segmentation process, the annotation chart records with 'R' peak locations are utilized. From the annotation file, it is observed that there are a lot of variations among R-peak positions time-to-time. The difference between the R-peak positions is dynamic. Hence, we applied a window of length 300 samples on ECG signal to obtain an ECG segment that covers the QRS complex which is an important epoch in the ECG. Our segmentation process retains other important epochs like P and T waves, unlike centered R-peak distribution segmentation methods.

Feature Extraction
Feature extraction has a critical role in heartbeat classification. A feature provides crucial information about a signal and facilitates better discrimination of classes. From [36], it is evident that ECG is a non-stationary signal stemming from a nonlinear system. Hence, exploration of ECG with nonlinear methods can improve the performance of a model since they extract subtle information lying in ECG. Therefore, in the feature extraction stage, initially, we perform ICEEMD on ECG segments to get intrinsic mode functions (IMFs). Later, entropy and higher-order cumulants are extracted from the selected modes. In this section, the techniques employed and their support in the methodology development are briefly discussed.

ICEEMD
The EMD decomposes a given signal in a full data-dependent approach by exploiting the local characteristics. However, EMD is limited by "mode-mixing" problem while analyzing the real data [38]. Therefore, some noise-assisted data analysis methods can provide a solution. Here, noise is added in a controlled manner for developing new extrema. Thus, the local mean is limited to that of the original version where extrema are generated. A few among these noise assisted methods are EEMD [39] and CEEMD [40]. Among these methods, CEEMD provides a better solution to the mode-mixing problem. However, CEEMD has some limitations: (i) Some residual can be present in the modes. (ii) During the initial decomposition stages, information may appear "late" with undesired modes, when it is compared to EEMD.
To address these issues, Colominas et al. [41] introduced a new noise aided adaptive data analysis method called ICEEMD. The mathematical details of the ICEEMD are given below [41].
Notation used in algorithm: E l (.) = l th EMD mode, M(.) = local mean of the signal, < . >= averaging operator, w (j) = realization of white Gaussian noise with zero mean and unit variance and x = input signal.
The algorithm steps: 1. Compute the local means of J realizations x (j) = x + β 0 E 1 (w (j) ), j = 1, 2, ..., J using EMD, to obtain first residue r 1 =< M(x (j) ) >. 2. At the first stage (l = 1), compute the first IMF: 3. For l = 2, ..., L, calculate r l as 4. Calculate the l th mode as 5. Go to step 3 for next l Here, β l = 0 σ (r l ) is used to obtain the desired SNR at each stage. We choose 0 = 0.2. The resultant IMFs provide significant underlying features of the ECG signal. The ICEEMD is a beneficial tool used for analyzing non-stationary signals originating from nonlinear systems such as bio-signals. The main advantage of ICEEMD is: avoiding the spurious modes and reducing the amount of noise in the mode patterns. Thus, the decomposed IMFs capture the morphology of the signal. Later, entropy and statistical measures are calculated from the first six modes of each ECG segment.

Entropy Measures
Entropy measures the uncertainty in a given data. It is often used in signal processing and pattern recognition applications [42]. A high value of entropy maps to higher uncertainty (or) unpredictability. Entropy yields useful information for analyzing non-stationary signals [43]. In this work, we calculated Shannon [44], log energy, and norm entropies [45]. The entropy E must be an additive cost function such that E(0) = 0 and where s is the probability of the given signal and i represents one of the discrete states. Various entropies are defined below: • log Energy Entropy: with the convention log(0) = 0. • norm Entropy: The l p norm entropy with 1 ≤ p is defined as

HOS
HOS provides a meaningful measure for analyzing non-stationary signals originating from nonlinear systems [46,47]. HOS represents the deviation from Gaussianity and can provide useful information from the non-Gaussian nature of ECG signals. In our work, we utilized second, third, and fourth-order cumulants as HOS. The mathematical details of the HOS can be found in [48].
We construct a feature vector of size 36 × 1 for each heartbeat (6 features × 6 modes = 36). Later, the training feature set is fed to a classifier for building a model, and that model is evaluated using a testing set.

Voting Scheme
The final goal of machine learning is to get better-generalized performance. We come across a question "which learning algorithm or classifier is preferred over the other ?". According to a "No free launch theorem" [49], there is no precise answer to this. One algorithm fits or performs well for a set of training and testing data and may fail for another. The learning algorithm overall performance depends on the prior information, distribution of data, amount of training data, and some cost functions. The performance generalization depends on the bias and variance errors. Always, there will be a trade-off between bias and variance. Ensemble classifiers form a better choice, to improve the performance generalization by reducing bias and variance. Combining several classifiers for the final decision is called an ensemble classification or mixture-of-experts model or modular classification.
The primary motivation behind the classifier ensemble is improving the classification performance using the complementary information offered by various classifiers. Kittler et al. [50] developed a scheme for combining classifiers using voting based on a set of rules: min-rule, max-rule, product-rule, sum-rule, and median-rule. From our experiments, we preferred product rule which outperforms others.
Mathematical Framework: Consider a pattern recognition model where a pattern y is to be assigned with one of the m possible classes (ω 1 , ω 2 , ......, ω m ). Say there are R number of classifiers used for combining. Let us assume that each classifier possesses a different representation of measurement vector The density function for each class ω k in the measurement space is p(x i |ω k ) and the prior probability is P(ω k ). We assume that the models are mutually exclusive.
From the Bayesian framework, y is assigned to the class ω j having a maximum posterior probability out of ω k classes: Rewriting the posterior probability P(ω k |x 1 , x 2 , ....., x R ) using the Bayes theorem: Here, P(x 1 , x 2 , ....., x R ) can be expressed in terms of conditional measurement distribution as Product Rule: p(x 1 , x 2 , ....., x R |ω j ) represents the joint probability distribution of the measurements computed by the classifiers. Assuming that these representations are statistically independent, we can rewrite the joint probability distribution as Based on p(x i |ω k ), the measurement process model for i th representation is developed. Substituting Equation (10) and Equation (9) into Equation (8) and using Equation (11) in Equation (7), we obtain the decision rule Rewriting in terms of the posterior probabilities obtained from the respective learning algorithms, Equation (13) represents the likelihood decision rule obtained after combining the posterior probabilities generated by different classifiers using the product rule.
In this work, we used five different classifiers for ensembling using a voting scheme to enhance the performance of the system: naïve Bayes [51], linear and quadratic discriminant functions [52], J48 [53], and J48 consolidated classifiers [54]. A brief description of these classifiers is given below.
Naïve Bayes Classifier: It is a probability-based learning algorithm developed on the Bayesian framework. According to Bayes theorem, an unknown y is categorized into the one among the R classes, with high posteriori probability: where Naïve Bayes is a modified version of Bayes classifier, based on the assumption that the features in an unknown example vector are independent. Therefore, posteriori probability can be written as P(ω|y) = P(y|ω)P(ω) = P(y 1 , y 2 , ..., y m |ω) = P(y 1 |ω)P(y 2 |ω)......P(y m |ω)P(ω). (15) Hence, Equation (14) can be modified as With this final rule, the naïve Bayes classifier operates. The parameters used for the Naïve Bayes Classifier is given in Table 2. In general, the naïve Bayes classifier assumes that the given features follow the normal distribution. In Table 2, use the Kernel Estimator parameter set to false to follow this assumption. Supervised discretization converts a specific range of attribute values to binary values. Here, the term supervised is coined because the class information of the training instances is used for discretization. However, this process is possible only when the class labels are nominal. The advantage of supervised discretization in naïve Bayes classifier is present in [55].
Linear and Quadratic Discriminant Analysis Based Classifiers: The approach of discriminant analysis is to derive a decision boundary or a discriminant function based on the linear combinations of features that best separate the given classes. The assumption made is: examples from different categories follow Gaussian distribution. For instance, the discrimination function for two-class problems based on Bayes theory can be written as where µ 1 , µ 2 are the mean vectors of class1 and class 2, Σ 1 , Σ 2 are the covariance matrices of class 1 and class 2 and T is the threshold value. The above function without further assumptions represents the quadratic discriminate function. If the covariance matrices Σ 1 = Σ 2 = Σ, then the discriminant function simplifies to a dot product.
x.y > constant, (18) where . This decision rule represents the classification based on linear discriminant.
The parameters used for linear discriminant analysis (LDA) and quadrature discriminant analysis (QDA) classifiers are given below in Table 3. Table 3. LDA and QDA classifier parameters used in this work.

Parameters
LDA QDA Ridge parameters in the discriminant analysis classifiers reduce the overfitting problem by penalizing the large quantity coefficients. In our work, we use the default values as given in Table 3.
J48 Classifier: Recently, decision tree-based algorithms have become popular in machine learning strategies. In practice, J48 is an execution of popular C 4.5 algorithms proposed by Quinlan [53]. According to this algorithm, the decision process involves the construction of a tree based on the feature splitting. The superiority of matching y to a class label ω k ∈ ω depends on the choice of feature splitting based on the value of information gain.
Information gain is measured with the help of difference entropy as the difference between the entropy of the central node to the sum of entropies of the leaf nodes. It measures how well a given feature splits the training data under its class label. A feature node having high information gain is preferred.
J48 Consolidated (J48-C) Classifier: It is a consolidated version of C 4.5 classifier. "J48 consolidated" is an implementation of a consolidated tree's construction algorithm, proposed by Arbelaiz et al. [54] in WEKA. The basic idea is building a single tree using several subsamples. In each iteration, we will find a better feature using information gain content similar to J48. After finding the best feature split, all the subsamples are divided using the same features. More details can be found in [54].The parameters used for J48 and J48-C classifiers are given in below Table 4. J48 and J48-C classifiers are decision tree classifiers in which tree splitting criteria play a significant role. The above-mentioned parameters determine the growth and direction of the tree structures that influence the final model accuracy. Sub-tree raising considers raising of a sub-tree when pruning is enabled. The minimum number of objects determines the number of instances per leaf. Minimum description length (MDL) correction is a statistical measure like information gain to identify the best split tree. The number of folds determines the data used for error reduce pruning; here, one fold is for pruning and the other folds for building the tree.
All the parameters are fixed based on the final results. All the details of the parameters can be found in WEKA 3.9 version [56].

Results
In this work, we are classifying the five classes: N, V, S, F, Q. The training set is constructed with the array of records as DS1 = [101, 106, 108, 109, 112, 114, 115, 116, 118, 119, 122, 124 We start with scatter plots for justifying the choice of features in discriminating against the heartbeats. Individual performance of five classifiers naïve Bayes, LDA, QDA, J48, and J48 consolidated is presented, and analysis using a voting scheme with various combinations of these classifiers is considered. The performance is illustrated for each combination. We used the WEKA 3.9 version (University of Waikato, New Zealand) [56] for implementing the classification algorithms and scatter plots. Data pre-processing and feature extraction is implemented using MATLAB 2018a (Mathworks, MA, USA). All the experiments are carried out in Windows 8, 8 GB RAM, and 64-bit operating system.

The Performance Measures
An algorithm's efficiency can be validated with appropriate performance measures. In this work, Sensitivity (SEN), False Positive Rate (FPR), Positive Predictive Value (PPV), and Overall Accuracy (OA) are used as performance measures to compare with the state-of-the-art methods, following the AAMI recommendations. The confusion matrix required for calculating these measures is given in Table 5. For V and S classes, the measures are calculated as per [31]. For remaining classes, we followed [57]. Table 5. Confusion matrix.

Actua Labels
Performance measure from Table 5 can be calculated as follows: The sum measures of row-wise and column-wise calculations are: The false-positive and false negative values for each class are defined as below: The other useful measures, true positives, and negatives can be calculated for Classes N,V,S,F,Q: The performance measures are given by where TP = True Positive, TN = True Negative, FP = False Positive, andFN = False Negative. We present the scatter plots with marginal histograms on the testing data set DS2, for features in the two-dimensional feature space. These scatter plots reveal how different features spread in feature space, thereby revealing the relationship between different heartbeat classes. Figure 2 shows the two-dimensional scatter plot between cumulant 2 of IMF1 to norm entropy value of IMF2. In this plot, we can observe that N and V beats are dominantly spread across space. In addition, the histogram plots also reveal the good discrimination between N, V, and Q classes out of the five classes. The next plot from Figure 3 gives the relation between the log energy entropy of IMF1 to cumulant 2 of IMF1. In this figure, we can see the spreading of N, V, S, and F classes in the space. In particular, this space provides good discrimination between N, V, and S classes. From Figures 4-6, we can observe that log energy entropy values extracted from different IMFs provide a good perception of N, V, and S classes.
In the same way, Figures 7-8 give better discrimination of Q beats, which are very rare indeed. In these figures, the characteristic feature is the norm entropy. In addition, different combinations of features with norm entropy reveal different class spreads and discrimination capabilities. As a whole, we can say that the combinations of selected features from different IMFs can predict the required hypothesis.
After dividing the training and testing feature sets, we need to learn a model for classification. In this work, we used an ensemble learner for classification. Ensemble classifiers use multiple learning algorithms and combine all the decisions. It can be more accurate than the individual classifiers. The main advantage of the ensemble classifiers is that we can achieve low bias error and low variance error. Ensembles using multiple trained (high variance/ low bias) models can average out of the variance, leaving just the bias. In addition, ensemble classifiers are preferred for imbalanced datasets. Our DS1 and DS2 datasets are highly imbalanced with majority N group class and minor F and Q classes. Therefore, in this work, we used a voting scheme based on product rule to ensemble the classifiers. The individual classifier performance on DS2 (testing data) is presented in Tables 6 and 7. Confusion matrices calculated for LDA, QDA, naïve Bayes, J48 and J48-C classifiers are shown in Table 6. The performance measures for the corresponding matrices based on Table 5 are presented in Table 7.         From this table, we can see that each classifier yields different prediction. LDA and J48 give better classification for the N and V classes. It is an understandable phenomenon because of the dominating number of examples in N and V. LDA and J48-C provides better discrimination to Q group. The other classifiers J48-C and naïve Bayes are providing better SEN results for F group. Finally, S class is predicted accurately by J48-C and QDA classifiers. The other important point is, although all classifiers yield better results for the specific group of categories, the OA is dominated by the N class discrimination. Therefore, it is noticeable that OA is no longer a useful performance measure for imbalanced data classification.
In Table 8, the confusion matrix after combining J48, LDA, and naïve Bayes classifiers using the voting scheme is presented. The corresponding performance measures are demonstrated in Table 9. From the results, it is evident that this combination yields better results for N, V, F, and Q classes and average result for S class. The critical point is N, and S classes have more morphological similarities. Therefore, individual classifiers are giving complementary results for N and S. However, this ensemble selection enhances the prediction generalization for both classes.   Similarly, we performed ensemble voting for different combinations and the results are presented in Tables 10-15. Each combination provides various enhanced results in some aspects. As mentioned earlier, the dataset is dominated by N, V, and S classes, respectively. The F and Q classes are sporadic. Therefore, in some works, only N, V, and S classes are considered for classification. We provide the results for such schemes in Tables 16-25. Here, first results are also presented for individual classifiers; later, the ensemble voting scheme is performed on different combinations of classifiers. Each one gives better classifications than individual classifiers.

Discussion
This section contains a discussion on simulating the proposed methodology illustrated in Figure 1. In this work, we employ an adaptive non-stationary and nonlinear decomposition method, namely ICEEMD, to analyze the ECG heartbeats. ICEEMD produces a local and entirely data-driven separation of a signal in the form of fast and slow oscillations called IMFs. The main advantage of ICEEMD is that it successfully avoids the spurious nodes and reduces the amount of noise in the modes.
Later, six nonlinear morphological features: higher-order cumulants, log, Shannon energy, and norm entropies are extracted from the first six IMFs of each heartbeat, to generate a 36 × 1 feature vector. Then, these feature vectors are divided based on training and testing sets DS1 and DS2 as specified above. Statistics (median and interquartile range) of these features for each class are presented in Table 26. Variation of attributes corresponding to different heartbeats can be observed from this table.
In Table 6, we presented the individual performance of various classifiers on the given problem. Here, all classifier models offer separate results for all the classes. Each model performs well for a specific class or classes. However, it fails in providing the overall better performance. For example, the S class contains 1837 beats, the J48, LDA, and naïve Bayes are predicting 51, 2, 1516 beats, respectively. Whenever we combine these three models using the voting scheme as shown in Table 12, this combined model identified 779 beats correctly. The voting scheme uses the product of probabilities rule. In this scheme, it is assumed that each model representation for a given class is statistically independent. It is because of the different representation capabilities of each model. From this, a final decision rule is formed as described in Section 2.4. This decision rule quantifies the probability of class choice from combined hypothesis models and the same type of results we can observe for other classes. In this work, we implemented four voting schemes with different classifier combinations. Each combination again provides different but better results than individual classifier models. The proposed combinations of classifier details are given in Table 27.

Comparative Analysis
To assess the performance of our proposed methodology, we compared our results with the existing methods in the literature. Comparisons are presented in Tables 28 and 29. The features and  classification schemes employed by various researchers listed in Tables 28 and 29 are given in Table 27. In Table 28, we compare our four sets of voting schemes with the works which followed AAMI recommendations based on [31] division scheme. In addition, Table 29 shows the performance measure comparison with the literature on only N, V, and S classification. In Table 28, our proposed methods yield almost similar performance compared to the state-of-the-art for N, S, and V classes; however, in case of F and Q classes, our proposed work one and four outperforms the other compared methods. From Table 29, it is evident that our proposed methods one and three are efficiently distinguished the classes N, S, and V. The best results of our method are highlighted in bold. Overall, the measures of our work are appreciable compared with other approaches.

Limitation and Future Scope
Despite the proposed method giving significant results, the performance of the S class is still limited when compared to other classes. Similar behavior is observed in other state-of-the-art methods. Hence, there is a need to explore a new set of attributes and learning algorithms to improve this. In addition, incorporating other physiological signals such as blood pressure, plethysmographic signals along with ECG may improve the description of "heart functioning."

Conclusions
In this work, we implemented a computer-aided inter-patient heartbeat classification algorithm. We employed a nonlinear decomposition method called ICEEMED, to extract some important information lying in ECG. Later, HOS and entropy measures are calculated on the modes obtained after ICEEMD and used as features. Class imbalance is one of the critical challenges in medical diagnosis. We addressed this issue by utilizing the voting scheme as the learning model. The extracted features are then fed to this model for classification. To design this model, naïve Bayes, linear and quadratic discriminating functions, J48 and J48 consolidated classifiers are explored. The proposed method showed promising results compared to state-of-the-art techniques. Our method opens new frontiers to the successful identification of rare heartbeat groups enabling a real-time heart monitoring system.