ECG Heartbeat Classification Using Machine Learning and Metaheuristic Optimization for Smart Healthcare Systems

Early diagnosis and classification of arrhythmia from an electrocardiogram (ECG) plays a significant role in smart healthcare systems for the health monitoring of individuals with cardiovascular diseases. Unfortunately, the nonlinearity and low amplitude of ECG recordings make the classification process difficult. Thus, the performance of most traditional machine learning (ML) classifiers is questionable, as the interrelationship between the learning parameters is not well modeled, especially for data features with high dimensions. To address the limitations of ML classifiers, this paper introduces an automatic arrhythmia classification approach based on the integration of a recent metaheuristic optimization (MHO) algorithm and ML classifiers. The role of the MHO is to optimize the search parameters of the classifiers. The approach consists of three steps: the preprocessing of the ECG signal, the extraction of the features, and the classification. The learning parameters of four supervised ML classifiers were utilized for the classification task; support vector machine (SVM), k-nearest neighbors (kNNs), gradient boosting decision tree (GBDT), and random forest (RF) were optimized using the MHO algorithm. To validate the advantage of the proposed approach, several experiments were conducted on three common databases, including the Massachusetts Institute of Technology (MIT-BIH), the European Society of Cardiology ST-T (EDB), and the St. Petersburg Institute of Cardiological Techniques 12-lead Arrhythmia (INCART). The obtained results showed that the performance of all the tested classifiers were significantly improved after integrating the MHO algorithm, with the average ECG arrhythmia classification accuracy reaching 99.92% and a sensitivity of 99.81%, outperforming the state-of the-art methods.


Introduction
The recent developments in biomedical sensors, the Internet of Medical Things (IoMT), and artificial intelligence (AI)-based techniques have increased interest in smart healthcare technologies [1,2]. Microelectronics, smart sensors, AI, 5G, and IoMT constitute the cornerstone of smart healthcare [3,4]. A smart healthcare system does not suffer fatigue; hence, it can process big data at a much higher speed than humans with greater accuracy [5]. With smart healthcare systems, the diagnosis and treatment of diseases have become more intelligent. For instance, smart patient monitoring empowers the observation of a patient outside the traditional clinical settings, which offers a lower cost through reducing visits to physician offices and hospitalizations [6].
The human body is known as a complex electromechanical system generating several types of biomedical signals, such as an electrocardiogram (ECG), which is a record of the dynamic changes of the human body that need to be monitored by smart healthcare systems.
For instance, the EKG sensor measures cardiac electrical potential waveforms. It is used to create standard 3-lead electrocardiogram (EKG) tracings to record the electrical activity in the heart or to collect surface electromyography (sEMG) to study the contractions in the muscles of the arm, leg, or jaw. Simply, an ECG graphs heartbeats and rhythms. The classification of an ECG heartbeat plays a substantial role in smart healthcare systems [7,8], where the presence of multiple cardiovascular problems is generally indicated by an ECG. In the subsequent ECG waveform, diseases cause defects. However, early diagnosis via an ECG allows for the selection of suitable cardiac medication and is thus very important and helpful for reducing heart attacks [9]. The method of detecting and classifying arrhythmia is not an easy task and may be very difficult even for professionals because sometimes it is important to examine multiple pulses of ECG data, obtained, for example, during hours, or even days, by a Holter clock. Furthermore, there is a possibility for errors by humans during the ECG recording study due to fatigue. Building a fully automatic arrhythmia detection or classification system is difficult. The difficulty comes from the large amount of data and the diversities in the ECG signals due to the nonlinearity, complexity, and low amplitude of ECG recordings, as well as the nonclinical conditions, such as noise [10].
Despite all these difficulties, methods for ECG arrhythmia classification have been widely explored [11,12] but choosing the best technique for smart patient monitoring depends on the robustness and performance of these methods. Several convolutional neural network (CNN)-based approaches have been introduced for the task [13,14]. Bollepalli et al. [10] proposed a CNN-based heartbeat detector to learn fused features from multiple ECG signals. It achieved an accuracy of 99.92% on the MITBIH database using two ECG channels. In [15], a subject-adaptable ECG arrhythmia classification model was proposed and trained with unlabeled personal data. It achieved an average performance of 99.4% classification accuracy on the MIT-BIH database. In [16], an end to-end deep multiscale fusion CNN model of multiple convolution kernels with different receptive fields was proposed, achieving an F1 score of 82.8% and 84.1% on two datasets. Chen et al. [17] combined CNN with long short-term memory to classify six types of arrhythmia and achieved an average accuracy of 97.15% on the MIT-BIH database. A recent approach by Atal and Singh [18] proposed using the bat-rider optimization to optimally tune a deep CNN to achieve an accuracy of 93.19% with a sensitivity of 93.9% on the MIT-BIH database. Unfortunately, most CNN-based methods are effective only for small numbers of arrhythmia classes, are computationally intensive, and need a very large amount of training data [13]. This is a great challenge for using the CNN-based methods on real-time applications or wearable devices with limited hardware [19].
On the other hand, many research efforts have been devoted to ECG arrhythmia classification using ML classifiers, such as SVM, RF, kNN, linear discriminants, multilayered perceptron, and regression tree [20,21]. It is well known that the SVM classifier does not become trapped in the well-known local minima points, requires less training data, and is faster than CNN-based methods [22]. In [23], wavelet transform and ICA were used for the morphological features description of the segmented heartbeats. The features were fed into an SVM to classify an ECG into five classes. In [24], least square twin SVM and kNN classifiers based on features' sparse representation were used for cardiac arrhythmia recognition. The experiments were carried out on the MIT-BIH database in category and personalized schemes. A method based on improved fuzzy C-means clustering and Mahalanobis distance was introduced in [25], while in [26], abstract features from abductive interpretation of the ECG signals were utilized in heartbeat classification. Borui et al. [27] proposed a deep learning model integrating a long short-term memory with SVM for ECG arrhythmia classification. Martis et al. [28] evaluated the performance of several ML classifiers and concluded that the kNN and higher-order statistics features achieved an average accuracy of 97.65% and sensitivity of 98.16% on the MIT-BIH database. In [29], the RF classifier was utilized with CNN and PQRST features for arrhythmia classification from imbalanced ECG data. The major drawback of ML classifiers (e.g., SVM) is their deficiency in interpreting the impact of ECG data features on different arrhythmia patterns for extracting the optimal features. Further, the performance of most ML classifiers is questionable because the interrelationship between the learning parameters is not well modeled, especially for data features with high dimensions.
Despite the large amount of previous studies in the field, ECG arrhythmia classification has not been completely solved and remains a challenging problem. Consequently, there is room for improvement in several aspects, including classification, feature extraction, preprocessing, and ECG data segmentation. Most ML classifiers have some limitations; for example, SVM does not perform well with noisy data, while random forest (RF) suffers from interpretability issues and fails to determine the significance of variables. In addition, these ML classifiers have many parameters, and tuning such parameters has a crucial influence on the efficiency of the classification. Motivated by the advantages of the ML classifiers compared to the CNN-based methods, although they face a major challenge with a low classification accuracy, in this work, we focus on enhancing the classification accuracy of the ML classifiers. To this end and to develop an efficient classifier model, we propose to optimize the learning parameters of these classifiers using a naturally inspired metaheuristic algorithm called the marine predators optimization algorithm (MPA). The parameters of the classifier are gradually optimized using the MPA algorithm, which introduces an optimal classifier model that can classify the ECG features efficiently. Four different machine learning classifiers are considered, namely SVM, GBDT, RF, and kNN. The performance of these classifiers without learning parameter optimization and with optimization (i.e., MPA-SVM, MPA-GBDT, MPA-RF, and MPA-kNN) are compared. The experiments are validated on the three common benchmarking databases: the MIT-BIH, EDB, and INCART.
The remainder of this paper is organized as follows. Section 2 presents the methodology proposed to classify the ECG arrhythmia based on the optimization of the parameters of the ML classifiers. The experimental results and analysis as well as a comparison with the state of the art are presented in Section 3. Finally, the paper is concluded in Section 4.

Methodology
A complete smart healthcare system consists of several parts, such as sensors for heartbeat recordings, dry electrodes sensing of heartbeats, interpretation of the heartbeat signals, a personalized system for heartbeat monitoring, and incorporation of the heartbeat monitoring system into healthcare. The overview of an early diagnosis and classification of ECG arrhythmia healthcare system is illustrated in Figure 1. It consists of three main steps: data preprocessing, feature extraction, and classification. Detection or classification is the vital step in the system; thus, the contribution of this work is mainly in the classification step as explained in the following.

Data Preprocessing and Feature Extraction
Denoising and reliable segmentation increase the efficiency of the classifiers [30], where the frequency of an ECG is between 0.5 Hz and 50 Hz [31]. To eliminate disturbances from the digital ECG signal, an FIR band-pass filter [32] designed with cutoff frequencies was utilized for this task. For the segmentation task of the ECG, the R-peaks annotations described by the MIT-BIH, EDB, and INCART datasets were considered as an indication of the beats segmentation, and for every beat, we centered a patch of size 200 ms around its R-peak within 75 ms before the R-peak and 110 ms after the R-peak.
After the segmentation phase, the features were extracted around the regions of the segmented ECG signal. In this work, different techniques were used for the feature extraction phase, including the 1D-local binary pattern (LBP) [33], higher-order statistics (HOS) [34], discrete wavelet transform (DWT), the Hermite basis function (HBF) [35], the central moment (CM), and the R-R intervals. Table 1

Classification
The supervised machine learning classifiers considered for detecting rhythm diseases with a number of parameters optimized using the proposed artificial intelligence metaheuristic optimization (MHO) algorithm were the SVM, random forest (RF), gradient boosting decision trees (GBDT), and K-nearest neighbor (kNN).

Support Vector Machine
The SVM offers strong insight for practical applications and contributes to high efficiency [36]. It works by transforming input data from the basic domain P into a new higher dimension feature space; thereafter, it searches in this space for the optimal hyperplane. It aims to split the training data into groups in order to find a maximum marginal hyperplane. Mathematically, an instance x i is connected with a label y i ∈ {+1, −1}. The hyperplane divides the multidimensional space into negative and positive instances induced by the kernel function with the maximum margin and the minimum classification [37]. Suppose z = ϕ(x) is a feature space vector, and w i maps ϕ from P to the feature space Z; then, the hyperplane is defined by the pair (w, b), which is obtained by separating the point X i , such that where w ∈ Z and b ∈ P. S is linearly separable, if there is (w, b) with is applicable for each element on S. If S is not linearly separable, the SVM formulation must allow for classification violations. The ideal with a hyperplane is a solution to

Gradient Boosting Decision Tree
In decision trees, every internal node is labeled with a distinctive input. The arcs derived from the node marked with a certain feature are labeled with each of the possible feature values. Every tree leaf is classified as having a class or distribution of probability over classes. The basic concept of the decision tree for gradient boosting is to combine a series of weak base classifiers into one strong classifier. Unlike traditional methods of boosting samples that weigh positive and negative, the GBDT uses global algorithm convergence by following the negative gradient direction. The weak learner estimates the error at every splitting node based on a test function κ : R n → R considering a threshold τ for the returns η l and η r . The optimal split is achieved by identifying the triplet τ, η l , η r that minimizes the error after the split, where The weight w j i and response r j i of x i at an iteration j are

Random Forests
RF is close to the Bayesian method and is used to recognize an ensemble with a combination of hierarchical tree structure predictors [38]. The basic concept behind the RF is that a set of learning tree models may perform well compared to single decision trees if they make uncorrelated mistakes. In this context, we develop several trees instead of a single tree, where each tree is constructed upon values of random vectors sampled independently following the whole forest distribution. Consequently, the RF is an ensemble classifier consisting of many random decision trees. A single classification output of these decision trees is taken, and the values are collected to produce the final result of the classifier [39]. The RF, once constructed, is very fast, as it requires little computation. It has clear interpretability, which provides a natural way to incorporate prior knowledge. Employing appropriate randomness produces precise regressors and classifiers. Moreover, some studies have shown that random input features result in a high classification performance [40].

K-Nearest Neighbor
The kNN is one of the straightforward and simplest machine learning schemes based on supervised learning. It is a non-parametric technique, which means that the underlying data do not need any assumptions during data classification. It implies the similarity between the new class and available instances and puts the new class in the category that is most closely related to the available categories. Generally, the estimation that can be obtained with the kNN scheme is prone to local noise and not very satisfactory.
The larger the value of k, the smoother the classification boundary, while a smaller k is more convoluted to the boundary. An advantage of the kNN is that there is no training required.

Marine Predator Algorithm (MPA)
The MPA is a naturally inspired metaheuristic algorithm that imitates the behavior of predators to catch their victim or prey, employing two techniques when targeting their prey (Brownian and Lévy) [41].

Initialization
Similar to all the metaheuristic schemes, the algorithm begins with an initial solution uniformly distributed over the search space such that where Y U and Y L are the minimum and maximum boundary limits of the search spaces, respectively.

Elite and Prey Matrix Construction
It is employed to construct an n × d Elite matrix E with where n refers to the number of search space agents, and y I symbolizes the superior predator vector iterated n times to create the matrix. The prey matrix with the same dimensions d as the Elite is The optimization process in the MPA is mainly based on these two metrics, where the initialization generates the starting prey, from which the optimal fit builds this elite matrix.

Optimization Process
The most critical step in which the predators seek to find the optimal fit or solution is the optimization cycle. In the discovery phase, which is the starting point, the predators attempt to move faster before they detect the prey such that: where the vector R L consists of the random values computed by the Lévy distribution, which represents the Lévy movements. Meanwhile, the process of the multiplication of the R L and elite symbolizes the movements of the predators in the Lévy scheme, while utilizing the phase size for the elite position mimics the movements of the predators updating the position of the prey.
In the middle stage, the algorithm divides the population into two portions to distinguish the difference between exploration and exploitation.
In this stage with while in the second half, it is and The population is modified using Lévy flight with t > 2 3 * max-iter The predators accurately remember the previous locations of successful foraging because of their good memory. Using memory saving, the MPA algorithm simulates this ability of remembering successful foraging places, which can increase the quality of the solutions with the increase in iterations. Solution fitness at the present iteration is matched with its counterpart in the previous one. The new one replaces the solution if it is more suitable. The steps of the MPA are summarized in Algorithm 1.

Parameters Optimization
Using the MPA algorithm, four optimized versions of the ML classifiers were introduced for ECG signal classification, namely the MPA-SVM, MPA-GBDT, MPA-RF, and MPA-KNN. The fitness function acts according to each classifier and its parameters. Tuning parameters has a crucial influence on the efficiency of the classification. Thus, a diverse set of parameters for each classifier was considered to optimize the classification stage. To fine-tune the best value for the parameters, the holdout strategy was considered with 80% for the training set, and the remaining 20% was used to test the performance. The list of parameters considered in the experiments for each classifier is provided in Table 2. Implement the memory saving 6: Update CF using Equation (16) 7: for each py i do 8: if t < 1 3 * t max then 9: Reposition the current − → py i based on Equation (11) 10: else 11: if 1 3 * t max < t < 2 3 * t max then 12: if i < 1 2 * n then 13: Reposition the current − → py l using Equation (13) 14: else 15: Reposition the current − → py i using Equation (15) 16: end if 17: else 18: Reposition the current − → py i using Equation (18) 19: end if 20: end if 21: end for 22: Compute the fitness value of each − → py i , f − → py i

24:
Apply the memory saving 25: Apply the FADS for ∀ py i 26: t + + 27: end while

Database Descriptions
The main characteristics of the three ECG databases used in the evaluation process are summarized in Table 3. This is a public dataset showing the regular investigation content of cardiac rhythm detection collected from 47 patients. It consists of 48 records, where each one is 30 min in duration, with a 360 Hz sample rate. Moreover, the records have two signals: the first is a bipolar limb lead named the modified lead (MLII), and the second one is related to unipolar chest leads called V leads (V1, V2, V3, V4, V5, and V6). The MLII type is shareable though all records because it provides an ideal view for the significant waves (e.g., Q-waves, P-waves, R-Waves, T-waves, and S-waves) [42].

The European ST-T Dataset (EDB)
The EDB was planned to be used to evaluate the performance of ST and T-wave architectures. It is a collection of 90 annotated samples of patients' ECG records taken from 79 subjects. Each record has a two hour duration with two signals recorded at 250 sps [43].

St. Petersburg INCART Dataset (INCART)
This is a 12-lead arrhythmia dataset containing 75 annotated recordings taken from 32 Holter records, each 30 min long. The INCART consists of 12 regular leads, and each lead is sampled at 257 Hz. The main records were acquired from patients undergoing coronary heart disease examinations [43].

Evaluation Criteria
As a strategy of classification, the holdout strategy was used to evaluate the performance of the optimized ML classifiers against five standard criteria including accuracy (Acc), precision (Pr), specificity (Sp), sensitivity (Sn), and the F1-score (F1). The performance criteria typically rely on different major metrics (positive/negative/true/false) of a binary classification test as follows:

Evaluation of the MPA-SVM
For the MPA-SVM classifier, two parameters, c and gamma, which had important effects on the classification process, were optimized. According to Table 4, all measures were higher than 98%. Looking at the class level, it is clear that the ACC and Sn were high on three datasets, where the ACC > 99.14%, and Se > 98.11%. The accuracy of the classification process by all models was remarkably enhanced and nearly balanced for all classes. The best reported accuracy was obtained for the class F with 99.93%, and the lowest performance was obtained for the class N with an accuracy of 99.86. Regarding the misclassification, only ≤0.49% of the S class and ≤0.82% of the VEBs class were not classified accurately. The results in the case of the S and VEBs classes were very promising, signifying improvements over counterpart studies. These two classes are important cases for the AAMI, which recommends that evaluation measures should focus on the classification of the S and VEBs classes. The performance of the MPA-SVM classifier was sufficient for these two classes on the three databases.

Evaluation of the MPA-GBDT
The MPA-GBDT was introduced to optimize three parameters, the max depth, gamma, and learning rate. Table 5 lists the results of the optimized parameters for the MPA-GBDT model for the AMMI classes (N, S, VEBs, and F). In addition to the high accuracy of classification (≥99.45), the classification performance (sensitivity of ≥98.49% and positive predictivity of ≥98.81%) for the S and VEBs classes were very high, where the positive predictivity reached 100% for the classes of the MIT-BIH and EDB databases.

Evaluation of the MPA-RF
The proposed MPA-RF optimized the same parameters as for the MPA-GBDT. Table 6 reports the obtained results on the same databases with the same validation scheme, and the MPA-RF provided the highest accuracy Acc = 99.93 and sensitivity Sn = 100% in the recognition of cardiac disorders. At the class level, the MPA-RF achieved a sensitivity of 100% for class F on the EDB database and ≥99.75 on the other two databases, although the class F had the minimum number of samples of ≤0.08% from the total number of class samples.

Evaluation of the MPA-kNN
For the MPA-kNN classifier, the K parameter (number of nearest neighbors) was optimized. The MPA-kNN provided the lowest classification performance compared to the MPA-SVM, MPA-GBDT, and MPA-RF due to the characterization of the kNN as a lazy classifier depending on the distance for the classification process. However, the optimized version, MPA-kNN, performed well compared to the kNN itself. The detection accuracy of the AMMI classes with the MPA-kNN classifier was 94.96%, 95.40% and 92.14% on the MIT-BIH, EDB, and INCART database, respectively. As depicted in Table 7, it is clear that the MPA-kNN achieved an average Acc of 94:96%. Approximately ≤6.97% of the S class and ≤9.07% of the VEBs class were not classified correctly. Thus, according to the experimental results, we can conclude that the MPA-kNN performed well in terms of the classification accuracy. For more investigation, the convergence curves are presented for each optimized classifier on the MIT-BIH, EDB, and INCART datasets in Figure 2. The MPA-SVM higighted a high-speed convergence on the MIT-BIH database compared to the other models, while the MPA-kNN was in last place. On other two databases, the MPA-GBDT and MPA-RF had the highest speed convergence, and the MPA-kNN still had the least convergence. Moreover, the MPA-GBDT and MPA-RF had close convergence on the three databases. In order to higight the improvement in the performance of the classifiers after optimization using the MPA algorithm, Tables 8-11 show the improvements in the performance of each classifier with optimization (i.e., the classifier and its optimized version). The reported performance criteria were the average values of the Acc, Sn, Sp, Pr, and F1 on the three databases. The highest improvement was in the performance metrics of the SVM. Moreover, the improvement on the INCART database was higher than on the other two databases. Thus, it is clear that utilizing the proposed optimization algorithm improved the performance of these four ML classifiers significantly.

Comparison with Other Methods
The classification performance of the four optimized ML classifiers was compared to 16 of the state-of-the art methods, and the obtained results are reported in Table 12. In contrast, the results achieved by the previous other works were obtained for only five classes, of which four were known classes, and only one was unknown. The current proposed approaches accomplished average accuracies of 99.67%, 99.91%, 99.92%, and 97.07% on the EDB dataset for the MPA-SVM, MPA-GBDT, MPA-RF, and MPA-kNN, respectively. The MPA-SVM, MPA-GBDT, and MPA-RF achieved the highest percentages in terms of the ACC and Sn. Even the MPA-kNN, which was based on the lazy classifier kNN, performed well against the SVM, CNN, and kNN models in [44][45][46]. It can be concluded from Table 12 that the proposed method yielded a significantly improved classification performance in terms of the overall measurement factors compared to the other methods, which confirms the effectiveness of the proposed optimized classifiers.

Conclusions
This paper proposed an automatic arrhythmia classification method based on a new AI metaheuristic optimization algorithm and four ML classifiers for IoT-assisted smart healthcare systems. . It is clear that the RF showed the most accurate results of these methods. Hence, it can be concluded that incorporating the MPA scheme can effectively optimize the ML classifiers, even a lazy one, such as the kNN. The achieved performance by the optimization step was ranked among the highest reported to date.
In future works, to enhance the ability to predict heart problems, other optimization algorithms can be investigated. For efficient methods to extract features and perform classification, it is necessary to incorporate the real-time surveillance of cardiac patients. Using powerful classification models (e.g., deep learning) is the possible next step of this research. To have meaningful classification outcomes with greater accuracy, these powerful classification models can be combined with the MPA algorithm, as it performed very well and enhanced the accuracy of the classification process.