Detection of Atrial Fibrillation Using a Machine Learning Approach

: The atrial ﬁbrillation (AF) is one of the most well-known cardiac arrhythmias in clinical practice, with a prevalence of 1–2% in the community, which can increase the risk of stroke and myocardial infarction. The detection of AF electrocardiogram (ECG) can improve the early detection of diagnosis. In this paper, we have further developed a framework for processing the ECG signal in order to determine the AF episodes. We have implemented machine learning and deep learning algorithms to detect AF. Moreover, the experimental results show that better performance can be achieved with long short-term memory (LSTM) as compared to other algorithms. The initial experimental results illustrate that the deep learning algorithms, such as LSTM and convolutional neural network (CNN), achieved better performance (10%) as compared to machine learning classiﬁers, such as support vectors, logistic regression, etc. This preliminary work can help clinicians in AF detection with high accuracy and less probability of errors, which can ultimately result in reduction in fatality rate.


Introduction
Atrial fibrillation (AF) is the major concern with irregular heart rhythms, especially if a person approaches the age of 65. The heart is a pump used for efficient operation and it is regulated by an internal pacemaker that controls the heartbeat, the electrical impulse usually starts from the sinoatrial node [1][2][3] and moves from atria to ventricles, which can cause regular rhythmic contractions of the chambers. AF risk remains unacknowledged in some patients, because of its unawareness, and other patients may be mindful of the erratic behaviour of heartbeat and may be uncomfortable with sensation. The irregular heartbeat indicates the symptoms of any accidental strokes that may result in further prolonged illness and, as a result, it leads to ultimate heart failure [4][5][6][7][8]. The AF detection methods are mainly focused on the RR intervals, short term study of heart rate variability, and sequential examination to check the existence of P-wave [9]. The current studies are mostly focused on the feature extraction techniques; therefore, the features have significant fictional effect on the final outcome of the models. The major improvements in electrocardiogram (ECG) data and the advancement of Dynamic Neural Network (DNNs) and, in particular, the implementation of algorithms such as long short term memory (LSTM) and convolutional neural network (CNN). These algorithms can be directly trained on the large-scaled dataset that RR Intervals of the ECG signal requirements and the performance of the ECG models is dramatically improved [10]. It is worth mentioning that, in the past decade, deep learning (DL) classifiers have been successfully applied in different fields, such as speech recognition, image classification [11], and many other domains such as natural language processing [12]. However, DL algorithms have not been widely applied for AF detection. However, there are few studies applied aforementioned classifier for AF prediction, but the comparison results show that the performance was unsatisfactory when compared to the image classification and image recognition [13]. It is to be noted that there is not any study available to detect the AF using DL classifiers.
In order to overcome the current issues, we build six models that are based on the feature-based approaches and DL approaches including support vector machine (SVM), multilayer perceptron (MLP), CNN and LSTM. The feature-based model is trained based on the manually extracted features, and DL methods are trained on raw data without any feature engineering. It is envisioned that this is the only exploiting automated feature engineering that focuses on DL for the automatic AF detection and classification. Additionally, the DL classifiers are compared with shallow learning classifiers.
In summary, the paper reports three major contributions that are outlined below: • We developed a novel deep learning architecture for convolutional neural network (CNN) and long short-term memory (LSTM) to automatically detect AF. In addition, in depth comparison has been done with state-of-the-art approaches as well as baseline models, such as ResNet and Convolutional LSTM.

•
Comparative analysis of the proposed approach with two widely online benchmark datasets.

•
It is to be noted that, unlike the traditional machine learning algorithms, the deep learning methods have integrated feature extraction into the model, thus the handcrafted features are not needed. In addition, these methods can mine well different types of data sources and have good generalization ability, allowing for the computer to automatically learn and extract related features for any given issues. We developed an end-to-end approach that is based on deep learning approaches, which does not require feature selection and feature extraction technique.

•
Additionally, we developed novel framework that can detect AF based on raw ECG signals than instead of other ECG features.
This paper structured, as follows: in Section 2, related work on the machine learning (ML) and DL classifiers for AF detection are presented. In Section 3, our novel approach for automated AF detection using ML and DL are described. In Section 4, the experimental results are discussed. Section 5 provides the discussion of the experimental results. Lastly, a conclusion is drawn out in Section 6 with future recommendations.

Related Work
In this section, the current state-of-the-art approaches for Atrial Fibrillation automatic detection are discussed. There are different ML and DL methods that are used to detect AF; however, the data should be transformed into an acceptable representation that enables classifiers to recognize the most suitable classes. Therefore, the DL methods break these rules and outperform the state-of-art efficiency method in many fields, such as text classification, image classification, etc. [14][15][16]. The DL methods that basically extract and classify the abstract features of the raw data [17][18][19].

ML Methods
Bruser et al. [20] proposed an approach for employing bed-mounted sensors for the detection of AF from the cardiac vibration signals. The approach remotely monitors patients, as well as various ML classifiers are used in order to evaluate the performance of the approach. Moreover, Xiong et al. [21] presented an approach based on k-means to identify the AF. The PubMed dataset is used in order to evaluate the performance of the approach, the empirical results show that the maximum entropy outperforms other algorithms. Hurnanen et al. [22] introduced a method based on the linear classification of the spectral entropy and SCG signals. The performance of the approach is tested on the thirteen patients. The technique does not require the identification of heartbreak peaks from SCG data, as it works well on low-quality SCG signals.
Asgari [23] presented a method that employs stationary wavelet transform and ML algorithms, such as SVM, to detect AF. The method does not require P and R peak detection. The comparison results with different algorithms demonstrated that the presented method obtains considerable performance. In addition, Andersen et al. [24] presented a method that is based on ML algorithms in order to predict AF by using long-term ECG recordings. The method used extracted features from the data. The experimental results reveal that the model achieved better performance when compared to state-of-the-art approaches on the same benchmark dataset and the method is more computationally efficient for processing all recording in less than one minute. Additionally, the procedure has been tested on healthy subjects with SNR in order to detect any false positive and negative predictions to clarify the robustness of the technique. The study of misclassified data shows that the classifiers produce plenty of false positives that impact the approach's output. The potential solution to this problem is to locate and manually delete the noise level in the ECG signal to maximise the method's efficiency, but this is computationally costly. Wu et al. [25] used different DL algorithms, such as CNN and LSTM, to improve the performance and, as neural networks, obtained better results without the feature extraction process. The hybrid approach for CNN and LSTM has been used in order to improve the performance of AF detection.
Nemati et al. [26] developed an approach that is based on ML algorithms in order to detect multiple diseases. The approach can detect AF based on the noise that is recorded from the wrist. In addition, the time of the movement and also ambulatory pulsatile is recorded. The experimental results were based on the 10-fold cross-validation and the proposed approach obtained a high performance. The presented approach was accurate in monitoring AF and this approach is the first of its kind algorithm to receive high performance in general population AF detection.
Recently, Aschbacher et al. [27] introduced a system that can employ a PPG signal in order to classify the AF without extensive feature engineering; therefore, the system required lots of pre-processing and used heart rate variability and even statistical approaches for achieving high accuracy. The AF detection tools that are based on artificial intelligence have great potential for minimising the risk of AF and it can deliver the early treatment to patients. However, the main limitation of the system is a lack of comparison with current state-of-the-art approaches. Wang et al. presented an [4] automated AF detection for the correlation among wavelet coefficient series in ECG signals. However, the method had performed well in detecting AF in clinical diagnosis and, in order to evaluate the performance of the feature construction strategy MIT-BIH, the AF database is used in order to explore the performance of the approach. There are a variety of wireless sensors are used to detect the AF; however, the short noisy ECG signal is still a big challenge. The DL classifiers are correctly identified AF from wireless ECG records without feature extraction [28]. Three different models are using artificial neural network (ANN), binary decision tree, and SVM on 10 ECG signals are used for comparison. The best classifier for Atrial Fibrillation was a binary decision tree, which split signal equal to 100 and the worst case is SVM while using one feature [29].

Feature-Based Methods
Sadr et al. [30] developed a method that was based on different features, including RR inter-beat intervals, time domain, frequency domain, and also distribution features, there are three different classifiers, such as linear classifier, SVM, and quadratic neural networks, are used to assess the efficiency of the method. Lim et al. [31] implemented a feature selection technique for AF of ECG morphological features and heart rate variability and several ML classifiers are used to understand the performance of the of the feature selection technique. However, there is lack of comparison results with current feature selection technique, such as information gain and mutual information. Lahdenoja et al. [32] introduced a system to collect data whlie using a mobile phone that was equipped with Google Android OS to detect AF. The system can pre-process, extract features and classify the data. The applications obtained high performance; however, the application is not available online.
Xia et al. [33] proposed an approach for converting one dimensional ECG into two-dimensional form by short-time Fourier transform and wavelet transform to detect AF while using SVM. The experimental results on MIT-BIH, Massachusetts Institute of Technology (MIT)-Beth Israel Hospital (BIH) AFIB dataset displays the superior performance on the detection of AF. Cao et al. [34] proposed a data augmentation technique for combining of the ECG episodes to improve the diversity of samples. The two layered benchmark LSTM is used to train the method. The method was successfully detect AF in small and imbalanced dataset. However, the key issue is that they have used a large number of histogram data and threshold values that are difficult to detect AF.

Wearable Devices for AF Detection
There are different approaches proposed to detect AF using wearable devices, such as smart watches, For example, Dorr et al. [35] introduced an approach for detecting AF based on the smart watches with high accuracy. However, it is challenging to obtain high performance from the wrist. The device can improve the detection of AF in patients with real-time monitoring. However, the proposed device is required to test on the wider population to monitor AF. ML algorithms improve outcomes, especially when the diagnose are provided from large datasets or complicated patterns of data, such as in AF. However, in order to reduce the data, the wavelet transformation is used for the compression of an image and then compressed image is processed by SVM classifier [2]. From ECG signals, heart rates are calculated and R peaks are extracted while using R-R intervals. Three classifiers K-nearest neighbour and random forest classifiers were used [36]. Zhang et al. [37] developed a mobile app for monitoring AF with active measurement. The sensitivity and accuracy used for monitoring AF were used in order to evaluate the performance of the approach.

Methodology
This section describes our proposed methodology for detecing AF. Figure 1 depicts the proposed conceptual framework and its details are presented in subsequent sections. Pre-processing: there are two different pre-processing methods that were applied before feeding the data into the models, the normalization consists of removing noises from ECG signals and then zero padding were used: where x denotes the matrix that is composed by ECG recording with the shape of 8528, 18,000, µ and σ are the mean and standard deviation of the matrix. please confirm.
In order to train ML classifiers, it is necessary to use fast fourier transform (FFT) in order to convert signal from time domain to the recorded frequency domain and then proceed to perform the feature extraction, each of the heart rhythm can be converted into components rhythms. In order to extract ECG features, first, each ECG is divided into 10 s segments and four QRS detectors have been applied, gqrs, Pan-Tompkins, maxima search, and match filtering. The voting system is applied to mix the results of QRS detector, based on the ECG records and QRS labels. There are 83 various features extracted, such as R-R intervals, single-wave features, and full-wave features. The features, such as age, gender, and information on commodities, has been used. In order to train the ML models the principal component analysis (PCA) is used to reduce the dimensionally of the data. However, in order to train the DL methods, the raw data to train the model.

ML Models
After feature extraction, there are different machine learning classifiers, including SVM, logistic regression, MLP, XGBoost, CNN, and LSTM, has been applied in order to evaluate the performance of the approach. Support Vector Machine (SVM): the SVM is one of the most well-known machine learning algorithm that is used to determine the hyperplane for the n-dimensional space of data, the SVM can be applied to non-linear classifier problem by using different kernel functions [38]. The main idea behind the SVM is how to divide the space with a decision boundary between data points. The w presents the vector per perpendicular to a median of the decision boundary, u is unknown vectors, and b is a constraint. Equation (2) and (3) show the positive and negative equations, respectively: Multilayer Perceptron (MLP): the MLP consists of interconnected processing elements called neurons, the neurons in MLP is connected to inputs with different weights. The output of a neuron is the summation of all the connected inputs, and then the non-linear processing unit is used which is called a transfer function. The aim of MLP is to transform the inputs into an understandable output by learning the relation between input and the output and offer a solution to unseen problems [39][40][41].  In our experiment, the features are extracted from the fully connected layer in order to train MLP classifier to identify the AF automatically. The experimental results show that the MLP classifier contains one hidden layer. The best performance of the MLP classifier is with a learning rate of 0.09. The number of epochs that is used to train MLP classifier is 100.
Logistic Regression: the logistic regression is also called logit regression which is a mathematical method used for a statistic to predict the probability of the event that is being processed in the previous data. The logistic regression can work on binary and multi-class data where the event can be happening, or the event is not happening [42][43][44].
XGBoost: XGBoost has been widely used and it is well-known classifier for the detection of AF, as it has shown superior performance on a large dataset. It is highly flexible and most of the regression, classification and ranking problems. In order to use XGBoost, we used skicit-learn package to train the classifier.

DL Models
Convolutional Neural Networks (CNN) architecture: CNN is a type of classifier that can be applied for different tasks, such as analysing images, text, and many other fields. However, CNN is also called shift invariant that can share the weights and also other characteristics. The CNN can be applied to sentiment analysis, recommendation system, image classification, healthcare, etc. [43,[45][46][47]. CNN is modern version of multi-layer perceptron that means fully connected layers that each neuron is connected to all other neurons. The regularization technique is added to some characteristics to lose function. The CNN usually used a novel method towards regularization, the main benefits of these types of methods are more complex methods to use a small and simple pattern in order to identify the prediction. CNN has been inspired by biological processes with a connectivity pattern between the neuron and visual cortex. However, in deep learning, there is not any requirement for extraction of the features, sometimes in machine learning algorithms, the extracted features can reduce the performance of the approach, because the appropriate features are not selected by the framework [48][49][50][51].
We present the novel architectural details of the proposed of the CNN model used for the approach. Recently, the CNN classifier is one of the most well-known supervised learning techniques in the field of computer vision due to parameter sharing and sparse connectivity, which make the model computationally efficient [52,53]. The main benefit of using CNN classifier for this problem is to automatically detect important features while using any human intervention. For instance, given lots of data into CNN, it can learn distinctive features for each class itself. In addition, the CNN is computationally efficient. It is to be noted that most of the DL classifiers do not require any feature engineering or feature selection, which can cause saving time during the pre-processing and feature engineering stage of developing a framework. Table 1  Long Short-Term Memory (LSTM): the LSTM architecture is made of input and two different LSTM layers and, in addition to that, it has one fully connected layer. The LSTM that is made up of two layers consists of 128 and 64 cells followed by dropout of layers with 0.2 probability and one dense layer with two different neurons. The main proposed architecture was trained based on Adam optimizer. The LSTM function is defined, as follows [54]: σ is denoted the sigmoid function. tanh is a tangent function that provides an output of [−1, 1], is component-wise multiplication. The old memory is controlled and discarded by f t and i t and they are used to control information that is stored in the new memory. ResNet: The Residual Networks is classic neural network used as classification for many computer vision tasks. The model was winner of ImageNet challenge in 2015. The fundamental with ResNet was allowed to train the deep neural networks with two-layer ResNet block, two generalized residual blocks (ResNet Init), two-layer ResNet block from two generalized residual blocks (grayed out connections are 0), and two-layer RiR block.

Experimental Results
In this section, we explain the experimental setup, followed by the results and discussions. In order to evaluate the performance of the proposed approach, there are different evaluation metrics, including accuracy, precision, recall, and f-measure are used [54].
where TP denotes true positive, TN presents true negative, FP is false positive, and FN represents false negative respectively. PhysioNet Dataset: In order to understand the performance of our proposed approach, we used the open labeled dataset of 2017 Physionet challenge dataset which consists of 8528 single channel ECG waveforms that were donated by the AliveCor. The waveform was recorded on the average of 30 s with a short waveform of 9 s and a long waveform of 61 s. The records were manually labeled into four different classes, including normal rhythm, AF rhythm, other rhythms, and noisy recording. The dataset consists of 8528 ECG recordings. The dataset consists of duration for each recording from 9 s to 60 s and it has four different class, such as normal, AF, noisy, and other. The sampling frequency of the record of 300 Hz.
MIT-BIH Atrial Fibrillation Dataset: the dataset consists of 23 ECG recording of human subjects with AF arrhythmia. Each of the ECG recording signals are sampled with 250 Hz with 12-bit resolution over range of ±10 millivolts. In this study, we used four segments and labelled each of them based on the threshold parameter.
In order to evaluate the performance of the approach, the dataset is divided into 70% of training, 20% of testing and 10% of validation. The generalisation model has been evaluated while using the performance of the model on 48 short-term recording. The dataset consists of four different subjects that are more likely to learn the feature. Table 2 shows the summary of parameters to train ML and DL models.  Table 3 shows the summary of ML and DL classifiers. The DL algorithms, such as CNN and LSTM, achieved better performance as compared to ML classifiers using PhysioNet, as shown in the Table. Table 4 show the summary of ML and DL classifiers. As shown in the Table the DL algorithms such as CNN and LSTM achieved better performance as compared to ML classifiers while using MIT-BIH Atrial Fibrillation. Table 5 shows the summary of results for ML and DL classifiers to identify the AF. The comparative experimental results show that the deep learning algorithms, such as LSTM and CNN, achieved better performance as compared to traditional ML classifiers. The experimental results show that CNN and LSTM achieved better performance when compared to traditional ML classifier. The proposed CNN architecture does not require any feature engineering as compared to machine learning classifier and it can generate the best performance as compared to machine learning algorithms such as MLP and logistic regression. We need to point out that the proposed CNN is more accurate to use the feature learning as compared to MLP and logistic regression, because the proposed approach can improve the AF detection by using DL classifiers. However, the CNN results show that the capability of learning features of identify of AF which can outperform in another ML algorithm. In order to evaluate the current results the Convolutional LSTM and ResNet have been added. However, the baseline ResNet and Convolutional LSTM achieved lower performance as compared to the proposed architecture for CNN and LSTM. However, the training time for Convolutional LSTM is lower than the training time for CNN and LSTM. Tables 3-6 has been updated. In addition, the ResNet took a long time to train the model. It is to be noted that the major drawback for ResNet is that is takes a long time to train the model. In addition, the convolutional LSTM required lots of resources and trained data to be trained on the AF dataset; therefore, it obtained lower performance as compared to proposed architecture for CNN and LSTM.  Table 5 presents the summary of results for ML and deep learning classifiers in order to identify the AF using MIT-BIH Atrial Fibrillation. The initial experimental results demonstrate that the deep learning algorithms, such as LSTM and CNN, achieved better performance when compared to traditional machine learning classifiers. The proposed DL approaches, including CNN and LSTM, achieved better performance as compared with traditional machine learning classifiers, such as SVM, MLP, Logistic regression, and XGBoost, as shown in Table 6. The overall accuracy of LSTM is 87.5%. The main reason why deep learning model achieved better performance as compared to a machine learning model is the feature extraction ability of DL classifiers. However, the main problems of deep learning classifiers is the computation speed. The TensorFlow Python package is used in order to train the model. The models trained used two hundred epochs with back propagation, the Adam optimizer is used for minimising the categorical cross-entropy loss function. In order to find the best accuracy for CNN, the different layers of CNN have been used in order to find out the best accuracy for CNN. The 2017 Physionet challenge dataset has been evaluated with a different number of convolutional layers, as shown in Table 7. The initial experimental results show that, as the number of convolutional layers are increased, the performance is improved and, also, once the number of convolutional layers increased to five, the performance is decreased gradually. Therefore, we have chosen four convolutional layers, as the performance is improved. It is worth to mention that, as number of convolutional layers increased, the model takes more time to train. According to the results shown in Tables 7 and 8, the precision, recall and f-measure for layer 4 is better as compared to other layers. In addition, the cohen's kappa statistical values were above 0.8, where it shows that our proposed system is the most perfect approach in detecting AF.  There are different layers of LSTM that have been used to find the best model architecture to detect the AF in order to find the best accuracy of LSTM. As shown in Table 9, the 2017 Physionet challenge dataset has been evaluated with a different number of LSTM layers. The experimental results reveal that an increased number of LSTM layer increases the time to train the model and, as the number of layers are decreased, the model is faster to train. According to results, as the number of layer increase from one to two, the performance of the model is rapidly improved. However, as the number of the layers increased from two to three layer the accuracy decreased gradually. The two-layer LSTM achieved better performance in terms of other evaluation metrics, such as precision, recall, and f-measure. In addition, Table 10 evaluated a different number of LSTM layers with MIT-BIH Atrial Fibrillation dataset. As shown, the LSTM algorithm with two-layer achieved better performance when compared to other layers. Comparison with State-of-the-Art Approaches Table 11 demonstrates the comparison of our proposed approach with other state-of-the-art approaches. As demonstrated, our proposed approach achieved better performance in terms of accuracy, precision, recall, and f-measure.

Discussion
The performance achieved on PhysioNet and MIT-BIH Atrial Fibrillation datasets on deep learning classifiers, such as CNN and LSTM, beats the traditional machine learning classifiers, such as MLP and logistic regression. The performance achieved on these datasets indicates that our proposed unique data-driven DL models outperform traditional feature extraction techniques. It is to be noted that the proposed model has been trained on the NVIDIA GeForce 940M GPU with 384 Cuda cores and 2 GB DDR3.
The major strength of deep learning algorithms that their ability to generalise well on the learning good representation of data. It is to be noted that the datasets used in our study consist od lots of noise and invariance to factors that may contribute to irregularity of heart rhythms, which can cause low performance in the proposed deep learning classifiers. However, in our method, the current invariance can be obtained while using combination of time frequency and attention to mechanism that can detect the AF with better performance. For example, change the position the high heart rate in the time-frequency. One of the main limitations with machine learning methods is lack of gold standard labels datasets. Therefore, in this study, there are two widely online benchmark datasets used, which are clinically annotated. It is to be noted that there are still lots of noise available in dataset. In summary, this paper provides motivation for researcher to utilize the deep learning models in order to detect the AF, because the preliminary experimental results demonstrate that the deep learning models achieved better performance when compared to traditional machine learning classifiers. To the best of our knowledge, this is the first study to consider the AF prediction. Our future work will be more focused on the ensemble classification of machine learning and deep learning classifiers to achieve better performance in terms of detecting AF.

Conclusions
The abnormal activity of the heart can cause cardiac arrhythmias, which can cause atrial fibrillation (AF). The AF is one of the most common types of heart failure, which can lead to a high mortality rate. More than 70% of AF occurred without the notice of patients and around eight million people in the world are suffering from this disease. Therefore, it is required to develop a novel technique for accurately and proactively detecting AF. In this study, we presented a novel framework to detect AF while using both machine and deep learning techniques. Most of the current approaches required feature selection; however, in this work, we proposed a deep learning algorithm, which does not require any feature engineering. The experimental results demonstrated that the deep learning approaches achieved better performance as compared to traditional shallow learning classifiers, such as SVM. As future work, we are planning to develop a real-time approach in order to detect AF without the requirement of any labelled data.