Lightweight Ensemble Network for Detecting Heart Disease Using ECG Signals

: Heart disease should be treated quickly when symptoms appear. Machine-learning methods for detecting heart disease require desktop computers, an obstacle that can have fatal conse-quences for patients who must check their health periodically. Herein, we propose a MobileNet-based ensemble algorithm for arrhythmia diagnosis that can be easily and quickly operated in a mobile environment. The electrocardiogram (ECG) signal measured over a short period of time was augmented using the matching pursuit algorithm to achieve a high accuracy. The arrhythmia data were classiﬁed through an ensemble classiﬁer combining MobileNetV2 and BiLSTM. By classifying the data using this algorithm, an accuracy of 91.7% was achieved. The performance of the algorithm was evaluated using a confusion matrix and a receiver operating characteristic curve. The sensitivity, speciﬁcity, precision, and F1 score were 0.92, 0.91, 0.92, and 0.92, respectively. Because the proposed algorithm does not require long-term ECG signal measurement, it facilitates health management for busy people. Moreover, parameters are exchanged when learning data, enhancing the security of the system. In addition, owing to the lightweight deep-learning model, the proposed algorithm can be applied to mobile healthcare, object detection, text recognition, and authentication.


Introduction
Biosignals are indicators of physical health that allow the management of various diseases, such as muscle pain, insomnia, and heart disease [1][2][3]. The electrocardiogram (ECG) is the most important signal for confirming the state of the heart [4,5].
In an ECG, the heartbeat is represented by electrical signals. The heart rate of a healthy person is usually between 60 and 100 beats per minute [6]. When a person is exercising, tense, or excited, the heart beats faster. There are typically no problems in such cases; however, when the heart beats irregularly for no reason, this symptom is called arrhythmia. According to NHANES 2015-2018 data, the prevalence of cardiovascular disease (comprising coronary heart disease, heart failure, stroke, and hypertension) in adults ≥20 years of age is 49.2% overall (126.9 million in 2018) and increases with age for both males and females. The prevalence of cardiovascular disease excluding hypertension is 9.3% overall (26.1 million in 2018) [7]. The rapid increase in the number of heart-disease patients due to changes in eating habits and reduced exercise has contributed to the most serious death rate of modern people. The most representative type of heart disease is arrhythmia. Early detection of arrhythmia is crucial, because arrhythmia causes symptoms such as dizziness, fainting, chest pain, and difficulty breathing, and can lead to heart attacks [8].
Methods of diagnosing arrhythmia include periodically visiting a hospital or using a Holter monitor. However, both of these are inconvenient for patients and can be expensive. In addition, ECG signals measured by widely used smart watches are acquired over a short period of <1 min; therefore, it is impossible to identify cardiovascular diseases, including arrhythmia, using such devices.
MobileNetV2 is an image-classification model that was proposed in 2018 [17]. It is a lightweight network that preserves performance to the greatest extent possible, and compared with previously proposed networks, the size of the classifier model is significantly reduced by applying the average-pooling technique in the process of converting the feature group into the classifier group. The feature stage has a structure in which convolutional modules are repeatedly stacked. MobileNetV2 has a conv. + batch normalization (BN) + rectified linear unit (ReLU) structure, and the computational burden and model size are reduced by using a depth-wise convolutional layer as an intermediate layer. In the process of converting spatial information into a fully connected (FC) layer, the weight of the FC layer is limited to the number of channels by applying average pooling rather than the existing tensor shape conversion [18].
In a previous study, computed-tomography and X-ray data related to COVID-19 were classified using six different models, including MobileNetV2 [19]. The model performance was evaluated according to the accuracy, precision, recall, and F1 score. Among the models tested, MobileNetV2 and VGG19 exhibited the best performance.
There are many reports of heart-disease diagnosis using machine learning. A review was conducted to identify the trends of machine-learning-based and data-driven techniques for heart-disease diagnosis with imbalanced data [20]. A meta-analysis was performed using 451 reports acquired from reputed journals between 2012 and 15 November 2021.
Machine learning, which can accurately classify data, uses a high-performance central processing unit (CPU); thus, it has excellent computational processing power and uses multiple high-performance graphics. Because machine-learning algorithms are developed with a focus on performance, the operating environment of the algorithm is not considered. However, patients with heart disease must check their health regularly and cannot carry a high-performance desktop. To solve this problem, a machine-learning method that can be implemented on a mobile device is required. MobileNetV2 is a convolutional neural network (CNN)-based algorithm designed for applications wherein the computational resources are limited or the battery performance is important.

Bidirectional Long Short-Term Memory (BiLSTM)
BiLSTM is a circulating neural network that can process data that change over time, such as video [21,22]. The original circulating neural network had the problem of data loss after repeated backpropagation, but BiLSTM mitigates the problem of data loss using a forget gate. The core of the BiLSTM neural network comprises the sequence input layer and the long short-term memory (LSTM) layer. The sequence input layer inputs sequence or time-series data into the neural network. The LSTM layer learns the data according to the sequence time flow.

Matching Pursuit
The matching pursuit algorithm was developed to linearly decompose signals in order to understand their characteristics. When the original signal is decomposed, the characteristics of the original signal are identified by considering the time and frequency domains simultaneously, e.g., the wavelet and Fourier transforms, in the order of energy [23].

Database
The MIT-BIH database, which is widely used in arrhythmia-related studies, contains ECG data categorized into 17 classes. This dataset consists of ECG signals with durations of 10 s from 45 participants. The MIT-BIH database is commonly used by researchers to develop heart-related algorithms [24][25][26]. Table 1 presents details regarding the databases used in this study.  Figure 1 shows the ECG signals included in the database. "NSR" corresponds to a normal ECG signal. AFIB is one of the heart diseases that can be detected using ECG signals. PVC and LBBB correspond to premature ventricular contractions and left bundle branch block beats, respectively, and also represent abnormal heartbeats.

Wavelet Transform
The wavelet transform is a multi-resolution system capable of processing signals in various frequency bands by converting the input sampling frequency into another set of sampling frequencies. By applying this transform to an ECG signal, noise removal and waveform segmentation can be performed simultaneously, providing a high resolution for each feature element in the signal. The wavelet transform can be used to analyze the converted signal in the desired frequency band by multiplying the input signal by the wavelet function and scale function, and dividing the frequency band into high-and low-frequency segments [27][28][29].

Wavelet Transform
The wavelet transform is a multi-resolution system capable of processing signals in various frequency bands by converting the input sampling frequency into another set of sampling frequencies. By applying this transform to an ECG signal, noise removal and waveform segmentation can be performed simultaneously, providing a high resolution for each feature element in the signal. The wavelet transform can be used to analyze the converted signal in the desired frequency band by multiplying the input signal by the wavelet function and scale function, and dividing the frequency band into high-and lowfrequency segments [27][28][29].

Previous Study
Tuncer et al. [2] classified arrhythmia signals using the 1D-HLP technique. Using the 1D-HLP, 512 dimensional features are extracted from each of the five levels of the lowpass filter. These features are subjected to 1 Nearest Neighbor (1-NN) classifier for classification with four distance metrics. The authors obtained a classification accuracy of 95.0% when classifying 17 arrhythmia classes using the MIT-BIH arrhythmia ECG Database.
Ribeiro, H.D.M., et al. [30] proposed an algorithm that could classify the ECG signal of both healthy and sick people. The proposed lightweight solution uses quantized onedimensional deep convolutional neural networks, and is ideal for real-time continuous monitoring of cardiac rhythm. It is capable of providing one output prediction per second. It is accurate (sensitivity of 98.5% and specificity of 99.8%) and can be implemented on a smartphone, which is energy-efficient and fast, requiring 7.65 ms per prediction Naz et al. [31] proposed a new deep-learning approach for the detection of ventricular arrhythmias (VA). Initially, the ECG signals were transformed into images, and this had not been done before. Subsequently, these images were normalized and utilized to train the AlexNet, VGG-16, and Inception-v3 deep-learning models. The results were evaluated on the MIT-BIH Database, and an accuracy of 97.6% was achieved.
Cai et al. [32] developed a deep-learning-based approach for multi-label classification of ECG, named Multi-ECGNet, which can effectively identify patients with multiple heart diseases simultaneously. The experimental results show that Multi-ECGNet can achieve a high score of 0.863 (micro-F1-score) in classifying 55 types of arrhythmias.

Previous Study
Tuncer et al. [2] classified arrhythmia signals using the 1D-HLP technique. Using the 1D-HLP, 512 dimensional features are extracted from each of the five levels of the low-pass filter. These features are subjected to 1 Nearest Neighbor (1-NN) classifier for classification with four distance metrics. The authors obtained a classification accuracy of 95.0% when classifying 17 arrhythmia classes using the MIT-BIH arrhythmia ECG Database.
Ribeiro, H.D.M., et al. [30] proposed an algorithm that could classify the ECG signal of both healthy and sick people. The proposed lightweight solution uses quantized onedimensional deep convolutional neural networks, and is ideal for real-time continuous monitoring of cardiac rhythm. It is capable of providing one output prediction per second. It is accurate (sensitivity of 98.5% and specificity of 99.8%) and can be implemented on a smartphone, which is energy-efficient and fast, requiring 7.65 ms per prediction Naz et al. [31] proposed a new deep-learning approach for the detection of ventricular arrhythmias (VA). Initially, the ECG signals were transformed into images, and this had not been done before. Subsequently, these images were normalized and utilized to train the AlexNet, VGG-16, and Inception-v3 deep-learning models. The results were evaluated on the MIT-BIH Database, and an accuracy of 97.6% was achieved.
Cai et al. [32] developed a deep-learning-based approach for multi-label classification of ECG, named Multi-ECGNet, which can effectively identify patients with multiple heart diseases simultaneously. The experimental results show that Multi-ECGNet can achieve a high score of 0.863 (micro-F1-score) in classifying 55 types of arrhythmias.
Park et al. [33] proposed an ECG signal multiclassification model using deep learning.
proposed features yielded an accuracy of 90.46%, an AUC of 0.982, a sensitivity of 0.892, a precision of 0.900, and an F1 score of 0.895, and presented better results than wavelet features with XGBoost. Existing technology for arrhythmia diagnosis makes it difficult for patients to detect diseases in their daily lives. This is due to structural problems with the arrhythmia diagnosis technology. When deep learning is used to detect arrhythmia it involves a very large amount of computation, because it is developed with a focus on performance. This deeplearning computer uses high-performance GPUs and memory. However, it has spatial limitations because it is not portable. Although arrhythmia diagnosis technology is being developed, it is not practically helpful for patients with arrhythmia. To solve these problems, patients need solutions to be available on mobile devices that are now widely available to many people. Figure 2 shows the workflow of the proposed algorithm. The proposed arrhythmia detection method involves preprocessing to minimize noise and resize the ECG signals.

Methodology
To increase the accuracy of the algorithm, the training data were sufficiently selected through the matching pursuit algorithm. The wavelet transform was used for feature-point extraction. Arrhythmia was detected using a MobileNetV2-BiLSTM neural network. classifier with 152 layers achieved an F1 score of 97.05% for seven-class classifications. The model surpassed the baseline model, ResNet, by +1.40% for seven-class classification.
Lee et al. [34] proposed a novel method for generating a gray-level co-occurrence matrix (GLCM) and gray-level run-length matrix (GLRLM) from one-dimensional signals. The authors extracted the morphological features for automatic ECG signal classification. The extracted features were combined with six machine-learning algorithms to classify cardiac arrhythmias. Of the six machine-learning algorithms, combining XGBoost with the proposed features yielded an accuracy of 90.46%, an AUC of 0.982, a sensitivity of 0.892, a precision of 0.900, and an F1 score of 0.895, and presented better results than wavelet features with XGBoost.
Existing technology for arrhythmia diagnosis makes it difficult for patients to detect diseases in their daily lives. This is due to structural problems with the arrhythmia diagnosis technology. When deep learning is used to detect arrhythmia it involves a very large amount of computation, because it is developed with a focus on performance. This deeplearning computer uses high-performance GPUs and memory. However, it has spatial limitations because it is not portable. Although arrhythmia diagnosis technology is being developed, it is not practically helpful for patients with arrhythmia. To solve these problems, patients need solutions to be available on mobile devices that are now widely available to many people. Figure 2 shows the workflow of the proposed algorithm. The proposed arrhythmia detection method involves preprocessing to minimize noise and resize the ECG signals.

Methodology
To increase the accuracy of the algorithm, the training data were sufficiently selected through the matching pursuit algorithm. The wavelet transform was used for featurepoint extraction. Arrhythmia was detected using a MobileNetV2-BiLSTM neural network.

Preprocessing
Noise in ECG signals is caused by various factors, such as the environment, whether the electrodes are in good contact with the patient, movement of the measurement cable, movement of the baseline due to breathing, and movement of the patient. These factors affect the shape and size of the waveform, reducing the accuracy of arrhythmia detection. To minimize noise, a Butterworth notch filter and a moving-average filter were designed. Figure 3 shows the signal-processing results.

Preprocessing
Noise in ECG signals is caused by various factors, such as the environment, whether the electrodes are in good contact with the patient, movement of the measurement cable, movement of the baseline due to breathing, and movement of the patient. These factors affect the shape and size of the waveform, reducing the accuracy of arrhythmia detection. To minimize noise, a Butterworth notch filter and a moving-average filter were designed. Figure 3 shows the signal-processing results.

Data Augmentation
A matching pursuit algorithm was used to augment the training data. This algorithm can generate additional signals that are similar to the original signal.
The original signal must be decomposed to generate a similar signal. First, the basic signals 0 and 0 that most closely reflect the original signal are calculated, as follows: where R(t) represents the remainder of the original signal after it is decomposed using the base signal 0 , and 0 is a coefficient that optimally represents the given signal in terms of the minimal mean square error given any base signal 0 . The approximated signal that best represents a given original signal has the smallest error value when approximating the original signal. Therefore, to determine this basic signal function and the coefficient value, we define Equation (2), which gives the difference between the original signal and its approximation.
Finding the coefficient 0 that minimizes Equation (2), this becomes an optimal value from the viewpoint of the least square error. The minimized E ( 0 | 0 ) is given as To minimize Equation (2), the default signal with the maximum value of < , 0 > 2 and the default signal with the largest inner-product absolute value are set to the signal 0 to decompose the input signal. For minimizing Equation (2), the expansion coefficient 0 is expressed as follows: Figure 3. Digital filter was designed in MATLAB and applied to the ECG signal. Butterworth highpass and low-pass filters were designed (5−250 Hz), and a band-stop filter was used to reduce the noise at 60 Hz.

Data Augmentation
A matching pursuit algorithm was used to augment the training data. This algorithm can generate additional signals that are similar to the original signal.
The original signal must be decomposed to generate a similar signal. First, the basic signals g 0 and a 0 that most closely reflect the original signal are calculated, as follows: where R(t) represents the remainder of the original signal after it is decomposed using the base signal g 0 , and a 0 is a coefficient that optimally represents the given signal in terms of the minimal mean square error given any base signal g 0 . The approximated signal that best represents a given original signal has the smallest error value when approximating the original signal. Therefore, to determine this basic signal function and the coefficient value, we define Equation (2), which gives the difference between the original signal and its approximation.
Finding the coefficient a 0 that minimizes Equation (2), this becomes an optimal value from the viewpoint of the least square error. The minimized E (a 0 |g 0 ) is given as To minimize Equation (2), the default signal with the maximum value of f , g 0 2 and the default signal with the largest inner-product absolute value are set to the signal g 0 to decompose the input signal. For minimizing Equation (2), the expansion coefficient a 0 is expressed as follows: Equations (2)-(4) are used to determine the basic signal with the largest absolute value of the original signal and the inner product. This procedure represents the input signal using a basic signal that is most similar to the original signal. After a given input signal is decomposed, the original signal is redefined using the approximate error signal f (t), as indicated by Equation (5). This procedure is repeated until the original signal is completely decomposed. Thus, the matching pursuit algorithm decomposes the original signal into appropriate basic signals. Figure 4 shows the results of generating a signal similar to the original signal by changing the coefficient. signal using a basic signal that is most similar to the original signal. After a given input signal is decomposed, the original signal is redefined using the approximate error signal f(t), as indicated by Equation (5). This procedure is repeated until the original signal is completely decomposed. Thus, the matching pursuit algorithm decomposes the original signal into appropriate basic signals. Figure 4 shows the results of generating a signal similar to the original signal by changing the coefficient.

Wavelet Transform
Equation (6) represents the wavelet function ψ(a, d)(x) using the scale coefficient a and the transition element d.
The discrete signal x(n) of the ECG can be converted into a discrete wavelet by discretizing the scaling element (a) and the transition element (d) in Equation (6). At the level j = −1, it can be expressed by combining a high-frequency signal 2 [ ( )] and a lowfrequency signal 2 [ ( )]. The ECG signal is expressed as x(n) = 2 [ ( )] + 2 [ ( )] and generally satisfies Equations (7) and (8).
In this case, 2 indicates that the number of samples is divided by two as the level j decreases. Equation (7) represents a signal with a high-frequency component, which is

Wavelet Transform
Equation (6) represents the wavelet function ψ(a, d)(x) using the scale coefficient a and the transition element d.
The discrete signal x(n) of the ECG can be converted into a discrete wavelet by discretizing the scaling element (a) and the transition element (d) in Equation (6). At the level j = −1, it can be expressed by combining a high-frequency signal D 2j [x(n)] and a low-frequency signal A 2j [x(n)]. The ECG signal is expressed as and generally satisfies Equations (7) and (8).
In this case, 2 j indicates that the number of samples is divided by two as the level j decreases. Equation (7) represents a signal with a high-frequency component, which is related to the transition elements of the signal. Equation (8) shows the low-frequency component of the signal, which is related to the scale of the signal. The high-and lowfrequency components of the input signal are divided according to the level j. D k (detail) is the finite impulse response (FIR) high-band filter coefficient associated with the wavelet coefficient, and A k (approximation) is the FIR low-band filter coefficient associated with the scale function coefficient. The signal whose length is reduced by half through each filter is repeatedly converted to the next scale level. The wavelet coefficient indicates the similarity to the wavelet-generating function. This represents the frequency signal. Figure 5 shows the results of applying the wavelet transform to the ECG signal. quency components of the input signal are divided according to the level j.
(detail) is the finite impulse response (FIR) high-band filter coefficient associated with the wavelet coefficient, and (approximation) is the FIR low-band filter coefficient associated with the scale function coefficient. The signal whose length is reduced by half through each filter is repeatedly converted to the next scale level. The wavelet coefficient indicates the similarity to the wavelet-generating function. This represents the frequency signal. Figure 5 shows the results of applying the wavelet transform to the ECG signal.

Proposed ECG Signal Classification Method
An algorithm combining MobileNetV2 and BiLSTM was developed for classifying arrhythmia data. MobileNetV2 classified the arrhythmia data, and BiLSTM maintained the sequence data to improve the performance of the arrhythmia classification model. A diagram of the MobileNetV2-BiLSTM algorithm for classifying arrhythmia data is shown in Figure 6.

Proposed ECG Signal Classification Method
An algorithm combining MobileNetV2 and BiLSTM was developed for classifying arrhythmia data. MobileNetV2 classified the arrhythmia data, and BiLSTM maintained the sequence data to improve the performance of the arrhythmia classification model. A diagram of the MobileNetV2-BiLSTM algorithm for classifying arrhythmia data is shown in Figure 6. The data ratio was the highest with 283 NSR data and the lowest with 103 LBBB data. The balance data may cause overfitting problems, because the number of data is insufficient. To solve this problem, data were added using the matching pursuit algorithm in this study.
The matching pursuit algorithm was developed to decompose signals in order to understand their characteristics. During the decomposition of the signal to be analyzed, the time and frequency domains are simultaneously considered by applying the wavelet transform or Fourier transform.
When a signal is added, the matching pursuit algorithm first determines the length Figure 6. Diagram of the proposed algorithm. The input data consisted of four classes, and a MATLAB-based digital filter was used for preprocessing. The matching pursuit algorithm was used for data augmentation. MobileNetV2-BiLSTM was applied in the data classification process.
The data ratio was the highest with 283 NSR data and the lowest with 103 LBBB data. The balance data may cause overfitting problems, because the number of data is insufficient. To solve this problem, data were added using the matching pursuit algorithm in this study.
The matching pursuit algorithm was developed to decompose signals in order to understand their characteristics. During the decomposition of the signal to be analyzed, the time and frequency domains are simultaneously considered by applying the wavelet transform or Fourier transform.
When a signal is added, the matching pursuit algorithm first determines the length of the signal to be added. Subsequently, a signal that most closely reflects the original signal is generated. At this time, the minimum square error (MSE) is used. A smaller MSE corresponds to a higher degree of similarity to the original signal. In the above formula, g 0 represents a basic signal, and a 0 represents a signal with a minimized MSE. R(t) represents the signal remaining after the original signal is decomposed using g 0 . A matching pursuit function can be applied to the original signal to generate several signals with small MSE values.
MobileNetV2 extracts random features from the input data. Figure 7 shows the MobileNetV2-BiLSTM structure. The input size was 227 × 227 × 3. The data size was changed to 114 × 114 × 32 by applying a stride of 2 × 2 in the first convolutional layer. BN and ReLU functions were applied, and subsequently, depth-wise convolution (3 × 3 × 1), BN, the ReLU function, and convolution were applied to reduce the amount of computation.
MobileNetV2 contains 16 blocks, and all the blocks were implemented in the same manner, as shown in Figure 8. The input size was 227 × 227 × 3. The data size was changed to 114 × 114 × 32 by applying a stride of 2 × 2 in the first convolutional layer. BN and ReLU functions were applied, and subsequently, depth-wise convolution (3 × 3 × 1), BN, the ReLU function, and convolution were applied to reduce the amount of computation.
MobileNetV2 contains 16 blocks, and all the blocks were implemented in the same manner, as shown in Figure 8.
Because sequence data processing is difficult with general neural networks, a special recurrent neural network was used for BiLSTM. Each LSTM layer has three gates that transmit or control data and can learn by considering the gradient decay problem. LSTM obtains the information of all the cells over time. However, data cannot be learned after the cell in which the current learning is in progress. BiLSTM is an improved version of LSTM, in which forward propagation and backpropagation are used to learn information implied in the past and future, respectively. Consequently, this model can handle time-series data more efficiently. Figure 9 shows the structure of the BiLSTM model.
The input size was 227 × 227 × 3. The data size was changed to 114 × 114 × 32 by applying a stride of 2 × 2 in the first convolutional layer. BN and ReLU functions were applied, and subsequently, depth-wise convolution (3 × 3 × 1), BN, the ReLU function, and convolution were applied to reduce the amount of computation.
MobileNetV2 contains 16 blocks, and all the blocks were implemented in the same manner, as shown in Figure 8. Because sequence data processing is difficult with general neural networks, a special recurrent neural network was used for BiLSTM. Each LSTM layer has three gates that transmit or control data and can learn by considering the gradient decay problem. LSTM obtains the information of all the cells over time. However, data cannot be learned after the cell in which the current learning is in progress. BiLSTM is an improved version of LSTM, in which forward propagation and backpropagation are used to learn information implied in the past and future, respectively. Consequently, this model can handle timeseries data more efficiently. Figure 9 shows the structure of the BiLSTM model. The first LSTM layer was used to calculate the sequence information at the current time. The second layer was used to read the same sequence in the reverse direction and add reverse sequence information to extract meaningful features of the input data. The output value between the LSTM layers was transmitted to not only the adjacent unit but also the input of the next LSTM layer. The weight of the LSTM could be updated by the forward and backward propagation of the neuron. After the characteristics of each input signal were extracted, a BiLSTM classification model was configured. Dropout was added to the BiLSTM layer to prevent overfitting of the model. The learning results were used as inputs to an FC layer. The ECG signal was classified in the FC layer, and a softmax layer was used to output the result.

Performance Evaluation
The sensitivity, specificity, precision, and F1 score were calculated to evaluate the performance of the presented model. Sensitivity refers to the percentage of data that are actually positive and are classified as positive. The sensitivity of the proposed model was calculated as 0.92 using Equation (9).
Specificity refers to the ratio of negative data classified as negative. The specificity of The first LSTM layer was used to calculate the sequence information at the current time. The second layer was used to read the same sequence in the reverse direction and add reverse sequence information to extract meaningful features of the input data. The output value between the LSTM layers was transmitted to not only the adjacent unit but also the input of the next LSTM layer. The weight of the LSTM could be updated by the forward and backward propagation of the neuron. After the characteristics of each input signal were extracted, a BiLSTM classification model was configured. Dropout was added to the BiLSTM layer to prevent overfitting of the model. The learning results were used as inputs to an FC layer. The ECG signal was classified in the FC layer, and a softmax layer was used to output the result.

Performance Evaluation
The sensitivity, specificity, precision, and F1 score were calculated to evaluate the performance of the presented model. Sensitivity refers to the percentage of data that are Appl. Sci. 2022, 12, 3291 11 of 18 actually positive and are classified as positive. The sensitivity of the proposed model was calculated as 0.92 using Equation (9).
Specificity refers to the ratio of negative data classified as negative. The specificity of the proposed model was calculated as 0.91 using Equation (10).
Precision refers to the ratio of data whose predicted value and actual value match as positive among subjects whose prediction is positive. It indicates how well positive data are classified. The precision of the proposed model was calculated as 0.92 using Equation (11).
The F1 score is the most representative method for evaluating the performance of deep-learning classification models. The proposed model exhibited high F1 scores, as the precision and sensitivity were not biased toward either side. F1 score = 2 × precision × sensitivity precision + sensitivity (12)

Results and Discussion
Using the proposed model, i.e., the MobileNetV2-BiLSTM algorithm, 2000 ECG signals were classified. The data for each class were augmented to 500 using the matching pursuit algorithm. Figure 10 presents the classification results. As shown, the LBBB data were best classified among the four classes. This is because the LBBB data had the most prominent features, such as baseline fluctuations and changes in the QRS complex, among the ECG signals. The AFIB data exhibited an accuracy of 92.8%, and were also classified relatively well compared with the other data, because the signal interval was not constant and the signal amplitude was small. The NSR data had the lowest classification accuracy because they had no noticeable features compared with the other data. Appl

Results and Discussion
Using the proposed model, i.e., the MobileNetV2-BiLSTM algorithm, 2000 ECG signals were classified. The data for each class were augmented to 500 using the matching pursuit algorithm. Figure 10 presents the classification results. As shown, the LBBB data were best classified among the four classes. This is because the LBBB data had the most prominent features, such as baseline fluctuations and changes in the QRS complex, among the ECG signals. The AFIB data exhibited an accuracy of 92.8%, and were also classified relatively well compared with the other data, because the signal interval was not constant and the signal amplitude was small. The NSR data had the lowest classification accuracy because they had no noticeable features compared with the other data. The sensitivity of the proposed model was calculated as 0.92 using Equation (9). The specificity of the proposed model was calculated as 0.91 using Equation (10). The precision of the proposed model was calculated as 0.92 using Equation (11). The F1 score of the proposed model was calculated as 0.92 (on average) using Equation (12). Table 2 presents the performance of the proposed algorithm. The sensitivity of the proposed model was calculated as 0.92 using Equation (9). The specificity of the proposed model was calculated as 0.91 using Equation (10). The precision of the proposed model was calculated as 0.92 using Equation (11). The F1 score of the proposed model was calculated as 0.92 (on average) using Equation (12). Table 2 presents the performance of the proposed algorithm. The receiver operating characteristic (ROC) curve is an important indicator for measuring the performance of classifiers [35]. It indicates how the true positive rate (TPR) changes when the false positive rate (FPR) changes. Here, the TPR represents the sensitivity. By setting the FPR as the X-axis and the TPR as the Y-axis, the changes in the TPR with respect to the FPR were examined. The true negative rate (TNR) is an indicator corresponding to the sensitivity.
The presented algorithm was trained by considering the following parameters. This set value was determined by conducting several experiments using MobileNet v2-BiLSTM and comparing the results. Table 3 Computational or time-complexity evaluation table of proposed algorithms. The batch size refers to the size of a group when the dataset used for training is divided into several groups. The training dataset is divided because it takes a long time to train if the entire dataset is entered into a neural network. The batch size used in this study was 30. Data rotation is a procedure applied to improve the efficiency when an algorithm learns. Neural networks have different learning outcomes depending on the state of data. When the same image is inputted differently, the neural network recognizes different data.
(1) The normal image is the same as input (2), but the image is rotated. The data rotation used in this study was set to five. 'Data shift' refers to the movement of data in the pixels. Similar to data rotation, the input data can be completely different if they shift by pixel size, which can improve the learning efficiency. The data shift for this algorithm was set to 3. Overfitting problems may occur in deep-learning algorithms. To prevent this, a verification process is required, and in this study the verification was conducted 50 times.
'Time elapsed' refers to the time taken by this algorithm to classify the data. This algorithm classified the data in 23 m 30 s. An epoch means that the MobileNet v2-Bilstm algorithm learns all datasets once. The number of epochs of this algorithm was set to 40. Iteration refers to learning 1/n-sized data once the entire dataset is divided into n equal parts. In the arrhythmia detection study, it was set as 120. The learning rate is the amount an algorithm can learn at once, and in this study, the learning rate was set to 0.01. If the learning rate is too large or too small, overfitting occurs; therefore, it is common to set it to 0.001 to 0.01. Figure 11 shows the ROC curve for the MobileNetV2-BiLSTM algorithm. The higher the value on the left, the better the ROC curve. The ROC curve can be used to calculate the change in the TPR while changing the FPR from 0 to 1. When the threshold is specified as 1, the FPR is 0, and conversely, when the threshold is 0, the FPR becomes 1. The TPR based on the change in the FPR value becomes the ROC curve.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 14 of 19 Figure 11. Arrhythmia detection results. A larger AUC corresponds to better performance. AUC denotes the area under the graph, and the four graphs represent the AUC that is proportional to the accuracy of each class.
The area under the ROC curve (AUC) for each class indicated the performance of the classifier, with a larger area corresponding to a better performance. The AUCs for all the data are presented in Table 4. The PVC, LBBB, and AFIB data had AUCs close to 1, indicating that the signals were classified accurately. The NSR data had an AUC value of 0.982, which was smaller than those of the other classes, but still indicates a high-performance classifier (defined as having an AUC value of ≥0.8). Therefore, the NSR data were also classified accurately. Figure  12 shows the results of K-fold cross validation. Figure 11. Arrhythmia detection results. A larger AUC corresponds to better performance. AUC denotes the area under the graph, and the four graphs represent the AUC that is proportional to the accuracy of each class.
The area under the ROC curve (AUC) for each class indicated the performance of the classifier, with a larger area corresponding to a better performance. The AUCs for all the data are presented in Table 4. The PVC, LBBB, and AFIB data had AUCs close to 1, indicating that the signals were classified accurately. The NSR data had an AUC value of 0.982, which was smaller than those of the other classes, but still indicates a high-performance classifier (defined as having an AUC value of ≥0.8). Therefore, the NSR data were also classified accurately. Figure 12 shows the results of K-fold cross validation. The most important goal of deep-learning algorithms is "how accurately data can be classified". In general, deep-learning algorithms are used to predict unknown data with limited data held by the system. The accuracy increases with an increasing number of data. There is no problem if the data are sufficient, but when the system learns with limited data, it is important to use the data it holds as efficiently as possible. Cross validation reveals how efficiently the system can use data. Therefore, this method was used to verify the performance of the proposed algorithm.
The performance of the MobileNetV2-BiLSTM algorithm was analyzed using K-fold cross validation. The dataset was divided into 12 groups, and one of the groups was extracted and used as the test set. The remaining 11 groups were used as training sets. The test was repeated 12 times.
The accuracy increased with the K value. The accuracy of the algorithm was maximized when the K value was 10, and it decreased as K increased further. The minimum accuracy was 79.7%, the maximum accuracy was 93.3%, and the average accuracy was 86.21%.
ECG signals contain noise. For example, noise caused by factors such as patient movement, impedance between the skin and electrodes, and movement of cables reduces the accuracy of the algorithm and should be minimized. In this study, a MATLAB-based Butterworth filter was designed. The filter band was set as 5 Hz for high-pass and 250 Hz for low-pass, and the band-stop filter was designed for 60 Hz. Figure 13 shows the experimental evaluation results for the performance of the MATLAB-based filter. The raw signal with noise is shown in blue, and the result of applying the filter is shown in red. Comparing the two signals revealed that the noise of the baseline was reduced. The most important goal of deep-learning algorithms is "how accurately data can be classified". In general, deep-learning algorithms are used to predict unknown data with limited data held by the system. The accuracy increases with an increasing number of data. There is no problem if the data are sufficient, but when the system learns with limited data, it is important to use the data it holds as efficiently as possible. Cross validation reveals how efficiently the system can use data. Therefore, this method was used to verify the performance of the proposed algorithm.
The performance of the MobileNetV2-BiLSTM algorithm was analyzed using K-fold cross validation. The dataset was divided into 12 groups, and one of the groups was extracted and used as the test set. The remaining 11 groups were used as training sets. The test was repeated 12 times.
The accuracy increased with the K value. The accuracy of the algorithm was maximized when the K value was 10, and it decreased as K increased further. The minimum accuracy was 79.7%, the maximum accuracy was 93.3%, and the average accuracy was 86.21%.
ECG signals contain noise. For example, noise caused by factors such as patient movement, impedance between the skin and electrodes, and movement of cables reduces the accuracy of the algorithm and should be minimized. In this study, a MATLAB-based Butterworth filter was designed. The filter band was set as 5 Hz for high-pass and 250 Hz for low-pass, and the band-stop filter was designed for 60 Hz. Figure 13 shows the experimental evaluation results for the performance of the MATLAB-based filter. The raw signal with noise is shown in blue, and the result of applying the filter is shown in red. Comparing the two signals revealed that the noise of the baseline was reduced.
The digital filter was designed with general settings. Sometimes, the input data require a different filter band. For example, if another ECG study is performed using a different ECG database, the settings of the Butterworth filter used in this study must be changed. Alternatively, a filter other than the Butterworth filter may need to be used. Figure 14 presents ECG signals to which the Fourier transform was applied. The two ECG signals represent the results of the fast Fourier transform. They exhibit different frequency characteristics. Considering these frequency characteristics, the proposed algorithm should design an appropriate filter regardless of which ECG signal is input. The plan currently under consideration involves calculating the signal-to-noise ratio (SNR) of the signal and redesigning the filter when the SNR is too low. Figure 13. Noise removal using a MATLAB-based digital filter. The raw and filtered signals are shown in blue and red, respectively. The cutoff frequencies were set as 5, 250, and 60 Hz for lowpass, high-pass, and band-stop, respectively.
The digital filter was designed with general settings. Sometimes, the input data require a different filter band. For example, if another ECG study is performed using a different ECG database, the settings of the Butterworth filter used in this study must be changed. Alternatively, a filter other than the Butterworth filter may need to be used. Figure 14 presents ECG signals to which the Fourier transform was applied. The two ECG signals represent the results of the fast Fourier transform. They exhibit different frequency characteristics. Considering these frequency characteristics, the proposed algorithm should design an appropriate filter regardless of which ECG signal is input. The plan currently under consideration involves calculating the signal-to-noise ratio (SNR) of the signal and redesigning the filter when the SNR is too low.  Figure 14. Fourier transform was applied to two ECG signals with different frequency characteristics. After applying digital filters to eliminate noise, Fourier transforms confirmed that the designed filter removed the signal in the appropriate frequency band.
The ECG signals used in this study were obtained from the MIT-BIH database. Digital filters were designed and applied to increase the classification accuracy. Subsequently, Figure 13. Noise removal using a MATLAB-based digital filter. The raw and filtered signals are shown in blue and red, respectively. The cutoff frequencies were set as 5, 250, and 60 Hz for low-pass, high-pass, and band-stop, respectively. Figure 13. Noise removal using a MATLAB-based digital filter. The raw and filtered signals are shown in blue and red, respectively. The cutoff frequencies were set as 5, 250, and 60 Hz for lowpass, high-pass, and band-stop, respectively.
The digital filter was designed with general settings. Sometimes, the input data require a different filter band. For example, if another ECG study is performed using a different ECG database, the settings of the Butterworth filter used in this study must be changed. Alternatively, a filter other than the Butterworth filter may need to be used. Figure 14 presents ECG signals to which the Fourier transform was applied. The two ECG signals represent the results of the fast Fourier transform. They exhibit different frequency characteristics. Considering these frequency characteristics, the proposed algorithm should design an appropriate filter regardless of which ECG signal is input. The plan currently under consideration involves calculating the signal-to-noise ratio (SNR) of the signal and redesigning the filter when the SNR is too low.
(a) (b) Figure 14. Fourier transform was applied to two ECG signals with different frequency characteristics. After applying digital filters to eliminate noise, Fourier transforms confirmed that the designed filter removed the signal in the appropriate frequency band.
The ECG signals used in this study were obtained from the MIT-BIH database. Digital filters were designed and applied to increase the classification accuracy. Subsequently, Figure 14. Fourier transform was applied to two ECG signals with different frequency characteristics. After applying digital filters to eliminate noise, Fourier transforms confirmed that the designed filter removed the signal in the appropriate frequency band.
The ECG signals used in this study were obtained from the MIT-BIH database. Digital filters were designed and applied to increase the classification accuracy. Subsequently, the learning data were sufficiently secured using the matching pursuit algorithm. Data classification was performed using the MobileNetV2-BiLSTM algorithm.
The dataset used consisted of four classes, and the number of data was unbalanced. To solve this problem, the matching pursuit algorithm was used to analyze signals, adding 500 pieces of data from all classes. In this study, the proposed data augmentation method was used only to detect heart disease. However, if research on data aggregation progresses, we will be able to solve problems arising from unbalanced datasets. In addition, because the proposed method can arbitrarily add data measured in a short period of time, e.g., ECG signals, sufficient learning data can be secured, which can increase the algorithm accuracy.
The advantage of this algorithm is that arrhythmia diagnosis using a mobile device is possible. Deep learning, which has recently been used in various ways, has problems, such as the use of high-performance CPUs and the consumption of large amounts of memory and power, because it focuses on performance. However, MobileNet is an algorithm designed for use in situations where the computational performance is limited; thus, it can solve the aforementioned problems.
The accuracy of the proposed model was 91.7%. Considering that the accuracy of the existing arrhythmia detection algorithm has reached approximately 99%, the performance of the proposed algorithm was not excellent. In addition to the accuracy, the sensitivity, specificity, and precision were poor compared with previous studies. However, the objective of the proposed method is to allow heart patients to check their health using mobile devices in their daily lives.
To overcome the disadvantages of this study, further research on the weight reduction of the model, the learning method of deep learning, and noise reduction is needed.

Conclusions
We proposed an artificial-intelligence model for classifying arrhythmia using MobileNetV2-BiLSTM and a matching pursuit algorithm. The ECG data measured over a short period were augmented with sufficient quantities of data using the matching pursuit algorithm, and the MobileNetV2-BiLSTM-based arrhythmia diagnosis results exhibited an accuracy of 91.7%. The performance of the model was evaluated using the ROC curve, and the average AUC was 0.994, indicating that the performance of the classifier was excellent. The algorithm arbitrarily added ECG data to increase its accuracy. In this process, the matching pursuit algorithm was used, and a large number of data could be secured. The data augmentation method used in the present study can be applied to imbalanced datasets. If the dataset is imbalanced, an overfitting problem can occur, reducing the accuracy. If the matching pursuit algorithm can solve the imbalance problem, the proposed algorithm can classify data from various datasets in addition to the ECG datasets used in this study. Owing to the widespread use of portable devices, various applications of lightweight algorithms will be developed in the future. After further research, the proposed MobileNetV2-BiLSTM model is expected to be useful in various fields, such as healthcare and the Internet of Things, in addition to disease monitoring.
Author Contributions: S.S. constructed the arrhythmia-detection algorithm and suggested the concepts for the work; M.K. performed the experiments; G.Z. analyzed the ECG data; J.J. and Y.T.K. supervised the writing of the article. All authors have read and agreed to the published version of the manuscript.