Compressed Deep Learning to Classify Arrhythmia in an Embedded Wearable Device

The importance of an embedded wearable device with automatic detection and alarming cannot be overstated, given that 15–30% of patients with atrial fibrillation are reported to be asymptomatic. These asymptomatic patients do not seek medical care, hence traditional diagnostic tools including Holter are not effective for the further prevention of associated stroke or heart failure. This is likely to be more so in the era of COVID-19, in which patients become more reluctant on hospitalization and checkups. However, little literature is available on this important topic. For this reason, this study developed efficient deep learning with model compression, which is designed to use ECG data and classify arrhythmia in an embedded wearable device. ECG-signal data came from Korea University Anam Hospital in Seoul, Korea, with 28,308 unique patients (15,412 normal and 12,896 arrhythmia). Resnets and Mobilenets with model compression (TensorFlow Lite) were applied and compared for the diagnosis of arrhythmia in an embedded wearable device. The weight size of the compressed model registered a remarkable decrease from 743 MB to 76 KB (1/10000), whereas its performance was almost the same as its original counterpart. Resnet and Mobilenet were similar in terms of accuracy, i.e., Resnet-50 Hz (97.3) vs. Mo-bilenet-50 Hz (97.2), Resnet-100 Hz (98.2) vs. Mobilenet-100 Hz (97.9). Here, 50 Hz/100 Hz denotes the down-sampling rate. However, Resnets took more flash memory and longer inference time than did Mobilenets. In conclusion, Mobilenet would be a more efficient model than Resnet to classify arrhythmia in an embedded wearable device.


Introduction
Heart disease is a major contributor for disease burden on the globe [1][2][3][4][5][6]. The estimated number of deaths from cardiovascular disease was 17.9 million in the world for Y2019 (Y2019 hereafter), which was 32% of global deaths [1]. The age-standardized death rate from atrial fibrillation, the most common arrhythmia, showed a great increase from 0.8 to 1.6 per 100,000 for men (or 0.9 to 1.7 per 100,000 for women) in the world during 1990-2010 [2]. This worldwide trend agrees with its Korean counterpart. Heart disease ranked second in Korea as the cause of death for Y2020 (63.0 per 100,000) [3] and as the source of disease burden for Y2015 (3475 disease-adjusted life years per 100,000) [4]. In addition, the number of hospitalizations for atrial fibrillation registered a rapid growth of 420% from 767 to 3986 per 1 million during 2006-2015 [5].
For this reason, emerging literature has focused on the early diagnosis of arrhythmia, using deep neural networks for their better performance measures than those of other approaches [6][7][8][9][10][11][12][13][14]. These studies utilized electrocardiogram (ECG) data, applying convolutional neural networks (Alexnet, Resnet) [6][7][8][9][10][11][12], recurrent neural networks (long short-term memory) [13] or both [14] with various class categories and accuracy results (80-99%). For instance, a recent study [11] employed ECG data in a general hospital, comparing 30 convolutional neural networks for the classification of the normal sinus rhythm vs. atrial fibrillation status: six Alexnets with five convolutional layers, three fully connected layers and 3 to 256 kernels; and 24 Resnets with 2 to 8 residual blocks and 2 to 64 kernels. The accuracy of the best Alexnet was 0.997 with 24 kernels in the first layer, 5,268,818 parameters and the training time of 89 s, while the best Resnet showed the accuracy of 0.999 with six residual blocks, 32 initial kernels in the first layer, 248,418 parameters and the training time of 253 s. In general, the performance of Resnet improved as the number of its residual blocks (its depth) increased. Based on the results of this study, for atrial fibrillation diagnosis, Resnet might be a good model with higher accuracy and fewer parameters than its Alexnet counterparts.
A recent follow-up [12] made two extensions to the study above. In this follow-up, six types of arrhythmia were considered, i.e., atrial fibrillation, atrial flutter, sinus bradycardia, sinus tachycardia, premature ventricular contraction and first-degree atrioventricular block. This study also introduced Resnet with a squeeze-and-excitation block (SE-Resnet) and compared SE-Resnet to its baseline counterpart for varying layer depth (18,34,50,101,152). Based on the findings of this study, SE-Resnet outperformed its baseline counterpart across the board. Specifically, SE-Resnet with 152 layers showed the highest F1 score of 97.05% with a margin of 1.40% compared to its baseline counterpart. However, these models are reported to take too much memory for an embedded wearable device. The importance of an embedded wearable device with automatic detection and alarming cannot be overstated, given that 15-30% of patients with atrial fibrillation are reported to be asymptomatic [15][16][17]. These asymptomatic patients do not seek medical care hence traditional diagnostic tools including Holter are not effective for the further prevention of associated stroke or heart failure [18]. This is likely to be more so in the era of COVID-19, in which patients become more reluctant on hospitalization and checkup [19,20].
Resnet [21], Mobilenet [22] and Litenet [23] are deep learning candidates for embedded vision applications. Resnet is based on residual learning (to be explained in the next section). Residual learning brought it to the first place in 2015 ImageNet Large Scale Visual Recognition Challenge with 152 layers and top-5 error rate of 3.6%. Residual learning brought it to much greater depth and accuracy compared to Virtual Geometry Group (the second winner in 2014 with 24 layers and a top-5 error rate of 6.8%) [21]. Mobilenet [22] and Litenet [23] center on depth-wise and point-wise convolutions, which reduce the size of input image and the number of its channels, respectively. A recent study used Litenet to classify arrhythmia and achieved the accuracy of 97.78% in the inference time of 25 microseconds [23]. These deep learning models depend on the strengths of convolutional layers, which focus on global information. On the other hand, another group of models rely on the distinctive characteristics of recurrent layers, which focus on sequential information [24,25]. One recent study used a linear combination of simple recurrent neural networks for the diagnosis of arrhythmia, recording the accuracy of 99.60% in the inference time of 31.2 ms [24]. Likewise, another recent study requested due attention to the advantage of combining convolutional layers and simplest (Vanilla) recurrent layers for the diagnosis of arrhythmia, recording the accuracy of 99.80% in the inference time of 3 min [25]. However, the existing literature employed a public dataset (MIT-BIH Arrhythmia Database) and its inference was performed on personal computers, not in an embedded wearable device. In this context, this study introduced efficient deep learning with model compression, which is tailored for ECG data and arrhythmia classification in an embedded wearable device. To the best of our knowledge, this is the first study in this direction.
This article is organized in the following manner. Participants, deep learning models and their compression methods are described in the next section. This is followed by the presentation of their results in terms of performance, model size, inference time and current consumption. Finally, the contributions, limitations and conclusions of this study are discussed in the last section.

Participants and Categories
ECG-signal data came from Korea University Anam Hospital in Seoul, Korea, with 28,308 unique patients. Other information including age, gender and medical history was excluded from this dataset because of hospital rules and regulations. This retrospective study was approved by the Institutional Review Board of Korea University Anam Hospital on 12 February 2018 (2018AN0037). Informed consent was waived by the IRB given that data were de-identified. Lead-II ECG-signal data (taken from 12-lead ECG image traces) were measured for 10 s at the frequency of 200 Hz. Among the 28,308 patients, 80%, 10% and 10% were used as training, validation and test sets, respectively. Training/validation was performed in a personal computer whereas testing was completed on an embedded wearable device. Among the 28,308 patients, 15,412 were diagnosed as normal (Categories 1-4 in Table 1) and 12,896 as arrhythmia (Categories 5-7 in the table). A normal ECG wave has five elements: P (atrial contraction); Q (downward deflection immediately before ventricular contraction); R (the peak of ventricular contraction); S (downward deflection immediately after ventricular contraction); and T (ventricular recovery). On the other hand, an atrial fibrillation wave registers irregularity, e.g., a P element is missing and a QRS element is irregular with no regular pattern. An example of the preprocessed ECG signal is given in Figure 1.

Deep Learning Models
For the diagnosis of arrhythmia in an embedded wearable device, this study applied and compared two neural network models, Resnet [21] and Mobilenet [22], with model compression in TensorFlow Lite [26]. The models used in this study are shown in Figures 2 and 3. A neural network is a network of "neurons", i.e., information units combined through weights. Usually, the neural network has one input layer, one, two or three intermediate layers and one output layer. Neurons in a previous layer connect with "weights" in the next layer and these weights represent the strengths of connections between neurons in a previous layer and their next-layer counterparts. This process starts from the input layer, continues through intermediate layers and ends in the output layer (feedforward operation). Then, learning happens: These weights are accommodated based on how much they contributed to the loss, a difference between the actual and predicted final outputs. This process starts from the output layer, continues through intermediate layers and ends in the input layer (backpropagation operation). The two operations are replicated until a certain expectation is met regarding the accurate diagnosis of the dependent variable. In other words, the performance of the neural network improves as long as its learning continues. Finally, a deep neural network is a neural network with a large number of intermediate layers, e.g., 5, 10 or even 1000. The deep neural network is called "deep learning" given that learning "deepens" through numerous intermediate layers [11,12]. Specifically, a certain type of deep learning models, so-called convolutional neural networks, have emerged as dominant deep learning models in the past decade. The convolutional neural network has convolutional layers, in which a kernel passes across input data and performs "convolution", that is, computes the dot product of its own elements and their input-data counterparts. The operation of convolution helps the convolutional neural network to detect specific characteristics of the input data, e.g., the form of a normal rhythm vs. its arrhythmia counterpart. However, the convolutional neural network has an issue of gradient vanishing: As it becomes deeper (the number of its layers increases), the gradient of the loss with respect to the weight becomes 0 quickly. In this context, it has been an important task for deep learning experts to develop a new deep learning model, which manages its considerable depth (e.g., 100 layers) and unprecedented performance at the same time [11,12,21]. Resnet solved this great challenge based on residual learning explained below. This new deep learning model, which ranked first in 2015 ImageNet Large Scale Visual Recognition Challenge, was much deeper and more accurate than Virtual Geometry Group the second winner in 2014: the former network with 152 layers and top-5 error rate of 3.6% vs. the latter network with 24 layers and top-5 error rate of 6.8%. In its predecessor network, output y was the function of input x, i.e., f (x), whereas in Resnet, y is f (x) + x. This helps to focus on "residual learning", i.e., learning the residual part of f (x) besides x. In addition, this helps to overcome the gradient-vanishing problem: f'(x) + 1 > 1 [21]. Indeed, Mobilenet was presented as an efficient deep learning model for embedded vision applications: It is based on depth-wise and point-wise convolutions, which reduce the size of input image and the number of its channels, respectively [22].
Finally, TensorFlow Lite is a collection of tools for the compression and inference of an original TensorFlow model in an embedded device [26]. Once we complete the training of the original model, we can compress it in TensorFlow Lite (model compression) and we can run the inference of the compressed model in an embedded device. It is not an option in TensorFlow Lite to train a model at this point. The common strategies of model compression are pruning, quantization, clustering, low-rank approximation and knowledge distillation at this point [26][27][28] (Table 2). We use pruning to remove some of model weights, i.e., to set their values as zeroes (suitable for both training from scratch and using a pre-trained model) [29]. We use quantization to decrease the sizes of the weights by mapping their values in an original set to their smaller-set counterparts (e.g., 8-bit to 1-bit) (suitable for both training from scratch and using a pre-trained model) [30]. We use clustering to divide the weights into several groups, then share central values for all weights in the same group (suitable for both training from scratch and using a pre-trained model) [31]. We use low-rank approximation to reduce the redundancy (or "rank") of convolutional filters, that is, to approximate the original filters based on their lower-rank counterparts (suitable for both training from scratch and using a pre-trained model). Finally, we use knowledge distillation to condense an original model to its smaller counterpart with a similar loss function (and performance) (suitable for using a pre-trained model) [32]. TensorFlow Lite supports pruning, quantization and clustering at this point [26].

Pruning
We use pruning to remove some of model weights, i.e., to set their values as zeroes: suitable for both training from scratch and using a pre-trained model [29] Quantization We use quantization to decrease the sizes of the weights by mapping their values in an original set to their smaller-set counterparts (e.g., 8-bit to 1-bit): suitable for both training from scratch and using a pre-trained model [30] Clustering We use clustering to divide the weights into several groups, then share central values for all weights in the same group: suitable for both training from scratch and using a pre-trained model [31] Low-Rank Approximation We use low-rank approximation to reduce the redundancy (or "rank") of convolutional filters, that is, to approximate the original filters based on their lower-rank counterparts: suitable for both training from scratch and using a pre-trained model

Knowledge Distillation
We use knowledge distillation to condense an original model to its smaller counterpart with a similar loss function (and performance): suitable for training from scratch [32]

Results
Firstly, Resnet and Mobilenet were compared in terms of six performance measures in this study, i.e., accuracy, sensitivity (or recall), specificity, area under the receiver-operatingcharacteristic curve (AUC), precision and F1 score. Their equations were presented as (1)-(5) below. Here, TP, FP, FN and TN represent true positive, false positive, false negative and true negative defined in a confusion matrix (Table 3).  Table 3) Comparison was made between the original Resnet and its compressed counterpart in terms of the model weight size and performance (accuracy) in Table 4. The weight size of the compressed model registered a remarkable decrease from 743 MB to 76 KB (1/10,000), whereas its performance was almost the same as its original counterpart. In addition, a comparison was made between Resnet and Mobilenet in terms of the six performance measures in Table 5 and    (Figure 6). Fourthly, current consumption was reported to be similar among the four models, i.e., Resnet-50 Hz (7.4 mA), Mobilenet-50 Hz (7.5 mA), Resnet-100 Hz (7.4 mA), Mobilenet-100 Hz (7.5 mA) (Figure 7). Overall, Mobilenet would be a more efficient model than Resnet to classify arrhythmia in an embedded wearable device.

Contributions of Study
The emerging literature has focused on the early diagnosis of arrhythmia, using deep neural networks for better performance measures than those of other approaches. These studies utilized ECG data, applying convolutional neural networks, recurrent neural networks or both with various class categories and accuracy results. However, these models are reported to take too much memory for an embedded wearable device. The importance of an embedded wearable device with automatic detection and alarming cannot be overstated, given that 15-30% of patients with atrial fibrillation are reported to be asymptomatic. These asymptomatic patients do not seek medical care, hence traditional diagnostic tools including Holter are not effective for the further prevention of associated stroke or heart failure. This is likely to be more so in the era of COVID-19, in which patients become more reluctant on hospitalization and checkup. However, little literature is available on this important topic. For this reason, this study developed efficient deep learning with model compression, which is designed to use ECG data and classify arrhythmia in an embedded wearable device.
A rare attempt was made to use a "lightweight" convolutional neural network (Litenet) for the classification of arrhythmia and achieved the accuracy of 97.78% in the inference time of 25 microsecond [23]. Here, the term "lightweight" means the size of input image and/or the number of its channels were reduced as in Mobilenet. The core of Litenet is the Lite module, a modified version of the inception module with two distinctive characteristics, i.e., (1) the kernel sizes of 1 × 1, 1 × 2 and 1 × 3 and (2) depth-wise and point-wise convolutions, which reduce the size of the input image and the number of its channels ( Figure 8). ECG data for this study came from the MIT-BIH Arrhythmia Database with 109,449 samples from 48 unique participants. These samples were augmented and oversampled to achieve a balance between normal and arrhythmia categories. Then, five deep learning models were compared in terms of accuracy and inference time: Alexnet, Googlenet, Litenet, Mobilenet and Squeezenet. Litenet ranked third in accuracy and first in inference time.
Another study employed a lightweight recurrent neural network for the diagnosis of arrhythmia and recorded the accuracy of 99.80% in the inference time of 3 min [25]. This study developed the fused lightweight recurrent neural network module: combination of convolutional layers and the simplest (Vanilla) recurrent layers to achieve efficiency and accuracy at the same time ( Figure 9). ECG data for this study also came from the MIT-BIH Arrhythmia Database with 48 unique participants. Their samples were undersampled to achieve a balance between normal and arrhythmia categories. However, these studies relied on a public dataset (MIT-BIH Arrhythmia Database) and their inference was carried out in personal computers, not in an embedded wearable device. For this reason, this study developed efficient deep learning with model compression, which is designed to use ECG data and classify arrhythmia in an embedded wearable device. To the best of our knowledge, this is the first study in this direction.

Limitations of Study
First, this study used the binary categories of normal vs. arrhythmia conditions. Introducing the multiple categories of arrhythmia would be a great extension of research on this topic. Secondly, little literature is available, and more study is to be done regarding the comparison of convolutional neural networks and their recurrent counterparts in terms of model compression, model performance and inference time. As addressed above, the convolutional neural network has convolutional layers, in which a kernel passes across input data and performs "convolution", that is, computes the dot product of its own elements and their input-data counterparts. The operation of convolution helps the convolutional neural network to detect specific characteristics of the input data, e.g., the form of a normal rhythm vs. its arrhythmia counterpart. On the other hand, in the recurrent neural network, the current output information depends, in a repetitive (or "recurrent") pattern, on the current input information and the previous hidden state (which is the memory of the network on what happened in all previous periods) [24,25,33]. In other words, the convolutional neural network focuses on global information whereas its recurrent counterpart focuses on sequential information. Combining these unique strengths is expected to render great insights and rich applications for the field of efficient deep learning with model compression. To the best of our knowledge, however, no study has been completed in this direction.
Thirdly, the standardization of ECG diagnostic criteria would strengthen the agreement of clinical experts and the performance of computer algorithms regarding ECG interpretation [31]. Clinical experts with rich experience, the gold standard, often disagree in their ECG interpretation, hence, more endeavor is to be made in this direction. Finally, this study did not consider the application of reinforcement learning to find the most efficient deep learning models [32,33] for the classification of arrhythmia in an embedded wearable device. Reinforcement learning helps to find the optimal deep learning model with the best performance in an embedded wearable device, given the budget constraint of model size, inference time, current consumption and so on as in this study [32,33]. This study considered Resnet and Mobilenet to overcome the issue of gradient vanishing and to manage considerable depth and best performance in an embedded wearable device, given the budget constraint of model size, inference time and current consumption. These two models were chosen largely because there have been few options available. However, various deep learning models can be developed with different sets of metrics including performance, model size, inference time and current consumption. How to optimize the deep learning model in an embedded wearable device given the constraint of various metrics is still an uncharted territory and much more research is to be completed for this emerging field.

Conclusions
Little literature is available on compressed deep learning to classify arrhythmia in an embedded wearable device. In this context, this study introduced efficient deep learning with model compression, which is tailored for ECG data and arrhythmia classification in an embedded wearable device. To the best of our knowledge, this is the first study in this direction. Based on the results of this study, Mobilenet would be a more efficient model than Resnet to classify arrhythmia in an embedded wearable device.