Arrhythmia Diagnosis by Using Level-Crossing ECG Sampling and Sub-Bands Features Extraction for Mobile Healthcare.

Mobile healthcare is an emerging technique for clinical applications. It is usually based on cloud-connected biomedical implants. In this context, a novel solution is presented for the detection of arrhythmia by using electrocardiogram (ECG) signals. The aim is to achieve an effective solution by using real-time compression, efficient signal processing, and data transmission. The system utilizes level-crossing-based ECG signal sampling, adaptive-rate denoising, and wavelet-based sub-band decomposition. Statistical features are extracted from the sub-bands and used for automated arrhythmia classification. The performance of the system was studied by using five classes of arrhythmia, obtained from the MIT-BIH dataset. Experimental results showed a three-fold decrease in the number of collected samples compared to conventional counterparts. This resulted in a significant reduction of the computational cost of the post denoising, features extraction, and classification. Moreover, a seven-fold reduction was achieved in the amount of data that needed to be transmitted to the cloud. This resulted in a notable reduction in the transmitter power consumption, bandwidth usage, and cloud application processing load. Finally, the performance of the system was also assessed in terms of the arrhythmia classification, achieving an accuracy of 97%.


Introduction
An electrocardiogram (ECG) signal possesses critical information about cardiac functionality [1]. An abnormality of cardiac rhythm is a sign of certain diseases that can be diagnosed by an effective analysis of an ECG [2]. Heart diseases are one of the major threats to human life [3], and a timely diagnosis can lead to better measures. To do this, individual heartbeats are analyzed by exploiting their frequency content and morphological patterns for the automatic diagnosis of ECG arrhythmia [4][5][6][7][8].

Dataset
The performance of the designed solution was studied by using the MIT-BIH arrhythmia database (ECG arrhythmia time series are available under (https://physionet.org/content/mitdb/1.0.0/)) [36]. A set of twelve 30-min ECG recordings containing clinically important arrhythmias was employed. Each channel was band-limited between [0.5, 60] Hz by using an analog antialiasing filter and was then acquired with an 11-bit resolution analog-to-digital converter (ADC). The sampling rate was360Hz. Experienced cardiologists labeled all acquired heartbeats. To extract individual heartbeats, each considered ECG signal wassegmented for a time length of 0.9 s. Five different arrhythmias classes were considered: normal signals (N), right bundle branch block (RBBB), left bundle branch block (LBBB), atrial premature contraction (APC) and premature ventricular contraction (PVC). To achieve equal representation, 300 instances were considered for each class. Therefore, in total, there were 1500 instances, each belonging to one of the five considered classes. To avoid any bias, heartbeats related to each class were collected from various records (clear from Table 1).

Level-Crossing A/D Converter (LCADC)
To evaluate the LCADC, the sampled version of a band-limited signal, given by ( ) and obtained from the MIT-BIH dataset, was reconstructed. If ( ) is the reconstructed quasi-analog version of ( ), then the relationship between ( ) and ( ) can be mathematically presented by

LCADC ASA Sampled Signal (x n , t n )
Classification x (t) Figure 1. The proposed system block diagram. LCADC: level-crossing analog-to-digital converters; ASA: activity selection algorithm.

Dataset
The performance of the designed solution was studied by using the MIT-BIH arrhythmia database (ECG arrhythmia time series are available under (https://physionet.org/content/mitdb/1.0.0/)) [36]. A set of twelve 30-min ECG recordings containing clinically important arrhythmias was employed. Each channel was band-limited between [0.5, 60] Hz by using an analog antialiasing filter and was then acquired with an 11-bit resolution analog-to-digital converter (ADC). The sampling rate was 360 Hz. Experienced cardiologists labeled all acquired heartbeats. To extract individual heartbeats, each considered ECG signal wassegmented for a time length of 0.9 s. Five different arrhythmias classes were considered: normal signals (N), right bundle branch block (RBBB), left bundle branch block (LBBB), atrial premature contraction (APC) and premature ventricular contraction (PVC). To achieve equal representation, 300 instances were considered for each class. Therefore, in total, there were 1500 instances, each belonging to one of the five considered classes. To avoid any bias, heartbeats related to each class were collected from various records (clear from Table 1).

Level-Crossing A/D Converter (LCADC)
To evaluate the LCADC, the sampled version of a band-limited signal, given by y(t n ) and obtained from the MIT-BIH dataset, was reconstructed. If x(t) is the reconstructed quasi-analog version of y(t n ), then the relationship between x(t) and y(t n ) can be mathematically presented by using Equation (1), where U is the up-sampling factor. The up-sampling is realized by using the standard cascaded cubic spline interpolators and anti-imaging filters [37].
The band-limited signal x(t) is digitized with a LCADC. The frequency content of x(t) is limited to [0.5, 60] Hz [9,36]. The LCADC is designed on the basis of level-crossing sampling (LCS) [28,38]. In this case, a sample is only acquired when x(t) crosses one of the predefined thresholds. For a given LCADC amplitude dynamic, ∆V, and resolution, M, the sampling frequency is piloted by the signal. Samples are irregularly spaced in time, and the count of samples is proportional to the slope of x(t) [31]. Sample amplitudes are equal to the predefined thresholds. The sampling instants are defined by Equation (2), and the process is shown in Figure 2 [28,38], where t n is the present sampling instant, t n−1 , is the preceding one and the time step between the present and the preceding sampling instants is dt n . LCADC parameters are selected according to the approach described in [35]. This approach is based on the uniform-quantization scheme. Therefore, its quantum, q, can be calculated as: q = ∆V 2 M−1 [38].
The phenomenon of hysteresis is also embedded in the LCADC [29]. A new sample is only acquired when there is a difference of q with respect to the preceding sample amplitude. The process is shown in Figure 2 and can be mathematically expressed as: x n = x n−1 ± q. It improves LCADC efficiency in terms of real-time compression [29]. The QRS complexes of heartbeats contain the most significant arrhythmia-related information [16,24]. The LCADC acquires the relevant ECG information, QRS complexes, at adaptive-rates while avoiding the remaining low amplitude components [24]. Therefore, it collects a reduced number of samples while comparing them with the classical counterparts [24]. standard cascaded cubic spline interpolators and anti-imaging filters [37].
x (t) = y(t n U ). (1) The band-limited signal ( )is digitized with a LCADC. The frequency content of ( ) is limited to [0.5, 60] Hz [9,36]. The LCADC is designed on the basis of level-crossing sampling (LCS) [28,38]. In this case, a sample is only acquired when ( ) crosses one of the predefined thresholds. For a given LCADC amplitude dynamic, ∆ , and resolution, , the sampling frequency is piloted by the signal. Samples are irregularly spaced in time, and the count of samples is proportional to the slope of ( ) [31]. Sample amplitudes are equal to the predefined thresholds. The sampling instants are defined by Equation (2), and the process is shown in Figure 2 [28,38], where is the present sampling instant, , is the preceding one and the time step between the present and the preceding sampling instants is .LCADC parameters are selected according to the approach described in [35]. This approach is based on the uniform-quantization scheme. Therefore, its quantum, , can be calculated as: = ∆ [38].
The phenomenon of hysteresis is also embedded in the LCADC [29]. A new sample is only acquired when there is a difference of with respect to the preceding sample amplitude. The process is shown in Figure 2 and can be mathematically expressed as: = . It improves LCADC efficiency in terms of real-time compression [29]. The QRS complexes of heartbeats contain the most significant arrhythmia-related information [16,24]. The LCADC acquires the relevant ECG information, QRS complexes, at adaptive-rates while avoiding the remaining low amplitude components [24]. Therefore, it collects a reduced number of samples while comparing them with the classical counterparts [24]. The working principle of an LCADC is different from conventional ADCs [38]. While considering the ideal case, for conventional ADCs, the sampling instants are accurately known. However, the amplitudes of samples are quantized [38]. Quantization is the only source of error, and it depends on the selected ∆ and [35]. It is assessed in terms of the signal-to-noise ratio (SNR) [38]. The SNR is computable as: = 6.02 + 1.76. It presents an ideal ADC SNR for a full-scale monotonous sinusoidal input and describes its dependency on . On the contrary, the amplitudes of samples are ideally known for an LCADC. However, the instants of these samples are quantized according to the operating frequency, , of the timer circuit thatis used to record these instants [28].The SNR of an ideal LCADC can be calculated by using Equation (3) [38], where is the frequency of the full-scale sinusoid used to evaluate the LCADC. Equation (3) shows that the SNR of an ideal LCADC is independent of and is a function of and = . A 6.02dB improvement in the value of the SNR is achievable by halving Ttimer [28]. In this study, a 21-bit The working principle of an LCADC is different from conventional ADCs [38]. While considering the ideal case, for conventional ADCs, the sampling instants are accurately known. However, the amplitudes of samples are quantized [38]. Quantization is the only source of error, and it depends on the selected ∆V and M [35]. It is assessed in terms of the signal-to-noise ratio (SNR) [38]. The SNR is computable as: SNR dB = 6.02M + 1.76. It presents an ideal ADC SNR for a full-scale monotonous sinusoidal input and describes its dependency on M. On the contrary, the amplitudes of samples are ideally known for an LCADC. However, the instants of these samples are quantized according to the operating frequency, F Timer , of the timer circuit thatis used to record these instants [28]. The SNR of an ideal LCADC can be calculated by using Equation (3) [38], where f sig is the frequency of the full-scale sinusoid used to evaluate the LCADC. Equation (3) shows that the SNR of an ideal LCADC is independent of M and is a function of f sig and T Timer = 1 F Timer . A 6.02 dB improvement in the value of the SNR is achievable by halving T timer [28]. In this study, a 21-bit resolution timer wasused with F Timer = 1 MHz. These parameters allowed us to properly record a heartbeat without timer overflow and resulted in an ideal LCADC SNR of 73.25 dB. According to [4][5][6][12][13][14][15], an11-bit ADC resolution is appropriate and results in a precise, computer-based arrhythmia diagnosis. For the selected timer parameters, the obtained LCADC SNR wasequal to the theoretical SNR of an 11.9-bit classical ADC. This justified the selected system parameters for the considered application.

Activity Selection Algorithm (ASA)
The activity selection algorithm (ASA) segments LCADC output [27,28]. Thisalgorithmemploys sampling process non-uniformity for selecting the active parts of the signal while avoiding the redundant baseline [16,24,28]. The principle is clear from the algorithmic state machine (ASM) chart shown in Figure 3. In Figure 3, T 0 = 1 f min is the fundamental period of x(t), and f min is the lowest frequency component whose value is equal to 0.5 Hz [9,36]. The level-crossing concept based sampled signal active parts are identified by using values of T 0 and dt n . The condition dt n ≤ T 0 is selected to respect the Nyquist criterion for f min . L i is the length of the i th selected segment W i . N i is the number of samples for W i . L re f is the superior bound on L i , and its choice depends on the system parameters and the characteristics of the intended signal [28,38]. For this study L re f = 1-swas selected. At the beginning of each iteration, 'i' is incremented, and N i and L i are initialized to zero.
The traditional windowing functions do not provide interesting features of the ASA [27,29]. The ASA allows for the selection of the signal active portions while avoiding the redundant, unwanted ones [16,24,28]. In addition, the length of the window-function is automatically adjusted according to the temporal variations of the signal. This process avoids signal truncation, and, therefore, segmentation can be performed by using adaptive length rectangular windows. This avoids the use of arithmetically complex smoothening-window functions and creates an effective solution of the spectral leakage phenomenon [27]. resolution timer wasused with = 1 MHz. These parameters allowed us to properly record a heartbeat without timer overflow and resulted in an ideal LCADC SNR of 73.25dB. According to [4][5][6][12][13][14][15], an11-bit ADC resolution is appropriate and results in a precise, computer-based arrhythmia diagnosis. For the selected timer parameters, the obtained LCADC SNR wasequal to the theoretical SNR of an 11.9-bit classical ADC. This justified the selected system parameters for the considered application. = −11.19 − 20 . (3)

Activity Selection Algorithm (ASA)
The activity selection algorithm (ASA) segments LCADC output [27,28]. Thisalgorithmemploys sampling process non-uniformity for selecting the active parts of the signal while avoiding the redundant baseline [16,24,28]. The principle is clear from the algorithmic state machine (ASM) chart shown in Figure 3. In Figure 3, = is the fundamental period of ( ), and is the lowest frequency component whose value is equal to 0.5 Hz [9,36]. The level-crossing concept based sampled signal active parts are identified by using values of and . The condition ≤ is selected to respect the Nyquist criterion for . is the length of the i th selected segment . is the number of samples for . is the superior bound on , and its choice depends on the system parameters and the characteristics of the intended signal [28,38]. For this study = 1-swas selected. At the beginning of each iteration, 'i' is incremented, and and are initialized to zero. The traditional windowing functions do not provide interesting features of the ASA [27,29]. The ASA allows for the selection of the signal active portions while avoiding the redundant, unwanted ones [16,24,28]. In addition, the length of the window-function is automatically adjusted according to the temporal variations of the signal. This process avoids signal truncation, and, therefore, segmentation can be performed by using adaptive length rectangular windows. This avoids the use of arithmetically complex smoothening-window functions and creates an effective solution of the spectral leakage phenomenon [27].

Adaptive-Rate Resampling
For a given resolution M, the LCADC sampling frequency organizes by following the temporal variations of ( ). The maximum sampling frequency, , of a uniform quantization-based LCADC is defined by Equation (4) [28,38], where fmax is the ( ) bandwidth. Ain is the input signal amplitude.

Adaptive-Rate Resampling
For a given resolution M, the LCADC sampling frequency organizes by following the temporal variations of x(t). The maximum sampling frequency, Fs max , of a uniform quantization-based LCADC is defined by Equation (4) [28,38], where f max is the x(t) bandwidth. A in is the input signal amplitude.
The sampling frequency for W i can be calculated as: To benefit from the established signal processing techniques, W i is uniformly resampled by using simplified linear interpolation (SLI) [29], which modifies the resampled signal compared to the original; this variation depends on M, q, and the employed interpolator [39]. For SLI, the superior bound of error per resampled observation is q 2 [39].
The LCADC focuses on the active signal parts. Nevertheless, one LCADC shortfall is that the active signal parts can be digitized at superior rates compared to conventional digitization approaches [27,29]. The ASA overcomes this shortfall by examining features of W i and thenadjusting the system parameters accordingly [29]. In this way, the resampling frequency, Frs i , and the arithmetic complexity of the post-processing modules are adjusted by following the x(t) temporal disparities [29].

Adaptive-Rate Denoising
A band-pass finite impulse response (FIR) filters bank is designed for the effective online diminishing of unwanted noise from ECG signals [9]. The FIR filtering process can be mathematically described by using Equation (5), where x n is the incoming signal, x f n is the filtered signal, and h k are the coefficients of the K th order FIR filter.
Here, the filters bank was designed for the cut-off frequencies of [Fc min , Fc max ] Hz. Each filter was implemented for a different sampling frequency that was chosen from the set Fre f (cf. Equation (6)). The upper bound on Fref was selected as F r and, to assure a proper digital filtering operation, the lower bound on Fref was chosen as Fs min ≥ 2·F Cmax [29]. F r was the sampling frequency such that its value remained greater than and closer to F Nyq = 2·F Cmax . Q is the length of Fref, and its value is always chosen as a binary-weighted. In Equation (6), ∆ is a unique offset and can be computed as: ∆= The ASA examines the properties of W i and uses them for adjusting the denoising parameters such as the resampling frequency, Frs i , and the filter order, K i . The method of choosing Frs i and keeping it aligned with Fre f C is shown in Figure 4, which shows that a suitable filter, from the reference bank, is selected for each W i . Let h c k be the selected filter for W i that is sampled at Fre f C . Then, this selection can be made on the basis of Fre f and Fs i . For proper denoising, Frs i = Fre f C must bechosen [29].

Features Extraction
The wavelet transform (WT) is frequently used for the multi-resolution time-frequency analysis of non-stationary ECG like signals [4,40]. This transform can be mathematically expressed by The wavelet transform (WT) is frequently used for the multi-resolution time-frequency analysis of non-stationary ECG like signals [4,40]. This transform can be mathematically expressed by Equation (7), where s and u, respectively, represent the dilation and the translation parameters.
A discrete-time wavelet transform (DWT) is used for analyzing digital signals. A translation-dilation representation is attained by employing digital filters. In this case, a denoised segment, W i , is passed through the Daubechies algorithm-based wavelet decomposition, which consists of half-band high-pass and low-pass filters. This allows for the computation of approximation, a i m , and detail, d i m , coefficients at each level of decomposition. The mathematical processes of computing a i m and d i m are, respectively, expressed by Equations (8) and (9), where m represents the level of decomposition. In this study, a fourth level of decomposition was performed, i.e., m ∈ {1, 2, 3, 4}. g 2n−k and h 2n−k are, respectively, the low-pass and high-pass FIR filters with a subsampling factor of 2. The process is further illustrated in Figure 5.

Features
The wavelet coefficients obtained for each intended sub-band-, , , , and -were used for extracting classifiable signal features. Nine statistical features were extracted for each considered sub-band. Therefore, in total, 45 features were employed to represent each selected segment.
The same features were extracted from each considered sub-band, as listed below.

Power Spectrum of the Signal (PS)
Power spectrum (PS) is computed as the average absolute value of the spectral means.

Mean Absolute Value of the Signal (MAV)
The mean absolute value (MAV) is calculated by adding all the absolute values of coefficients and then normalizing the sum.

Standard Deviation (STD)
Standard deviation (STD) is the measure of intended coefficients dispersion from the mean value.

Skewness of the Signal (SK)
Skewness (SK) is a measure of the asymmetry of the frequency distribution around its mean.

Kurtosis of the Signal (K)
Kurtosis (K) is a measure of the curvature of the considered coefficients. According to [40], sub-bands extracted by wavelet decomposition are functions of the incoming signal sampling frequency. In the proposed solution, the W i can be resampled at a specific frequency Frs i , resulting in an adaptive-rate decomposition for each W i , potentially achievingsub-bands with a lesser computational cost. This happens because the system has to process fewer number of samples compared to fix-rate decomposition concepts [4][5][6][12][13][14][15]. Furthermore, the adjustment of Frs i allows for a better focus on the incoming signal band of interest compared to the fix-rate counterparts [4][5][6][12][13][14][15].

Features
The wavelet coefficients obtained for each intended sub-band-d i 1 , d i 2 , d i 3 , d i 4 , and a i 4 -were used for extracting classifiable signal features. Nine statistical features were extracted for each considered sub-band. Therefore, in total, 45 features were employed to represent each selected segment.
The same features were extracted from each considered sub-band, as listed below.
Power Spectrum of the Signal (PS) Power spectrum (PS) is computed as the average absolute value of the spectral means.
Mean Absolute Value of the Signal (MAV) The mean absolute value (MAV) is calculated by adding all the absolute values of coefficients and then normalizing the sum.

Standard Deviation (STD)
Standard deviation (STD) is the measure of intended coefficients dispersion from the mean value.

Skewness of the Signal (SK)
Skewness (SK) is a measure of the asymmetry of the frequency distribution around its mean.

Kurtosis of the Signal (K)
Kurtosis (K) is a measure of the curvature of the considered coefficients.

Mean Ratio (R)
The mean ratio (R) is the ratio of the mean value of the detailed signal to the mean value of the approximate signal.

Peak Positive Value (PV)
The peak positive value (PV) is the maximum positive amplitude of the considered coefficients.

Peak Negative Value (NV)
The peak negative value (NV) is the maximum negative amplitude of the considered coefficients.

Second Peak Negative Value (NV2)
The second peak negative value (NV2) is the second maximum negative amplitude of the considered coefficients.

Classification Methods
Here, once relevant features were extracted, the data were represented in the form of a reduced data matrix, composed of 45 features, with 9 from each of the 5 selected sub-bands for each intended instance. Since the employed dataset had 5 ECG classes with300 instances per class, the resulting data matrix had a size of 1500 × 45. To classify this data matrix, the following classification techniques were employed.
Several of the used classification techniques require parameters tuning. To do this, we used the standard method of validation to tune parameters during the training phase to get the appropriate average result. These values were then fixed for the testing phase and are mentioned below.

k-Nearest Neighbors (k-NN)
The k-nearest neighbors (k-NN) algorithm [41,42] uses the k nearest neighbors of a test sample from the training dataset. Let v j represent a sample and <v j , l j > denote a tuple of a training sample and Sensors 2020, 20, 2252 9 of 19 its label, l j ∈ [1,C] where C is the number of classes. Given a test sample, z, the mathematical process of computing the nearest neighbor, j, is presented by Equation (10): argmin dist j d j , t ∀j = 1..N (10) In the designed solution, k = 3 and the Euclidean distance metric was used. The final label of z was selected as the most frequent label of the k chosen neighbors.

Artificial Neural Network (ANN)
The artificial neural network (ANN) is a popular class of algorithms and is based on the perceived mechanics of the human brain [43]. Standard multi-layered perceptron (MLP) was used here. The input and output layers, respectively, corresponded to the 45 features and the 5 possible classes. Hidden layers are important to model complex data, but care must be taken to avoid over-fitting. The number of the hidden layers was chosen equal to 5. The training function property 'traingdx,' in MATLAB [44] was used. This set the learning to gradient descent momentum and set a variable learning rate. The activation function used was the radial basis, and the maximum epochs were set to 1000.

Support Vector Machines (SVMs)
The support vector machine (SVM) was developed by Cortes and Vapnik [45]. It searches for the optimal separating hyperplane between support vectors of two classes. The process of separating the hyperplane is described mathematically in Equation (11), where x is the sample vector x = [x 1 , x 2 . . .
x p ] with p attributes, w = [w 1 , w 2 . . . w p ] is the weights vector, and b is a scalar bias.
For categorizing multiple classes, different approaches can be used [46,47], such as the one-vs-all approach or the one-vs-one approach. In this solution, the one-vs-all strategy was used with sequential minimal optimization (SMO) to train the classifier weights. The kernel function used was the polynomial of order 3, and the regularization parameter was set to 50.

Random Forest (RF)
The random forest (RF) was developed by using ideas put forward by Ho Tim Kam [48]. It is a technique that takes advantages of multiple classifiers. It constructs a multitude of decision trees at the training time and use the output from these trees to form a consensus. In contrast with bagging, RF may employ different decision tree techniques for the multiple subsets created. In the designed solution, the number of trees was set equal to 60, and out-of-bag predictions were retained for each tree. For the split at the nodes, the interaction-curvature method was selected, as it minimized the p-value of the chi-square tests of independence between each predictor and the response. The number of splits on each branch was limited to 10.

Bagging (BG)
Bagging (BG) is a bootstrap aggregation of classification trees. Multiple trees are allowed to fit the training data so that any bias, such as over-fitting, can be dealt with by using the ensemble of trees. Hence, bagging can be a powerful tool in classification. In this study, the number of bagged trees was set to 50, and ensemble predictions were used for out-of-bag observations.

Compression Ratio
The compression ratio compares the proposed system performance in terms of reduction in the amount of information to be transmitted and classified compared to the conventional approach. In the classical case, acquired ECG data points are transmitted to the cloud without performing any features selection [24]. If N r and P are, respectively, the count of data points to be classified in the conventional and the proposed approach, then the compression ratio, R COMP , can be calculated by using Equation (12):

Computational Complexity
The computational complexity of the embedded processing chain till the denoising module was studied in detail. The complexity of wavelet decomposition, features extraction, and cloud processing modules was analyzed at an abstract level by considering the reduction in amount of information to be processed by these modules.
The resampled signal was denoised by using an enhanced adaptive-rate FIR (ARFIR) filtering concept [27]. The arithmetic complexity of a classical K order FIR filter is clear from Equation (5). It executes (K − 1) additions and K multiplications while calculating an output sample. Therefore, for N samples, the entire computational complexity C FIR can be calculated by using Equation (13): For the suggested solution, the online filter selection and the selected segment resampling processes necessitated additional operations.

•
Filter selection for W i wasresolved by using the successive approximation algorithm. Therefore, resolving the value of Frs i for Q reference filters, in the worst case, requires log 2 (Q) comparisons [29].

•
Resampling wasrealized by using the SLI. For W i , the complexity of SLI was Nr i additions and Nr i binary weighted right shifts. The complexity of binary weighted right shift wasnegligible compared to the addition and multiplication processes [49]. Therefore, it wasignored.

•
The complexity of the K i order FIR filtering for Nr i samples could be calculated as: K i − 1 ·Nr i additions and K i ·Nr i multiplications.
Therefore, the overall complexity of the proposed ARFIR method for W i can be calculated by using Equation (14):

Classification Precision
The proposed solution seems appealing in terms of hardware complexity, compression, processing, and transmission efficiencies. However, it can lag in terms of precision. Therefore, the performance of the whole system wasstudied in terms of its classification accuracy. To avoid any bias in estimating the classification performance due to the limited dataset, cross-validation schema has been popularly used in the literature [47]. Therefore, 10-fold cross-validation was used in this study. All tested algorithms are provided with the same dataset, both training and test, for each fold. Similarly, to avoid any biasness in findings from any one measure, the following evaluation measures were used.

Accuracy (Acc)
Accuracy (Acc) is the percentage of labels that have been correctly classified. Let TP, TN, FP, and FN, respectively, denote true positives, true negatives, false positives, and false negatives in the predicted labels. Then, the mathematical formulation for accuracy is given by Equation (15). The accuracy measure results in a value between 0 and 1, with a higher value signifying a better performance.
Normalized Mutual Information (NMI) Normalized Mutual Information (NMI) is an information theoretic score calculated as the mutual information between two distributions-labels predicted by a classifier and the real labels of the data. The value of NMI scores ranges between 0 and 1, with a higher score signifying a better classification. It is mathematically computable by using Equation (16), where X is the predicted clustering labels via the algorithm, Y is the real labels from the data, k(X) denotes the number of clusters in X, r j X is the number of elements in cluster j according to X, r uj is the number of elements predicted in u but actually belonging to j, and n is the total number of elements.
The F-measure (F1) balances the values of precision and recall. We usually talk about a macro (without taking class sized into account) or micro (taking class sizes into account) F-measure. However, as all classes had the same data size in the studied case, we simply employ the F-measure. Mathematically, the F-measure is expressed by Equation (17) Kappa Index (Kappa) The kappa index (kappa) is a widely used statistics to judge the agreement of two clustering results. It is usually considered more robust than simple accuracy because it takes the possibility of agreement by chance into account. The most popular version is the Cohen's kappa measure, which is mathematically expressed by Equation (18), where p 0 is the percentage of agreement between the predicted and actual labels, similar to accuracy, and p e is the hypothetical probabilistic chance of such an agreement occurring randomly. It is given as p e = (TP+TN)(TP+FN)+(FP+TN)(FP+FN) (TP+TN+FP+FN) 2 . A perfect classification gives kappa = 1, and if the classification is merely due to chance, kappa = 0.

Specificity (Sp)
Specificity (Sp) measures a test's ability to correctly generate a negative result for instances thatdo not belong to a given class. It is also known as the "true negative" rate. It is expressed by Equation (19):

Results
The performance of the suggested solution wasstudied for five arrhythmia classes [36]. All system modules were implemented and validated by using the MATLAB ® [44]. Examples of the pre-segmented heartbeats from all considered classes are shown in Figure 6. Specificity (Sp) measures a test's ability to correctly generate a negative result for instances thatdo not belong to a given class. It is also known as the "true negative" rate. It is expressed by Equation (19):

Results
The performance of the suggested solution wasstudied for five arrhythmia classes [36]. All system modules were implemented and validated by using the MATLAB ® [44]. Examples of the pre-segmented heartbeats from all considered classes are shown in Figure 6. In the classical case, the signal is band-limited to 60 Hz, and it is acquired with traditional ADC of 11-bit resolution at a sampling rate of 360 Hz [38]. The signal is divided into segments of 0.9 s, and each segment consists of 320 samples.
In the proposed case, to test the LCADC, the considered ECG signals were reconstructed by using U = 400.The reconstructed signals were acquired with an LCADC of M = 5-bit. The ECG signals were band-limited up to fmax = 60 Hz. Therefore, the maximum LCADC sampling frequency was = 3.6 kHz. Examples of the pre-segmentation ECG signal, from the PVC class, acquired with an 11-bit classical ADC and a 5-bit LCADC, are shown in Figure 7. In the classical case, the signal is band-limited to 60 Hz, and it is acquired with traditional ADC of 11-bit resolution at a sampling rate of 360 Hz [38]. The signal is divided into segments of 0.9 s, and each segment consists of 320 samples.
In the proposed case, to test the LCADC, the considered ECG signals were reconstructed by using U = 400. The reconstructed signals were acquired with an LCADC of M = 5-bit. The ECG signals were band-limited up to f max = 60 Hz. Therefore, the maximum LCADC sampling frequency was Fs max = 3.6 kHz. Examples of the pre-segmentation ECG signal, from the PVC class, acquired with an 11-bit classical ADC and a 5-bit LCADC, are shown in Figure 7.
Sensors 2020, 20, x FOR PEER REVIEW 13 of 20 The LCADC output was segmented by using the ASA. The ASA adapted and according to the ( ) temporal variations. It contributed to the enhancement of the system's computational efficiency. The average compression ratios were computed for all 300 instances of each considered class. These were, respectively, 3.13-, 2.86-, 3.01-, 3.02-, and 3.05-fold for classes N, RBBB, LBBB, APC, and PVC. The attained overall average reduction in the number of collected samples for all five classes was3.01-fold.
The resampled signal wasdenoised by using the ARFIR filtering technique [29]. A band-pass filter bank was designed for the cut-off frequencies of [FCmin = 0.7; FCmax = 35] Hz. The Filter bank was implemented for a set of sampling frequencies, Fref, between = 75 Hz ˃ 2.FCmax to = 360 Hz.
In this case, ∆ = 19 Hz was chosen. It realized a bank of Q = 16 band-pass filters. A summary of the designed filter bank is shown in Table 1.
(a) (b) Figure 7. The pre-segmented ECG signal, acquired with an 11-bit resolution conventional ADC (a) and acquired with a 5-bit resolution LCADC (b).  acquisition and processing chain with RF and bagging resulted in superior classification performances as compared to k-NN, ANN, and SVM. Since RF uses multiple classifiers, it is less likely to be biased. This was reflected in the evaluation where RF resulted in the best NMI score and kappa statistics. On the other hand, k-NN can be easily biased by the chosen neighbors, particularly if there are outliers in the data. This was reflected in its low score across all indices. Overall, the highest accuracy score of 0.97 was attained by the designed solution.

Discussion
The appealing features of the suggested solution are clear from results, presented in Section 3. This solution was attained by intelligently exploiting level-crossing sampling, adaptive-rate processing, and robust classifiers. The values of were adapted by the ASA. We showed how , , and were adjusted as a function of the temporal variations of ( ). The adjustment of For further analysis, we considered the detail classification results for each intended class, as shown in Figure 8. The overall false positive and false negative counts by using RF were the lowest compared to bagging, k-NN, ANN, and SVM, thus confirming the outperformance of RF compared to other considered algorithms. The most difficult class to discriminate was the RBBB, which was confused with LBBB, while the easiest class to discriminate was normal compared to the other four classes.

Discussion
The appealing features of the suggested solution are clear from results, presented in Section 3. This solution was attained by intelligently exploiting level-crossing sampling, adaptive-rate processing, and robust classifiers. The values of Nr i were adapted by the ASA. We showed how Fs i , L i , and Frs i were adjusted as a function of the temporal variations of x(t). The adjustment of Frs i avoided unnecessary interpolations during the resampling process [29]. This signal-driven, adaptive-rate sampling resulted in a three-fold decrease in the number of collected samples compared to classical counterparts. It guaranteed a noticeable reduction in the processing load of the post denoising and features extraction.
K i represents how the adjustment of hc k for W i avoids the unnecessary operations while conditioning the selected segments. It yielded a noteworthy computational gain of the proposed denoising method over the conventional counterparts. The average gains in additions and multiplications, for the considered 1500 instants, of the employed ARFIR over the conventional one were 7.81-and 7.99-fold, respectively. Additionally, the adaptation of Frs i resulted in an adaptive-rate sub-band decomposition. Frs i reflected the x(t) frequency content [35]. Therefore, it was able to result in an improved focus on the band of interest, as the decomposition process is a function of Frs i [40]. The decomposition was attained by using the half-band FIR filters. A three-fold decrease in the number of collected samples confirmed computationally efficient sub-band decomposition compared to the counter fix-rate solutions [4,6,12,13,29,40].
Originally, each intended instance was composed of 320 samples. After features extraction, their dimensionality was reduced to 45 features, resulting in a 7.1-fold dimensionality reduction and guaranteeing the same factor of reduction in the data transmission activity, bandwidth, and power consumption. Additionally, on the cloud side, the processing of 7.1-fold less amount of information assured a similar factor of processing and resource utilization gain during the classification process.
The adoption of performing most of the signal processing tasks via the front-end processor and transmitting only the extracted features to the cloud was also beneficial in realizing an optimized front-end ECG wearable device while keeping the system configurable. Moreover, in the designed framework, the signal was digitized with a M = 5-bit resolution LCADC. In the classical case, the digitization is performed with an M = 11-bit resolution ADC [4][5][6][12][13][14][15]. Our study confirmed a simpler circuit level realization compared to the traditions ones. A low resolution ADC can also be used in the classical case, but it can reduce the system SNR [38] and can therefore degrade classification performance. On other hand, the effective resolution of LCADC is fairly independent of M [38]. Therefore, an accuracy of 97% was achieved by the proposed solution, even when using a 5-bit resolution LCADC. The idea of embedding the level-crossing sampling and adaptive-rate processing in ECG signal-based automatic arrhythmia diagnosis is a novel concept. In [16,24,[30][31][32], new approaches were proposed for effective ECG acquisition. These mainly focused on the analog-to-digital conversion of ECG signals. In [30,31], the authors demonstrated that how the use of LCS-based architectures led towards a simplified and power efficient analog/digital (A/D) conversion. In [32], the authors showed how the proposed ADC could adjust its effective resolution as a function of the activity of the input signal. In [16,24], efficient and real-time QRS detection mechanisms were proposed.
In contrast to [16,24,[30][31][32], the proposed solution did not only focus on the design of an effective LCADC for ECG signal acquisition or QRS complexes detection; it additionally presented an application of the ASA and the adaptive-rate resampling, denoising, and sub-band decomposition approaches on the LCADC output. It was performed to achieve efficient segmentation, noise reduction, and features extraction of digitized ECG pulses to well prepare them for post cloud-based automatic arrhythmia classification.
The proposed technique is original, and comparing it with existing state-of-the-art methods was not straight forward, since they are based on classical sampling and processing. Additionally, each study uses a different number of subjects and different types of arrhythmia classes. It is also delicate to compare the diversity of classification methods and techniques for ECG signal processing. However, a comparison was performed with previous studies that use the same ECG dataset and DWT-based features extraction. The highest accuracies of classification for all considered studies are presented in Table 6, which shows that the proposed solution attained an analogous or better classification accuracy as compared to the fix-rate counterparts [4][5][6][12][13][14][15] while assuring noticeable processing, transmission, and hardware simplicity gains. The main advantage of the proposed solution over the previous ones is the elimination of unnecessary samples to process and to introduce a real-time compression gain in the system. This was achieved by tactfully embedding the level-crossing sampling and ASA into the system. Similar gains could be achieved by embedding these concepts in counterparts [4][5][6][12][13][14][15].

Conclusions
An original level-crossing ECG signal sampling, adaptive-rate denoising, and sub-band decomposition and features extraction approach was proposed. It was shown that the proposed framework achieved a three-fold reduction in the number of acquired samples, thus leading to a significant reduction in the computational complexity of the designed system compared to the conventional counterparts. The overall average gains in additions and multiplications for the designed adaptive-rate denoising module were computed as 7.81-and 7.99-fold, respectively. The adaptive-rate processing promised a similar factor of processing gain during the adaptive-rate sub-band decomposition. The proposed features extraction concept reduced the incoming data dimensionality with a factor of 7.1 and assured a remarkable gain in terms of the power consumption and transmission activity between the ECG wearable device and cloud-based classification. Moreover, a similar magnitude of processing gain was also evident in the post cloud-based classifier because it had to deal with a 7.1 times lesser amount of information. The highest classification accuracy of 97% was attained by the random forest, which is comparable, and in some cases better, to existing state-of-the-art solutions. The plus side is our gain in compression, processing, and transmission bandwidth, which show the potential of using the suggested solution for the design and development of low power and efficient ECG wearable devices in the mobile healthcare framework.