An Efficient Method to Learn Overcomplete Multi-Scale Dictionaries of ECG Signals

The electrocardiogram (ECG) was the first biomedical signal where digital signal processing techniques were extensively applied. By its own nature, the ECG is typically a sparse signal, composed of regular activations (the QRS complexes and other waveforms, like the P and T waves) and periods of inactivity (corresponding to isoelectric intervals, like the PQ or ST segments), plus noise and interferences. In this work, we describe an efficient method to construct an overcomplete and multi-scale dictionary for sparse ECG representation using waveforms recorded from real-world patients. Unlike most existing methods (which require multiple alternative iterations of the dictionary learning and sparse representation stages), the proposed approach learns the dictionary first, and then applies an efficient sparse inference algorithm to model the signal using the learnt dictionary. As a result, our method is much more efficient from a computational point of view than other existing methods, thus becoming amenable to deal with long recordings from multiple patients. Regarding the dictionary construction, we locate first all the QRS complexes in the training database, then we compute a single average waveform per patient, and finally we select the most representative waveforms (using a correlation-based approach) as the basic atoms that will be resampled to construct the multi-scale dictionary. Simulations on real-world records from Physionet’s PTB database show the good performance of the proposed approach.


Introduction
Since the development of the first practical apparatus for the recoding of the electrocardiogram (ECG) by Willem Einthoven in 1903, ECGs have been widely used by physicians to diagnose and monitor many cardiac disorders.Indeed, the use of the ECG has become so widespread that it is nowadays routinely used both in clinical and ambulatory settings to obtain a series of indicators related to the health status of patients [1].This ubiquitous presence of ECGs in the medical field has been greatly enabled by digital signal processing (DSP) techniques: ECGs were the first biomedical signals where DSP algorithms were extensively applied to remove noise and interferences, detect and characterize the different waveforms contained in the ECG, extract the signals of interest (e.g., the fetal ECG) from the composite ECG, etc [1,2].
By its own nature, the ECG is typically a sparse signal, composed of regular activations (the QRS complexes and other waveforms, like the P and T waves) and periods of inactivity (corresponding to isoelectric intervals, like the PQ or ST segments), plus noise and interferences (baseline wander, AC interference, electromyographic noise, motion artifacts, etc.) [1].Since the introduction of the LASSO (Least Absolute Shrinkage and Selection Operator) regularizer by Tibshirani in 1996 [3], many sparse inference and representation techniques have been developed and successfully applied for all kinds of signals [4]: images, sound/audio recordings, biomedical waveforms, etc.However, in order to obtain a good sparse model for a given signal it is essential to have an adequate dictionary composed of atoms that properly represent the significant waveforms contained in the observed signals.This has lead to the development of many families of dictionaries (based on wavelets and wavelet packets, curvelets, contourlets, etc.) for different applications, as well as several on-line dictionary learning algorithms (e.g., see [5][6][7] for reviews of different dictionary learning methods for sparse inference) that typically require multiple alternative iterations of the dictionary learning and sparse representation stages.
In electrocardiographic signal processing, many approaches have been devised for the sparse representation of single-channel and multi-channel ECGs using different types of simple analytical waveforms: Gaussians [8][9][10], generalized Gaussians and Gabor dictionaries [11], several families of wavelets (like the mexican hat or the coiflet4) [12,13], etc.Although these approaches can lead to good practical results, they usually result in many spurious activations that must be removed in order to obtain physiologically interpretable signals, for instance by means of a post-processing stage [9,13] or through the minimization of a complex non-convex cost function [14].However, a customized dictionary, built from real-world signals, will provide a better performance in terms of the reconstruction error obtained for a given level of sparsity.Consequently, several on-line dictionary learning approaches have also been applied, both in the context of sparse inference and compressed sensing (CS), to ECG signals: the K-SVD algorithm in [15], the shift-invariant K-SVD in [16], and the method of optimal directions in [17].Unfortunately, all these methods have a high computational cost (due to their need to iterate between the dictionary learning and sparse approximation stages) and lead to dictionaries whose atoms do not correspond to real-world signals (thus reducing the interpretability of the sparse model, as well as the ability to easily locate the relevant waveforms).Alternatively, an off-line dictionary construction methodology (where a dictionary with real-world waveforms is initially built and then directly used for CS and sparse modeling without any further modification) has been recently proposed by Fira et al. [12,[18][19][20].However, the atoms of the dictionary are either selected randomly from segments of the signal or taken directly from the first half of the ECG without any attempt to determine the most relevant waveforms.
In this work, we describe an efficient method to construct an overcomplete and multi-scale dictionary for sparse ECG representation using waveforms recorded from real-world patients.Unlike on-line dictionary methods (which require multiple alternative iterations of the dictionary learning and sparse representation stages), the proposed approach learns the dictionary first, and then applies an efficient sparse inference algorithm (CoSa [21]) to model the signal using the learnt dictionary.As a result, our method is much more efficient from a computational point of view than other existing methods, thus becoming amenable to deal with long recordings from multiple patients.Regarding the dictionary construction, we locate first all the QRS complexes in the training database, then we compute a single average waveform per patient, and finally we select the most representative waveforms (using a correlation-based approach) as the basic atoms that will be resampled to construct the multi-scale dictionary.With respect to the approach of Fira et al., our method selects the optimal atoms to construct the dictionary, thus resulting in a much more compact solution.Numerical simulations demonstrate that the proposed approach is able to obtain a very sparse representation without missing any QRS complex or introducing spurious activations. 1  The rest of the paper is organized as follows.Section 2 formulates the sparse representation problem of ECGs, emphasizing the importance of an appropriate dictionary.Then, in Section 3 we describe in detail the procedure followed to derive a multi-scale dictionary from real-world signals: the database used, the pre-processing steps, and the actual dictionary construction.Finally, Section 4 validates the proposed approach (focusing on the capability of the derived dictionary to model different ECG leads from multiple patients), and the paper is closed by the conclusions and future lines in Section 5.

1
A preliminary version of this paper was published in [22].With respect to that version, a much more detailed literature review has been performed, the pre-processing has been improved, multiple waveforms have been used to construct the dictionary, and many more numerical simulations have been performed.

Problem Formulation
Let us assume that we have a single discrete-time ECG, x[n], that has been obtained from a properly filtered and amplified continuous-time ECG, x c (t), through uniform sampling with a sampling period T s = 1/ f s , i.e., x[n] = x c (nT s ).An external ECG captures the electrical activity occurring within the heart that triggers the mechanical cycle (systole and diastole) of the heart.Consequently, it is composed of a set of waveforms that reflect the different stages of the electrical cycle of the heart [1]: atrial depolarization (P waveform), ventricular depolarization (QRS complex) and ventricular repolarization (T waveform). 2All of these waveforms repeat themselves regularly during the heart's electrical cycle (thus leading to the well-known P-QRS-T cycle), although important changes in morphology, as well as fluctuations in amplitude, duration and interarrival times can be observed both for intra-pacient and inter-pacient recordings.Figure 1 shows an example of a single cycle from a clean synthetic ECG generated using the ECGSYN waveform generator [23] downloaded from Physionet [24], where all the relevant P-QRS-T waveforms, as well as the QRS onset and offset (a.k.a.µ and j-points [25]) can be clearly identified.Example of a clean P-QRS-T cycle, with the peaks of all the relevant waveforms marked.The signal has been generated in Matlab using the ECGSYN waveform generator [23] with the following command: [s, ipeaks] = ecgsyn(1000,60,0,60,1,0.5,1000); On top of the relevant electrical activity, the ECG also contains several types of noise and interference signals [1]: additive white Gaussian noise introduced by the electronic equipment used to acquire the ECGs (sensors, amplifiers and filters), baseline wander caused by the patient's respiration, powerline interference, electromyographic noise, motion artifacts, etc. Mathematically, this situation can be modelled as the superposition of the waveforms of interest (QRS complexes, P and T waveforms) plus all the noise and interferences: where T k denotes the arrival time of the k-th electrical pulse; E k its amplitude; Φ k the associated, unknown pulse shape corresponding to QRS complexes, P and T waveforms; and [n] the noise and interference term.Note that, in real-world applications neither Φ k , nor T k and E k are known.However, for the ECG the typical shapes and durations of the Φ k are known for the different relevant waveforms.Therefore, they can be approximated by a time-shifted, multi-scale dictionary of known waveforms with finite support M N that can then be used to infer the E k and T k .More precisely, let us define a set of P candidate waveforms, Γ p for p = 1, . . ., P, with a finite support of M p samples such that M 1 < M 2 < • • • < M P and M = max p=1,...,P M p = M P .If properly chosen, these waveforms can provide a good approximation of the local behavior of the signal around each sampling point, thus allowing us to approximate (1) through the following model: where the β k,p indicate the amplitude of the p-th waveform shifted to the k-th time instant, t k = kT s ; and ε[n] includes also the approximation error from (2) to (1).Let us now group all the candidate waveforms into a single matrix , where the N × P matrices A k (for k = 2 Atrial repolarization cannot be observed in external ECGs, since it is masked by the ventricular depolarization, which happens simultaneously and produces a much stronger signal.0, . . ., N − M − 1) have column entries equal to Γ p [m − k] for m = k, . . ., k + M − 1 and 0 otherwise.Then, the model of (2) can be expressed more compactly in matrix form as follows: where ] is an N × 1 vector with all the ECG samples, β = [β 0,1 , . . ., β 0,P , . . ., ] is the N × 1 noise vector.Note that the matrix A can be considered as a global dictionary that contains N − M replicas, A k for k = 0, 1, . . ., N − M − 1, of the candidate waveforms time shifted to t 0 = 0, t 1 = T s , . . ., In practice, the usual approach is to either use a single or several different waveforms (atoms) with different time scales in order to cope with the uncertainty about the shape and duration of the pulses that can be found in x[n].Hence, as a result we obtain an overcomplete dictionary (as the number of columns is larger than the number of rows, i.e., (N − M)P > N) composed of time-shifted, multi-scale waveforms which resemble the relevant electrical impulses that can be observed in the recorded ECGs.Now, two questions arise: 1. Which is the optimal dictionary to model external ECG signals and how can we construct it?2. Given an overcomplete dictionary, how can we obtain the optimal set of coefficients that represent only the relevant signal components?
Regarding the second question, let us remark that the only unknown term in ( 3) is β when the dictionary is fixed.A classical solution to obtain this set of coefficients β is then minimizing the L 2 norm of the error between the model and the observed signal, thus obtaining the least squares (LS) solution: The solution of ( 4) is not unique, as it requires solving an overdetermined system of linear equations, but the standard solution (i.e., the solution that minimizes the L 2 norm of the obtained coefficients) is βLS = A x, where A = (A A) −1 A denotes the pseudoinverse of A. 3 However, the LS approach leads to a solution where all the coefficients in βLS are likely to be non-zero.This solution does not take into account the sparse nature of the relevant waveforms in x[n] and results in overfitting, as part of the noise and interference terms are also implicitly modeled by the first term in (3).Hence, a better alternative in this case is explicitly enforcing sparsity in β by applying the so called LASSO approximation [3], which minimizes a cost function composed of the L 2 norm of the reconstruction error and the L 1 norm of the coefficient vector: where λ is a parameter defining the trade-off between the sparsity of β and the precision of the estimation.
Regarding the first question, it is obvious that a good dictionary, tailored to the shapes of the relevant waveforms in the ECG, will lead to a sparser representation and thus a better temporal localization of those waveforms.In the particular case of ECG modeling, many families of waveforms have been proposed within the related fields of sparse inference and compressed sensing, as discussed in the Introduction.However, there is increasing evidence that the best dictionaries are those constructed using atoms directly extracted from the signals to be modeled [15,17,20].In the following, we describe a novel approach to construct a single overcomplete and multi-scale dictionary by learning 3 Note that, even though βLS can be computed analytically from a theoretical point of view, it requires inverting an (N − M)P × (N − M)P matrix.Hence, we can easily encounter computational or numerical problems when (N − M)P is large or A is ill-conditioned.the most representative waveforms from multiple patients.The goal of this paper is then investigating whether the resulting dictionary is able to model the outputs from multiple patients with a single set of representative waveforms.

Multi-Scale Dictionary Derivation
In this section, we describe the novel approach for off-line construction of a single mega-dictionary from QRS complexes extracted from multiple ECGs recorded from healthy patients.The database used to construct the dictionary is described first in Section 3.1.The method is described next: it is composed of the pre-processing stage described in Section 3.2 and the dictionary creation stage detailed in Section 3.3.Finally, the obtained dictionary is stored and applied to attain a sparse reconstruction of the desired ECGs (which may be in the database or not) using the LASSO approach, as described in Section 4.

Database
In order to construct the dictionary, we use the Physikalisch-Technische Bundesanstalt (PTB) database, compiled by the National Metrology Institute of Germany for research, algorithmic benchmarking and teaching purposes [26].The ECGs were collected from healthy volunteers and patients with different heart diseases by Prof. Michael Oeff, at the Dep. of Cardiology of Univ.Clinic Benjamin Franklin in Berlin (Germany), and can be freely downloaded from Physionet [24]. 4The database contains 549 records from 290 subjects (aged 17 to 87 years) composed of 15 simultaneously measured signals: the 12 standard leads plus the 3 Frank lead ECGs [1,2].Each signal lasts approximately 2 minutes and is digitized using a sampling frequency f s = 1000 Hz with a 16 bit resolution.Out of the 268 subjects for which the clinical summary is available, we selected channel 10 (lead V4) of the first recording of the Q = 51 healthy patients available in order to build the dictionary.

Pre-Processing
The block diagram of the pre-processing stage is shown in Figure 2. 5 Firstly, all the QRS complexes are extracted separately from each of the Q available ECGs, and those patients for which a significant number of QRS complexes cannot be reliably obtained are removed from subsequent stages.After resampling to the maximum length of all the QRS complexes found for each of the remaining Q ≤ Q patients, an individual average QRS complex is obtained per patient.Then, a second resampling stage is applied to the average QRS complexes of all the Q valid patients to ensure that they have the same length, followed by a windowing stage to obtain initial and final samples equal to zero, and a normalization to remove the mean and enforce unit energy on all the signals.Finally, these Q waveforms are stored in a QRS complexes database and used to construct the desired overcomplete dictionary.In the sequel, we provide a detailed description of each of the blocks in Figure 2.

QRS Extraction
The first pre-processing step consists in extracting all the QRS complexes from each of the Q ECGs, x q [n] for q = 1, . . ., Q.In order to attain this goal, we follow the approach described in [27]: 1. Apply a 4th order Butterworth bandpass filter with cut-off frequencies f c1 = 1 Hz and f c2 = 40 Hz to remove noise and interferences.Forward-backward filtering, with an appropriate choice of the initial state to remove transients [28], is used to avoid phase distortion.2. Locate the positions of the R waveforms using the well-known Pan-Tompkins QRS detector [29].
Let us remark that we focus here on the QRS complexes because they are the most relevant waveforms that can be found in the ECGs.However, the proposed approach can also be applied to construct dictionaries of typical P and T waveforms.3. Determine the fiducial points that mark the beginning and end of the QRS complexes by tracking backwards and forward from the R peaks, estimating the QRS onset and offset points using the minimum radius of curvature technique, as described in Section 4.2 of [27].
This approach results in Q = 44 valid subjects (i.e., ≈ 84, 6 % out of the Q = 51 available individuals), for which a significant and variable number of QRS complexes (y q,i [n] for 1 ≤ q ≤ Q , 1 ≤ i ≤ P q and 0 ≤ n ≤ L q,i − 1), with a variable length of samples for each of them, are extracted for the database used.A total of 6266 QRS complexes are extracted, implying an average of 142,4 QRS complexes per patient (with a maximum of 194 QRS complexes obtained from a single subject) with lengths from 90 to 124 samples (i.e., from 90 to 124 ms).

Resampling and Averaging
In order to compute an average QRS complex for each individual, we need to work with QRS complexes that have a fixed length.The easiest solution to achieve this goal is resampling the extracted QRS complexes to the maximum length for each patient, L q max = max 1≤i≤P q L q,i ≤ L max = max 1≤q≤Q L q max = 124 samples.A change in the sampling rate of discrete time signals can be accomplished by means of interpolation and decimation [30].If the i-th QRS complex (i = 1, . . ., P q ) has a length L q,i ≤ L q max samples and M q,i = GCD(L q,i , L q max ), with GCD denoting the greatest common divisor, then we need first to interpolate by a factor L q max /M q,i and then to decimate by a factor L q,i /M q,i (i.e., the fractional resampling rate is L q max /L q,i ≥ 1).The aforementioned approach includes a digital lowpass antialiasing filter between the interpolator and the decimator, with a cut-off frequency ω q,i c = πM q,i /L q max rad, which assumes that the sequence to process starts and ends with sequences of zeroes.Although the recorded ECGs have been initially bandpass filtered to remove baseline wander and other artifacts, the QRS complexes cannot be assumed to start and end with the required zero samples.In fact, the set of available QRS complexes (from the QRS onset to the QRS offset) always contain negative starting and ending values: from -0.0219 to -0.2349 mV for the initial sample, and from -0.0318 to -0.1796 mV for the final sample.As the starting and ending samples of all the QRS complexes are not equal to zero, resampling produces an undesired effect (edge effect), that consists on deviations from the expected values in the starting and ending values of the resampled signals.In order to remove the edge effect, we propose the following simple and effective approach: Peer-reviewed version available at Appl.Sci.2018, 8, 2569; doi:10.3390/app8122569 1. From the i-th QRS complex signal, y q,i [n] for n = 0, 1, . . ., L q,i − 1, we first construct the following two sequences: which are not likely to be affected by the edge effect on their leftmost and rightmost samples respectively.2. We perform the resampling by the factor L q,i /L i max separately on y (q,i) [n] and y (q,i) r [n], obtaining two resampled sequences ỹ(q,i) [n] and ỹ(q,i) r [n]. 3. The desired resampled sequence is finally given by ỹq Figure 3 shows the leftmost and rightmost samples corresponding to one of the resampled QRS complexes (from patient 214 in the PTB database) when resampling is performed directly on y q,i [n].The edge effect on the left and right parts of the signals (i.e., the deviation of the red line w.r.t. the desired values indicated by the black dots) is evident in this case.On the other hand, when the proposed approach is applied, the resampled sequence ( ỹq,i [n]) is not affected by the edge effect neither on its left side nor on its right side, as seen in Figure 3. Finally, the averaged QRS complex for each patient is obtained simply by computing the sample mean for each time instant: with P q denoting the number of QRS complexes found in the q-th ECG.

Windowing and Normalization
After averaging, all the averaged QRS complexes are resampled again in order to obtain Q signals with the same number of samples, L max = L q max for q = 1, . . ., Q , using a resampling factor L max /L q max .Then, the resulting signals are windowed to ensure a smooth decay of the QRS complexes towards zero, and normalized by removing their mean and dividing by their standard deviation.The signals that are finally stored in the QRS complex database are zq where zq [n] are the averaged QRS complexes after the second resampling stage (i.e., after ensuring that their sample length is equal to L max ), µ q and σ q are their sample mean and standard deviations, respectively, and w[n] is the window used in this case, a window that follows the spectral shape of the raised cosine filter, widely used in digital communications [31], in the time domain: 2 , and α denoting the roll-off factor that controls the decay of w c (t) towards zero: for α = 0 we have a rectangular window that abruptly goes to zero at ±T 0 , whereas for α = 1 the window is bell-shaped and starts decaying smoothly towards zero immediately after |t| > 0. This window, whose time-domain shape is shown in Figure 4 for several values of α, ensures that the central samples of the QRS complexes remain undistorted, while their amplitudes decay slowly towards zero at the borders.Throughout the paper, we used α = 0.25.

Dictionary Construction
After the pre-processing described in the previous section, we have Q ≤ Q waveforms from different patients stored in the QRS complexes database.These waveforms ( zq [n] for 1 ≤ q ≤ Q and 0 ≤ n ≤ L max − 1) could be directly used to build the sub-dictionaries.However, they are highly correlated and thus the resulting dictionary would provide a poor performance and lead to a high computation time.Therefore, in order to obtain a reduced dictionary composed of distinct shapes, we perform the procedure described in the following sections.

Selection of the First Atom
For the first atom of the dictionary, we seek the most representative waveform in the QRS complexes database.Indeed, if we intend to construct a sparse model for all the ECGs of all the patients using a single basis signal (atom), then we should choose a waveform which resembles as closely as possible the set of QRS complexes (the most relevant part of the ECGs) in the different patients.In order to achieve this goal, using the average signals stored in the QRS complexes database, we follow two steps: where C ij denotes the cross-covariance between the i-th and j-th waveforms at lag 0 (i.e., without any time shift), and the last expression (ρ ij = C ij ) is due to the energy normalization described in Section 3.2.3,which implies that C ii = 0 ∀i. 2. Select the waveform with the highest average correlation (in absolute value) with respect to all the other candidate waveforms, i.e., which corresponds to the most representative waveform of all the candidate waveforms.

Selection of Additional Atoms
If desired, additional atoms can be selected.These atoms should be constructed using highly correlated waveforms with respect to the remaining candidates (to obtain representative dictionary atoms), as well as with low absolute correlation with respect to already selected waveforms (to avoid similar atoms).Here, we propose to use the following procedure to select a total of K representative waveforms: 1. Set the number of accepted atoms equal to one (r = 1), the pool of candidate waveforms as C = {1, . . ., 0 − 1, 0 + 1, . . ., Q } (i.e., all the waveforms except for the one selected for the first atom), the pool of accepted waveforms as A = { 0 }, and construct a reduced correlation matrix by removing the row and column corresponding to the first atom selected from the global correlation matrix C: 2. WHILE k < K: (a) Select, from the remaining waveforms in the pool of candidates, the one with the highest average correlation (in absolute value) with respect to all the other candidate waveforms in the pool, i.e., (b) Compute the maximum correlation (in absolute value) between the selected candidate and all the already accepted atoms, (c) Remove the k -th waveform from the pool of candidates (i.e., set C = C \ { k }), and construct a new reduced matrix (C r ) by removing the k -th row and column from the current C r .(d) IF ρ max < γ (with 0 ≤ γ ≤ 1 denoting a pre-defined threshold), THEN add the selected waveform to the pool of accepted atoms (i.e., A = A ∪ { k }), and set k = k + 1.

END 6
In practice, this only has to be done Q (Q − 1)/2 times, since ρ ii = 1 ∀i thanks to the normalization and ρ ij = ρ ji .
. Q = 44 reliable average QRS complexes extracted from the Q = 51 healthy patients in the PTB database [26] after resampling, windowing and normalization.

Construction of the Multi-Scale Dictionary
Finally, the K selected waveforms are resampled in order to obtain a multi-scale sub-dictionary composed of R different time scales.Note that the total number of waveforms is thus P = KR.In this case, we have used R = 11 different time scales spanning the typical durations of QRS complexes: 60, 70, . . ., 160 ms.Note also that the global dictionary is simply obtained by performing N − M different time shifts of the resulting sub-dictionary [13].

Numerical Results
In this section, we first detail the construction of the dictionary and then we apply the resulting dictionary to perform the sparse reconstruction of the signals from different patients.

Dictionary Construction
As mentioned earlier, in order to construct the dictionary we used channel 10 (lead V4) from the first register of the Q = 51 healthy patients in the PTB database: patients  (116,166,172,173,233,255,266), where the QRS extraction fails, will be used as the test set.The average waveforms for the Q = 44 patients, after resampling to L = 124 samples (i.e., 124 ms), windowing and normalization, can be seen in Figure 5.Note that, as expected, there is a large degree of similarity among all the waveforms, since they all correspond to regular heartbeats from healthy patients.This similarity is illustrated also by the color plot of the absolute value of the correlation coefficient in Figure 6.Note the large correlation (corresponding to dark red points) among most waveforms.For this reason, in [22] we decided to extract a single waveform in  order to construct the multi-scale and over-complete dictionary.However, in Figure 6 we also notice that some waveforms exhibit low correlation values (as shown by blue points).This motivates us to explore here the performance as more than one atom is extracted from the pool of average QRS complexes shown in Figure 5 in order to construct the dictionary.In order to build dictionaries composed of different waveforms, we have tested several threshold levels: γ = 0.1, 0.2, . . ., 0.9.Note that the largest the value of γ the less restrictive the condition to incorporate a new atom to the dictionary, and thus the largest the number of final atoms (K) used: for γ = 0 we would always obtain K = 1 atoms (since no new atoms can be incorporated after the first one), whereas for γ = 1 we would obtain K = Q (as all waveforms would be considered valid).Following this approach, we obtain K = 1 atoms for γ ≤ 0.2, K = 2 atoms for 0.3 ≤ γ ≤ 0.6, K = 3 atoms when γ = 0.7, K = 4 atoms when γ = 0.8, and K = 6 atoms when γ = 0.9. Figure 7 shows the six atoms selected for γ = 0.9.Note that, since we follow a deterministic procedure, the first atom is the one obtained for K = 1 (i.e., when γ ≤ 0.2), the first and second ones are those obtained for K = 2 (i.e., for 0.3 ≤ γ ≤ 0.6), and so on.
Finally, the selected waveforms for K = 6 are resampled in such a way that their duration ranges from 60 ms to 160 ms (with a time step of 10 ms).The resulting base dictionary, composed of P = 11 × K = 66 atoms, is shown in Figure 8.The final multi-scale and overcomplete dictionary consists of these P atoms (from P = 11 for K = 1 up to P = 66 for K = 6) time shifted to the N − M locations of the sampled ECGs, implying that its size is N × P(N − M).For instance, for N = 115200 samples (a typical signal size in the PTB database) and M = 160 samples (the maximum support of the selected waveforms), the size of the matrix dictionary ranges from N × 11(N − M) = 115200 × 1265440 for K = 1 up to N × 66(N − M) = 115200 × 7592640 for K = 6.
Figure 8. Multi-scale dictionary constructed using the K = 6 most representative waveforms shown in Figure 7.

Sparse ECG Representation
Now we test the constructed dictionary on 21 recordings corresponding to 17 healthy patients from the PTB database.Our main goal in this section is determining whether the constructed dictionary (which is not patient-specific) can be effectively used to perform a sparse representation of ECG signals from multiple patients.In order to solve (5), we use the CoSA (Convolutional Sparse Approximation) algorithm recently proposed in [21], which allows us to process the N ≈ 115200 samples (almost 2 minutes of recorded time) of the whole signal at once (i.e., without having to partition it into several segments that have to be processed separately) in a reasonable amount of time.Since several signals showed a significant degree of baseline wander, before applying CoSA all signals were filtered using a third-order high-pass IIR (infinite impuse response) Butterworth filter designed using Matlab's filterDesigner tool: stop-band frequency f stop = 0.1 Hz, pass-band frequency f pass = 1 Hz, minimum stop-band attenuation A stop = 40 dB, and maximun pass-band attenuation A pass = 1 dB.Forward-backward filtering was applied again to avoid phase distortion.An example of the reconstructed signal using K = 1 and λ = 20 is shown in Figure 9.Note that all the QRS complexes (our main goal here) are properly represented by the sparse model.By increasing the number of signals (K), and especially by decreasing the sparsity factor (λ), a better model that also includes the P and T waveforms can be obtained.However, let us remark that this is not our main goal.Indeed, a better option to model the P and T waveforms would be constructing specific dictionaries of P and T waveforms: either using synthetic waveforms (e.g., Gaussians) or applying the proposed approach to construct real-world dictionaries of P and T waveforms.We intend to explore this issue in future works.
In order to measure the effectiveness of the proposed approach, we use several performance metrics that measure both the model's sparsity and its accuracy in representing the original signal.On the one hand, the sparsity is gauged by the coefficient sparsity (C-Sp), which measures how many coefficients (out of the total ones) are required to represent the signal, and the signal sparsity (S-Sp), which measures how many samples of the signal's approximation (out of the total number of samples) are equal to zero.Note that • 0 indicates the L 0 "norm" of a vector, i.e., its number of non-null elements.On the other hand, the reconstruction error is measured by the normalized mean squared error (NMSE) and its logarithmic counterpart, the reconstruction signal to noise ratio (R-SNR): where • 2 denotes the L 2 norm of a vector.The results, for the four performance measured previously described, are displayed in Figure 10.On the left hand side, we show the results (mean value and standard deviation) for the first 10  5).Note also the improvement in performance when incorporating additional waveforms to the dictionary (e.g., for λ = 2 there is a 3.32 dB improvement in mean R-SNR for the patients in the test set when using K = 6 instead of K = 1), although this comes at the expense of a reduction in the signal sparsity (e.g., for the same case, the mean S-Sp goes down by 4.2% when using K = 6 instead of K = 1).Finally, we applied the Pan-Tompkins algorithm to the reconstructed signal in order to test whether the sparse model introduced some distortion in the location of the QRS complexes.As a result, we found that all the QRS complexes were always properly detected and located within 2 samples (i.e., ±2 ms) of the QRS complexes found in the original signal, even in the sparsest case (i.e., with K = 1 and λ = 20).Therefore, if we are only interested in the QRS complexes or some analysis derived from them (e.g., heart rate variability studies), the proposed model is a very good option to construct a sparse model that keeps all the relevant information.

Sparse ECG Representation of Other Channels
Finally, in this section we investigate the feasibility of using the dictionary learnt on lead V4 to represent ECGs recorded in other leads.In order to do so, we use the constructed dictionary to model all the 15 channels available for patient 104.Table 1 displays the results for λ = 1 and K = 2, showing that good results are obtained in general for most of the leads.Indeed, the performance for several Peer-reviewed version available at Appl.Sci.2018, 8, 2569; doi:10.3390/app8122569leads (ii, avr, v5, v6 and vx) in terms R-SNR is better than for lead v4, which was used to construct the dictionary, and poor R-SNR results were only attained for leads avl and vy.Similar conclusions were obtained when considering other values of λ and K. Overall, this shows the feasibility of constructing a single multi-scale and overcomplete dictionary (possibly using QRS complexes extracted from several leads) for multiple channels.

Conclusions
In this paper, we have described a novel mechanism to derive a realistic, multi-scale and overcomplete dictionary from recorded real-world ECG signals.The dictionary is constructed offline, thus avoiding the computational burden of on-line approaches and ensuring the scalability of the proposed methodology for large data sets with many individuals and/or sample sizes.The obtained dictionary has then been used to obtain an accurate sparse representation of several ECGs recorded from healthy patients, showing that it is able to properly capture all the QRS complexes without introducing false alarms.Future lines include testing the proposed approach on a larger number of patients (especially including subjects with cardiac pathologies), the construction of dictionaries composed of multiple waveforms (e.g., P and T waveforms), and the combination of waveforms extracted from real patients with synthetic waveforms.

Figure 2 .
Figure 2. Block diagram of the pre-processing stage applied before the creation of the dictionary.

Figure 3 .
Figure 3. Leftmost and rightmost samples of an original QRS complex (from patient 214 in the PTB database) and its resampled version with and without edge effects.

Figure 4 .
Figure 4. Several examples of the raised cosine window used for T s = 1 ms, L max = 201, and different values of the parameter α.

PreprintsFigure 6 .
Figure 6.Color map showing the absolute value of the correlation coefficient, |ρ ij |, for the Q = 44 average QRS waveforms.Dark red colors indicate values close to 1, whereas dark blue colors indicate values closer to 0.

Figure 7 .
Figure 7. K = 6 atoms selected from the Q = 44 reliable average QRS complexes obtained from the Q = 51 healthy patients in the PTB database, using a threshold γ = 0.9.

Figure 10 .
Figure 10.Different performance metrics.Left hand side: results on signals from training set.Right hand side: results on signals from test set.

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 November 2018 doi:10.20944/preprints201811.0199.v1
Figure 9. Example of sparse ECG representation with the derived dictionary for a segment of signal 121 from the PTB database.Real signal in blue; sparse representation (QRS complexes) in red.patients in the training set: patients 104, 105, 117, 121, 122, 131, 150, 155, 156, and 169.On the right hand side, we show the results (mean value and standard deviation again) for the 11 signals in the test set: patients 116, 166, 172, 173, 233 (5 recordings), 255, and 266.Note the large sparsity attained in all cases (especially as λ increases), and the good reconstruction error for small/moderate values of λ (i.e., λ ≤

Table 1 .
Performance of the constructed dictionary (using waveforms extracted from lead v4) on other leads from patient 104.