Correction of the Unobtrusive ECG Using System Identiﬁcation

: Unobtrusively acquired electrocardiograms (ECG) could substantially improve the comfort of patients. However, such ECGs are not used in clinical practice because (among other reasons) signal deformations impede correct diagnosis of the ECG. Here, methods are proposed for correction of the unobtrusive ECG, based on system identiﬁcation. Knowing the reference ECG, models were developed to correct the unobtrusively acquired ECG. A ﬁnite impulse response (FIR) model, a state space model and an autoregressive model were developed. The models were trained and evaluated on the Goldberger leads recorded from an ECG T-shirt with dry electrodes, and from a gold standard ECG. It was found that the FIR model corrects the unobtrusive ECG with good agreement ( ρ aVR = 0.84 ± 0.10, ρ aVL = 0.65 ± 0.24, ρ aVF = 0.88 ± 0.04), while the other models do not yield signiﬁcant improvements, or become unstable. The R-peaks were also accurately corrected by the FIR model (MSE aVR = 0.10 ± 0.10, MSE aVL = 0.14 ± 0.27, MSE aVF = 0.03 ± 0.02). To conclude, the proposed FIR method succeeded in signiﬁcantly correcting the unobtrusive ECG signal.


Introduction
The electrocardiogram (ECG) is one of the standard clinical methods of diagnosis for cardiac-related diseases. By means of the ECG, the electrical activity of the heart is recorded and analyzed. This technology is inexpensive, easy to use and clinically established. The main drawback of the ECG is the use of adhesive electrodes that can provoke allergic reactions, especially if applied during a relatively long period. Another issue is the reduced comfort for patients, mainly due to the cables attached to the electrodes.
To overcome these issues, various unobtrusive ECG systems (uECG) have been developed. One realization method is capacitive ECG (cECG), which has been built into everyday objects such as chairs [1,2], car seats [3], beds [4][5][6][7] and clothing [8]. The principle is to use ECG electrodes incorporated in these objects that do not require conductive contact to the human body, but acquire the physiological surface potentials with capacitive coupling. Similar unobtrusive systems are dry ECGs; these have conductive contact to the skin but do not use any electrolyte gel or adhesive material [9][10][11][12]. However, the uECG requires a more complicated electronic setup [13,14]. In many cases, an operational amplifier is used as a buffer located directly at the electrode that drives the electrode cable and reduces interference (active electrode). Nevertheless, these systems have their drawbacks, e.g., they are especially prone to movement artifacts. So far, uECG are foremost used in laboratory settings or clinical studies.
For the cECG in particular, several hardware-based approaches have been proposed to improve the quality, but they have also increased the complexity of the circuits [14][15][16]. Furthermore, the morphology of the cECG differs slightly from that of the standard ECG, which is important

Study
A study was conducted with seven healthy participants (4 male, 3 female) and utilizing data from three healthy male volunteers participating in [19]. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of RWTH Aachen University (Reference code: EK 283/17). The volunteers were asked to wear the dry 12-lead ECG T-shirt [19] and a reference Holter ECG BT12 (Corscience, Erlangen, Germany) while sitting still and relaxed in front of their desks. The dry ECG electrodes were located on the interior of the T-shirt. The three Goldberger leads were recorded (the duration of each recording was 5 min). The data were post-processed on a PC with an Intel Core i5-2500 CPU @ 3.30 GHz (Intel Corporation, Santa Clara, CA, USA) and 8 GB RAM. Two of the new recordings were discarded because of excessive noise (two male participants). The data needed to be clean in order to conduct training for the models; artifacts were rejected afterwards. Table 1 lists the recordings that were used for this paper. For the ECG, the differential leads are recorded and not the surface potential at each electrode. Since the equations for the ECG leads are linearly dependent, it is not possible to determine the surface potential directly at each electrode. The electrodes are denoted left arm (LA), right arm (RA) and left leg (LL), see Figure 1. The Einthoven leads are most frequently selected: I = LA − RA, II = LL − RA, III = LL − LA.

Pre-Processing of the ECG Data
Data pre-processing was conducted prior to system identification ( Figure 2).  First, the raw data from the T-shirt ECG and the reference device were synchronized, as reported in [19]. The Goldberger leads aVL, aVF and aVR were selected for system identification. The data from each channel of the reference and uECG were filtered with a digital 4th-order high-pass zero-phase filter at 0.5 Hz to remove the DC component. The optimal frequency boundaries within which the ECG should not be distorted are between 0.5 and 40 Hz. After filtering, the data were resampled from 500 Hz to a lower resolution of 200 Hz with the MATLAB (The Mathworks, Nattick, MA, USA) resample function, since only frequency components below 100 Hz were of interest. Subsequently, artifacts with large amplitudes were removed manually, because their presence would impede system identification, see Figure 3b. Also, because some of the experiments contained strong 50 Hz line noise (Figure 3a), a subtraction procedure was applied which removes power line interference but does not alter the ECG [21,22].

Blind System Identification
The approach is to use methods of black or grey box system identification. The assumption is that the underlying system is unknown and only the input and output are available, see Figure 4. The first method is to determine a finite impulse response (FIR) filter of Nth order with input x[n], output y[n] and b i filter coefficients. The input is the uECG and the output the estimated reference ECG. FIR filters have the advantage that they are always stable and are simple to realize. The feasibility of using an FIR filter to estimate the reference ECG from the cECG was demonstrated in [23,24]. The presented deconvolution method in particular made use of the fact that the convolution operation in the time domain becomes a simple multiplication in the frequency domain. It was demonstrated that the deconvolution of three capacitive ECG channels led to an improved estimate of the underlying reference ECG. The results of the root-mean-square error and the Pearson correlation coefficient indicated an improvement compared to the single capacitive channels. However, the cECG leads were not recorded according to a known electrode placement system. Therefore, there was no correspondence between the reference ECG and the cECG sites RA, LA and LL, see Figure 1.
In the present case, simultaneously recorded data of two devices were available from the same ECG sites, allowing a direct comparison. Since the results of the previous attempts to use deconvolution for system identification were promising, the approach of using an FIR filter was followed for this work.
The training data were used as an input to the MATLAB System Identification toolbox (The Mathworks, Nattick, MA, USA) graphical user interface. For this study, the models with the highest accuracy were selected. In addition to the FIR model, two other models were chosen for system identification: (1) a state space model (SSM); and (2) an autoregressive (AR) or autoregressive with exogenous terms (ARX) model. The structure of an SSM is as follows: with u(t) input, y(t) output, A, B, C and D state matrix, e(t) noise and K noise matrix. The structure of an ARX model is as follows: with u(k) input, y(k) output, n k system delay, n a AR order, n b X order and e(k) noise.
blind system identification FIR, SSM, ARX

Training and Testing Technique
The recordings were divided into training and test data; the training data (5 heart beats) were used for the estimations for each subject, the remainder was test data. Input and output data were digitally filtered and aligned around zero for each channel separately.
The training phase was as follows: • Short, clean segments of the data were manually selected as training data (5 QRS complexes) for each subject and per channel (aVR, aVL, aVF).

•
The locations of the R-peaks of the uECG and reference ECG were aligned manually to receive matching input and output data.

•
System identification was performed to obtain FIR, SSM and ARX models, see Figure 5a.
transfer functions training data test data For the test or validation phase, test data were used as an input to the identified models and transfer functions, see Figure 5b. The overall concept for uECG correction is shown in Figure 5.
Additionally, for the FIR filters, leave-one-out cross-validation was conducted for each lead separately. For each round of cross-validation one dataset (one subject) was the test set and the other datasets (all other subjects) were used as training set. Therefore, the filters were trained on the data of all other subjects and then applied to the data of the remaining subject.

Results
First, the results of the training phase are presented in Section 3.1. Then, the resulting model orders for blind system identification are presented in Section 3.1.1, see Figure 4. General observations during the training phase are presented in Section 3.1.2. In Section 3.2, the results of system identification on test data are shown: • Time-based analysis • ECG morphology analysis • Frequency analysis.

Estimation of Blind System Identification Orders
For system identification, each channel was taken separately and used for estimation in a single input, single output structure. First, the model orders were determined by the MATLAB System Identification toolbox. The automatically selected filter length for FIR was N = 46, using the MATLAB impulseest function. The SSM was chosen as a 4th order model with the MATLAB n4sid function. Further, the ARX model was n a = 4, n b = 4, n k = 0 with the MATLAB arx function. These fixed filter and model orders were chosen for this study.

Training Outcomes
During the training phase, different qualities for the estimations were observed on visual inspection ( Figure 6): examples of different qualities of estimation results are shown. The different errors are marked in the graphs. In (b), a dent in the Q wave and an overshoot of the T wave occurred in the estimation of aVR. In (3) the P wave was deformed, the Q wave was flattened, the S wave had a dent, and the T wave had an overshoot.

Time-Based Analysis
Evaluation of the filters was conducted on the test (or validation) data. Table 2 shows the correlations ρ between the reference ECG channels and the other ECG modalities, while the uECG served as a benchmark. Some of the models proved to be unstable during estimation and are indicated with an 'x' in Table 2. The mean was calculated from the absolute values, ignoring the missing values; the same was done for the standard deviation. The modality with the highest correlation coefficient is highlighted for each subject and for each channel. The best correlation was achieved by the stable FIR filter for all channels (ρ aVR = 0.84 ± 0.10, ρ aVL = 0.65 ± 0.24, ρ aVF = 0.88 ± 0.04) compared to the uECG (ρ aVR = 0.64 ± 0.21, ρ aVL = 0.41 ± 0.31, ρ aVF = 0.67 ± 0.26) with an improvement by 0.2 for aVR, 0.24 for aVL and 0.21 for aVF. The other modalities (SSM and ARX) performed similarly to the uECG (ρ aVR = 0.65 ± 0.01, ρ aVL = 0.50 ± 0.04, ρ aVF = 0.77 ± 0.07). For the aVL lead, poor estimation results were acquired in all experiments. Additionally, leave-one-out cross-validation was performed with the FIR filters. Table 3 shows the results of the cross-validation. Using the filters trained on other subjects, no discernible improvement was achieved. Overall, the correlation coefficients were in the same range as for the uECG.

ECG Morphology Analysis
One of the morphological differences between the uECG and the reference ECG is R-peak flattening. In Table 4 the deviation of the R-peaks is evaluated using the mean square error measure (MSE) in arbitrary units (a.u.). The ECG waves were extracted using the algorithm from [19]. The flattening of the R-peaks for the uECG can be observed, especially in aVR (MSE aVR = 2.45 ± 2.85, MSE aVL = 0.97 ± 1.73, MSE aVF = 0.51 ± 0.49). With only one exception, an overall improvement occurred for the height of the R-peak (except for the system model estimation in aVF). Some experiments did not result in reasonable R-peak estimations when, e.g., the model was unstable (MSE infinite). Additionally, in some cases the segmentation failed because of irregular shapes of the ECG, e.g., if the T wave was higher than the R peak. The FIR estimation performed exceptionally well in estimating the correct R-peaks (MSE aVR = 0.10 ± 0.10, MSE aVL = 0.14 ± 0.27, MSE aVF = 0.03 ± 0.02). For the other waves (P, Q, S and T), the estimations did not significantly improve the amplitude of the waves compared to the uECG, see Appendix A Tables A1-A4. Lastly, for SSM and ARX, no notable improvements were observed compared with the uECG.

Frequency Analysis
The spectral properties of the data were also analyzed. In Figure 7, the power spectrum of the uECG is compared to the reference ECG and the estimations. In this example, the FIR filter was used (subject 1, aVR). The power spectra of the estimations FIR, and ARX resemble the power spectrum (0-40 Hz) of the reference ECG (Figure 7).
The coherence estimate and the cross-spectrum phase of the ECGs were also analyzed ( Figure 8). In Figure 8, the coherence estimate C xy is shown with with P xx , P yy power spectral densities of the signals x and y, and the cross-power spectral density P xy . Coherence close to one indicates equality at a certain frequency of two given signals, while a value close to zero indicates inequality. In our case, for all subjects the resulting coherence estimates were very similar for the reference and uECG, as well as for the reference and estimation ECG. However, the cross-spectrum phase indicated some additional information: If the coherence has a large value, the phase of the cross-spectrum can be determined, thus identifying phase lags. Although the coherence estimate was not significant, the cross-spectrum phase pinpointed the differences. For the example presented in Figure 8, the cross-spectrum phase of the estimated ECG (FIR filter) with the reference had lower values than the uECG with the reference. Table 5 shows the mean absolute of the cross-spectrum phase values. Lower values indicate lower phase lags between the reference ECG and the other ECGs. The lowest values were achieved with the FIR estimations.

Discussion
This paper presents different approaches to determine a model usable for correction and compensation of capacitive coupling occurring in uECG signals. The approach is to blindly perform system identification using an FIR model, a SSM and an ARX model. These models were trained on short excerpts of five heartbeats and were later validated on test data.
The goal was to evaluate different methods to correct deformations in a uECG using system identification. After a short calibration phase using a reference ECG and uECG, a correction method was applied to the following uECG acquisition. The models do not take into account changes in the electrode-skin interface, e.g., loss of electrode contact, but take into account the changes in ECG morphology due to the ECG acquisition method.
The use of FIR filters proved to be a feasible correction method. For each lead and each subject, an individual FIR filter was estimated. The estimated ECG was in better agreement compared to the reference ECG, which was supported by the high correlation coefficients (ρ aVR = 0.84 ± 0.10, ρ aVL = 0.65 ± 0.24, ρ aVF = 0.88 ± 0.04). The improvement of the correlation of the ECGs was consistent with our findings in [23]; analysis of the spectrum also supported this statement. Regarding morphology, the flattening of the R-peaks was successfully corrected (MSE aVR = 0.10 ± 0.10, MSE aVL = 0.14 ± 0.27, MSE aVF = 0.03 ± 0.02). The other ECG waves (P, Q, S and T) were comparable with the uECG. To summarize: FIR filters had the best properties for correcting the uECG recordings. It is possible to tune the models by selecting different filter lengths (FIR) or model orders. We assume that the discrepancy between aVL and the other leads was probably due to an unknown acquisition error. Several sources of acquisition disturbances exist, the prominent are movement and high-frequency noise [25]. We assume that our problems were due to insufficient electrode contact to the skin, leading to more movement artifacts or baseline wander. Furthermore, the setting for the study was in an office space with computers and electronic devices that lead to high-frequency noise.
For the SSM and the ARX, in most experiments, the correlation coefficient did not indicate a better agreement with the reference ECG than with the uECG. Furthermore, a few of the estimated models were unstable for the SSM model (aVR subject 4, aVL subject 3, aVF subject 3) and the ARX model (aVR subject 1, 5; aVL subject 3, aVF subject 3). For future applications, each model needs to be checked for stability before it is used for correction. The values of the cross-spectrum indicated a slightly better agreement with the reference ECG for aVL and a much higher agreement with aVF than the uECG. In the analysis of the ECG waves, the flattening of the R-peak was corrected by both models. The MSE of the other waves indicated inconsistent results that showed no significant improvement compared with the uECG.
In future studies, additional issues of unobtrusive ECGs that are related to artifacts (based on motion or electric discharge) also need to be addressed. However, automatic correction of artifacts was not an aim of the present study; artifacts were selected and rejected manually. Meanwhile, different approaches have been proposed to remove and correct for ECG artifacts [26][27][28][29].
In the future, medical professionals should visually verify whether the corrected uECG can be used for purposes of diagnosis. Additional features of interest include: the PQ length, QRS width, ST length, and the P and T length. These features are not readily available and require more sophisticated computing or manual extraction by a medical professional. Although segmentation algorithms for ECG exist, they are reported to fail for, e.g., the capacitive ECG [30].
A short calibration procedure is necessary for the correction using the models. A recording procedure with the uECG is proposed as follows: (1) Simultaneous recording of the uECG and the reference ECG for 30 s.
(2) Automated data processing: (a) Alignment of the uECG and the reference ECG channel data. (b) Performance of system identification to obtain models. (c) Choice of best model for uECG correction.
(3) Removal of the reference ECG. (4) Long-term uECG recording using the obtained model for correction.
The short calibration phase (steps 1-3) is added to the time needed to apply a standard ECG. In step 4, either a real-time correction can be implemented in the ECG software, or an offline correction is applied afterwards. The need for calibration for each subject or even each acquisition is known in clinical practice, i.e., in EEG acquisition. Artifacts from eye movements are removed by different methods and most require calibration before each EEG recording [31]. Instead of calibration, more general models might be found, when a larger study group is available. However, individualized filters have the advantage that they correct more precisely for the given ECG setup, as demonstrated in the present leave-one-out cross-validation.

Conclusions
In conclusion, three different models for the correction of ECG deformations have been presented and compared. The method using an FIR filter performed particularly well. The corrected ECG curves had a significantly higher agreement with the reference ECG than the uncorrected ECG.
Acknowledgments: This study was funded by the Excellence Initiative of the German federal and state governments (OPBF074).
Author Contributions: A.B. and X.Y. designed the study; A.B. performed the study; A.B. analyzed the data; A.B. wrote the paper; S.L. and D.T. supervised the manuscript writing phase.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: