Data fusion of multivariate time series: Application to noisy 12-lead ECG signals

12-lead ECG signals fusion is crucial for further ECG signal processing. In this paper, a novel fusion data algorithm is proposed. In the method, 12-lead ECG signals are appropriately converted to a single-lead physiological signal via the idea of the local weighted linear prediction algorithm. For effectively inheriting the quality characteristics of the 12-lead ECG signals, the fuzzy inference system is rationally designed to estimate the weighted coefficient in our algorithm. Experimental results indicate that the algorithm can obtain desirable results on synthetic ECG signals, noisy ECG signals and realistic ECG signals.


Introduction
ECG records the physiological information of cardiac activity by using some electrodes on different positions of the body. Therefore, ECG recordings have been widely applied to clinic diagnosis and clinical monitor. Nevertheless, the ECG recordings gathered in clinical settings are often contaminated by noise and artifacts.
On account of the overlapping frequency bands and similar morphologies in noise and ECG signal [1,2], the characteristics of ECG signal are usually distorted and result in the false alarms (FA) in intensive care unit (ICU) and imprecise measurement of ECG signal [2]. Thus, ECG signal quality assessment is a significant element for further ECG signal processing.
The quality condition of ECG signal is related to the accuracy of ECG analysis.
Thus the ECG quality assessment algorithms have been successfully developed [3,4].
The key of these algorithms is to extract the characteristics of the ECG signal appropriately. For instance, as time domain approach, Moody et al. adopted Karhumen-Loeve basis functions to represent the QRS complex and employed the residual error of the reconstructed QRS complex to estimate the instantaneous noise of the original signal [3]. On the other hand, as frequency domain approach, via a long-term ECG recording from coronary care unit, the frequency content and the number of times the ECG exceeded a preset limit were employed to analyze the ECG quality [4].
PhysioNet, a resource for biomedical research sponsored by the National Institutes of Health (NIH), solicited algorithms in 2011 to effectively assess the ECG quality via mobile telephones [5]. Subsequently, in order to estimate signal quality accurately, many ECG quality assessment algorithms were proposed [6][7][8][9][10][11][12][13][14][15]. Among the majority of the proposed algorithms, the time-domain characteristics or the frequency-domain characteristics are used alone for ECG quality assessment. Jekova et al. employed time-frequency-domain characteristics for assessing the ECG quality, yielding a sensitivity of 81.8% and a specificity of 97.8% [9]. Clifford et al. presented some signal quality indices (SQIs) and these characteristics also involved both time-domain and frequency-domain, which can partly reflect the condition of the ECG quality [10]. In [11], Li et al. developed four novel signal quality indices which enriched the SQIs in some way. The SQIs can provide a great deal of physiological information, thus those quality indices have been extensively used for ECG quality assessment [2,11,12].
Apparently extracting the ECG quality characteristics reasonably is significant for the quality assessment algorithm. Nonetheless, the conventional methods can only be performed on a single-lead ECG signal for assessing the ECG quality. Once facing 12-lead ECG signals, the signal quality of each lead must be assessed individually, thereby the computational efficiency is brought down.
In order to reduce the amounts of pending ECG data properly, Chen et al. employed Dower transform to convert 12-lead ECG signals to 3-lead vectorcardiogram (VCG), which are properly analyzed by multiscale recurrence -3 -analysis in each scale [16]. Wavelet analysis is an effective algorithm for handling nonlinear and nonstationary signal. Nevertheless, VCG signals are decomposed into a series of multiple wavelet scales and this also increases the amount of pending data observably. In other words, in [16], the application of multiscale recurrence analysis virtually weakened the original ideal of the Dower transform.
Principal component analysis (PCA) is a significant technique of signal processing derived from applied linear algebra. As a simple and non-parametric method, PCA has been applied to ECG analysis successfully [17]. The purpose of PCA is to map high-dimensional data to low-dimensional spaces, thereby realizing data compression. Nevertheless, in PCA, the most interesting variances of data set are associated with the first k principal components and other less important components, which can help revealing the dynamics information of high-dimensional data further but are discarded [18]. Hence, the loss of data information is inevitable to some degree when using PCA. The time series of ECG signals are essentially both nonlinear and nonstationary. In order to comprehensively analyze ECG signals, the phase space reconstruction theory has also been applied [19,20]. According to this theory, the ECG signals are reconstructed in high-dimensional spaces with more complete characteristic information of the original signals. In this regard, the approaches based on phase space reconstruction are relatively preferable to PCA for 12-lead ECG signals fusion.
12-lead ECG signal contains abundant heart state information and can fully describe the electrical activity of the heart. Because the electrodes of the twelve leads are placed in the different position of the body, the signal amplitudes are different between all lead ECG signals. It means that there are differences for the time-domain characteristics of the twelve leads and the signal qualities of all lead signals ought to be assessed respectively. In order to analyze the quality of the 12-lead ECG signals in full directions, many characteristic parameters of each lead of ECG signals, e.g. SQIs, need to be calculated. Facing the quality assessment of 12-lead ECG signals, conventional algorithms are virtually inefficient in relative terms. Thus how to comprehensively assess the quality of 12-lead ECG signals and effectively reduce the -4 -computational complexity are very critical. The key solution lies in converting the 12-lead ECG signals into a single-lead physiological signal. It means that the quality characteristics of original signals are inherited in the single-lead signal as much as possible.
The main objective of this study is to develop a novel fusion data algorithm (NFDA) for 12-lead ECG signals that can adequately integrate the qualitative characteristics of 12-lead ECG signals into a single-lead signal. Previous studies implied that ECG signals possess the nonlinear and nonstationary characteristics, which is the chaotic signal [19]. Currently, the analysis of deterministic chaos is an active field. In many branches of the research, chaotic time series prediction is a fundamental issue of chaos theory [21,22]. Previous research has shown that the local The outline of the rest of this paper is as follows. In Section 2, LWLPA is briefly discussed as preliminary. Section 3 introduces NFDA which is based on LWLPA. The performance of NFDA is evaluated by synthetic ECG signals and realistic ECG signals in Section 4. Section 5 contains the conclusion. -5 -In the course of the signal quality assessment of 12-lead ECG signals, compressing the pending ECG data is an efficient solution to further improve the efficiency of the assessment algorithms. Since the cardiac signals reveal the possibility of deterministic chaos, here the LWLPA algorithm as an important prediction method of chaotic time series is used to fuse 12-lead ECG signals. In this section, as a preliminary, we will briefly review the algorithm, which is closely related to NFDA.

The Local Weighted Linear Prediction Algorithm
Due to the butterfly effect of chaotic systems, the evolutionary tendency of chaotic systems cannot be predicted in a relatively long time. However, there still exists of predictability because of the linear correlation of motions in short period for chaotic systems. Takens [24] proved that if the embedding dimension and delay time can be chosen appropriately, the regular evolutionary trajectory of chaotic systems could be completely reconstructed and revealed in an m-dimensional space.
2) Calculate the Euclidean distances between X k and other neighboring vectors and the weighted coefficient of each neighboring state vector X ki is computed as where d min is the minimum distance in and  is the regularization parameter being usually set to 1.
3) Estimate the future state X k+1 via linear prediction model

The Novel Fusion Data Algorithm
As the most important section of this paper, the basic idea of the NFDA algorithm will be briefly introduced in the subsection 3.1. In NFDA, the significance of weighted coefficients and how to obtain these appropriate parameters will be discussed in the subsection 3.2.

Basic Idea of Novel Fusion Data Algorithm
How to significantly improve the efficiency of ECG quality assessment algorithm is a realistic issue. It will facilitate solving this problem if the pending 12-lead ECG signals are effectively compressed. Evidently the LWLPA algorithm can successfully meet the requirement of the problem above.
To illustrate the basic idea of our algorithm, an example will be given. According to the phase space reconstruction theorem, consider the two reconstructed phase trajectories L 1 and L 2 , shown in Fig. 1. In this example, suppose that the trajectory L F is the fused result of the trajectories L 1 and L 2 . Furthermore, the state point X F on the trajectory L F should satisfy the linear prediction model Here, how to obtain the parameters a and b in the linear prediction model is a critical problem. From the point of LWLPA, in Fig. 1, the vectors X 1 (p) and X 2 (p) can be considered as the neighboring vectors of the current state X F (p). With the two vectors being employed, we can calculate the parameters a and b by (5). The equation can be described as follows where s  is the weighted coefficient, which reflects the degree of impact from the state point to the fusion result. Then the fused state can be calculated as The fused trajectory L F will be employed for the original signal quality assessment. It implies that to some extent, the characteristic information of original signal ought to be fused in the trajectory L F . Here, how to effectively inherit the characteristic information to fused result is a key for NFDA. For solving the problem, the weighted coefficients of state points on evolutionary trajectory should be appropriately estimated.

Weighted Coefficient Design for NFDA
The motivation of weighted coefficients design originated from the idea that, in   As an important application of fuzzy logic and fuzzy sets theory [25], fuzzy inference system (FIS) has been successfully applied in decision support tools and other subjects. FIS is useful for dealing with linguistic concepts, which can achieve nonlinear mappings between inputs and outputs. Thus, FIS could be well applicable to estimate the weighted coefficient of the data point.
In this subsection, the two simple FIS d and FIS  will be devised. FIS d is composed of nine fuzzy rules. The input variables of FIS d are the distance D and the change rate D r of the distance, respectively. The output variable of FIS d is O d .
The two parameters D and D r can be calculated as where D(p) and D r (p) are the Euclidean distance and the change rate of X(p) at p-step, respectively. FIS d is applied to estimate the weighted coefficient Similarly, FIS  consists of two inputs, one output and fifteen fuzzy rules. The two input variables are the angle of cosine  and the change rate r  of the cosine, which can be obtained as where () p  is the angle of cosine between two neighboring vectors V(p) and In the study, the universe of input  is set to [ 1,1]  and the other variables of two FISs will be set to [0,1] uniformly. Here, the universes of all variables of the two FISs are divided into several fuzzy sets, shown in Fig. 2. According to the aforementioned relationship, the rules of the two FISs could be devised, which are summarized in Table 1 and Table 2, respectively.
With FIS being employed, the weighted coefficient of  is calculated as where h is the number of rules of FIS, y(q) is the output of the qth rule and () q  the degree of activation for the qth rule. Based on FIS d and FIS  , the two parameters d  and   can be easily obtained by (13) where () where max 1, 2, , of the state X l (p) at the p-step can be computed by (14), (15).
3) Compute linear fitting parameters a and b by the least square equation In this section, the NFDA algorithm for 12-lead ECG signals is introduced. In the algorithm, the linear prediction equation is used to compute the fused state. With the qualitative characteristics of original signal, the weighted coefficient of each reconstructed trajectory is estimated through FIS properly. In the next section, the algorithm will be applied to 12-lead ECG signals and the performance of this -13 -algorithm will be further illustrated.

Application of NFDA in 12-lead ECG Signals
In order to estimate the quality condition of 12-lead ECG signals comprehensively and further improve the computational efficiency, the NFDA algorithm is proposed.
In this section, in order to assess the performance of NFDA, synthetic ECG signals and realistic ECG signals are applied in the experiments. In subsection 4.1, we evaluate the validity of NFDA by synthetic ECG signals. Then via three types of noises from the MIT-BIH Noise Stress Test Database (NSTDB) [26,27], the noise tolerance of the algorithm will be analyzed in detail. In subsection 4.2, NFDA is executed on the database of PhysioNet/Computing in Cardiology Challenge 2011 [5] to further illustrate the performance of the algorithm. It is worth mentioning that, in this study, the False Neatest Neighbors (FNN) algorithm and the Average Displacement (AD) algorithm are adopted to determine the optimal embedding dimension m s and delay time s  of the sth lead ECG signal. The two methods can guarantee the objectivity and accuracy of the experiments to some extent.

Synthetic Signals Experiments
As realistic ECG signals are recorded in clinical environment, the signals would be contaminated inevitably by the noise and artifacts with different magnitudes. For solving this problem, synthetic ECG signals were widely applied in estimating the performance of algorithm. In [28], al. [29] and Clifford et al. [30] developed an improved dynamical model which can generate 12-lead synthetic ECG signals. In the experiment, ideal VCG signals can be obtained via the model, as shown in Fig. 3 and it will be employed for testing the performance of NFDA.    Realistic ECG signals may be contaminated with the different types of the noises, e.g., baseline wander (BW), electrode movement (EM) and muscle artifact (MA), which cannot be easily removed by simple filter algorithms. Hence, the trajectory fusion problem of the noisy VCG signals will be discussed. To ensure the objectivity of experiment in this study, realistic noises are adopted from NSTDB and the three types of realistic noise, BW, EM and MA are shown in Fig. 5, respectively. MA, are added to clean synthetic VCG signals, with different magnitudes of signal noise ratio (SNR). 12-lead ECG signals can be transformed to 3-lead VCG signals via the linear transformation. It means that the signal quality of 12-lead ECG can be inherited in some degree. In other words, if one lead signal in 12-lead ECG signals is contaminated by the noise, the quality characteristics of the lead signal will also be reflected in the VCG signals. In the experiment, the lead V x of VCG signals is randomly chosen, which is contaminated by the noise. The parameters of the SNR levels are summarized in Table 3 [12]. Here the lead V x is polluted by BW and the magnitudes of SNR are 12dB, 6dB,  In this section, the clean V x signal is polluted by the noise of EM and MA with different magnitudes of SNR and the reconstructed trajectories are shown in Fig. 7 and Fig. 8, respectively.  With the experimental results being comprehensively analyzed under different conditions, it suggests that with the NFDA algorithm being employed, the reconstructed trajectories of fused results can effectively describe the quality characteristics of noisy synthetic ECG signals. In order to test the performance of our method adequately, some realistic ECG signals will be applied in subsection 4.2.

Realistic Signals Experiments
As an important database, the PhysioNet/Computing in Cardiology Challenge 2011 has been widely used for testing the ECG quality assessment algorithms. In the database, standard 12-lead ECG signal is sampled at 500Hz and recorded for 10 seconds. There are 1000 12-lead ECG records to be employed as train set (Set A) and the signal quality was quantified by a group of annotators professional in ECG It is important to note that the length of data in set A is 10 seconds. In clinical setting, ECG signals might be contaminated by the uncertain noise in a certain time period. In other words, for a long-term ECG signal, sometimes only parts of the ECG signal quality are unacceptable with the remaining being acceptable. Therefore, the long-term ECG signal needs to be divided into a number of short-term ECG signals and the signal quality of them could be estimated individually.
In order to analyze the ECG signal quality, the realistic ECG signals need to be pre-processed before quality assessment. Firstly, each lead of the 12-lead ECG signals should be examined for the constant signal detection. If some constant signals are contained in the ECG signals, then the realistic signal need not be further processed and it can be identified as being unacceptable. Otherwise, the 12-lead ECG signals will be transformed into VCG signals by the inverse Dower transformation matrix [31] by the following equation: where D inv is given by    In this subsection, NFDA is evaluated by realistic 12-lead ECG signals.
Experimental results indicate that the fused trajectory can effectively inherit the quality characteristics of 12-lead ECG signals.

Conclusion
In this paper, the NFDA algorithm is proposed, which utilizes the idea of the LWLPA algorithm to fuse 12-lead ECG signals. Meanwhile, two fuzzy inference systems are designed for effectively inheriting the characteristics of original signals. In this study, Synthetic ECG signals, noisy synthetic ECG signals and realistic ECG signals are employed to test the validity of the algorithm. Due to the limitation of paper length, two 12-lead ECG signals are adopted randomly from Set A of PhysioNet/Computing in Cardiology Challenge 2011 which contains 773 acceptable qualities of ECG records. Analogously, two 12-lead ECG signals are adopted randomly from the data set which are tagged as unacceptable quality. By the analysis of the remaining data in Set A, the quality characteristics of ECG signals can be exhibited by the reconstructed trajectories of the fused signals clearly. The experimental results indicate that the NFDA algorithm could effectively compress the 12-lead ECG signals and well fuse the quality characteristics of the original signal.
There are still many problems awaiting us to offer solutions. If the fused signal need to be analyzed further, how to obtain the quantified characteristic parameters is a crucial problem in the quality estimation of ECG signal, although the quality characteristics of the fused signal can be observed easily. The recurrence quantification analysis (RQA) method particularly suits to handle the biological signals. Hence, RQA will be employed to quantificationally extract the quality characteristics of the fused signal in the further research. Additionally, how to design an optimized FIS is also needs to be dealt with as the future work.