The input of the proposed method is a two-channel ECG segment obtained after a preliminary segmentation that consists in the isolation of heartbeats in each record. Given the R-peak positions, any heartbeat is isolated by retaining
and
samples to the left and to the right of the R-peak, respectively. For each of the two leads, a vector denoted as
is built with the values of the ECG (in Volts), of size
. The values of
and
are 160–200 and 120–136 for MIT-BIH and INCART, respectively. These values are given in
Table 3, and are similar to the one used in [
20].
The architecture of the whole process is shown
Figure 1a,b. The first stage is feature extraction. The input of the process is, for each lead, a vector
(with
) containing raw values of the segmented heartbeat. Each lead
is normalized (see the “Normalization” module in
Figure 1a by using a rescaling procedure so that the resulting vector
has an intensity that ranges from 0 to 1, as per equation:
At the same time, hand-crafted features are extracted from each lead
: frequency-domain features
, and autoregression features
. A single time-domain features vector
is also computed for both leads. Frequency-domain features, autoregression features and the normalized segmented heartbeat
are concatenated to obtain the vector
(see
Figure 1a). The
vector is processed by the neural module to produce a single output vector
for the two leads (see
Figure 1b). The vector
is then reduced in dimensions by using Principal Component Analysis (PCA) thus obtaining the vector
. The concatenation of the time-domain features
and
is the input of a Support Vector Machine classifier. The output of the classifier is the predicted heartbeat class. In the following subsections the feature extraction and neural module are discussed more in detail.
3.2. Neural Feature Extraction
To exploit the correlation between two ECG leads, we use a one-dimensional variant of the canonical correlation analysis network (CCANet). First introduced in the field of image recognition by Yang et al. [
33], CCANet has been employed in two-view image recognition tasks. Recently, CCANets, which are intrinsically two-dimensional, have been successfully employed in the signal processing field for the classification of two and three lead heartbeats [
20]. A CCANet is usually composed of two cascaded convolutional layers and an output layer: (1) in the convolutional layers, the CCA technique is used to extract dual-lead filter banks; (2) in the output layer, the features extracted from the second convolutional layer are mapped into the final feature vector [
20].
In this paper, with the aim of increasing performance, we design a new 1-D canonical correlation analysis network that is composed of four 1-D convolutional layers and an output layer. Contrary to CCANet, the filters are found by combining a CCA with a singular value decomposition (SVD), and features are extracted after each layer. The use of 1-D convolutions instead of 2-D permits to limit computational cost, thus allowing to increase the number of layers from two to four and, consequently, to increase performance.
The processing pipeline is shown in
Figure 3. The input of the proposed 1-D CCANet-SVD is the concatenation of autoregressive features, spectrogram features, and the original normalized heartbeat, resulting in the following vector
,
. The 1-D CCANet-SVD is trained with
N two-lead heartbeats and then used as neural feature extractor in combination with a linear SVM for heartbeat classification. The network is trained separately for the MIT-BIH and INCART databases.
3.2.1. First Convolutional Layer
We denote the i-th element () of an input vector . We selected a series of segments of size k centered on each value , to obtain the m following segments, . The latter are then zero-centered and concatenated to build a matrix of the segments . This procedure is performed on each of the N training heartbeats and the resulting matrices of segments are finally concatenated to obtain , . Note that our network is simultaneously fed with all the training heartbeats in order to build the two matrices and .
Let us address the filter extraction stage. In [
20], the filters are found with a CCA, thus by maximizing the correlation between pairs of projected variables. The first projection direction can be obtained by optimizing Equation (
4):
with the constraints
, where
, and
and
are the first canonical vectors for each of the two leads. The Lagrange multiplier technique shows that
and
are eigenvectors of
and
, respectively. Given the first
directions, the
l-th projection direction can be calculated by solving problem (
4) with the additional constraints
,
. In the end, the
filters for the first lead are built by taking the
primary eigenvectors of
(i.e. associated with the
biggest eigenvalues), whereas the
filters for the second lead are built by considering the
primary eigenvectors of
.
In this paper, we use a slightly different approach, referred to as the CCA-SVD filter extraction technique. We perform an SVD of both , and , as per and , where the U and V matrices are unitary, and the D matrices are diagonal with singular values on the diagonals. Using an SVD allows to retrieve the directions, which explain the most the variance of and . Since these two matrices derive from the CCA, they capture the correlations between the two leads. Therefore, we use the directions found by performing an SVD on them to have the best explanation of the correlation between the two leads. Consequently, the filters for the first lead are built by taking the columns of that are associated with the biggest singular values of , whereas the filters for the second lead are built by considering the columns of that are associated with the biggest singular values of . Such an approach yields better results than the traditional CCA filter extraction technique (see Experiments). We denote as and , , the filters of size k corresponding to the first and second lead, respectively.
As for the convolutions, for each lead h, each input signal yields outputs , . The length of the input and output signal were kept identical, thanks to a zero-padding of the input.
3.2.2. First Extraction Stage
The extraction stage follows the same steps as in [
20]. First, for each heartbeat, the output of the first convolution is converted to a decimal one-dimensional signal as per
, where
H is the Heaviside step function. Therefore, the range of each component of
T is
.
T is then divided in
B blocks of size
. Each block can overlap with its neighbor, according to
, an overlapping proportion parameter. For each of these blocks, a histogram with
bins is built. The values of the resulting histogram for each block is embedded in a
-long vector and the vectors provided by each block are then concatenated to obtain
. The first feature vector, for the heartbeat, is
.
3.2.3. Second Convolution Layer and Extraction Stage
The second layer is identical to the previous one, except for the fact that the input is different. Indeed, before the first convolution, each lead of a heartbeat was represented by a single vector of length m. After the first convolution, each lead is now represented by vectors of length m. Let’s walk through the second layer with the notations used so far.
The , produced after the first convolutional layer are the input of the second layer. Since we initially considered N training heartbeats, it means that this layer has a total number of input vectors corresponding to lead 1 and input vectors corresponding to lead 2. The same segmentation and zero-centering process as in the first layer gives , the matrices of the concatenated segments for all the input vectors, for each lead.
Applying the CCA-SVD filter extraction technique with leads us to perform the SVD of and , for the first and second lead, respectively. The filters are then found exactly as in the first convolutional layer and we denote as and , , the filters of size k extracted for the first and second lead, respectively.
As for the convolutions, for each initial lead and channel , the signal yields outputs , . At this stage, each initial lead of a heartbeat is now represented by vectors of size m.
The second extraction step is the same as after the first convolutional layer except for a few points. First, for each heartbeat, the output of the second convolutional layer is converted to a decimal signal as per , . The second feature vector for the heartbeat is obtained as per . The are built with a block size and an overlapping parameter equal to and , respectively.
The third and fourth convolutional layers are built similarly. and refer to the third and fourth feature vectors extracted for a heartbeat after each layer. We denote as and , the number of filters for the third and fourth layers, respectively. and are the block sizes for the construction of after the third and fourth convolutional layers, respectively. Finally, we denote as and , the overlapping parameters for the last two layers.
3.2.4. Final Output and PCA
For a given heartbeat, the final output of the network is obtained by concatenating the four feature vectors, as per . Given the significant size of the final feature vector, a PCA is carried out to reduce dimensionality. The number of components is chosen such that the explained variance is over 99.99% thus obtaining . The final feature vector is obtained by concatenating this vector to the vector of time-domain features corresponding to the heartbeat. is a vector of size 1382 or 3020, for INCART or MIT-BIH heartbeats, respectively.
The classification step is performed by a linear SVM, with a regularization parameter .