Data Independent Acquisition Based Bi-Directional Deep Networks for Biometric ECG Authentication

: In this report, the study of non-ﬁducial based approaches for Electrocardiogram(ECG) biometric authentication is examined, and several excessive techniques are proposed to perform comparative experiments for evaluating the best possible approach for all the classiﬁcation tasks. Non-ﬁducial methods are designed to extract the discriminative information of a signal without annotating ﬁducial points. However, this process requires peak detection to identify a heartbeat signal. Based on recent studies that usually rely on heartbeat segmentation, QRS detection is required, and the process can be complicated for ECG signals for which the QRS complex is absent. Thus, many studies only conduct biometric authentication tasks on ECG signals with QRS complexes, and are hindered by similar limitations. To overcome this issue, we proposed a data-independent acquisition method to facilitate highly generalizable signal processing and feature learning processes. This is achieved by enhancing random segmentation to avoid complicated ﬁducial feature extraction, along with auto-correlation to eliminate the phase difference due to random segmentation. Subsequently, a bidirectional recurrent neural network (RNN) with long short-term memory (LSTM) deep networks is utilized to automatically learn the features associated with the signal and to perform an authentication task. The experimental results suggest that the proposed data-independent approach using a BLSTM network achieves a relatively high classiﬁcation accuracy for every dataset relative to the compared techniques. Moreover, it exhibited a signiﬁcantly higher accuracy rate in experiments using ECG signals without the QRS complex. The results also revealed that data-dependent methods can only perform well for speciﬁed data types and amendments of data variations, whereas the presented approach can also be considered for generalization to other quasi-periodical biometric signal-based classiﬁcation tasks in future studies.


Introduction
At present, we are experiencing emerging digitization in most aspects of our lives. Day-to-day, use online applications and services such as mobile banking, social networking, online stock exchange, and trading or email services, with little apprehension about storing personal confidential information on our devices or client servers. In the digital era, sadly, we are also faced with new attacks and exploits, as well as unauthorized access to sensitive information and devices by malicious viruses or hosts. It is incredible that large populations of users still rely on numerous types or particular sets of passwords, which have been used for authorized access since the earliest era of computing. In recent years, there has been a shift in attention towards biometric security systems. These security applications facilitate the identification of an individual using their distinct biological characteristics instead of a set of numerical or alphabetical passwords. The most widespread techniques use fingerprint, iris, and facial recognition approaches, and are normally found in smart devices [1]. Regarding biometrics in mobile devices, the benefits are apparent. There is an added degree of security in relying, at least in part, on an extremity (e.g., finger) that only the user possesses. For instance, with a standard password, together with a personal fingerprint, the sense of security is enhanced. Beyond smartphone security, the use of features such as touch IDs improves the convenience of interacting with your device. With applications based on such security systems, instead of manually entering your payment information, users simply have to swipe a finger across their device. It is simple and saves time. Using biometric authentication for security purposes also improves the security of information, processes, and establishments. Some organizations have implemented biometric scanning as a modern method of "punching in" to work. This assures that all employees are honest in terms of the hours they have worked. In turn, this saves the organization money. However, there are still difficulties and issues related to fingerprint usability and reliability. Current challenges in ECG biometric classification [2][3][4] tasks include the extraction of features from the ECG signals to implement a model to learn hidden patterns for accurate generalization, proving the stability of the biometric and protecting against attacks. In this report, we proposed biometric-based ECG signals for human authentication with non-fiducial techniques.
The ECG signal usually consists of three complexes, namely P, QRS, and T. They are determined based on their corresponding complexes, also known as fiducial points, which are the peak points of the respective complex. Using this information as distinct features, more informative characteristics including time-domain features such as amplitudes and intervals, are generally used as features for individual signals. However, the shapes of the ECG signal can vary depending on the location of the electrode on the human body during data acquisition. P-QRS-T complexes are not available for every version of the ECG signal. Thus, a non-fiducial approach using machine learning is used to address these problems. By introducing a non-fiducial method based on deep learning techniques, the fiducial extraction can be neglected in the pre-processing phase. In addition, unlike previous studies that relied on specified data for classification, a data-independent acquisition technique is also extended to form a model that can handle various types of ECG data input. To enable a data-independent and highly generalizable signal processing and feature learning process, a random segmentation based a wavelet domain multiresolution bi-directional LSTM network is proposed. Specifically, it allows for blindly selecting a physiological signal segment for classification purpose, avoiding the complicated signal fiducial characteristics extraction process which does not rely on QRS peak detection. It also removes phase difference among random-chosen signal segments by the auto-correlation approach. Thus, ECG data with either QRS peak or without QRS peak can be processed for data acquisition task in upstreaming phase without any hesitation. However, the scope of our research was focused on non-fiducial approach, rather than fiducial techniques, our proposed models are examined only on non-fiducial approach. The major contributions of this research are as follows: 1. Applied random segmentation and auto-correlation for various types of ECG data input independently, and to produce a reasonable quantity of training data from a raw signal [5]. 2. Proposed and compared the performance of generalization by designing 1D-CNN networks, bidirectional RNNs on both Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) cells.
The report consists of four components. The first is a brief review of related works and the methodology. In the second part, the authors present their research methodology and the detailed explanation of pre-processing the data with insightful information on the proposed method. In the third part, the research results and comparisons with other techniques are presented, and in the final, section, the main conclusions and concluding remarks are presented.

Fiducial Methods
ECG-based authentication applications are typically based on two approaches, namely, fiducial and non-fiducial methods. The fiducial approach utilizes a feature extraction process, wherein the points of interest are within a heartbeat wave. Usually, the heartbeat wave consists of a P-QRS-T complex, which represents the activity of a beating human heart, acquired using electrodes placed on different parts of the body. These complexes are then used to extract latency and amplitude features [4,6]. Such approaches generally rely on robust heartbeat segmentation and fiducial peak point detection [7][8][9][10]. Thus, manual feature engineering efforts are necessary to capture a heartbeat from an original ECG signal. Moreover, precise estimation is important to annotate the peak locations of the P-QRS-T complex of a signal. After this fiducial information is detected, the amplitude and timeinterval between corresponding points are measured. The entire process of identifying peak data points of the QRS complex and calculating the time-interval features are considered as the pre-processing component or the signal processing phase in ECG-based biometric authentication applications, as shown in Figure 1.

Non-Fiducial Methods
Numerous recent studies have been proposed in the field of biometric signal-based security systems by exploiting deep learning methods [11][12][13][14][15][16][17][18]. Q. Zhang et al. [19] proposed a multi-resolution network based on 1D-CNN for ECG human identification applications for smart devices. The method was extended to transform raw input signals into multiple versions of wavelets to improve the context representations of the signal. However, the autocorrelation of segmented windows and transformation of the wavelet are necessary. X. Zhang [20] proposed models using RNN networks in various types of cell units in hidden layers. The results suggested that the use of both LSTM and GRU gates did not significantly affect performance in terms of classification accuracy. M. Al Rahhal et al. [21] proposed a method that implements stacked denoising auto-encoders with sparsity constraints, and a softmax layer was applied on top of the hidden representation layer as a deep neural network. M. Zihlmann et al. [22] proposed two models based on deep neural networks, CNN, and a hybrid approach for combining the CNN and the RNN network using an LSTM cell unit. A similar approach was proposed by Warrick and Homosi [18], which automatically learned the hidden characteristics of a signal and identified cardiac arrhythmias in an ECG signal using CNN and LSTM techniques.

Convolutional Neural Networks (CNN)
Convolutional neural networks are a category of deep neural networks that have proven to be effective in areas such as image recognition and related classification tasks. CNNs have been successful in identifying faces [23], objects, and traffic signs. It uses convolutional layers to filter input data to obtain useful information, and a nonlinear activation function is applied to the results of the convolutional operation. The convolution operation includes the combination of inputs with the kernel, also known as filters, to form a transformed feature map. Then, a fully connected layer is used after the pooling process for classification. The kernel filters complete features extracted by sliding from top to bottom and from left to right in the original matrix. A convolutional neural network is also known as a type of multi-layer neural network that improves the error backpropagation network. CNNs are good at classifying images, especially large images. CNN was first proposed by Y. Lecun and used for handwritten character recognition [24], see in Figure 2. The convolutional neural network (CNN) technique has two components: a feature identifier and a fully connected layer. The feature identifier is performed using convolutional layers and pooling layers, wherein the features are learned automatically. In ECGbased classification problems, the fully connected component performs signal classification using the features learned from the feature's identifier component.

Recurrent Neural Networks (RNN)
The RNN has is highly preferred method [25,26], especially for sequential data. Every node at a time step consists of an input from the previous node, and it proceeds using a feedback loop. In RNN, each node generates a current hidden state and its output is obtained by using the given input and previous hidden state as follows: where indicates the hidden block of each time step t. W and V are the weights for the hidden layers, b denotes the bias for the hidden and output states, and f denotes the activation function applied on each node throughout the network. o t is the output vector at time step t which predict the next output of the sequence from the network, and is a softmax function of hidden representation h and weights W associated with it along with the bias b.

Long Short-Term Memory (LSTM)
Long short-term memory is a type of RNN model designed to prevent the output of a neural network from either exploding or decaying (long-term dependency) as it passes through the feedback loops for a given input [27]. Such feedback loops in RNNs allow the network to be better at pattern recognition compared to other neural networks. Due to their ability to learn long-term dependency, LSTMs are applicable to numerous long sequence learning problems such as language modeling and machine translation, and many other related tasks. LSTM models are designed by applying memory cells with several gates in a hidden layer, as shown in Figure 3a. The hidden layer blocks with the LSTM cell unit, and the three functions of gate controllers are formulated as follows: • Forget gate f t decides which part of long-term state c t should be omitted. • Input gate i t controls which part should be added to long-term state. • Output gate g t determines which part of c t should be read and outputs to h t and o t .
The following equations facilitate the calculation of the long-term and short-term states of the cell and the output of each layer in the time step. where

Gated Recurrent Unit(GRU)
Generally, both LSTM and GRU cell units are applied based on the intuition of avoiding the vanishing gradient problem in deep neural networks. The GRU is modified with an update gate to decide whether to pass the previously hidden layer output to the next cell. The forged gate is implemented for additional mathematical operation with a new set of weights. Intuitively, the reset gate decides to combine the new input with the previous memory. The update gate determines which part of the previous memory information should be passed on to the network to calculate the new state. The insight information and structure is shown in Figure 3b, and the formulations for each gate and their outputs are calculated as follows: where W x,r , W x,z , W x,ô denote the weight matrices for the corresponding connected input vector, W o,r , W o,z , W o,ô represent the weight matrices of the previous time step, and b r , b z , and bô are the biases.

Data Argumentation Process
In this research, ECG-ID (ECGID), MIT-BIH Arrhythmia Database (MIT-BIH ECG), STAFF-III, and LT-AF [28][29][30][31] for ECG signals with QRS complex dataset, and AFDB [32], and AHA dataset [33] for ECG signals without the QRS complex from PhysioNet [34], have been performed separately for all candidate models. The signal processing phase can also be determined as data pre-processing in ECG authentication applications. Figure 4 illustrates the framework of data preprocessing phase of proposed methodology where the raw ECG signals with QRS complex are conducted with data argumentation process such as data filtering, R-peak detection and heartbeat segmentation in order to obtain a segmented single heartbeat window before they are fed into proposed deep networks. Similarly, ECG signlas without QRS complex are conducted with extended data independent acquisiton process where the signals are randomly segmented without having peak detection, and performed auto-correlation technique to transform two dimensional data to be fed into downstreaming process. In data argumentation process, it mainly consists of three core operations, i.e., detrending, noise removal (filtering) [35], and R-peak detection, which is the procedure for annotating the index data points of the corresponding R-peak complex along the signal. Then, the original ECG signal is detrended to yield a better approximation for a specified length of a segment for signal analysis. The nonlinear trend in the signal is also removed by fitting a low-order polynomial to the signal and subtracting it. The polynomial is set to order 6 [36]. Subsequently, a Butterworth bandpass filter in the range of 5Hz and 15Hz is applied to eliminate the baseline wander. Baseline wanders are low-frequency noise that occurs during data acquisition in signal processing. This is usually due to perspiration that affects the electrode impedance, respiration, and body movements such as finger movements on the electrode. The detrended signal and filtering process of the original signal can be found as shown in Figure 5. The result of the filtering process is then normalized in the range of 0 and 1 by subtracting from the mean value to balance the contribution for the training phase using (13), (14) x, wherex denotes the raw ECG signal and the resulting signal, respectively.x Since the R-peak is the most prominent peak used to identify a heartbeat within a signal, R-peak detection is still necessary using the Pan-Tompkins algorithms [28,37] for annotating the respective peak points throughout the signal. Thus, every heartbeat in the original signal can be extracted. After annotation of the indices of the R-peak, a suitable number of samples before and after a given R-peak point are then sliced to segment the heartbeat of a signal, which is a vector form. For our fiducial approaches, we consider 125 samples before and after the R peak point to form a heartbeat for the dataset with the QRS complex, and 150 samples for the ECG-ID dataset. The other datasets are also conducted using the same approach depending on their sampling rates, and a sample of a vector that interprets the heartbeat. For each signal for every dataset, approximately 45 to 50 heartbeat segments were withdrawn using 251 samples, whereas 51 heartbeats were extracted using 301 samples from the ECG-ID dataset and other datasets for heartbeat segmentation.

Extended Data Independent Acquisition
Unlike the data acquisition technique used in the previous chapter, we utilize random segmentation without applying QRS peak detection, which is available for any type of ECG signal or the signal dataset with no QRS complex. The original signal is blindly segmented into segments with an equal length, which is a 2-s window (720 samples) to include at least one heartbeat, since the normal range for heart rate in a signal is from 40 to 280 beats per minute [25]. For each recording, 500 random windows were chosen, half of which were used to train and the other half for testing. The auto-correlation operation is introduced to remove the phase difference due to blind segmentation. The auto-correlation operation is applied to the segmented windows to remove the phase difference that occurs due to random segmentation and thus provides a shift-invariant multi-resolution data representation used in [5], and it is defined as: where with a time lag of m, m is chosen from 0 to T − t − 1. T and W correspond to the number of samples in an ECG window, which is 720, and the number of ECG window which is 500, respectively.
The autocorrelation function calculates the correlation of a series with its delayed copy, i.e., the similarity between series as a function of the time lag between them [38]. Therefore, it can effectively discover repeating patterns in the quasi-periodic ECG signals even with different numbers and occurrence times of the heartbeats. After removing the phase difference, the multi-resolution data can now be fed to the networks for automatic feature learning and user identification. Figure 6 shows similar outputs when auto-correlation is applied to two different wavelet domain signal segments.

Models Overview
In this section, we will review several proposed techniques based on deep neural networks for downstreaming approaches. The first model is designed based on a 1-D convolutional neural network (1D-CNN), which facilitates the learning of hierarchical distinct features to present a new version for representation of a high-level abstraction. Then, such abstracted data are fed into a classification layer such as a fully connected layer for further authentication processes. The rest of the other proposed methods based on RNN with modified cell units, LSTM and GRU, are also collectively proposed and investigated, see in Figure 7.We also consider changes in their hidden states during the training procedure based on deployment in a bidirectional manner. For conventional RNN models, the hidden state of a given time step is calculated using a linear combination of the previous hidden state and the current input. Although GRU and LSTM networks share a similar network structure, the update gate of the hidden state is more complex in both approaches. Figure 8 illustrates the proposed RNN-based models for the different cell units used during the experiments in this study.

Proposed 1-D CNN Model
Convolutional neural networks (CNNs) are neural networks built to primarily classify images, cluster images based on similarity, and perform object recognition. It was developed in the 1980s. The CNN is designed for robust training in terms of the stochastic gradient descent algorithm for each layer. Moreover, CNNs have been commonly used for feature learning and classification problems. In this report, a deep 1-D CNN is designed to perform ECG classification using a fiducial approach. The optimal parameters of the model structure are selected according to the previous works and after various lengths of trails. Moreover, all the parameters of the networks are also conducted with various settings by trial-and-error approach, and chose the optimal setting for each network which yields the most with better performance results. The weights in the models were initialized randomly at the start of the training process, and progressively updated throughout the process. The detailed network architecture of the proposed CNN model is implemented, and the respective parameters of the network used in this study are listed in Table 1. There are four hidden layers in the model that are used for feature learning, followed by a fully-connected layer with 40 neurons. Then, a decision-making classification layer with a sigmoid function is applied to produce the appropriate categorical distribution for each class. The intuition of the proposed CNN model is that it allows a function that differentiates the patterns and distinct characteristics of all classes based on their respective input signal. Generally, the ground truth result is indicated in a one-hot distribution vector, whereas the input is a discrete sequential sample, where the data point is a vectorized representation of individual samples at time t. The signals are segmented into specified windows of definite length followed by the procedure presented in the first section of this report. Each window captures at least one or more heartbeat waveforms in the original signal. The parameter values, such as filter size, stride, and padding values, are set according to Table 1 throughout the network layers from the first layer to the last layer. Convolutional operations with nonlinear activation functions are applied between each layer. In the first and second layers, 30 filters are implemented. Finally, the softmax function is used in the last layer to produce the distribution of the corresponding class for decision making in the form of a vector, in the range of 0 to 1. The cross-entropy loss function for the network's targets can be calculated as whereŷ i is the ground truth target vector and y i is the output vector of our model for class i.
To obtain the output in categorical distributions across all the subjects, outputs y 1,...,i are calculated by applying a sigmoid function to the weight sums of the activation function of the previous layer.

Proposed Bidirectional RNN Architectures
To investigate the use of RNN methods for ECG classification, we proposed the models based on different types of RNN techniques. The input training data can be set by S = (X n , O n ), n = 1, ..., N, where the sample suggests the m numbers of samples in a signal with 251-sampled segments following the procedure used in the preprocessing phase, whereas 301-sampled segmented windows are for the ECG-ID dataset, where i denotes the number of O n i subject, which denotes the corresponding ground truth for each subject of the nth input. The ground truth values are determined as 1 for a given subject signal, and 0 for the other subjects, respectively. For a given sequence input, a classifier is trained to learn the probabilities of N classes. The first proposed model for the non-fiducial approach is based on a bidirectional RNN with an LSTM cell unit in the hidden state layer, and is called BLSTM, as shown in Figure 3a. It is associated with the cell unit. The segmented signal inputs x 1 , ..., x T from the pre-processing section, are fed into the network for each time step t(t = 1, ..., T) for each LSTM cell. Each cell unit in a bi-directional manner consists of a parallel of LSTM tracks, known as forward and backward sequences, to capture the context from the past and future. During the final time step, the two parallel tracks of the LSTM cell unit are concatenated into a single vector. In the first hidden layer, the forward cell states h f 0 , and the backward cell state h b 0 are initialized with zero for all layers N. The input x t at time t, and previous cell states h t−1 to produce the output of the corresponding layer p n t at time t and at the nth layer for both backward or forward tracks given its parameter θ n , can be defined as where θ n denotes the parameters (b, U, W) of the respective cell unit for layer n.
For the next proposed model, see Figure 3b, the only difference between BLSTM and the GRU gate units is the cell unit at the hidden layers. In addition, to address one of the most important challenges in deep neural networks, overfitting, the dropout layer is applied to each cell for all RNN-based methods. Sharing the similar outputs as in BLSTM at the last layer, the outputs from both forward and backward track, the late fusion for bidirectional networks is concatenated into a single vector. Then, the output is followed by a softmax activation function to achieve the N-dimensional output in the last layer. The overall model architecture is shown in Figure 7. As implemented based on a bi-directional manner, the forward track trains the input from left to right, while the backward track traces the input from right to left in both BLSTM and BGRU, and can be defined as follows:

Network Training
To achieve a higher acceleration of the training process, which is usually a bottleneck while operating deep networks with many layers, our proposed models are developed in the Tensorflow deep learning library, which can be executed on a graphics processing unit (GPU). It is commonly at least 5 to 10 times faster than the central processing unit (CPU), and can also predominantly increase the training process. All our experiments were executed on a GeForce GTX 1080 GPU. During the training process, the 1D-CNN based model learns hierarchical features by performing convolution and pooling operations in accordance with the parameters provided in Table 1. The stochastic gradient descent (SGD) learning method is applied to increase the acceleration of the training process. This allows for passing a batch of training input data to the neural network each time. The batch size is selected as 150 for all proposed methods, including RNN-based networks, to achieve a compromise based on two considerations. Specifically, a large size results in a small convergence time by reducing the variance of stochastic gradient updates, and a small size to strengthen SGD to leap out the shallow minima during the error loss function. However, the network can learn the hidden patterns of the input signal and reach its convergence at 14 epochs. The epoch size is set as 50 to offset the under-fitting and over-fitting considerations.
For RNN-based models, the batch size is selected as 150 because it yields better performance compared to the other schemes, as previously indicated. The optimization method was applied using the Adam optimizer and the learning rate was set to 0.001. Moreover, the loss functions were determined based on the categorical cross-entropy method used, whereõ l indicates the ground truth vector, and o l denotes the output vector of the model for the l class. In our experiments on RNN-based methods, the optimal window length of the segmented signal is chosen with regard to previous works, and after the various lengths of the attempts. The parameters of the proposed models were also examined using various trials of the settings, and the optimal setting was selected to yield better performance. The weight parameters in the proposed models were randomly initialized at the beginning of the training process and incrementally updated throughout the entire procedure. A dropout value of 0.2 for outputs in the first layer of the networks and the last layer inputs was used to avoid the over-fitting problem that is typically encountered in the learning of deep neural networks. The top block indicates the training cross-entropy loss as it reaches its convergence at 70 epochs when the percentage of subjects used for training is 50%.

System Evaluation
For our experiment, the ECG-ID (ECG-ID) and MIT-BIH ECG, STAFF III database (STAFF-III), and Long Term AF Database (LT-AF) data were collected from PhysioNet. The ECG-ID dataset includes 310 ECG recordings digitized at 500 Hz, obtained from 90 subjects (10,000 samples), while the MIT-BIH ECG dataset contains 168 short recordings for 47 subjects, set to pose a variety of challenges for ECG compressors, especially for compression methods. The STAFF III database was acquired during 1995-1996 and contains standard 12-lead ECG recordings from 108 patients. LT-AF dataset consists of ECG recordings of 84 subjects with paroxysmal or sustained atrial fibrillation (AF), and digitized at 128 Hz with durations which vary from 24 to 25 hours. Moreover, two more datasets, namely AFDB with 20 subjects and AHA datasets with two subjects (without QRS complex), were examined for our extended data-independent acquisition-based approach. These datasets may be acquired by different lead configurations. Moreover, the datasets were collected from healthy or quasi-healthy participants, and also include severe heart diseases such as ST depression or elevation, arrhythmia, atrial fibrillation and malignant ventricular ectopy. However, the datasets were not acquired with the same sampling rate, all the ECG recordings were resampled to 360 Hz to fairly illustrate the performance. To train the RNN-based networks, the training dataset was divided into batches of several heartbeats for each dataset. The weights for each batch were updated upon completion. The input data was forward and backward propagated throughout the network, and the error cost was calculated by back-propagating the unfolded network in time. We adopted a method called backpropagation through time (BPTT) with the Adam optimization method employed in our experiment, with the learning rate set to 0.001. The batch size was set as 150, which yielded higher performance for all the methods, namely, traditional RNN, RNN with LSTM, and RNN with GRU. The epoch size was chosen as 150 to balance network over-fitting issues. Furthermore, a dropout [39] of 0.4 outputs of the first layer, and the last layer inputs were utilized to overcome the overfitting problem. For the evaluation, total of 5066 segmented windows from all dataset from all dataset were separated for training and test sets. To investigate the accuracy of the classification rate, the proposed models were evaluated based on classification accuracy, which can be determined using the confusion matrix. It is one of the most common intuitive metrics used for evaluating the performance and accuracy of machine learning models commonly used for classification problems, as shown in Table 2. The associated terms are related to a given confusion matrix can be specified as true positives (TP), True negatives (TN), False positives (FP), and True negatives (TN). When the output correctly classifies the data point as the ground truth, it is considered to be a TP. True negatives (TN) are the cases when the output class of the given data point is predicted correctly as negative for the given class. False positives (FP) are the cases when the model incorrectly predicted the corresponding class as positive. False negatives (FN) are the cases when the ground truth should be positive, but the model's prediction is negative. As a result of the accuracy in classification tasks if the correct predictions made by the model over all predictions, the correct predictions are known as true positive (TP) and true negative (TN) are divided by all predictions made by the model, calculated in (24).
According to Table 3, the reported overall classification accuracy outperforms all the previous studies for all datasets. Before we investigated the performance of our proposed method, we studied it in comparison to conventional RNN-based methods, namely, traditional RNN, RNN with LSTM gates, and RNN with GRU gates over four datasets with QRS complex. The optimal input sequence length or the number of successive heartbeats for the experiments are selected according to the previous experiments and after various lengths of trails by conducting with widely used lengths; single heartbeat, three heartbeats, six heartbeats and nine heartbeats. Since there is a lower accuracy rate in single heartbeat compared to the rest of the groups, and there is no significant higher accuracy rate in six-heartbeat group compared to three-heartbeat group, we selected the number of successive heartbeats to three and nine for further experiments. The entropy loss plot is shown in Figure 9, and the proposed multi-resolution bidirectional LSTM outperforms the others as the training process reaches 150 epochs over the ECG-ID dataset. Thus, our proposed bi-directional LSTM-based model was chosen for further experiments that examined our new data-independent acquisition-based method compared to other recent studies. To evaluate the performance results for classification, we examined the networks based on four statistical evaluation metrics: accuracy (Acc), sensitivity (Sen), specificity (Spc), and positive predictivity (Ppr).
The terms TP, TN, FP, and FN in the preceding equations denote true positive, true negative, false positive, and false negative, respectively. F1 and Fowlkes-Mallows index (FM) scores are also computed using Sen and Ppr as follows: We compared the proposed method with previous RNN-based networks for six ECG datasets according to the preceding evaluation metrics. Our proposed multi-resolution bidirectional LSTM outperformed the others in terms of F1 and FM scores for the ECG-ID dataset. The accuracy of the proposed algorithm for both datasets nearly matches that of Mostayed et al. [24], which used a bidirectional LSTM network for a 12-lead ECG signal. It should be noted that the F1 score is a more significant metric than the accuracy score. Q. Zhang et al. [5] used a multi-resolution parallel network based on CNN, deploying multiple versions of wavelets to improve the context representation of the signal for generalization purposes. For the ECG-ID dataset, the accuracy for all techniques was quite similar, and the proposed method outperformed the other methods for the F1 score, which achieved 98.84%. For the MIT-BIH ECG dataset, Fan Liu et al. [40] achieved a higher accuracy compared to other techniques [24,40,41]; however, our proposed method significantly outperformed theirs. Moreover, they still need to first identify the heartbeat, which is intensive in terms of the algorithm engineering process, compared to our random signal segmentation. In the STAFF-III dataset, the proposed method scored a high accuracy of 97%, and achieved higher F1 and FM scores. Regarding the LT-AF dataset, despite obtaining a similar accuracy score to A. Mostayed et al. [24] of 99%, our proposed method significantly outperformed the others in terms of F1 and FM scores of 99.5%.
However, for two datasets with signals without a QRS complex, the rest of the compared techniques based on the R-peak detection method for heartbeat segmentation significantly decreased in terms of classification accuracy. The highest F1 score did not reach 90.5% among all the compared techniques, whereas our data-independent acquisition approach achieved scores of 97.94% and 97.3% for the AFDB and AHA datasets, respectively. Nevertheless, the proposed method has more advantages and a sufficiently high classification capability compared to the recent studies in the overall comparison of different signal versions. Figures 10-12 show the training process for all six trials. Interestingly, we found that our proposed method had the best convergence speed and least training entropy loss after 120 epochs for the ECG-ID dataset, and it reached its convergence after 90 epochs for the MIT-BIG ECG dataset, while the epoch rates decreased to 60 to 90 for both the STAFF-III and LT-AF datasets, which are considered as presenting better signal quality with less noise and shift invariants. However, in both datasets without the QRS complex, AFDB, and AHA, none of the other compared methods using QRS peak detection were unable to achieve a high convergence speed even after 170 epochs. This shows that datadependent methods can only perform well for specified data types and amendments in data variations. Figure 13 showed Receiver Operating Characteristics (ROC) analysis of the proposed bi-directional LSTM network on different ECG datasets provided Sensitivity and Specificity for different datasets are respective to Table 4, and it suggests that the proposed model yields better classification on MIT-BIH and ECG-ID with QRS complex compared to the rest.     This is consistent with the conceptual experiment that establishes that poor local minima are hardly an issue in deep neural networks with many layers consisting of a large number of parameters. Instead, the landscape of the object function is packed with a variation of valleys, which seems to typically have local minima with similar values. Therefore, the randomness in the Adam optimization-based method parameter tuning process often results in only small fluctuations in the convergence curve in the training process.

Discussion
We still need to put more effects in several aspects compared to the current wellknown biometric techniques such as fingerprint identification [42]. For instance, although more and more ECG datasets are available now benefitting from great advancement of data acquisition systems, the amount of data is still not comparable to the fingerprint data. Generally, hundreds or thousands of records of fingerprint can be obtained, which is much higher than the dataset sizes used in this study. Moreover, there are also more diversity induced by different heart health conditions, which brings more challenges to the generalization ability of the identification algorithms [43,44]. Our proposed method includes random segmentation for data enriching purpose as non-fiducial approach, which effectively avoids complicated and data-dependent signal preprocessing efforts such as QRS peaks detection and segmentation effort. It also removes phase difference among randomchosen signal segments by the auto-correlation approach. However, further consideration for more data representation methods and deep neural network techniques should be considered, to explore the interesting connection between the data representation network topology and the feature learning ability. Our experimental results have also suggested that the cardiac organisms react physiologically to stimulations such as danger and threat, and corresponding ECG signal also changes due to the cardiac defense mechanism (CDR). Therefore, identifying these external stimulation-related ECG patterns in future work is also expected to further improve the user identification performance.

Conclusions and Future Work
We proposed a multi-resolution bidirectional LSTM network with a random segmentation technique using an auto-correlation method to increase the dimension of the input ECG data for biometric identification. By applying random segmentation with an autocorrelation approach for independent data acquisition, the time-frequency representation of the original signal was improved, and the classification accuracy for data variations increased. The performance of the learning procedure was improved compared to other RNN-based methods and hybrid method based approaches according to the experimental results. The experimental outcomes showed that the proposed algorithm outperformed most RNN based networks by adapting the bidirectional learning method, and considerably improving the classification performance for more contextualized distinct features. We will further consider more data representation techniques and deep learning methods for better feature learning capability, and demonstrating the difference between traditional RNN nteworks and proposed BLSTM method. The proposed method of the BLSTM-RNN model can also be considered for generalization to other periodic waveforms for biometric signal-based user authentication applications.