A Deep Learning Technique for Biometric Authentication Using ECG Beat Template Matching

: An electrocardiogram (ECG) is a unique representation of a person’s identity, similar to ﬁngerprints, and its rhythm and shape are completely different from person to person. Cloning and tampering with ECG-based biometric systems are very difﬁcult. So, ECG signals have been used successfully in a number of biometric recognition applications where security is a top priority. The major challenges in the existing literature are (i) the noise components in the signals, (ii) the inability to automatically extract the feature set, and (iii) the performance of the system. This paper suggests a beat-based template matching deep learning (DL) technique to solve problems with traditional techniques. ECG beat denoising, R-peak detection, and segmentation are done in the pre-processing stage of this proposed methodology. These noise-free ECG beats are converted into gray-scale images and applied to the proposed deep-learning technique. A customized activation function is also developed in this work for faster convergence of the deep learning network. The proposed network can extract features automatically from the input data. The network performance is tested with a publicly available ECGID biometric database, and the proposed method is compared with the existing literature. The comparison shows that the proposed modiﬁed Siamese network authenticated biometrics have an accuracy of 99.85%, a sensitivity of 99.30%, a speciﬁcity of 99.85%, and a positive predictivity of 99.76%. The experimental results show that the proposed method works better than the state-of-the-art techniques.


Introduction
Nowadays, advancements in computer applications such as smartphones, healthcare, banking, airports, websites, etc., may lead to a rise in the demand for high-security [1][2][3]. Traditional biometrics like fingerprints, voices, faces, iris, and passwords have been utilised in various applications because they are easy to use and convenient. Still, these methods provide only a limited level of protection and expose the user to the risk of attack from third parties [4,5]. Artificial masks, fingerprint replication, iris falsification through contact lenses, and vocal mimicry are all examples of face spoofing. As a result, biometric recognition with liveness detection has been taken into account to counter various threats and unauthorised user access to the systems [6][7][8]. As traditional biometric systems are being spoofed, for unbalanced datasets, and limited tuning parameters optimisation. To overcome these limitations, deep learning-based techniques have been used in ECG signals to improve recognition performance [42,43]. Convolutional neural network (CNN), a major deep learning algorithm, is gaining tremendous attention due to its ability to learn the intrinsic patterns of data automatically. This technology can help prevent manual work and improve the efficiency of deep learning. ECG-based biometric recognition on a large database using 2D Convolutional neural networks was performed by Hong et al. [43]. Moreover, ECG beats were converted into images and reported a consistent accuracy of 98.10%. ECGbased biometric recognition with machine learning techniques has been studied for the last decade. These techniques will provide accurate results for only a small data size. These techniques also faced different problems, such as over-fitting, inconsistent accuracy for unbalanced datasets, and limited tuning parameters optimization [43]. ECG-based biometric recognition on the large database using 2D Convolutional neural networks was performed in [44]. Zhang et al. [45] considered a large data size of 90 subjects from the ECGID database, used CNN as a classifier, and reported an accuracy of 97%. To improve the generalised capability of the biometric recognition system, Dalal et al. [46] introduced CNN in short-segment ECG signals for biometric recognition. Still, so many researchers are working in ECG biometrics; Table 1 addressed state-of-the-art methods for a better understanding of ECG analysis, methods and results. Using deep learning algorithms, such as a CNN and long-term memory (LSTM) network with a unique activation function, Prakash et al. [47] built an ECG-based biometric system. The authors analysed the suggested model utilising both on-person and off-person databases and then computed standard performance parameters. According to the authors' findings, the accuracy of the ECGID database utilising the CNN-LSTM structure was 99.42%.The significant contributions of this work are:

1.
Retrieval based algorithm is proposed instead of classification to identify the person; hence the system is secure.

2.
An image-based beat authentication is used to extract in-depth information and make the system resilient to noise.

3.
The proposed customised deep learning model is tested with the different beat combinations in a single frame image. This combination allows us to extract more features from the subject data. 4.
we can take advantage of recent developments in computer vision in image-related tasks by converting ECG signals into image data. Therefore, it is easier to analyse images than signal data.

5.
A customised activation function is developed in this work to design fast convergence deep learning architecture. 6.
To assess the viability of the proposed scheme, comprehensive comparison analyses are conducted utilising a variety of measurement parameters, including sensitivity, specificity, positive predictivity, and area under the curve (AUC).
The rest of the paper is organised as follows: Section 2 describes the ECG database, The structure of the entire proposed system is explained in Section 3, Experimental results are discussed in Section 4, and finally, the Conclusion is explained in Section 5.

ECG Database Description
The standard publicly available open-source ECGID database [60] from PhysioNet is utilized to estimate the effectiveness of the proposed method. Lead-I configuration was used to acquire records from 90 people ranging in age from 13 to 75 years old, with 44 men and 46 women represented in the ECGID database. The proposed deep learning model required two different inputs at a time during the training and testing phase. Therefore a new dataset is created with the help of the existing ECGID dataset. The ECGID dataset contains 90 persons' individual ECG records. We considered 20 individual ECG beats from each record of the person. The ECG beats are segmented based on the R-peak location of the ECG signal, and after segmentation, the beats are converted into grey-scale images.
In this study, frequency-time-based signal processing algorithm [61] was utilized for R-peak detection, and further individual beats were extracted from the ECG signal based on the position of R-peaks, which are then characterised by analysing 128 adjacent samples to the left and right of the R-peak [62]. For experimentation purposes, three different databases are developed; they combine single, dual, and triple ECG beat images of the same and different persons. For the first dataset, the image of a single beat from each patient is considered, i.e., a pair of 17,100 images belonging to the same person and 17,100 image pairs belonging to a different person. We consider images of two consecutive beats as a single image from each patient for the second dataset. As there are 20 beats from each person, which makes 171 pairs. Each pair consists of 2 images, and each image consists of 2 consecutive beats, i.e., a total of 30,690. Finally, we consider the image of consecutive three beats as a single image from each patient. A total of 27,470 pairs are generated from the 90 persons for the third dataset.

Proposed System for Biometric Authentication
This section deals with a detailed explanation of the proposed methodology utilised for biometric authentication. The architecture of the siamese network and the training and testing phases of the network are also explained. The proposed biometric authentication system followed three important stages: (i) pre-processing and beat segmentation, (ii) database preparation, and (iii) bio-metric authentication based on deep learning techniques. The detailed block diagram of the biometric-authentication system is shown in Figure 1.

Pre-Processing and Beat Segmentation
Initially, R-peak locations are identified from the ECG signal and segmented all the beats individually for further processing. The beat was selected for segmentation with a duration of +380 ms to −380 ms sec. The beats are suffered from the baseline wander noise; hence a simple band-stop filter (0 to 0.7 Hz) is used to denoise the beats. The noise removed beat is as shown in Figure 2. These denoised beats are converted into a grey-scale image of size 112 × 112 for better extraction of deep features to increase the classifier's efficiency. These grey-scale images are used to prepare a balanced dataset for the proposed deep-learning classifier.

Database Preparation
The balanced dataset is prepared with a combination of different and same-person ECG beat images. If the beats are of the same person, then it is Labelled as 1; otherwise, it is labelled as 0. The proposed methodology is tested with three different types of databases. A pair of 17,100 images belonging to the same person (Label 1) and 17,100 image pairs belonging to a different person (Label 0) with the help of single beat images. As there are 20 beats from each person in the ECGID database, which makes 171 pairs. Each image consists of 2 consecutive beats; hence a total of 15,390 (171,90) image pairs are created for label 1. A total of 15,300 samples for label 0 are created from different people to make the dataset balanced. Likewise, 13,770-label 1 and 13,700-label 0 are created for the images containing three consecutive beats.

Biometric Authentication Network Based on Deep Learning Technique
Deep CNNs are now used in the majority of picture categorization applications [7]. Suppose an image needs to be categorized into three different classes, and in this scenario, the conventional paradigm entails processing the image through a succession of layers. The result is a probability distribution for each of the three classes indicated before. In other words, the algorithm will generate three probabilities, one for each of the three classes, based on the computed odds of the class in the image belonging to each of them. While the procedure appears to be basic and straightforward, there are a few factors to consider. The first issue concerns the training procedure, which necessitates a huge number of photos for each class. The second issue is the system's inability to recognize any other class other than the three listed above. Obtaining multiple photos of a new class, for example, and then retraining the model would be required to teach the system to distinguish a newly added class. Both of these challenges are common in real-world applications.
Consider the case of a system that deploys a facial recognition system for its thirty people. In the traditional method, the system must initially be educated using a variety of photos of each person. The algorithm would then output thirty probabilities corresponding to the odds of the input image belonging to one of the people when a person was to be identified. In another case, the system deals with a few hundred or perhaps a thousand people instead of thirty. Furthermore, people may be fired, relocated, or hired at any moment, resulting in a change in the number of courses and, consequently, the necessity to retrain the model. These two elements frequently result in unjustified expenditures, necessitating the development of a new solution. A customized deep learning technique that is siamese network is used to identify the biometrics-based on ECG beat images. A Siamese neural network (also known as a twin neural network) is an artificial neural network that employs the same weights to generate equivalent output vectors from two different input vectors [63]. This network is very helpful in identifying the similarity between two vectors or images [64]. A siamese neural network designed with the combination of twin networks with the same energy function. Twin networks are tuned with two different inputs. This function calculates a metric between each side's highest-level feature representation. The dimensions of the twin networks are inextricably linked. Because each network computes the same function, weight tying ensures that two extremely similar images cannot be translated to very different locations in feature space by their networks. Also, because the network is symmetric, if two separate images are presented to the twin networks, the top conjoining layer computes the same metric. The same two images were presented to the opposite twins.
Siamese Neural Networks [63] use one-shot classification instead of the typical classification technique discussed earlier, requiring only one class training example. These networks learn a similarity function rather than learning how to classify an input to one of the organization's twenty employees [65]. This similarity function takes two photos as input and returns a similarity score that indicates how similar the images are. The real-time acquired image is compared to a reference image present in the database. The outcome is a similarity score, which ranges from 0 to 1, with 0 denoting no resemblance and 1 denoting total similarity. To accurately interpret this result, a certain threshold can be chosen. As previously said, Siamese Neural Networks aid in the resolution of the two challenges outlined in the preceding paragraph. A good model can be constructed with only a few photos in the training phase. Second, the database must be updated with a single new image whenever a new person is added. Instead of retraining the entire model, this new image can be utilized as a reference, exactly like the previous ones, for the system to calculate the similarity. Third, Siamese Neural Networks have been proven to produce excellent results even when one of the classes used for training has a much lower number of data than the other. These benefits have made Siamese Neural Networks useful in a variety of fields. The proposed Siamese network is trained using single, dual, and triple image beats. In Figure 3, six different pairs of ECG beats are prepared with the different beat combinations. The training and testing phase of the network is shown in Figure 4 and Figure 5. The individual structure of the CNN is used in this work shown in Figure 6. Figure 3.
(a-f) are the pair of ECG beats from the same and different persons to train the Siamese network. ((a-c) are for subject-1 ECG data, rest of for subject-2 ECG data).    Figure 5. Block diagram of the proposed biometric authentication system during the test phase.

Feature maps 64@26x26
Flatten 89856 Dense 90x90 ECG Beat Images Figure 6. CNN used in the Siamese deep learning technique.

Customised Activation Function of the Proposed Method
In most cases, an activation function will speed up convergence while simultaneously increasing accuracy and processing efficiency. For optimality, activation functions should be monotonic, differentiable, and rapidly converge to a constant in terms of the weights [66]. In order to improve upon the standard sigmoid activation function, the authors designed a new, more generalized activation function. We employed a novel activation function that, compared to the conventional activation functions, such as sigmoid, converges much more quickly. The equation of our activation function is as follows: The characteristics of the customized activation function compared with those of the standard sigmoid activation in Figure 7. From the figure, we can find that our activation function is flexible and Converges quickly than the sigmoid function [66]. For the training of the siamese network, two images, each of size 112 × 112 pixels, are applied parallelly; the CNN extract the features from the image and represents them in the form of a feature vector. The feature vector is then used to calculate the euclidean distance. This system uses a contrastive loss function to calculate the error and train the model. The loss function trains the network such that when beats belonging to the same person are applied, they will have less Euclidean distance as compared to beats of different persons. The detailed layers and sizes of the Siamese network are shown in Table 2. The contrastive loss function is used in the Siamese network, which takes the network's output for a positive class and calculates its distance to an example of the same class and contrasts that with the distance to negative examples.

Experimental Results
The siamese network is trained using TensorFlow backend with the system specifications: i7-10 t h gen GPU on RTX 5000 8GB RAM 16GB. The network uses Keras early stopping parameter( patience level) function, which automatically stops the epochs if the accuracy reaches a plateau, i.e id consecutive values of accuracy are equal, then the model stops training. In this algorithm, the patience parameter is taken as 10. The proposed system is based on an information retrieval system so after the training of the network is complete for testing, we use a database to verify the identity of a person. So the incoming person's ECG beat image is compared with the database consisting of the reference image of all people. The incoming person's ECG beat is compared with each image in the database, and the similarity index is computed for all. The highest similarity index from the database corresponds to the person with whom the incoming person matched. To further make the system more resilient, a 75% threshold is added, so if the matching person similarity index is less than 75%, the system will output no match found. The dataset for single ECG beats in the frame is described in this Table 3. As the Siamese network is trained with image pair, so we created a total of 17100 image pairs from 90 people for label 1 and to balance the dataset for label 0, we paired the 1st five images of a person containing 1 beat in frame with 38 random people 1st beat image creating a total of 17100 pairs. Similarly, the datasets for dual beat in the frame are shown in Table 4.  In this work, we discussed the hyper-parameters of the Siamese model for different beats in the frame. The model reaches an accuracy of 91% with a learning rate of 0.01. The model stops training after 550 epochs as the accuracy does not change in further epochs for one beat in the frame. Similarly, for 2 beats in the frame, we created a total of 15,390 image pairs from 90 people for label 1 and to balance the dataset for label 0, we paired the 1st five images of a person containing 2 beats in the frame with 34 random people creating a total of 15,300 pairs. The detailed analysis of summarized supporting parameters is reported in Table 5. As compared to 1 beat in frame the siamese model reaches an accuracy of 99.85% with a learning rate of 0.01 and 370 epochs. The model is able to achieve higher accuracy with fewer epochs. Finally, we created a dataset consisting of 3 beats in the frame as shown in Table 6, we created a total of 13,770 image pairs from 90 people for label 1 and to balance the dataset for label 0, we paired the 1st five images of a person containing 3 beats in frame with 30 random people creating a total of 15,300 pairs. Table 6. Description of the number of training and testing patterns with three beats as a single image. The model achieves an accuracy of 99.90% with a 0.01 learning rate and 450 epochs with 3 beats in the frame. The accuracy is the same as compared to 2 beats in the frame model, but it takes longer for the model to train as compared to 2 beats in a frame model. The dataset is divided into an 80:20 ratio with 80% for training cum validation and 20% for testing. The effectiveness of the proposed Siamese network was analysed using 10-fold cross-validation. Table 7 depicts the typical results from validation tests (Avg. ± SD) to allow for comparison. The proposed network is tested by applying the testing beat image as input. The applied beat is compared with all reference beats, and the distance between the feature vectors is calculated. Finally, based on the output probability of the network at the output nodes, the corresponding class will be identified. Record-I from the ECGID database is applied to the network for testing purposes; the detailed probabilities of the output nodes are shown below in Table 8. All the independent event probabilities of the network except P 1 are in the range of 0 to 0.20, but the value of P 1 is 0.91. Hence the applied image is detected as person 1. Performance of the suggested method is calculated as in [16], and various supporting performance parameters such as Accuracy, Sensitivity, Specificity, F1-score, Matthews correlation coefficient (MCC), Area under ROC and Positive predictivity are calculated for the proposed Siamese network. The network performs better with an accuracy of 99.90 %, as shown in Table 9 for 450 epochs, but the network is providing 99.85 % accuracy with 370 epochs with the dual beat. Hence, the authors concluded that the proposed method provides better accuracy with the input of dual beat as a single image.

Discussions
The proposed Siamese network is compared with the recent state-of-the-art techniques in this section. K.K. Patro et al. [67] acquired 20 ECG signals from the MIT-BIH and ECGID database for a minimum of six months for experimental purposes. A high-dimensional (N = 72) set of ECG features is taken from the data. Then, these features are given to different algorithms, which reduce the number of features by grouping the most important ones together and eliminating random, correlated, and over-fit features to make the prediction more accurate. With the combination of K-NN and LASSO, the authors reported the highest overall accuracy of 99.13%. A novel LSTM-based framework for person identification utilising ECG signals has been developed by Jyotishi et al. [68]. The suggested approach determines the underlying temporal representation of an ECG signal by taking intra-beat and inter-beat fluctuations into account. The authors proved that the LSTM model captures intra-beat fluctuations for smaller ECG segments more accurately and reported overall accuracy of 93.11% for the ECGID database. Ciocoiu et al. [69] proposed a convolutional neural network with four different types of ECG signal spatial representations as input. The actual techniques that were utilised in the process of transforming the initial time series into 2D and 3D images are based on a modified version of the Continuous Wavelet Transform (S-Transform). Significant experiments have been conducted utilising UofT and CYBHI datasets, including recordings made on the fingers and palm of the hand throughout a variety of activity situations. The wavelet-based method produced the most accurate findings, with a CYBHi database accuracy of 98.60%. Lynn et al. [70] presented a deep Recurrent Neural Network (RNN) based on Gated Recurrent Unit (GRU) in a bidirectional way (BGRU) for person identification via ECG-based biometrics using time series sequential data. In addition, GRU cells in RNNs deploy an update gate and a reset gate in a hidden layer. As a result of this decrease in gates, GRU cells in RNNs are more computationally efficient than typical LSTM networks. The proposed BGRU model, which is the combination of RNN and GRU cell unit in a bidirectional way, appears to have obtained a high classification accuracy of 98.55%, according to the results of the experiments that were conducted. The proposed approach with customized activation function is evaluated with different blocks of the network using the Ablation study reported in Table 10. The Ablation investigation was conducted by assessing various models with distinct combinations of activation functions for input dual beat ECG images. The suggested work is evaluated using three variations on the database structure derived from the standard ECGID database. The authors intend to convert ECG beats to images to identify significant morphological changes. There is essentially no discernible difference in the proposed network's performance accuracy with dual and triple-beat images. Therefore, it was that the most effective method for ECG biometric authentication is to use a dual beat ECG image processed using a Siamese network. In this study, experiments were conducted with over 28,000 beats, and the proposed network may yield significant performance with fine-tuned parameters even when the size of the beats increases in quantity. Table 11 shows that the proposed method outperforms the state-of-the-art methods, proving the value of the approach.

Conclusions
In this work, a novel biological signal-based biometric authentication algorithm has been proposed using a custom siamese neural network. The algorithm is tested on three custom datasets derived from the publicly available ECGID Database v 1.0.0. The paper provides a detailed study of the effects of the number of beats used in an image and the advantage of the proposed method over traditional algorithms performance of the proposed authentication algorithm. An accuracy of 99.85 % is obtained for the ECGID dataset, which is better than the state-of-the-art algorithms existing in the literature. In future works, we intend to deploy the algorithm to the cloud for real-time industrial applications and implement android and standalone applications for securing smartphones and laptops. In addition to this, the authors want to extend the same authentication for off-person databases also.