Interval-Based LDA Algorithm for Electrocardiograms for Individual Veriﬁcation

: This paper presents an interval-based LDA (Linear Discriminant Analysis) algorithm for individual veriﬁcation using ECG (Electrocardiogram). In this algorithm, at ﬁrst, unwanted noise and power-line interference are removed from the ECG signal. Then, the autocorrelation proﬁle (ACP) of the ECG signal, which is a mathematical representation of the degree of similarity between a given time series and a lagged version of itself over successive time intervals, is calculated. Finally, the interval-based LDA algorithm is applied to extract unique individual feature vectors that represent distance and angle characteristics on short ACP segments. These feature vectors are used during the processes of enrollment and veriﬁcation of individual identiﬁcation. To validate our algorithm, we conducted experiments using the MIT-BIH ECG and achieved EERs (Equal Error Rate) of 0.143%, showing that the proposed algorithm is practically e ﬀ ective and robust in verifying the individual’s identity.


Introduction
Since the invention of electrocardiogram (ECG) recording in 1903 [1], a considerable number of studies using the electrocardiogram have been performed in the fields of diagnosing cardiac-related diseases [2][3][4][5], detecting sleep apnea [6], monitoring driver drowsiness [7], and measuring blood pressure [8], because of its advantage of noninvasive convenience. Recently, researchers started to use the ECG biometrics to verify individual identity [9]. The ECG is a suitable tool because the signal itself constantly changes depending on the sympathetic/parasympathetic activities and it is resistant to attacks and duplication [10][11][12].
According to the latest US Federal Trade Commission report published in March 2019 [13], identity theft and cybercrime ranked third among the issues most complained about. Such kinds of identity fraud and hacking are becoming seriously dangerous. Unfortunately, they are unavoidable as long as we use password-based identity authentication systems. Therefore, new types of biometrics such as facial characteristics [14], gait [15], fingerprints and iris scans [16], vein pattern [17], etc., have been developed. Nevertheless, References [18,19] show that they are still vulnerable to a replay/falsification attack even though they perform better with very low identification errors. To avoid problems related to such attacks, the biometrics should be a variable, so that they cannot be stolen, duplicated, or shared. Thus, many studies are showing a great interest in the ECG signal as a promising alternative biometric [20][21][22][23].
The ECG signal is non-stationary and thus continuously changes over time. The two limbs of the autonomic nervous system (i.e., sympathetic and parasympathetic nerves) are the most important factors determining the changes in the heart rate. This type of variability is called intra-subject 2 of 19 variability [24][25][26], that is, the changes occur for the same individual over time. There is also inter-subject variability found in the ECG's waveform that depends on the position, size, and anatomy of the heart, age, gender, relative body weight, chest configuration, and various other factors [27,28]. Therefore, the ECG signal varies from person to person, implying that we can use it as a biometric for the verification of an individual's identity.
There have been numerous attempts at applying ECG inter-subject variation for individual verification/identification [20][21][22][23][29][30][31][32], which can be classified into two categories: the fiducial dependent approach [20][21][22][23] and the fiducial independent approach [29][30][31][32]. The fiducial dependent approach relies on local features, such as time duration and amplitude differences between specific points of interest in an ECG, so it is less affected by heart rate variations. However, it has the possibility of missing the overall morphological information that might be useful in identifying individuals. Many studies have focused on fiducial dependent features such as time intervals, angles, and amplitudes between specific fiducial points on the QRS complex [20,[33][34][35].
On the contrary, the fiducial independent approach is to find the feature of the ECG signal in the frequency domain rather than the time domain, or to extract the whole morphological features that appear to offer excellent discrimination information among individuals. Many algorithms based on frequency transformation have been proposed for extracting fiducial independent features from ECG signals, in which they use wavelet transform or DCT (Discrete Cosine Transform) to convert ECG signals in the time domain to the frequency domain [36][37][38][39][40]. The autocorrelation method also has been gaining considerable attention because of its excellent performances reported in many studies. It eliminates the need for fiducial point localization in the ECG signal by using an autocorrelation (AC) of the segmented ECG signal followed by DCT or LDA [41][42][43]. More recently, Safie [44] introduced a new feature extraction method, known as Pulse Active Ratio (PAR), that is implemented on electrocardiogram signals for biometric authentication. The Pulse Active Ratio uses pulse width modulation (PWM) to generate new ECG feature vectors and thereby can adapt to changes in heart rate. However, such fiducial independent approaches require tremendous computational effort because the parameters should be generated and compared for every person listed in the database during the verification stage. Furthermore, the database is often updated with intra-subject variability, in which the computational cost increases exponentially as the size of the database increases. This computational cost problem can be mitigated significantly by combining the above-mentioned two approaches.
In this paper, we propose an interval-based Linear Discriminant Analysis (LDA) algorithm, a type of hybrid algorithm that combines the complementary strengths of both fiducial dependent and fiducial independent features, and thereby is able to achieve higher accuracy of identification. The main characteristics of the proposed algorithm are as follows: (i) the use of an autocorrelation profile (ACP) of ECG signal; (ii) the use of short segments of the ACP to extract the interval-based feature vectors, which can be updated every 5 s, showing higher adaptability to intra-subject variation; (iii) the enrollment and verification of the individual's interval-based feature vectors; and (iv) the update of interval-based feature vectors when the user's identity claim is accepted.
The remainder of the paper is organized as follows. The proposed LDA algorithm based on the short ACP segments to determine the feature vectors on an interval basis as enrollment process, the verification process, and the algorithm to update the feature vectors are described in Section 2. In Section 3, the experimental results are compared with the existing ECG identification algorithms based on the same public database. Finally, the conclusion is drawn in Section 4.

Materials and Methods
The ECG signal reflects the electrical activity of the heart. A basic ECG signal cycle is shown in Figure 1 with a single lead ECG sensor. The signal cycle consists of the P wave, followed by the QRS complex, and the T wave [1,33]. Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 18   In the preprocessing stage, the raw ECG signal's noise is removed or suppressed. For biometric authentication, every user can use his/her ECG sensor in one of the two operation modes, that is, enrollment or verification mode.
In the enrollment mode/stage, a distinctive feature set is extracted from the ACP (autocorrelation profile) of the ECG on an interval basis to create an association between the user and his/her biometric characteristics. Subsequently, the extracted feature set is stored in a template. This process is called training in the enrollment mode. The template is updated over time, mainly to handle the intrasubject variation of cardiac signals. Longer training of the ECG signal leads to lower chances of false rejection (FR). In this study, the enrollment is performed for 2 min to provide a stable signature.
Similarly, during the verification mode/stage, the newly acquired unknown input is compared with the template stored in the database to verify a claimed identity. The verification decision is made through a matching process based on a feature vector threshold. The obtained matching score is finally compared with the threshold to either accept or reject the target user's identity claim.

Preprocessing
We used the ECG data from the MIT-BIH public database to test the proposed algorithm. The MIT-BIH database (MITDB) [34] is the standard ECG database universally used for ECG analyses that contains 48 records of heartbeats collected from 47 subjects; each record was sampled at 360 Hz    In the preprocessing stage, the raw ECG signal's noise is removed or suppressed. For biometric authentication, every user can use his/her ECG sensor in one of the two operation modes, that is, enrollment or verification mode.
In the enrollment mode/stage, a distinctive feature set is extracted from the ACP (autocorrelation profile) of the ECG on an interval basis to create an association between the user and his/her biometric characteristics. Subsequently, the extracted feature set is stored in a template. This process is called training in the enrollment mode. The template is updated over time, mainly to handle the intrasubject variation of cardiac signals. Longer training of the ECG signal leads to lower chances of false rejection (FR). In this study, the enrollment is performed for 2 min to provide a stable signature.
Similarly, during the verification mode/stage, the newly acquired unknown input is compared with the template stored in the database to verify a claimed identity. The verification decision is made through a matching process based on a feature vector threshold. The obtained matching score is finally compared with the threshold to either accept or reject the target user's identity claim.

Preprocessing
We used the ECG data from the MIT-BIH public database to test the proposed algorithm. The MIT-BIH database (MITDB) [34] is the standard ECG database universally used for ECG analyses that contains 48 records of heartbeats collected from 47 subjects; each record was sampled at 360 Hz In the preprocessing stage, the raw ECG signal's noise is removed or suppressed. For biometric authentication, every user can use his/her ECG sensor in one of the two operation modes, that is, enrollment or verification mode.
In the enrollment mode/stage, a distinctive feature set is extracted from the ACP (autocorrelation profile) of the ECG on an interval basis to create an association between the user and his/her biometric characteristics. Subsequently, the extracted feature set is stored in a template. This process is called training in the enrollment mode. The template is updated over time, mainly to handle the intra-subject variation of cardiac signals. Longer training of the ECG signal leads to lower chances of false rejection (FR). In this study, the enrollment is performed for 2 min to provide a stable signature.
Similarly, during the verification mode/stage, the newly acquired unknown input is compared with the template stored in the database to verify a claimed identity. The verification decision is made through a matching process based on a feature vector threshold. The obtained matching score is finally compared with the threshold to either accept or reject the target user's identity claim.

Preprocessing
We used the ECG data from the MIT-BIH public database to test the proposed algorithm. The MIT-BIH database (MITDB) [34] is the standard ECG database universally used for ECG analyses that contains 48 records of heartbeats collected from 47 subjects; each record was sampled at 360 Hz and is approximately 30 min long. The subjects were 25 men with ages ranging from 32 to 89 years and 22 women with ages ranging from 23 to 89 years. The data were obtained with a two-channel ambulatory Holter monitor at the Bath Israel Deaconess Medical Center from 1975 to 1979. In most database records, the upper signal is a modified limb lead II(MLII) obtained by placing the electrode on the chest, and the lower signal is usually a V1(occasionally V2 or V5 or V4). Normal QRS complexes are usually prominent in the upper signal, thus, we only used the single lead data (the upper signal, MLII) in the experiment.
To minimize the negative effect of random noises, we applied the scheme proposed by Sornmo [35] in our study, in which noisy ECG signals are passed through the bandpass filters (BPF) spanning from 0.5 to 200 Hz to reject the DC components and high-frequency noise. Subsequently, we implemented a notch filter and a short-time Fourier transform (STFT) to eliminate the power line interference and its harmonics. Figure 3 shows the preprocessing procedures.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 18 and is approximately 30 min long. The subjects were 25 men with ages ranging from 32 to 89 years and 22 women with ages ranging from 23 to 89 years. The data were obtained with a two-channel ambulatory Holter monitor at the Bath Israel Deaconess Medical Center from 1975 to 1979. In most database records, the upper signal is a modified limb lead II(MLII) obtained by placing the electrode on the chest, and the lower signal is usually a V1(occasionally V2 or V5 or V4). Normal QRS complexes are usually prominent in the upper signal, thus, we only used the single lead data (the upper signal, MLII) in the experiment.
To minimize the negative effect of random noises, we applied the scheme proposed by Sornmo [35] in our study, in which noisy ECG signals are passed through the bandpass filters (BPF) spanning from 0.5 to 200 Hz to reject the DC components and high-frequency noise. Subsequently, we implemented a notch filter and a short-time Fourier transform (STFT) to eliminate the power line interference and its harmonics. Figure 3 shows the preprocessing procedures.

Feature Extraction
Let the noise-filtered ECG signals be x(n), with the time index n. The proposed method strongly relies on the small pieces of x(n), called classes, and calculates the ACP (autocorrelation profile) of each class. Then, each calculated ACP is sliced into short-time ACP segments; each segment is called ACPlet. By dividing each ACP into multiple ACPlets, we may easily capture the temporal pattern of the ACP in detail. The features we used are mainly based on the amplitude and angle patterns of the ACPlets to identify the similarities between an unknown ACP input and the reference profile.

Feature Extraction
Let the noise-filtered ECG signals be x(n), with the time index n. The proposed method strongly relies on the small pieces of x(n), called classes, and calculates the ACP (autocorrelation profile) of each class. Then, each calculated ACP is sliced into short-time ACP segments; each segment is called ACPlet. By dividing each ACP into multiple ACPlets, we may easily capture the temporal pattern of the ACP in detail. The features we used are mainly based on the amplitude and angle patterns of the ACPlets to identify the similarities between an unknown ACP input and the reference profile. As mentioned in the previous section, during the enrollment mode, 2 min long ECG signals received from the ECG sensor are considered to be the training set x(n). The training set x(n) is then divided into several classes with a length of N. Each class of x(n) is labeled x c (n) with the class "c", as follows

and a given class c (2a)
where x(n + m) is the time-shifted version of ECG with a time lag m and Q is the length of the QRS complex. Figure 4 shows how the signals x c (n) and x c (n + m) are defined to obtain the ACP, that is, Φ c (m). As mentioned in the previous section, during the enrollment mode, 2 min long ECG signals received from the ECG sensor are considered to be the training set x(n). The training set x(n) is then divided into several classes with a length of N. Each class of x(n) is labeled ( ) with the class "c", as follows Subsequently, the ACP Φ ( ) of ( ) can be computed by the following Equation (2a).  Based on the previous research, the QRS complex is considered to be the most prominent waveform of ECG for individual verification [41,45,46]. Hence, the time shift parameter m of Φ ( ) was chosen between 0 to Q − 1. In this paper, the length Q, which depends on the individual, was calculated using the QRS detection method based on the Pan-Tompkins algorithm [47]. Depending on the individual, the different length of Q helps in achieving a high identification accuracy and robustness of the proposed algorithm.
Based on the assumption that the training set x(n) includes C classes, we define a set of ACPs for C classes as follows: Figure 4. The two-minute-long training set x(n) and its time-shifted version x(n + m). The training set x(n) is divided into C classes, each of which is N samples long. For example, x 1 (n) and x 1 (n + m) are the 1st class of x(n) and x(n + m), respectively.
Based on the previous research, the QRS complex is considered to be the most prominent waveform of ECG for individual verification [41,45,46]. Hence, the time shift parameter m of Φ c (m) was chosen between 0 to Q − 1. In this paper, the length Q, which depends on the individual, was calculated using the QRS detection method based on the Pan-Tompkins algorithm [47]. Depending on the individual, the different length of Q helps in achieving a high identification accuracy and robustness of the proposed algorithm.
Based on the assumption that the training set x(n) includes C classes, we define a set of ACPs for C classes as follows: Subsequently, the reference profile Φ(m) is defined as the average of all ACPs contained in the training set x(n), given by: where C is the total number of classes in the training set x(n) for 2 min. Figure 5 shows an example of ACPs for two different subjects (subject A and subject B). We assumed that each subject has C classes and those C classes are displayed together within the same time period Q.

Interval-Based Feature Extraction
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 18 Subsequently, the reference profile Φ ( ) is defined as the average of all ACPs contained in the training set x(n), given by: where C is the total number of classes in the training set x(n) for 2 min. Figure 5 shows an example of ACPs for two different subjects (subject A and subject B). We assumed that each subject has C classes and those C classes are displayed together within the same time period Q. From Figure 5, the intervals (a) and (c) are when the two subjects are difficult to distinguish by using only the amplitude patterns of Φ ( ), but these two subjects are more easily distinguished using the angle patterns of Φ ( ). It is obvious from the two profiles of (a) and (c) that their angle patterns are significantly different, in which the angle pattern is defined as the angles between the horizontal x-axis and the ACPlet. On the other hand, (b) and (d) are the intervals when the two subjects can be easily distinguished using the amplitude patterns of Φ ( ).

Interval-Based Feature Extraction
These results lead to the idea of an interval-based feature vector. Accordingly, to facilitate the identification between different subjects, each class needs to be sliced into short-time ACP segments, i.e., the ACPlets. Therefore, in this work, the two features of amplitude and angle patterns of ACP produced for every ACPlet are used to distinguish the ECG signals. From Figure 5, the intervals (a) and (c) are when the two subjects are difficult to distinguish by using only the amplitude patterns of Φ c (m), but these two subjects are more easily distinguished using the angle patterns of Φ c (m). It is obvious from the two profiles of (a) and (c) that their angle patterns are significantly different, in which the angle pattern is defined as the angles between the horizontal x-axis and the ACPlet. On the other hand, (b) and (d) are the intervals when the two subjects can be easily distinguished using the amplitude patterns of Φ c (m).
These results lead to the idea of an interval-based feature vector. Accordingly, to facilitate the identification between different subjects, each class needs to be sliced into short-time ACP segments, i.e., the ACPlets. Therefore, in this work, the two features of amplitude and angle patterns of ACP produced for every ACPlet are used to distinguish the ECG signals.

Deciding the Optimal Interval Size for Interval-Based Feature Vector
In Section 2.2.2, we realized that dividing ACPs with a short interval and applying different feature vectors to each ACPlet yields a better identification; however, the interval size should be decided appropriately. The optimal interval size is determined based on the maximum interval that satisfies a mean square error (MSE) criterion between the ACPlets and their linearly fitted lines, such that each ACPlet provides clear angle information.
Based on the assumption of the optimal interval size L, the reference profile Φ(m) can be equally divided into Q/L profiles f i (v), defined as follows: where i = Q L is the number of intervals and 0 ≤ ν ≤ L − 1. Here, let defineΦ(m) as the set of g i (v) that is the f i (v) linearly fitted, as follows: where 0 ≤ ν ≤ L − 1. Thus, the optimal interval L is the maximum length that satisfies the following MSE (ζ) criterion between the two profiles Φ(m) andΦ(m).
where ε is the minimum tolerance of the MSE and NI (=Q/L) is the number of intervals. The symbol represented in the above process is illustrated in Figure 6. In Section 2.2.2, we realized that dividing ACPs with a short interval and applying different feature vectors to each ACPlet yields a better identification; however, the interval size should be decided appropriately. The optimal interval size is determined based on the maximum interval that satisfies a mean square error (MSE) criterion between the ACPlets and their linearly fitted lines, such that each ACPlet provides clear angle information.
Based on the assumption of the optimal interval size L, the reference profile Φ ( ) can be equally divided into Q/L profiles ( ), defined as follows: where i = is the number of intervals and 0 ≤ ν ≤ L − 1.
Here, let define Φ ( ) as the set of ( ) that is the ( ) linearly fitted, as follows: where 0 ≤ ν ≤ L − 1. Thus, the optimal interval L is the maximum length that satisfies the following MSE (ζ) criterion between the two profiles Φ ( ) and Φ ( ).
= max ( ( ) − ( ) ) ≤ for 2 ≤ ≤ 2 where ε is the minimum tolerance of the MSE and NI (=Q/L) is the number of intervals. The symbol represented in the above process is illustrated in Figure 6.   Figure 7, MSE ζ represents the ratio based on the assumption that MSE ζ is 100% when NI is set to 2, where (a) represents the minimum tolerance ε, which is experimentally set to 20%; (b) is the NI value satisfying (a), which corresponds to Q/L; and (c) is the identification accuracy when the NI value corresponds to (b).   Figure 7, MSE ζ represents the ratio based on the assumption that MSE ζ is 100% when NI is set to 2, where (a) represents the minimum tolerance ε, which is experimentally set to 20%; (b) is the NI value satisfying (a), which corresponds to Q/L; and (c) is the identification accuracy when the NI value corresponds to (b).
The Figure 7 shows that the MSE ζ decreases as NI increases, where a higher NI is proportional to identification accuracy η but requires more computations for the verification process. Accordingly, there is a trade-off between the NI and computational cost. We also observed that the IDA improves with increasing NI; however, the improvement is not significant. For instance, as the NI increases from Q/L to 50, only an improvement of less than 0.2% is obtained. In conclusion, the performance of the proposed algorithm is not highly sensitive to the NI as long as it is greater than Q/L. Finally, the NI is set to Q/L when MSE ζ corresponds to ε, since there is no big difference in the final performance as long as the NI is greater than Q/L. The Figure 7 shows that the MSE ζ decreases as NI increases, where a higher NI is proportional to identification accuracy η but requires more computations for the verification process. Accordingly, there is a trade-off between the NI and computational cost. We also observed that the IDA improves with increasing NI; however, the improvement is not significant. For instance, as the NI increases from Q/L to 50, only an improvement of less than 0.2% is obtained. In conclusion, the performance of the proposed algorithm is not highly sensitive to the NI as long as it is greater than Q/L. Finally, the NI is set to Q/L when MSE ζ corresponds to ε, since there is no big difference in the final performance as long as the NI is greater than Q/L.

Threshold Values for Classification
Our algorithm requires two types of interval-based thresholds for the robust verification of intrasubject variation. One threshold is for the amplitude feature vector and another is for the angle feature vector, denoted ( , ) and ( ) , respectively, where the variable i indicates the interval index. These thresholds are used to check the similarity between the amplitude and angle feature vectors of the newly incoming ACPlets to those stored in the template.

Threshold of Amplitude Similarity: ( , )
Step 1: Consider a subject A for the training set and let define the ACP as In matrix form, Step 2: Calculate the average in a vertical way to obtain the reference profile, ( ) :

Threshold Values for Classification
Our algorithm requires two types of interval-based thresholds for the robust verification of intra-subject variation. One threshold is for the amplitude feature vector and another is for the angle feature vector, denoted TH φ (i, v) and TH θ (i), respectively, where the variable i indicates the interval index. These thresholds are used to check the similarity between the amplitude and angle feature vectors of the newly incoming ACPlets to those stored in the template.

Threshold of Amplitude Similarity: TH φ (i, v)
Step 1: Consider a subject A for the training set and let define the ACP as φ A c (m) that consists of the ACPlets In matrix form, Step 2: Calculate the average in a vertical way to obtain the reference profile, φ A (m) : for a given i and v (10) Step 3: where the distance ∆φ A c (i, v) is large if the descriptors are further apart in the descriptor space and small if they are closer together.
We obtain: Note that the amplitude distance ∆φ A c (i, v) is small and thus TH φ (i, v) is most likely smaller than TH φ max (i, v), since this parameter reflects the distance between intra-subjects.

Threshold of Angle Similarity: TH θ (i)
We calculated the scatter within class representing the degree of dispersion in a class which is called intra-subject variability, and the scatter between classes representing the degree of dispersion between different classes, which is called inter-subject variability, to obtain TH θ (i) [43].
Step 1: Consider a data set of (x, y) and (x, r) for a given i o ∈ i , where Step 2: Calculate the scatter within class and scatter between classes of the dataset (x, y) and (x, r).
where u y and u r indicates the mean of y and r, respectively.
Step 3: Calculate the weight vector matrix in terms of S W and S B . Subsequently, use the eigenvector with the highest eigenvalue as the most discriminant projection vector.
where V is the eigenvector when λ corresponds to the highest eigenvalue. Let's define θ A c (i o ) as the angle θ of the x-axis to the most discriminant projection vector V of the dataset, (x, y) and (x, r).
Step 4: Define θ A c (i o ) as the angle distance per interval between dataset (x, y) and (x, r) for a given c and i o .
We obtain: and where Similar to the amplitude distance ∆φ A c (i, v), the angle distance θ A c (i o ) is mostly between TH θ 1 (i o ) and TH θ 2 (i o ) and almost always between TH θ min (i) and TH θ max (i) . Based on this process and the subsequent updating process, the range of the intra-subject variation can be determined and the advantage of the constantly changing ECG signal can be reaped, which is beneficial for the verification using ECG.

Verification
In addition to the biometric data for subject A, the template database includes information about the identifier related to the subject A. The term "biometric data" is referred to as the φ A (i, v) ∀i ∈ training set stored along with the TH φ (i, v) and TH θ (i) in the template database.
During the verification stage, the system validates the identity claimed by an unknown person by checking the similarity of the person's biometric data to the enrolled data retrieved by the identifier input by subject A.
Consider an unknown subject B to be verified. If an unknown person B is detected by the biometric system during the verification, first, a biometric input is extracted from the person by calculating the feature vector, φ B c (i, v) ∀i, c ∈ verification mode . Subsequently, the system calculates the difference between the unknown input and templates, resulting in the amplitude and angle distances in the same manner as during enrollment. Next, the system calculates the similarity to determine how close the subjects A and B are. The computation is carried out by applying a similarity threshold to the distances and normalizing the resulting value to the range [0,1], where 0 means "no match at all" and 1 means "perfect match". In this paper, two similarity measures are considered, and those are the similarities of the amplitude and angle distances, denoted as S φ and S θ , respectively.
Subsequently, we define a matching score S to quantify the match between the unknown and template subject by concatenating the two similarities into S = 0.5S φ + 0.5S θ , where both similarities are equally weighted. Given a matching score S, the system checks whether the score S is above a certain threshold S TH for the signature classification. A detailed explanation of the verification is given below.
Step 1: First, consider an unknown subject B to be verified and let its ACP be Φ B c (m), which belongs to class c. It can be represented by the set of the ACPlets of , the corresponding ACP interval is counted as "partially accepted", that is, the similarity score is Λ(i) = TH θ (i) θ B c (i) , which has the range [0,1], or else if θ B c (i)< TH θ min (i) or θ B c (i) >TH θ max (i), the corresponding ACP interval is counted as "declined", that is, the similarity score Λ(i) = 0, or else if TH θ 1 (i) ≤ θ B c (i) ≤ TH θ 2 (i), the corresponding ACP interval is counted as "accepted" and the similarity score is given by Λ(i) = 1. The similarity of the angle (S θ ) is determined as the average of the similarity scores calculated on an interval-by-interval basis.
Step 6: Calculate the matching score For every incoming class, we define a matching score S = 0.5S φ + 0.5S θ with a range from 0 to 1, where 1 means a perfect match. Given a matching score S, the system finally determines whether the score S is above a certain threshold S TH , called as the matching threshold, for signature identification.

Updating Template
As mentioned in Section 1, an individual ECG signal changes over time due to intra-subject variability. Therefore, if a subject uses a fixed template for a long time, the subject may be falsely rejected during the verification. To solve this problem, we applied an updating process by continuously performing the training process for every class. For instance, for a new C th class incoming shortly after the training process of class c ∈ {0, 1, . . . C − 1}, if the matching score S is larger than the threshold S TH , the new C th class is accepted as a new member for the template update. Subsequently, the newly accepted class is appended to the training set, and thus the latest accepted C classes c ∈ {1, 2, . . . C − 1} are considered to be the new training set for the update. The template update for subject A is described below.

Calculating the Amplitude and Angle Distances
The latest accepted C classes are re-indexed using c ∈ {0, 1, . . . C − 1}. (i) Use the C classes as the new training set for subject A and find the ACP as Φ A c (m) which consists of the ACPlets ∅ A c (i, v): (ii) Next, find the profile Φ A (m): where for a given i and v (iii) Finally, compute the amplitude distance ∆∅ A c (i, v) and angle distance θ A c (i): where V is the most discriminant projection vector between φ A (i, v) and φ A c (i, v).

Updating the Threshold of Amplitude Similarity
2.5.3. Updating the Threshold of Angle Similarity By updating the threshold of the amplitude/angle similarity using the above process, the ECG changes due to intra-subject variation can be continuously reflected in the template. In this study, the verification/updating template is repeatedly performed for every class and each class is specified to be 5 s long, which is fast enough for real-life applications.

Experiment and Results
An experiment was carried out using a public open database, available on the PhysioNet website [34], to evaluate the proposed algorithm.
The MIT-BIT arrhythmia database (MITDB) contains 48 records of heartbeats collected from 47 subjects (records 201 and 202 are from the same male subject). Each record is approximately 30 min long and was sampled at 360Hz. The subjects are 25 men with ages between 32 and 89 years old and 22 women with ages between 23 to 89 years old. The data were obtained using a two-channel ambulatory Holter monitor at the Beth Israel Deaconess Medical Center from 1975 until 1979. This database is the most referred database related to arrhythmic studies. In this study, this database was used to evaluate the efficiency of the proposed algorithm. Two 2-min intervals were randomly selected for each data. The first 2-min interval of each data was used for training and the second data was used for verification. Only single lead data was used in the experiment.
The error rate can be calculated for each subject, as shown in Figure 8. The FR (False Rejection) means that the subject was rejected despite being himself. The FA (False Acceptance) means that the subject was accepted despite being someone else. The distance threshold was chosen to be the intersection of FR (False Rejection) and FA (False Acceptance), depending on the subject. The threshold and EER for each subject are calculated based on the intersection of FR and FA. If there is only one intersection point, the threshold and EER are determined based on the intersection point, whereas if there are more than two points, the threshold is determined by the average value of the largest and smallest value, and certainly, EER is calculated as 0, as shown in Figure 8.
subject was accepted despite being someone else. The distance threshold was chosen to be the intersection of FR (False Rejection) and FA (False Acceptance), depending on the subject. The threshold and EER for each subject are calculated based on the intersection of FR and FA. If there is only one intersection point, the threshold and EER are determined based on the intersection point, whereas if there are more than two points, the threshold is determined by the average value of the largest and smallest value, and certainly, EER is calculated as 0, as shown in Figure 8.   Figure 9a shows an example of obtaining the EER for each individual. A histogram of the EER (Equal Error Rates) is shown in Figure 9b. The average EER is 0.143%, which indicates the efficiency of the overall algorithm, as shown in Equation (32). Only a single user was enrolled at any point in time. In other words, at any point in time, we had one client and 46 imposters.  (32) where Sub indicates the subject number.
To demonstrate the efficiency of our proposed algorithm, we compared it with related works that used the same database (MIT-BIH), as shown in Table 1. Table 1 shows that the EER (Equal Error Rate) in Reference [48] is higher than the proposed algorithm despite using ECG data from two leads. Recently, most ECG wearable devices, such as smart watches, only use one lead; thus, the proposed algorithm is more suitable for real-life applications. In Reference [48], a fixed shifting size was used for the autocorrelation, while in the proposed algorithm, we used a flexible shifting size for the autocorrelation as the QRS width for the individual, which requires less computational effort. Although fused information from two ECG leads was used in Reference [49], their IDA (Identification Accuracy) is lower than the proposed algorithm; however, the enroll time is 5 min, which is longer than our algorithm. A fiducial point dependent method was used in References [28,[50][51][52] which is vulnerable to noise. Only 10-20 s data were used for enrollment and verification in References [52,53]. However, we used two 2-min data for enrollment and verification to verify the stability of the proposed algorithm. Only a direct contact sensor was used in References [28,[48][49][50][51][52][53] to verify the algorithms. In contrast, we used a non-contact single lead senor for real-life applications. In this paper, in the case of a fiducial dependent approach, although it has the advantage of being able to identify a subject in a relatively short time with less computational effort, it is easily affected by external noise. On the other hand, a fiducial independent approach has the advantage of being less affected by external noise, but it takes higher computational effort and longer authentication time. To compensate for the advantages and disadvantages of the above two methods, the amount of calculation was reduced through the amplitude feature using ACP and the identification accuracy was improved through the interval-based angle feature.
However, in this study, the matching score was calculated using the same weight for the two features, resulting in a disadvantage in terms of the weakness in various environmental variables. Therefore, we intend to explore the application of variable weights for future research to design a robust algorithm for environment variables. (Equal Error Rates) is shown in Figure 9b. The average EER is 0.143%, which indicates the efficiency of the overall algorithm, as shown in Equation (32). Only a single user was enrolled at any point in time. In other words, at any point in time, we had one client and 46 imposters.
where Sub indicates the subject number.
To demonstrate the efficiency of our proposed algorithm, we compared it with related works that used the same database (MIT-BIH), as shown in Table 1. Table 1 shows that the EER (Equal Error Rate) in Reference [48] is higher than the proposed algorithm despite using ECG data from two leads. Recently, most ECG wearable devices, such as smart watches, only use one lead; thus, the proposed

Conclusions
In this paper, a novel interval-based LDA algorithm is proposed. The implementation experiments of the proposed algorithm show higher identification accuracy compared to the existing ECG-based identification algorithm using the fiducial point dependent method or fiducial independent method. Based on this algorithm, we can complement each method's disadvantages by using both a fiducial point dependent method (amplitude distance by pixel) and fiducial point independent method (angle distance by feature vector). In addition, our proposed algorithm is able to maintain identification accuracy by tracking the ECG that changes in real-time using a template updating process.
This algorithm was tested with a public ECG database (MIT-BIH) and we compared its performance with related studies using the same database. The results show that the proposed algorithm is more reliable than other algorithms in terms of EER (Equal Error Rate). However, this algorithm has the disadvantage that it requires 5 s for each identification. Therefore, as future research, we intend to reduce the identification time and proceed to commercialize this as a product through multimodal identification combined with other biometrics.