Human Identification by Cross-Correlation and Pattern Matching of Personalized Heartbeat: Influence of ECG Leads and Reference Database Size

Human identification (ID) is a biometric task, comparing single input sample to many stored templates to identify an individual in a reference database. This paper aims to present the perspectives of personalized heartbeat pattern for reliable ECG-based identification. The investigations are using a database with 460 pairs of 12-lead resting electrocardiograms (ECG) with 10-s durations recorded at time-instants T1 and T2 > T1 + 1 year. Intra-subject long-term ECG stability and inter-subject variability of personalized PQRST (500 ms) and QRS (100 ms) patterns is quantified via cross-correlation, amplitude ratio and pattern matching between T1 and T2 using 7 features × 12-leads. Single and multi-lead ID models are trained on the first 230 ECG pairs. Their validation on 10, 20, ... 230 reference subjects (RS) from the remaining 230 ECG pairs shows: (i) two best single-lead ID models using lead II for a small population RS = (10–140) with identification accuracy AccID = (89.4–67.2)% and aVF for a large population RS = (140–230) with AccID = (67.2–63.9)%; (ii) better performance of the 6-lead limb vs. the 6-lead chest ID model—(91.4–76.1)% vs. (90.9–70)% for RS = (10–230); (iii) best performance of the 12-lead ID model—(98.4–87.4)% for RS = (10–230). The tolerable reference database size, keeping AccID > 80%, is RS = 30 in the single-lead ID scenario (II); RS = 50 (6 chest leads); RS = 100 (6 limb leads), RS > 230—maximal population in this study (12-lead ECG).


Introduction
Biometric identity recognition systems using human body measurement data are vital for the security of financial transactions, access control, travel, etc. The various biometrics that are currently being adopted in identity recognition scenarios such as fingerprint, iris, face and speech recognition have operational trade-offs in terms of performance, measurability, circumvention and liveness detection [1]. These limitations have led to the development of smart solutions such as: (i) improving voice recognition by sophisticated pre-processing, followed by heuristical selection of significant features in time and frequency domains [2]; (ii) integrating face-and body-related soft biometric traits [3]; (iii) providing security and flexibility by simultaneous application of numeric password data and biometric fingerprint information [4]; (iv) assuring liveness detection via optical sensors, which establish the presence of pulse, the variations of optical characteristics caused by pressure changes and skin reaction to illumination with different wavelengths [5].
For more than a decade the electrocardiogram (ECG) has been intensively investigated in the context of biometric applications. The attention is attracted by its ability to provide an unambiguous

•
Person verification, when the system accepts or rejects someone's identity based on a single comparison to the data for a given ID in the reference database (one-to-one scenario); • Person identification, when the system is finding the identity of an individual based on a search in a reference database for the subject whose data best matches the input data (one-to-many scenario).
In terms of feature extraction methods, identification techniques based on ECG analysis could be assigned into three categories: -Fiducial based approaches, which analyse: • Temporal, amplitude, area and angle features over single lead [9] or 12-lead ECG [16]; • Only temporal features measured in a single lead that are hypothesized to be invariant to sensor placement [12,17,18].
The accuracy of these methods strongly depends on the correct localization of the waves' boundaries within the P-QRS-T segment.
-Non-fiducial methods based on calculation of: • Autocorrelation functions and their post-processing via discrete cosine transform [14,19]; • Linear discriminant analysis [20] for feature space reduction and further identification; • Assessment of correlation coefficients between beats with known and unknown identity [21][22][23]; • Pattern matching and estimation of normalized Euclidian distance between the tested and the target heartbeat [24]; • Features describing the dynamic characteristics and chaotic behaviour in the ECG time series [25].
-Hybrid human identification methods combining both fiducial and non-fiducial strategies, which include: • Hierarchical identification scheme for temporal and amplitude measurements based on fiducial points detection and estimation of waveforms' similarity via Euclidean distance [26]; • Fiducial-based measurements for reduction of the matching candidates' number followed by wavelet transform and application of the coefficients for template matching [8].
Numerous studies are focused on general-purpose practical applications limited to the use of a single ECG lead, usually from the limbs [10,17,18,23,24]. Others aim at effective channel combination schemes by involving 12-lead ECG [16,[20][21][22] into the analysis in view of feasible applications in healthcare scenarios. The merit of multi-lead ECG is controversial, considering: on the one hand the identification accuracy improvement by increasing the analysed ECG leads in a fiducial independent application [22] and on the other hand the same level of identification accuracy for all 12 leads, only limb, only chest and only lead I in a fiducial-based application [16].
Despite that the majority of the studies have used public [9,10,19,20,23,26] or proprietary [10,[16][17][18][21][22][23][24] clinical databases, some authors have addressed the application of different non-invasive portable single-channel ECG sensors which support the acquisition of ECG signals for human identification purposes. The reported identification accuracies are: -~99% for 30 ECG signals acquired from the individual via a biosensor integrated into mobile devices [10]; -~90% for 30 ECG test patterns acquired from the palms of 9 subjects via a two poles self-developed ECG signal acquisition system [25]; -~94% achieved for ECG recordings acquired from the fingers of 16 individuals via self-developed ECG sensor [27].
The use of such ECG sensors facilitates the practical application of ECG biometrics in areas such as cyber-security [25,27], cloud data security or remote healthcare systems [10].
The majority of the cited methods report high identification accuracy. However, certain limitations are observed such as: -Small population size (<30 individuals) [16][17][18][19][20][21][22][25][26][27] with ECG recordings acquired in a very short temporal interval [16,21,22] or even during the same session [9,17,18,20,26]; -Application of one and the same database for both training and testing. This is far from the realm of biometrics. Significant accuracy degradation could be expected [28] with these methods, due to the influence of different unpredictable factors such as long-term ECG changes [13], noisy environment [14], electrode misplacement, overtraining of the identification model, etc.
The aim of this paper is to present the 12-lead ECG as a biometric modality by describing three methods for comparison between input-to-template waveforms of personalized beat patterns based on cross-correlation, amplitude relations and a new method for binary pattern matching. We further investigate their potential for human identification within an uncommonly large population with one-year distant collection of the reference and test datasets. Unbiased statistical analysis on independent validation set is presented for different perspectives of the human identification problem, considering: (i) the choice of the optimal single and multi-lead ECG set; (ii) the influence of the reference database size, providing evidence about the reasonable size of the population that could be tolerated in ECG biometric studies.

ECG Database
The study is using a proprietary clinical ECG database (Schiller AG, Baar, Switzerland) provided for research purposes and for the investigation of ECG's potential for human biometrics on a large population observed over time. It contains two 10-s sessions of standard 12-lead resting ECGs from 460 non-cardiac patients (235/225 male/female, 18-106 years old) admitted in the emergency department of the Basel University Hospital between 2004 and 2009. The ECGs were recorded via a commercial ECG device (SCHILLER AT-110 (500 Hz, 2.5 µV/LSB, 0.05 Hz-150 Hz bandwidth), Schiller AG, Baar, Switzerland) at different time-points, i.e. reference point T1, corresponding to the enrolment phase of the authentication task and remote point T2 > T1 + 1 year, corresponding to the matching/login phase. The 460 ECG couples were equally divided into two datasets from non-overlapping populations for independent training and validation of the designed human identification models: -Training dataset, including the ECG couples from 1 to 230; -Validation dataset, including the ECG couples from 231 to 460.

Methods
The identification concept ( Figure 1) is based on a comparison between a tested subject with an unknown identity (ID T ) from dataset (T2) and a set of N subjects from a previously recorded reference dataset (T1) with known identities (ID R ). The purpose in the identification task is to answer the question: "Who is this person?"

Methods
The identification concept ( Figure 1) is based on a comparison between a tested subject with an unknown identity (IDT) from dataset (T2) and a set of N subjects from a previously recorded reference dataset (T1) with known identities (IDR). The purpose in the identification task is to answer the question: "Who is this person?" The identification accuracy (AccID) is calculated as follows: NSubjects where the true identifications (TrueID) consider the number of all correctly decided cases for which the identity of IDT(T2) exactly corresponds to the identity of IDR(T1) and NSubjects is the number of tested subjects in the reference database.
Тhe block-diagram of the module designed for human identification is presented in Figure 2. The ECG pre-processing and the methods for cross-correlation and amplitude features extraction are inherited from our recent study on ECG-based human verification [29]. The novelty in this identification model is generally in the designed brand-new method for QRS pattern matching and the decision making, based on features' common similarity index. We further describe the methods in detail to facilitate their reproduction. The identification accuracy (AccID) is calculated as follows: where the true identifications (TrueID) consider the number of all correctly decided cases for which the identity of ID T (T2) exactly corresponds to the identity of ID R (T1) and NSubjects is the number of tested subjects in the reference database. The block-diagram of the module designed for human identification is presented in Figure 2. The ECG pre-processing and the methods for cross-correlation and amplitude features extraction are inherited from our recent study on ECG-based human verification [29]. The novelty in this identification model is generally in the designed brand-new method for QRS pattern matching and the decision making, based on features' common similarity index. We further describe the methods in detail to facilitate their reproduction.

ECG pre-processing
All ECG recordings are processed by a certified commercial ECG measurement and interpretation module (ETM, Schiller AG, Baar, Switzerland) for the extraction of a 12-lead average PQRST pattern with duration of 500 ms. This pattern, further referred as personalized pattern, provides higher signal-to-noise ratio and greater robustness against respiration-induced morphology changes in ECG as compared to single heartbeats.
Inter-subject standardization of 12-lead PQRST patterns is applied by: -Detrending (removal of DC offset and linear trend); -Synchronization of the cardiac depolarization process by time-alignment of the personalized PQRST pattern to a reference pattern by maximal cross-correlation in lead aVR. The reference pattern was selected at the beginning of the identification study. It represents PQRST segment in lead aVR with normal morphology (negative P, R, T waves) belonging to a healthy subject from the reference database (T1). -Extraction of QRS pattern (100 ms), synchronously in all 12 leads within a window of 30 ms before and 70 ms after the fiducial point, aligned to the R-peak of the reference pattern. The additional investigations of this short pattern are motivated by the findings in our previous study [30] which highlight the biometric potential of the R and S-waves and reject the verification capability of the P, ST, T parts due to their low intra-subject reproducibility and low inter-subject variability. -Calculation of heart rate corrected ST-T interval by means of Bazett's formula where QTc is the corrected QT interval and RR is the RR interval.

Features extraction
Comparison between the waveforms of the personalized PQRST and QRS patterns of a tested vs. reference subject (IDT vs. IDR) is performed independently for each of the 12 leads by three feature extraction methods-cross-correlation analysis of QRS/PQRST and amplitude measurements

ECG pre-processing
All ECG recordings are processed by a certified commercial ECG measurement and interpretation module (ETM, Schiller AG, Baar, Switzerland) for the extraction of a 12-lead average PQRST pattern with duration of 500 ms. This pattern, further referred as personalized pattern, provides higher signal-to-noise ratio and greater robustness against respiration-induced morphology changes in ECG as compared to single heartbeats.
Inter-subject standardization of 12-lead PQRST patterns is applied by: -Detrending (removal of DC offset and linear trend); -Synchronization of the cardiac depolarization process by time-alignment of the personalized PQRST pattern to a reference pattern by maximal cross-correlation in lead aVR. The reference pattern was selected at the beginning of the identification study. It represents PQRST segment in lead aVR with normal morphology (negative P, R, T waves) belonging to a healthy subject from the reference database (T1). -Extraction of QRS pattern (100 ms), synchronously in all 12 leads within a window of 30 ms before and 70 ms after the fiducial point, aligned to the R-peak of the reference pattern. The additional investigations of this short pattern are motivated by the findings in our previous study [30] which highlight the biometric potential of the R and S-waves and reject the verification capability of the P, ST, T parts due to their low intra-subject reproducibility and low inter-subject variability. -Calculation of heart rate corrected ST-T interval by means of Bazett's formula QTc = QT/ √ RR, where QTc is the corrected QT interval and RR is the RR interval.

Features extraction
Comparison between the waveforms of the personalized PQRST and QRS patterns of a tested vs. reference subject (ID T vs. ID R ) is performed independently for each of the 12 leads by three feature extraction methods-cross-correlation analysis of QRS/PQRST and amplitude measurements over the QRS (applied also in [29]) and a brand-new method for assessment of QRS pattern matching between ID T and ID R .
(1) Cross-correlation analysis of QRS and PQRST patterns (COR-QRS and COR-PQRST) based on the cross-correlation function r: where Pattern denotes QRS or PQRST with pattern duration PD = 100 ms or 500 ms, respectively. The lag value is changed in the range [−PD; PD]. Two correlation coefficients are calculated in the normalized scale [0; 100]%: Maximum correlation r(max) is representative for the best matching of the waveforms: Zero-lag correlation r(lag0) is representative for the non-synchronized similarity between the ECG patterns: Reduced values of both correlation coefficients are expected for subjects with different ID (ID T = ID R ) due to the different spatio-temporal dynamics of the cardiac vector between subjects.
(2) Amplitude measurements of the QRS patterns (AMP-QRS) estimating the feature: Ratio of the minimal-to-maximal QRS amplitude representative of the peak-to-peak amplitude equality between the QRS patterns in the studied lead, comparing ID T and ID R recordings.
A ratio close to 100% is expected for equal identity subjects ID T(T1) = ID R(T2) considering the same recording conditions in T1 and T2 sessions (e.g. posture, electrode placement, recording ECG device, etc.) and therefore similar QRS amplitudes.
(3) QRS pattern matching (MATCH-QRS) applying amplitude normalization; time-amplitude approximation; and extraction of features for assessment of QRS pattern matching between ID T and ID R .

Amplitude normalization
The amplitudes of the QRS patterns in any pair (ID T , ID R .) and any lead are linearly scaled to fit in the range [−1; 1] in order to use one and the same computational range for all individuals and all leads regardless of their signal amplitudes. This is achieved by dividing each sample of the QRS patterns in a particular ECG lead to the maximal absolute amplitude observed in this lead among QRS IDT and QRS IDR .

Time-amplitude approximation
The QRS waveforms of ID T and ID R are approximated by the following binary transform (binQRS ID ), which converts the QRS pattern into a 2-dimensional matrix: EQUT ranges between 0% (null time coincidence, i.e. patterns do not overlap for any binQRS entry over the complete pattern length) and 100% (full-time coincidence, i.e. patterns overlap for all entries over the complete pattern length). The largest EQUT values are expected for equal identity subjects ID T = ID R .
Measurement of the equality in the amplitude scale (EQUA): where DIFA (difference in the amplitude scale) represents the area enclosed between the non-overlapping amplitudes of both QRS patterns after binary element-wise multiplication and inversion (NAND operation): The integration interval in the amplitude scale is enclosed between the minimal and maximal QRS amplitudes among ID T and ID R patterns measured at each specific time index ti, i.e., [amin(ti) = min . DIFA ranges between 0% (full-amplitude coincidence, i.e., patterns overlap for all binQRS entries over the complete pattern length) and 100% (null amplitude coincidence, corresponding to the largest pattern differences that cover the full amplitude range). Therefore, minimal DIFA and inversely maximal EQUA values are expected for equal identity subjects ID T = ID R . The redundant input feature space is a matrix FEAT{Fi, Li}, where: is the index in the feature vector {r(max) PQRST , r(lag0) PQRST , r(max) QRS , r(lag0) QRS , ratioQRS, EQUT, EQUA}, including the option to enable/disable any feature from analysis and thus to test ECG identification scenarios with different feature sets. -Li = (1-12) is the index of the lead including the option to enable/disable any lead from analysis and thus to test different single-and multi-lead configurations for ECG identification.

Decision Making
For the purpose of human identification, the best matching between ID T and ID R is found by maximization of the similarity index: Optimal identification (ID) models with non-redundant features are trained by forward stepwise feature selection until maximization of accuracy AccID on the training dataset. The observed accuracy on the independent validation set could be considered as unbiased assessment of the human identification models.
The software package Matlab (The Mathworks Inc., Natick, MA, USA) was used for the management of the signal processing and statistical study. This includes training and validation of the forward stepwise ID models (reported as mean ± confidence interval (CI)) as well as for comparison of continuous feature distributions (represented as mean ± standard deviation (std)) via paired Student's T-Test. A value of p ≤ 0.05 is considered statistically significant.

Results
An example of a real human identification scenario based on 12-lead ECG is presented in Figure 3. In all leads we observe close matching between the personalized PQRST and QRS patterns of the same individual at two different time points (red vs. blue trace). In contrast, apparent deviances of the waveforms across different subjects are visible (red vs. grey traces) regardless of the overlapping of certain waveforms. Further in section Results we have presented the training and validation of ID models which deal with inter-subject waveform differences and long-term intra-subject waveform stability in 12-lead ECGs of a large population. the forward stepwise ID models (reported as mean ± confidence interval (CI)) as well as for comparison of continuous feature distributions (represented as mean ± standard deviation (std)) via paired Student's T-Test. A value of p ≤ 0.05 is considered statistically significant.

Results
An example of a real human identification scenario based on 12-lead ECG is presented in Figure 3. In all leads we observe close matching between the personalized PQRST and QRS patterns of the same individual at two different time points (red vs. blue trace). In contrast, apparent deviances of the waveforms across different subjects are visible (red vs. grey traces) regardless of the overlapping of certain waveforms. Further in section Results we have presented the training and validation of ID models which deal with inter-subject waveform differences and long-term intra-subject waveform stability in 12-lead ECGs of a large population.

Statistical Study
The distributions of all features are estimated on the training dataset for 230 equal (ID T = ID R ) and 230 × 229 = 52670 different (ID T = ID R ) identity pairs (Table 1). They prove the biometric potential of all features with significantly higher mean values for the groups of equal vs. different subjects (82-96% vs. 60-86%, p < 0.0001). This is a precondition to expect distinguishable similarity index ID ix involving all or a selected set of features used in the next phase of training of optimal ID models. Table 1. Statistical distributions (Mean ± std) of the features' values averaged for 12-lead ECG in the groups of equal (ID T = ID R ) and different (ID T = ID R ) individuals. Statistically significant differences are observed for all features.

Feature Extraction Method
Feature   Table 2 indicate poor AccID if only one feature is included in the ID model (not exceeding 41%).
The best identification capability is observed for COR-PQRST and COR-QRS single features in limb leads, while the worst performance (<1%) is observed for AMP-QRS features in all leads. Table 2. Training AccID (%) of each feature in each lead. The colour map distinguishes five AccID levels for visual ranking of the feature's ability for human identification: the worst ≤5%, is given in in red; [5][6][7][8][9][10][11][12][13][14][15]%, in orange; [15][16][17][18][19][20][21][22][23][24][25] More complex ID models involve all features from the same feature extraction method  Table 3 indicates the best identification capability for MATCH-QRS with 15 features (83%), followed by COR-PQRST with 18 features (77%), COR-QRS with 9 features (74%) and the worst AMP-QRS with 7 features (24%). The non-redundant feature sets in the ID models include features that are equally selected from limb and chest leads.  The next type of ID models involves all features in a single lead set (1 lead, 6 limb leads, 6 chest leads, 12 leads, i.e. The purpose is to assess the intrinsic identification capability of that specific lead set ( Table 4). The identification capability of single limb leads is the lowest in III, aVR (about 47% with up to 5 features) and the highest in I, II (about 57-58% with 5 features). It is increased by 20% when 6 limb leads are used in the ID model (about 77% with 19 features). The identification capability of single chest leads is the lowest in V3 (about 27% with 3 features) and the highest in V1 (about 44% with 4 features). It is increased by 20% when 6 chest leads are used in the ID model (about 64% with 18 features). The most complex ID model with 12-leads and 27 features improves AccID to 91.3%, which is about 14% and 27% points better than 6 limb and 6 chest leads, respectively. Table 4. Optimal ID models for single-and multi-lead sets. The non-redundant features included in the models are marked: '+' for single leads, '*' for limb leads, 'o' for chest leads, '#' for 12-lead ECG. Additionally, the last 3 rows belonging to the multi-lead models show the number of leads per feature, involved in the model. The training AccID (mean ± CI) of each model is reported in the last column. The following colour map is applied: the features involved in the single lead models and the achieved AccID are highlighted in yellow; the features involved in the model using the 6 chest leads and the respective AccID are highlighted in orange; the features involved in the model using the 6 limb leads and its AccID are highlighted in light green; and the features involved in the 12-leads model and the achieved AccID are highlighted in dark green.

Validation of ID Models on 230 Subjects (ID = 231 to 460)
The identification accuracy strongly depends on the number of reference subjects, i.e. The more the subjects there are, the higher the random match probability is, i.e. to find a subject with better similarity than his own. Therefore, AccID worsening is expected for reference database with larger size. We observe this trend for the independent validation set in order to find the number of reference subjects that could be managed by the trained ID models within different AccID tolerance ranges. For this purpose, we consider different number of reference subjects and test the ID models with all possible combinations of 10, 20, 30, . . . , 210, 220, 230 subjects within the total validation set of 230 subjects. Table 5 and Figure 4 compare the validation performance of the optimal ID models for different feature extraction methods using 12-lead ECG. Accepting a confident AccID threshold (>90%), we found that it is satisfied for 140, 90, 30, 20 reference subjects by the ID models based on all features, MATCH-QRS, COR-QRS, COR-PQRST, respectively. The AMP-QRS model is limited to maximum 56% achieved just for 10 reference subjects. Table 5. Validation AccID (mean ± CI) of the optimal ID models for different feature extraction methods using 12-lead ECG (Table 3). AccID is evaluated for different number of reference subjects (10 to 230 subjects included in the validation set). The colour map is provided for visual identification of the ID models' ability to keep the mean AccID within different tolerance ranges: ≥90% in dark green, [80-90)% in light green, [70-80)% in yellow, [50-70)% in orange, <50% in grey.   Table 5). The graphs are presented in respect of the feature extraction methods.  Table 5). The graphs are presented in respect of the feature extraction methods. Table 6 and Figure 5 compare the validation performance of the optimal ID models for different lead sets. The accepted AccID threshold (>90%) is not satisfied for any single lead, even for just 10 reference subjects. The most prominent single limb and chest leads fulfil reduced requirements for AccID thresholds. The top-ranked limb lead II achieves AccID > 80% for 30 reference subjects and AccID > 70% for 100 reference subjects. The top-ranked chest lead V1 achieves only the minor threshold AccID > 70% for 20 reference subjects. The multi-lead sets are more powerful and they satisfy higher AccID requirements. Identification accuracy AccID > 90% is achieved for 10 reference subjects using either 6 chest leads or 6 limb leads and 140 reference subjects when using 12-lead ECG. A lower threshold of AccID > 80% is satisfied for 50 reference subjects using 6 chest leads, 100 reference subjects using 6 limb leads and more than 230 reference subjects (maximal population in this study) using 12-lead ECG.  Table 5). The graphs are presented in respect of the feature extraction methods.  Table 6). The graphs are presented in respect of the ECG lead set implemented in the model: Blue lines: Single limb leads (dashed lines) and all limb leads (solid line); Red lines: Single chest leads (dashed lines) and all chest leads (solid line); Black lines: 12-leads.

Discussion
Although being a part of one general biometric task, the aims of human "identification" and human "verification" are quite different, i.e. to recognize an individual in a database by comparing a Figure 5. Mean value of the validation AccID in function of the number of reference subjects (data in Table 6). The graphs are presented in respect of the ECG lead set implemented in the model: Blue lines: Single limb leads (dashed lines) and all limb leads (solid line); Red lines: Single chest leads (dashed lines) and all chest leads (solid line); Black lines: 12-leads. Table 6. Validation AccID (mean ± CI, %) of the optimal ID models for single-and multi-lead sets (Table 4). AccID is evaluated for different number of reference subjects (10 to 230 subjects included in the validation set). The colour map is provided for visual identification of the ID models' ability to keep the mean AccID within different tolerance ranges: ≥90% in dark green, [80-90)% in light green, [70-80)% in yellow, [50-70)% in orange, <50% in grey. The bolded values correspond to the top ranked AccID across limb leads (II for 1-140 subjects, aVF for 140-230 subjects), chest leads (V1) and multi-lead sets (12-leads

Discussion
Although being a part of one general biometric task, the aims of human "identification" and human "verification" are quite different, i.e. to recognize an individual in a database by comparing a single input sample to many stored templates and to verify/reject person's identity by comparing a single input sample to a single stored template. The same features for comparison between input-to-template waveforms could be used in both sub-tasks, however, the decision-making process is different. Usually, the decision for "identification" is performed via the best similarity assessment approach, while the "verification" relies on threshold based techniques. In our recent study [29] we solve the "verification" problem via 6 cross-correlation and 2 amplitude features of QRS and PQRST patterns, involved in a linear discriminant threshold decision function. In this study, we use the same large population to solve the "identification" task adopting 4 cross-correlation and 1 amplitude features and adding 2 more QRS pattern features that measure the similarity in the time and amplitude scales. They are calculated with a brand-new method for binary template matching of short-duration QRS patterns (100 ms). The presented identification model applies a normalized scale for representation of all 7 features (0-100%), where 0% corresponds to the least matching and 100% to the best matching between input and template waveforms. The optimization of the feature selection scheme involved in the calculation of the common similarity index is presented by iterative maximization of the identification accuracy AccID.
The questions of greatest concern for the human identification task are: "Is there an individual whose personalized beat pattern would be more similar to another than to his own in a long-term basis?", "What is the probability to find such an individual within an increasing population of reference subjects?" A quantitative answer to these questions is presented in this study following a straightforward list of training and validation tests which aim to reveal specific aspects of the current human ID investigation.
(1) Is there a statistical justification for using the defined features for waveform similarity? Table 1 presents the statistics of the features' distributions in the training dataset, proving that all of them exhibit significantly higher similarity of the waveforms measured across the same individuals in a long-term basis (>1 year) than what can be seen between different individuals (mean value in the range about 82-96% vs. 60-86%, p < 0.0001). This proves that the similarity index ID ix defined for finding the best match between subjects could include all features on the same basis.
(2) Which are the most reliable QRS and PQRST features for human ID?
This aspect is investigated by training optimal ID models with different configurations of features using a large dataset with 230 individuals. The simplest model (one-feature in one-lead, Table 2) distinguishes r(lag0) in leads (aVL, aVF, III) with maximal AccID in the range of 35 to 41%. It is about 10% points better than the other distinguished feature r(max) where AccID ranges between 24 and 35%. The advance of both correlation coefficients is due to different spatial dynamics of the cardiac vector between different subjects. This dissimilarity is enhanced in the zero-lag correlation which additionally represents the temporal desynchronization of the depolarization-repolarization process in different individuals. We suggest that the latter effect is compensated in aVR by using this lead for synchronization of each subject's average beat to the reference pattern and this explains the remarkably low AccID in aVR (11-15%) in respect to the other limb leads. The joint analysis of r(lag0) and r(max) in 12-lead ECG increases AccID by about 40% reaching 73.5% (COR-QRS) and 77.4% (COR-PQRST)-see Table 3. These results indicate that the QRS pattern (100 ms) carry the essential subject-specific information, while P and T waves in the PQRST pattern (500 ms) contribute only for a slight ID accuracy improvement by about 4 percentage points.
The analysis of AMP-QRS shows definitively that the QRS peak-to-peak range across individuals is an indistinguishable feature. It provides AccID < 1% for all single leads (Table 2), which increases up to 24% for the optimal ID model ( Table 3) including the 7 ECG leads that are the most distinguishable in amplitude (I, II, III, V1, V2, V3, V5).
The MATCH-QRS method is not powerful when its features are analysed in single leads (maximal accuracy of 26.5% for EQUA in aVF, Table 2). This is suggested by the methodological approach for QRS waveform approximation in binQRS matrix [100 × 80], which gives indistinguishably high waveform similarity for many subjects whose QRS waveforms are matching within the pre-set time and amplitude tolerances of ±1 ms and ±5%, respectively. The accumulated similarity over 12 leads, however stays high for the same identity subjects and decreases for different identity subjects. Thus, 15 MATCH-QRS measures are selected in the best ID model (AccID = 83.5%, Table 3), which represents the best similarity for 10 leads in the amplitude scale (I, II, III, avR, avF, V1, V2, V4, V5, V6) and for 5 leads in the time scale (I, aVR, aVL, V1, V3).
(3) Which are the most reliable leads for human ID?
This aspect is investigated by training optimal ID models with different configurations of leads on a large dataset with 230 individuals (Table 4). In a single lead ID scenario, we highlight limb leads I and II (maximal AccID of 58%) followed by aVF (55%) and aVL (50%) including 3 to 5 features from all feature extraction methods. Compared to the best limb leads (I and II), the single chest leads are from 14 to 30% less powerful for the purposes of human ID. They are ranked in the following order: V1 (maximal AccID of 44%) followed by V6, V5, V2 (37-35%) and V3, V4 (27-30%). Such a drop in accuracy, especially in anterior leads, could be explained by V1-V6 proximity to the heart, so that small electrode misplacements across different recording sessions of the same individual result in considerable ECG morphology changes seen in chest leads.
Multi-lead ID models gain considerable raise in accuracy (about 20%) comparing 6 limb leads to the best II (77% vs. 58%) and 6 chest leads to the best V1 (64% vs. 44%). The optimal ID models are built with a prevalent selection of features from leads I, II (6 features), V1, V6 (5 features), aVR, V5 (4 features). The complete ID model, selecting features from all 12-leads, reaches maximal AccID of 91.3%. This is better than 6 limb, 6 chest leads and the best single lead (II) with about 14%, 27% and 33% points, respectively.
(4) What is the tolerated reference database size for searching human ID?
This question arises from our previous experience on human identification [23] showing that the time elapsed between the collection of the reference and the test dataset as well as the number of enrolled subjects in the reference dataset have a severe impact on the ID performance. There, AccID has been found to drop by 15% (from 93% to 78%) while the population increases by 35 subjects (from 14 to 49). The last is in agreement with [28], reporting AccID drop by 10% (from 99% to 89%) for population increase by 40 subjects (from 10 to 50). Naturally, AccID worsening could be expected for reference database with larger size, since the more the subjects there are, the higher the random match probability is, i.e. to find a subject with better similarity than his own. According to a comparative study [1], the majority of the published investigations on human ID have been conducted on small populations (about a few dozens of subjects). Considering the missing research information about the influence of the reference database size on the ID accuracy, we present consistent validation of our optimal ID models on an independent dataset by increasing its size from 10 to 230 subjects. Our validation results definitively confirm the expected trend for AccID drop with the increase of the number of the reference subjects which however is non-linear and depends on the particular ID model (Figures 4 and 5). We consider as good models those with higher AccID and lower AccID drop in respect of the number of reference subjects.
The differences are relatively small which is a sign of confident training of the ID models with 12-lead ECG capable of adequately evaluating independent database without bias. We observed that the minimal set of 2 MATCH-QRS features is capable of providing human ID with almost the same accuracy (AccID > 90%) as the maximal set of all 7 features. However, this is valid for limited reference datasets of up to 100 subjects. More subjects are best identified by the complete ID model including all features which maintains relatively stable performance with a drop of 11% (from 98% to 87%) for population increase by 220 subjects (from 10 to 230). These results outperform the cited in [23,28] which report the same level of AccID drop (10-15%) for about 6 times smaller population increase (35-40 subjects). The best performance of our complete ID model could be explained by the fact that larger datasets increase the probability to find similar characteristics across different subjects and thus, more and more robust features are required to distinguish their individuality.
In a single lead ID scenario ( Figure 5, Table 6) we can distinguish two best ID models based on analysis of lead II and aVF for small (10-140) and large (140-230) number of reference subjects, respectively. Overall, even for the best limb leads (II, aVF), we observe a limited ID capability, not exceeding 70% for more than 100 reference subjects and 80% for more than 30 reference subjects. Single chest leads are not usable for human ID. The majority of them (V2-V6) do not exceed 50% for more than 50 subjects. Only V1 can be distinguished with AccID above 70% for less than 20 subjects. We recommend multi-lead ID scenarios for management of human search in large populations. Both ID models based on 6 chest and 6 limb leads pass the AccID > 70% threshold for the maximal reference dataset (230 subjects) and AccID > 80% for up to 100 subjects (limb leads) and up to 50 subjects (chest leads). As expected, 12-lead ECG provides the most robust beat patterns for human ID with AccID > 90% for up to 140 subjects, reaching 98.4% for 10 subjects.
The performance of our best ID model (12-leads, 27 features, trained in Table 4, validated in Tables 5 and 6; Figures 4 and 5) is compared to other published studies on human identification (Table 7, Figure 6).
All published studies do not report training and validation of their ID models on independent datasets (as used in this study), therefore biased AccID is probable due to overtraining. For example, such an effect is suspected for studies with accuracy close to 100%, especially on small sized databases [9,16,[18][19][20]26]. Overall, the ID studies with enrolment and test phase in a single day report higher AccID [9,10,18,26,28]. However, they could not consider longitudinal ECG morphology changes due to physiological or technical sources (e.g. electrode displacement across different sessions). There are two ID studies using multiple sessions in different days on large populations with 74 [24] and 89 subjects [10] that report less than 5 percentage points better performance than our model. This might be due either to superior model algorithm or to the lack of independent validation, considering that the differences between our training and validation accuracies are in the same range. Figure 6 illustrates validation results of this study which extends the knowledge on the ID model performance over the largest span from 10 to 230 reference subjects. The observed decrease of AccID could be explained with the appearance of new subjects in the reference dataset who present PQRST waveforms similar to already existing ones. Thus, some of the subjects that match exactly their own pattern in the reference dataset of e.g. 100 subjects show higher correlation and pattern matching with different subject in the reference dataset that contains 230 individuals. This could be due to time-related ECG changes and possible small differences in the electrode placement during the acquisition of the test dataset. Possible approach to keep high AccID for larger reference datasets is to include additional robust features in the identification model. However, this should be done very carefully and the designed models should be validated on independent dataset in order to avoid overtraining. Another option to prevent from the assignment of wrong identity to the tested subject is to keep the size of the reference dataset as small as possible for the particular application and to update it with actual PQRST patterns on a regular basis. Table 7. Comparative study of AccID for our best ID model (validation results) and the results reported by other authors over databases with different number of subjects (the databases' abbreviations are cited according to their reference in the source publication). The information about single or multiple recordings per patient (srpp or mrpp) and the interval between T1 and T2 (in case of mrpp) is provided.   Table 7. The published studies (from No = 1 to No = 15 in Table 7) are presented in two groups, according to the time distance between the recording sessions, representative for ECG changes in a short (single day) or long-term (different days) basis. For the purpose of adequate comparison, the identification accuracy of our method is illustrated for different number of reference subjects (from 10 to 230 subjects included in the validation set).
All published studies do not report training and validation of their ID models on independent datasets (as used in this study), therefore biased AccID is probable due to overtraining. For example, such an effect is suspected for studies with accuracy close to 100%, especially on small sized databases [9,16,[18][19][20]26]. Overall, the ID studies with enrolment and test phase in a single day report higher AccID [9,10,18,26,28]. However, they could not consider longitudinal ECG morphology changes due to physiological or technical sources (e.g. electrode displacement across different sessions). There are two ID studies using multiple sessions in different days on large  Table 7. The published studies (from No = 1 to No = 15 in Table 7) are presented in two groups, according to the time distance between the recording sessions, representative for ECG changes in a short (single day) or long-term (different days) basis. For the purpose of adequate comparison, the identification accuracy of our method is illustrated for different number of reference subjects (from 10 to 230 subjects included in the validation set).

Conclusions
This paper presents a methodology for evaluation of the biometric potential of PQRST pattern waveforms in 12-lead ECG. The focus is on reliable human identification in a large-population database containing two ECGs per patient recorded between 1 to 2 years apart. The benefits from the use of such database are its representativeness for physiologically related long-term ECG changes and effects of possible multi-session technical differences. The investigations demonstrate the identification ability of cross-correlation, amplitude and pattern matching, applied on a personalized heartbeat pattern in single and multi-lead ECG scenarios. The detailed analysis of the identification accuracy highlights the ID model based on cross-correlation and pattern matching of 12-lead ECG as the best one, providing 91.3% identification accuracy over a large training dataset of 230 subjects. The independent validation of this ID model on datasets with different sizes (10-230 subjects) confirms its reliability for human identification with accuracy between 98.4% and 87.4% and extends the knowledge on the ID model performance over a large span of reference subjects.

Future Work
Up to this moment, our experience in human verification/identification involves the application of fiducial-based and fiducial independent approaches with databases collected with clinical ECG devices. Our future work would revolve around the design of hybrid methods for human verification/identification and their implementation in an identification ECG device prototype, working with a pre-selected minimal lead set.