This section gives an overview of previous research that is closely related to ours. This includes the work done related to human performance in latent fingerprint minutiae extraction and matching, as well as the work related to the performance of fingerprint matching algorithms when the input is modified or distorted.
2.1. Research on the Performance of Experts in Latent Fingerprint Analysis
In this sub-section, we present works that study the performance of human latent fingerprint examiners when marking or matching latent fingerprints. These works are intended to show that even experts can make mistakes and serve as the motivation for our research on the effects of mistakes in this area.
In 2011, Ulery et al. studied the accuracy and reliability of forensic latent fingerprint decisions by using empirical approaches [26
]. In particular, they studied how the quantity and quality of image features relate to the level of consensus among examiners and their decisions. The study of Ulery et al. was focused on determining the frequency of false-positive and false-negative errors, the extent of consensus among examiners, and factors contributing to variability in results. In such a study, 169 latent print examiners participated, who were generally highly experienced (
of the participants were certified as latent print examiners). The data included 356 latent fingerprints, from 165 distinct fingers from 21 people, and 484 impressions combined finally making up 744 distinct latent-impression image pairs. Ulery et al. balanced the number of fingerprint pairs used in their study and the number of examiners assigned to each pair. As a result, 100 image pairs from the 744 image pairs were assigned to each participant examiner.
The results of Ulery et al. showed that, on the one hand, the false positive rate error (FPR) was
, and never two examiners committed an error on the same comparison. These results indicate that blind verification (between two examiners) should be highly effective at detecting this type of error, which demonstrates the potential of information fusion approaches in this problem [33
]. On the other hand, the false-negative error was much more frequent (
of mated comparisons). An in-depth analysis detected that
of the examiners committed at least one false-negative error. Using a blind verification would detect most of the false-negative errors; however, this blind verification is not generally practiced in operational procedures. After studying the quality of the latent fingerprints (quantity of features, distortion, background, among others) [18
], this study could not identify the prints’ features associated with false positive or false negative errors.
Later on, in 2015, Ulery et al. analyzed the changes in latent fingerprint examiners’ markup [27
]. The authors collected 320 (231 mated and 89 nonmated) image pairs of fingerprints constructed from 301 latent fingerprints and 319 impressions. Each one of the 170 volunteer latent print examiners was randomly assigned to analyze 22 pairs of fingerprints. From the 170 participants of this study,
were qualified as latent print examiners.
The experiments executed by Ulery et al. showed that from 41,774 minutiae labeled,
were added, see Figure 2
. From these reports, the authors concluded that the comparison rates for inclusion or exclusion of minutiae ranged from
A detailed report of Ulery et al. showed that most of the latent fingerprint examiners (
) included or excluded minutiae in the majority of their comparisons. When examiners were individualized (without checking by using double examiners), they changed their markup of minutiae in most comparisons. In addition, examiners included minutiae less frequently when the fingerprint image pair was nonmated than mated, being a possible explanation that the comparison with a mated exemplar draws attention to additional corresponding features in the latent. The authors claimed that the high variation in rates could be due to participants’ unfamiliarity with the tested tools as well as the instructions and bringing casework habits. Next, in 2016, Ulery et al. analyzed the interexaminer variation of minutia markup on latent fingerprints [28
]. Similar to [27
], Ulery et al. [28
] used 170 latent print examiners from which
got the qualification of latent print examiners. The dataset collected 320 fingerprint image pairs, 231 mated (from the same finger and person), and 89 nonmated (from different fingers or individuals). The image pairs were made with 301 latent fingerprints and 319 impressions. Each examiner was randomly assigned 17 mated fingerprint image pairs and 5 nonmated fingerprint image pairs. Their results were based on the analysis of 3730 fingerprint image pair, in total, among all the 170 latent print examiners, with a median of 12 examiners assigned to each fingerprint image pair.
Ulery et al. [28
] also proposed to analyze whether examiners agree on the inclusion or exclusion of minutiae by measuring the minor variations in minutia location. For doing that, they used the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [36
] to group minutiae such that more than two examiners agreed they are the same on the latent fingerprint. In this way, as shown in Figure 3
, clusters are labeled as a singleton (marked by only one examiner), minority (<50% of examiners), majority (50–90%), and supermajority (≥
The results of Ulery et al. showed that of minutiae were reproduced by most examiners, of the singleton clusters were in unclear areas of the fingerprints, and of supermajorities were clear areas. After an analysis, the authors identified several factors that affect that several examiners can reproduce the same minutia in a fingerprint: clarity, region of interest, feature type, and location. Authors also claimed that a difference in minutiae markup could be due merely to a difference in how an interpretation is documented by an examiner.
Recently, in 2020, Kukucka et al. studied the impact of evidence lineups on fingerprint expert decisions. Their experiment involved the participation of 43 latent fingerprint examiners, at an average age of 44 years, most of them affiliated to a large laboratory. In the experiment, each examiner viewed four fingerprints, some of which belonged to a suspect. Fingerprints came from a crime scene, visited by the examiners, as the underlying hypothesis of Kukucka et al. is that an examiner could associate each latent fingerprint to a crime scene. Their results have shown that examiners made a positive identification of
for displayed images; by contrast,
images were classified as inconclusive. The main drawback of [29
] is that examiners were aware of being part of a monitored study. This may explain why the study output an unusually high rate of inconclusive judgments attributed to suspicion of examiners aware they were under assessment. The study proposed by Kukucka et al. is more for the implementation of evidence lineups than for analyzing the fingerprint from a dactyloscopy point of view. As a consequence, fingerprint features such as minutiae, pore, core, among others, were not analyzed.
In summary, existing research has focused on studying human errors during the latent fingerprint markup. However, it has not considered the impact of missing minutiae on the score of matching algorithms or on the ranking results.
2.2. Impact of Fingerprint Variations in Automatic Fingerprint Recognition
In this section, we present an analysis on the performance of fingerprint identification systems in the presence of different distortions.
Studying the impact of fingerprint variations on fingerprint identification has attracted some attention in the last two decades, including position variability [35
] and other distortions [10
The most up-to-date and comprehensive work in this regard may be Grosz et al. [37
], who performed a white-box evaluation of three fingerprint matchers so as to assess how different perturbations in fingerprints affect the matchers’ scores. Of these algorithms, one, called SourceAFIS, was open-source and the other two were unnamed, minutiae-based commercial-off-the-shelf (COTS) matchers. Authors evaluated the robustness of these matchers against controlled perturbations of the fingerprints, specifically perturbations of minutiae sets. The fingerprint dataset used in this experimentation consists of fingerprint impressions, synthetically generated through SFinGe [38
]. It is composed of
different “master” fingerprints, each of which was used to generate two different impressions. For performance evaluation, Grosz et al. also generated a ground-truth minutiae set for each fingerprint.
Authors ran experiments where fingerprint impressions involved the following perturbations: (i) moving and rotating minutiae; (ii) adding spurious minutiae and removing ground-truth minutiae; (iii) considering a non-linear distortion of the minutiae sets; and (iv) combining the displacement and rotation of minutiae with the addition of spurious minutiae and removal of ground truth minutiae. The first two perturbations were generated by using a multivariate Gaussian Distribution model, which had parameters the researchers considered to model real possible perturbations. The non-linear perturbations were generated by using a non-linear distortion model learned from distorted fingerprints. Using these distortions, the authors performed two different experiments: (a) an analysis of uncertainty resulting from realistic amounts of perturbation and distortion; and (b) an evaluation of the recognition performance of each minutia-based matcher on increasing levels of perturbation and distortion.
To perform the uncertainty analysis, Grosz et al. generated 100 perturbations per perturbation type, per generated fingerprint. They obtained similarity scores and used them to calculate a global uncertainty score relating to each perturbation type. When performing the second experiment, the authors increased the perturbation parameters throughout eight iterations. The results of this experiment are reported as true acceptance rate at a fixed false acceptance rate of . They also calculated impostor scores for each perturbation type, on each iteration. The authors found out that the non-linear distortion is the perturbation that reduced the mean of the similarity scores the greatest, followed by the combined perturbation, and after that, the removal of ground truth minutiae. The experiments suggested that the fingerprint identification algorithms are robust to elevate percentages of missing minutiae. The large number of minutiae in these experiments allows to maintain a competitive identification performance after missing minutiae.