Impact of Minutiae Errors in Latent Fingerprint Identiﬁcation: Assessment and Prediction

: We study the impact of minutiae errors in the performance of latent ﬁngerprint iden-tiﬁcation systems. We perform several experiments in which we remove ground-truth minutiae from latent ﬁngerprints and evaluate the effects on matching score and rank-n identiﬁcation using two different matchers and the popular NIST SD27 dataset. We observe how missing even one minutia from a ﬁngerprint can have a signiﬁcant negative impact on the identiﬁcation performance. Our experimental results show that a ﬁngerprint which has a top rank can be demoted to a bottom rank when two or more minutiae are missed. From our experimental results, we have noticed that some minutiae are more critical than others to correctly identify a latent ﬁngerprint. Based on this ﬁnding, we have created a dataset to train several machine learning models trying to predict the impact of each minutia in the matching score of a ﬁngerprint identiﬁcation system. Finally, our best-trained model can successfully predict if a minutia will increase or decrease the matching score of a latent fingerprint.


Introduction
In the last two decades, machine learning has attracted increasing interest in the research community, eager to automatize several processes in different application areas [1,2]. One of these areas is Biometrics [3][4][5][6][7][8][9], which aims to use several unique characteristics of the human body to automatically verify or identify people. From the several physiological traits of the human body, fingerprints hold the label of being the most reliable due to its uniqueness, permanence, acceptability, and collectability [2,3,5,10,11]. Fingerprints are considered genetically unique; up to now, there have been no reports of two individuals having the same fingerprint. Therefore, this biometric feature is widely used in several application fields, such as criminology, banking, and healthcare [2,[12][13][14][15].
Fingerprints can be classified into impressions and latent fingerprints [6,7,16,17]. Fingerprint impressions are acquired under controlled conditions and have high quality. By contrast, latent fingerprints are unintentionally left by someone when manipulating objects and have a low quality [18] (see Figure 1). Similarly, the methodologies for comparing fingerprints have been clustered into two groups. On the one hand, fingerprint verification, which aims to verify an individual's identity from two given impressions in a fully automatized process. On the other hand, fingerprint identification, which aims to search a background database for impressions that are the most similar to a given latent fingerprint query usually related to forensic applications [12,[19][20][21][22]. Example of a latent fingerprint and its corresponding impression, taken from the database NIST SD27 [23]. Notice how the latent fingerprint is a low-quality image, containing background noise that interrupts ridge flow. On the contrary, the impression is a clear and high-quality image taken under controlled conditions.
In 2006, in response to a misidentification of a latent fingerprint in a high profile case, the Federal Bureau of Investigation commissioned a review committee to evaluate the current state of forensic latent fingerprint identification technologies and processes and to better understand the bases of this discipline [24]. Their results showed that there are scientific areas where improvements in the practice can be made, particularly regarding validation, the analysis, comparison, evaluation, and verification (ACE-V) process, and data collection.
Fingerprint verification technologies based on digital impressions are dominated by fully automatic systems (e.g., smartphone authentication sensors). However, forensic latent fingerprint identification is usually performed through semi-automatic processes where human participation is still critical (e.g., minutiae labeling). Latent fingerprints are important evidence in thousands of real prosecutions all around the world. Together with DNA, the latent fingerprint is the most important biometric identification trait in these prosecutions. The main stages of latent fingerprint analysis are (a) reveal; (b) capture; (c) feature extraction, and (d) identification or matching. In nowadays forensic analysis, stages a to c are usually performed by human experts [25] while d is performed by automatic algorithms. These semi-automatic processes are error-prone; errors may originate either from a human or an algorithm. The analysis of these errors is critical to understand the performance of these systems. There are studies analyzing the likelihood of human errors in the stages a to c and showing these errors are habitual [26][27][28][29]. Nevertheless, there are not studies evaluating how these errors affect the identification stage, (d).
Minutiae extraction is a critical stage for any latent fingerprint identification system [13]. Usually, Automated Fingerprint Identification Systems (AFIS) work adequately for extracting minutiae from impressions but not for latent fingerprints. As a result, human fingerprint examiners are needed for extracting minutiae from latent fingerprints [21,22]. However, failing at marking minutiae or marking spurious minutiae can lead to misidentifying latent fingerprints, releasing criminals, or, even worse, yielding the apprehension of innocent people [21]. Recent studies [21,30] have shown that automatic fingerprint identification methods based on minutiae are often readily applicable and commonly outperform other types of methods. However, latent fingerprint identification rates are as low as 71.32% for Rank-1 when using medium-size databases (100,000 fingerprints) [30]. Consequently, latent fingerprint identification is an area of constant evolution, which aims to find new methods that help obtain better identification rates.
Based on the above discussion, three main research questions arise: • What is the impact of human errors in the performance of Automatic Fingerprint Identification Systems?
• Are there minutiae having a higher impact on the identification performance when missed? • If so, is there any feature of a minutia that can help determine its importance?
Aiming to develop methods that can help latent fingerprint examiners make better decisions with automatic methods as well as respond to the research questions mentioned above, in this paper, we propose to study how human error during minutiae extraction can affect the identification of latent fingerprints. By understanding the effect these mistakes have on matcher performance, we can identify research areas related to developing methods that seek to aid the latent fingerprint examiners.
The main contribution of this paper are summarized as follows: • To the best of our knowledge, this is the first study related to the matching algorithms' performance when minutiae are missed from latent fingerprints. • We quantify and discuss the impact of missing minutiae in the latent fingerprint on two latent fingerprint matching algorithms' performance by removing manually marked minutiae from latent fingerprints and calculating their matching score and rank, and comparing them to the original ground-truth latent fingerprints.
By removing minutiae from latent fingerprints, we simulate human mistakes. We show that even a single mistake could be fatal to the process. The variation in matching score when removing different minutiae leads us to believe that some minutiae are more important than others when it comes to latent fingerprint identification [13]. If we were able to determine the specific features of minutia that have a high impact on matching score, we could develop methods to aid latent fingerprint examiners by marking regions [31] where a mistake in feature selection would have a high negative impact in matching, in a kind of fingerprint quality map [18,32].
The remainder of this paper is organized as follows. In Section 2, we analyze previous works. After, in Section 3, we state the role of human errors in the forensic fingerprint analysis. Next, in Section 4, we define all our experimental setup. Then, in Section 5, we evaluate the impact of minutiae errors in latent fingerprint identification. Then, in Section 6, we experiment on predicting the utility of given minutiae for latent fingerprint identification. Finally, in Section 7, we present our conclusions and future work.

Previous Works
This section gives an overview of previous research that is closely related to ours. This includes the work done related to human performance in latent fingerprint minutiae extraction and matching, as well as the work related to the performance of fingerprint matching algorithms when the input is modified or distorted.

Research on the Performance of Experts in Latent Fingerprint Analysis
In this sub-section, we present works that study the performance of human latent fingerprint examiners when marking or matching latent fingerprints. These works are intended to show that even experts can make mistakes and serve as the motivation for our research on the effects of mistakes in this area.
In 2011, Ulery et al. studied the accuracy and reliability of forensic latent fingerprint decisions by using empirical approaches [26]. In particular, they studied how the quantity and quality of image features relate to the level of consensus among examiners and their decisions. The study of Ulery et al. was focused on determining the frequency of falsepositive and false-negative errors, the extent of consensus among examiners, and factors contributing to variability in results. In such a study, 169 latent print examiners participated, who were generally highly experienced (83% of the participants were certified as latent print examiners). The data included 356 latent fingerprints, from 165 distinct fingers from 21 people, and 484 impressions combined finally making up 744 distinct latent-impression image pairs. Ulery et al. balanced the number of fingerprint pairs used in their study and the number of examiners assigned to each pair. As a result, 100 image pairs from the 744 image pairs were assigned to each participant examiner.
The results of Ulery et al. showed that, on the one hand, the false positive rate error (FPR) was 0.1%, and never two examiners committed an error on the same comparison. These results indicate that blind verification (between two examiners) should be highly effective at detecting this type of error, which demonstrates the potential of information fusion approaches in this problem [33,34]. On the other hand, the false-negative error was much more frequent (7.5% of mated comparisons). An in-depth analysis detected that 85% of the examiners committed at least one false-negative error. Using a blind verification would detect most of the false-negative errors; however, this blind verification is not generally practiced in operational procedures. After studying the quality of the latent fingerprints (quantity of features, distortion, background, among others) [18,35], this study could not identify the prints' features associated with false positive or false negative errors.
Later on, in 2015, Ulery et al. analyzed the changes in latent fingerprint examiners' markup [27]. The authors collected 320 (231 mated and 89 nonmated) image pairs of fingerprints constructed from 301 latent fingerprints and 319 impressions. Each one of the 170 volunteer latent print examiners was randomly assigned to analyze 22 pairs of fingerprints. From the 170 participants of this study, 89% were qualified as latent print examiners.
The experiments executed by Ulery et al. showed that from 41,774 minutiae labeled, 87% were retained, 6% were moved, 7% deleted, and 17% were added, see Figure 2. From these reports, the authors concluded that the comparison rates for inclusion or exclusion of minutiae ranged from 14% to 54%.  [27]. On each latent fingerprint image, the retained minutiae are in yellow, moved in blue, deleted in red, and added in green.
A detailed report of Ulery et al. showed that most of the latent fingerprint examiners (86%) included or excluded minutiae in the majority of their comparisons. When examiners were individualized (without checking by using double examiners), they changed their markup of minutiae in most comparisons. In addition, examiners included minutiae less frequently when the fingerprint image pair was nonmated than mated, being a possible explanation that the comparison with a mated exemplar draws attention to additional corresponding features in the latent. The authors claimed that the high variation in rates could be due to participants' unfamiliarity with the tested tools as well as the instructions and bringing casework habits. Next, in 2016, Ulery et al. analyzed the interexaminer variation of minutia markup on latent fingerprints [28]. Similar to [27], Ulery et al. [28] used 170 latent print examiners from which 89% got the qualification of latent print examiners. The dataset collected 320 fingerprint image pairs, 231 mated (from the same finger and person), and 89 nonmated (from different fingers or individuals). The image pairs were made with 301 latent fingerprints and 319 impressions. Each examiner was randomly assigned 17 mated fingerprint image pairs and 5 nonmated fingerprint image pairs. Their results were based on the analysis of 3730 fingerprint image pair, in total, among all the 170 latent print examiners, with a median of 12 examiners assigned to each fingerprint image pair.
Ulery et al. [28] also proposed to analyze whether examiners agree on the inclusion or exclusion of minutiae by measuring the minor variations in minutia location. For doing that, they used the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [36] to group minutiae such that more than two examiners agreed they are the same on the latent fingerprint. In this way, as shown in Figure 3, clusters are labeled as a singleton (marked by only one examiner), minority (<50% of examiners), majority (50-90%), and supermajority (≥90%). . Two examples of latent fingerprints marked by several examiners, taken from [28]. Marked minutiae are shown as small black dots inside color-coded clusters. Singleton (marked by only one examiner), minority (<50% of examiners), majority (50-90%), and supermajority (≥90%).
The results of Ulery et al. showed that 66% of minutiae were reproduced by most examiners, 67% of the singleton clusters were in unclear areas of the fingerprints, and 98% of supermajorities were clear areas. After an analysis, the authors identified several factors that affect that several examiners can reproduce the same minutia in a fingerprint: clarity, region of interest, feature type, and location. Authors also claimed that a difference in minutiae markup could be due merely to a difference in how an interpretation is documented by an examiner.
Recently, in 2020, Kukucka et al. studied the impact of evidence lineups on fingerprint expert decisions. Their experiment involved the participation of 43 latent fingerprint examiners, at an average age of 44 years, most of them affiliated to a large laboratory. In the experiment, each examiner viewed four fingerprints, some of which belonged to a suspect. Fingerprints came from a crime scene, visited by the examiners, as the underlying hypothesis of Kukucka et al. is that an examiner could associate each latent fingerprint to a crime scene. Their results have shown that examiners made a positive identification of 37.8% for displayed images; by contrast, 25.6% images were classified as inconclusive. The main drawback of [29] is that examiners were aware of being part of a monitored study. This may explain why the study output an unusually high rate of inconclusive judgments attributed to suspicion of examiners aware they were under assessment. The study proposed by Kukucka et al. is more for the implementation of evidence lineups than for analyzing the fingerprint from a dactyloscopy point of view. As a consequence, fingerprint features such as minutiae, pore, core, among others, were not analyzed.
In summary, existing research has focused on studying human errors during the latent fingerprint markup. However, it has not considered the impact of missing minutiae on the score of matching algorithms or on the ranking results.

Impact of Fingerprint Variations in Automatic Fingerprint Recognition
In this section, we present an analysis on the performance of fingerprint identification systems in the presence of different distortions.
Studying the impact of fingerprint variations on fingerprint identification has attracted some attention in the last two decades, including position variability [35] and other distortions [10].
The most up-to-date and comprehensive work in this regard may be Grosz et al. [37], who performed a white-box evaluation of three fingerprint matchers so as to assess how different perturbations in fingerprints affect the matchers' scores. Of these algorithms, one, called SourceAFIS, was open-source and the other two were unnamed, minutiae-based commercial-off-the-shelf (COTS) matchers. Authors evaluated the robustness of these matchers against controlled perturbations of the fingerprints, specifically perturbations of minutiae sets. The fingerprint dataset used in this experimentation consists of fingerprint impressions, synthetically generated through SFinGe [38]. It is composed of 5000 different "master" fingerprints, each of which was used to generate two different impressions. For performance evaluation, Grosz et al. also generated a ground-truth minutiae set for each fingerprint.
Authors ran experiments where fingerprint impressions involved the following perturbations: (i) moving and rotating minutiae; (ii) adding spurious minutiae and removing ground-truth minutiae; (iii) considering a non-linear distortion of the minutiae sets; and (iv) combining the displacement and rotation of minutiae with the addition of spurious minutiae and removal of ground truth minutiae. The first two perturbations were generated by using a multivariate Gaussian Distribution model, which had parameters the researchers considered to model real possible perturbations. The non-linear perturbations were generated by using a non-linear distortion model learned from distorted fingerprints. Using these distortions, the authors performed two different experiments: (a) an analysis of uncertainty resulting from realistic amounts of perturbation and distortion; and (b) an evaluation of the recognition performance of each minutia-based matcher on increasing levels of perturbation and distortion.
To perform the uncertainty analysis, Grosz et al. generated 100 perturbations per perturbation type, per generated fingerprint. They obtained similarity scores and used them to calculate a global uncertainty score relating to each perturbation type. When performing the second experiment, the authors increased the perturbation parameters throughout eight iterations. The results of this experiment are reported as true acceptance rate at a fixed false acceptance rate of 0.01%. They also calculated impostor scores for each perturbation type, on each iteration. The authors found out that the non-linear distortion is the perturbation that reduced the mean of the similarity scores the greatest, followed by the combined perturbation, and after that, the removal of ground truth minutiae. The experiments suggested that the fingerprint identification algorithms are robust to elevate percentages of missing minutiae. The large number of minutiae in these experiments allows to maintain a competitive identification performance after missing minutiae.

Forensic Fingerprint Analysis: Role of Human Errors
This section introduces the main errors a human may incur in before and during the entire fingerprint identification process, possibly yielding misidentifications (see Figure 4). Latent fingerprint identification is composed of four main stages, namely: reveal, capture, feature extraction, and identification. The more common errors typically observed at each one of these stages are as follows [16]: Reveal: in this stage, an expert should use specific powders, based on the surface and shape of the object, to reveal the latent fingerprint. The errors one may incur at this stage are: • Using a revealing powder that does not correspond to the surface or shape of the object where the latent fingerprint is, which could wrongly reveal the ridges of the latent fingerprints.
• Using more revealing powder than necessary, creating a filling among the fingerprint ridges, and consequently, making it hard to be of use on the next stages.
Capture: in this stage, an expert must be careful at capturing the latent fingerprint under the best possible conditions. Errors that are typically observed at this stage are: • While lifting the latent fingerprint, deformations might be created in the ridges creating false bifurcations or false ends on the ridges. As a result, errors are induced that may severely affect the feature extraction stage. • Output an out of focus or low-resolution photo of the latent fingerprint. • Lack of use of a rule next to the latent fingerprint that may enable one to estimate the real scale of the fingerprint.
Feature extraction: during this stage, the examiners will extract all features necessary for the identification stage. The more common flaws one may experience at this stage are: • False features can be added from errors originated at the capture stage. • Actual features could be missed by examiners. • The position or angle of some features could be shifted due to human perception.
Identification: any flaw committed at previous stages will affect identification performance. However, the main errors that may show up at this stage are: • The ranking provides the corresponding impression, but the experts obviate this matching. • The ranking does not provide the corresponding impression due to human errors issued at the previous stages. • The expert issues a true positive identification when really it is a false positive.
All the human errors previously described above negatively affect the identification rate. One of the most important features for fingerprint identification is minutiae. A minutia is a minute detail on the ridges of a fingerprint, often ridge ending or bifurcation. Minutiae are one of the most used features for fingerprint identification [5,21,22]. Hence, we will analyze minutiae's impact on fingerprint identification and the matching algorithms upon the possible appearance of human errors.

Materials and Methods
This section describes the database used and its characteristics. After, the two tested matching algorithms are presented. Finally, our experimental setup is presented.
We used the database NIST SD27 [23], which is a public database widely used in latent fingerprint studies [13,39]. NIST SD27 contains images of 258 latent crime scene fingerprints with their manually marked minutiae, as well as their matched tenprint impressions with their extracted minutiae. The average number of minutiae in latent fingerprint images in NIST SD27 is 20, while the average number of minutiae in the corresponding impressions is 106.
Additionally, for some experiments, we used the NIST SD4 [40] as a background database. This database contains images of 2000 rolled fingerprints.
For determining the impact of our study on fingerprint identification, we used two matching algorithms that have demonstrated highly competitive performance in latent fingerprint identification: • Minutia Cylinder-Code (MCC) [41]: Matching algorithm based on a three-dimensional representation constructed using basic minutiae features such as angles and distances to other minutiae. • Deformable Minutiae Clustering using Cylinder-Codes (DMCCC) [19]: A matching algorithm independent of minutiae descriptors based on the use of clustering to improve robustness to non-linear transformations. In this case, we use Minutia Cylinder-Code as the minutiae descriptor.
We used CMC [42] curves as an evaluation measure adopting the norm ISO/IEC 19795-1, which indicates that CMC curves should be used for closed-set identification. As numerical measures of the CMC curve, we use from Rank-1 to Rank-100 identification rates. Notice that these ranks quantify the ratio of correct identifications in the first place and among the 100 first ranks, respectively, returned by an identification algorithm.
Our protocol consisted of three sets of experiments in which we removed minutiae from fingerprints and evaluated the impact of such minutiae removal on the two selected matching algorithms plus a final prediction experiment: 1.
The first set of experiments consists of randomly removing minutiae from latent fingerprints in the database and comparing the resulting CMC curves in a closed set comparison. We ran 35 experiments; the first 20 experiments consisted of removing a fixed number of minutiae from each latent fingerprint, from 1 to 20. The next 15 experiments consisted of removing a percent of minutiae from each latent fingerprint, from 0.1% to 0.75%. Each experiment was run 10 times with different randomlyselected removed minutiae, and the results of the 10 experiments are then averaged.

2.
The second set of experiments aims to measure the matching algorithms' negative impact when minutiae are removed. We remove every possible combination of one, two, and three minutiae from every fingerprint (more than 1,100,000 combinations for all fingerprints in the NIST SD27) for this set of experiments. Every new fingerprint (with the removed minutiae) was tested with the two selected matching algorithms to obtain their matching score and compared with the matching score obtained from the original fingerprints (without removed minutiae).

3.
The third set of experiments consists of determining the set of minutiae that less decreased the score and those that decreased the most the score. Using our second experiment results, we selected six combinations of minutiae for each fingerprint and each matching algorithm to see how the change in score is reflected in the CMC curve and the rank-100 identification. The three combinations which lowered the score the most are considered the "lower" class. The three combinations which lowered the score the least are considered the "higher" class.

4.
Finally, we perform one last experiment aimed to determine if it is possible to predict if removing a minutia will have a positive or negative effect on the matching score.
It is important to highlight that, for all experiments, the minutiae were not removed from fingerprints if a fingerprint was to be left with less than six minutiae, based on what INTERPOL recommends as a minimum for an AFIS [43].

Evaluating the Impact of Minutiae Errors
This section shows the experimental results obtained when removing fixed and variable numbers of random minutiae from each latent fingerprint.
For our first set of experiments, Figure 5 shows CMC curves for random fixed number removal of minutiae using both MCC and DMC as matching algorithms. From this figure, we can see how both matching algorithms degrade for the increasing number of minutiae removed. In addition, notice that MCC obtains worse identification rates than DMC regarding each CMC curve plotted. From these results, we can see that no matter the matching algorithm used, the identification rate is affected when minutiae are randomly removed from latent fingerprints. The drop of performance varies depending on the matching algorithm, but it is consistently high with rank-n identification degradation higher than 5% for each minutia removed.   Figure 6 shows the same results from Figure 5 in a different representation. From Figure 6, we notice that the identification rate decay proportionally to the number of minutiae removed. We can also see that rank one is more affected when minutiae are removed than the remaining ranks (from 2 to 6). In addition, we can notice that MCC's identification rates are more affected than those of DMC regarding the number of minutiae removed and the ranks analyzed. From our first set of experiments, we noted that the identification rate depends on both the number of removed minutiae and the total number of minutiae. Consequently, another essential experiment is to test removing a percentage of minutiae from each latent fingerprint.
For our second set of experiments, Figure 7 shows CMC curves for random proportion removal of minutiae using both MCC and DMC as matching algorithms. From this figure, notice that the identification rate decays very fast, with rank-n identification rate differences of 12% for a ratio of 0.2 missed minutiae. In addition, from Figure 7, we can see that the identification rate obtained by MCC (Figure 7a) is worse than the identification rate obtained by DMC (Figure 7b). These results suggest that the characteristics of the matching algorithm can serve to alleviate this drop in performance partially.  From our second set of experiments, we noted that some minutiae affected the matching scores differently than others. In addition, on the one hand, 2/3 of the minutiae lowered the matching score when removed. On the other hand, 1/3 of the minutiae have a null effect or even a small improvement of the matching score when removed when compared to the base score (no minutiae removed).
From the results obtained in our second set of experiments, we concluded that some minutiae are more critical for identification, but we still do not know why they are more important than others. The next experiment is performed to characterize minutiae that lower the matching score and compare them to the remaining minutiae to determine the reasons for this negative impact.
Using the results of our second set of experiments, we selected the 6 minutiae combinations of sizes one, two, and three that had the largest impact in matching score for each latent fingerprint and each matching algorithm to see how the change in scores reflected in the CMC curve and from the rank-1 to rank-100 identification.
The six combinations for each fingerprint and matching algorithm were determined by taking the three combinations of minutiae that lowered the score the most ("lower" class) and the three combinations of minutiae that raised the most ("higher" class). Figure 8 shows a latent fingerprint taken from NIST SD27 [23], which contains the minutiae labeled as "higher" class in green color, those labeled as "lower" class in red color, and the remaining minutiae in blue color. From this figure, we can see that those minutiae labeled as "lower" class are well separated from those labeled as "higher" class. In addition, we can notice that the sum of the score of all "lower" class minutiae (−0.1728) affects more negatively the identification score than the sum of the score of all "higher" class minutiae (0.0665).

Figure 8.
Example of a latent fingerprint taken from NIST SD27 [23]. Minutiae labeled as "higher" class are in green color, those labeled as "lower" class are in red color, and the remaining minutiae are in blue color. In both "higher" and "lower" classes, the minutiae contain the score difference that they introduce to the final score when that particular minutia is removed. Table 1 shows percentages in which there was a rank drop considering only those minutiae labeled as "higher" class. Table 2 shows percentages in which there was a rank drop taking into account only those minutiae labeled as "lower" class. Notice that for both tables, MCC obtained worse identification rates than DMC. In addition, in Table 2, the identification rates were most affected when those minutiae labeled as "lower" class were removed. Our third set of experiments shows how the identification rate is affected when those minutiae labeled as "lower" class are removed. Figure 9 shows a histogram of rank loss of MCC when removing the minutiae of the "lower" class. From this figure, we can notice how removing minutiae labeled as "lower" class can affect the MCC rank value with variations up to 2000 positions in a ranked list. It means that a fingerprint ranked in the top 50 positions can easily drop to top 1000. These results demonstrate the high impact of these errors in the performance of fingerprint identification systems. In this figure, we can see, as expected, that the more minutiae are removed, the higher the rank loss. Notice that when only two or three minutiae are missed, the drop is severe for a large number of fingerprints.  Figure 10 shows a histogram of rank loss regarding DMC when "lower" class minutiae are removed. Contrasting the results obtained by Figures 9 and 10, we can see that the matching algorithm DMC is less affected than MCC when minutiae labeled as "lower" class are removed. However, notice that when two or three minutiae are removed the histogram is still highly affected.

Predicting the Impact of Minutiae Errors
In the previous section, we showed that removing minutiae can decrease or increase the score of fingerprint matching algorithms. If we could automatically predict the impact of the minutiae in the score before performing latent fingerprint identification, we could: (1) help the forensics experts to improve their minutiae labeling process to maximize the probability that they find minutiae that lead to successful latent fingerprint identification [16], and (2) improve automated algorithms for latent fingerprint processing and recognition using quality-based processing [44] and quality-based information fusion [45][46][47]. With these goals in mind, we developed a framework to determine if it is possible to predict if a given minutia will have a positive or negative effect on the score of fingerprint matching. Thus, we could develop tools for forensics examiners to automatically classify regions by its impact on the performance of the identification algorithm, or quality-based algorithms for latent fingerprint processing [45,46]. This framework was created by using the results of removing every possible combination of one minutia from a fingerprint, as described in the previous section for the DMC matcher. We created features based on the location of each minutia and the distances of neighboring minutiae. We designed two different types of features: (1) the first type counts the number of minutiae in a radius neighboring the specified minutia, and (2) the second type is the distance from the specified minutia to the n closest minutia. Finally, the class values positive and negative were used for minutiae that, after being removed, increased or decreased the matching score, respectively. In the end, each datum point in the created dataset has the following information: • d [1][2][3][4][5][6]: Each of these features has the distance from the minutia to the closest minutia (d [1]), to the second closest minutia (d [2]), and so on. • r [15,30,45,60,75,90]: Each of these features counts how many minutiae there are in a radius (of 15 pixels, 30 pixels, and so on) around the specified minutia.
• class: Either a positive or negative impact on the matching score.
In total, this dataset has 12 features and a class feature. It contains 5215 instances, of which 1991 are positive examples (minutiae that increase the matching score when removed) and 3224 are negative examples (minutiae that lower the matching score when removed).
We tested eight different supervised classifiers. We selected popular classifiers of different types to get varied results. For each algorithm, we did five-fold distribution balanced stratified cross-validation [48], which is used for imbalanced class problems. We evaluated each algorithm with the area under the receiver operating characteristic curve (AUC) [49]. The AUC is calculated as: where TPR is true positive rate and FPR is false positive rate.
The eight supervised classifiers that we used were:  [54]. Figure 11 shows a bar graph comparing the average AUC of each supervised classifier when using five-fold distribution balanced stratified cross-validation. From Figure 11, we can see that PBC4cip [50] has the best performance with an AUC of .608, closely followed by FLDA and Random Forest [51]. The accuracy is still low, but these results suggest the existence of certain information that can be exploited to predict if a minutia will increase or decrease the matching score of a fingerprint with a better than random AUC. These results open a new research line related to creating automatic algorithms that help forensics experts to improve the minutiae labeling process in latent fingerprints.

Conclusions and Future Work
Aiming to develop automatic methods that can help latent fingerprint examiners make better decisions and can enable quality-based fingerprint processing, we carried out a study of how minutiae errors can affect the identification of latent fingerprints. Both human experts [28] and automatic algorithms [13] commit errors when labeling latent fingerprint minutiae. There is some knowledge already in the literature about the source and factors that originate those errors in both cases: human experts [26][27][28] and algorithms [13,21], but there is a lack of understanding of how those errors impact fingerprint identification systems. The present work is a step forward in understanding the impact of those minutiae errors.
From our experimental results, we can conclude that missing even one minutia from a fingerprint can have a very high negative impact on identification performance. In addition, some minutiae are more important than others to identify a latent fingerprint correctly. We also conclude that a result in the first places of the rank could be demoted to the last places of the rank when two or more minutiae (labeled as "lower" class) are removed. This fact of varying minutiae utility, or minutiae quality in terms of biometric utility [18], can enable specific processing for different minutiae [13] in quality-based conditional processing [44] or quality-based information fusion schemes [45][46][47].
In future work, we plan to deepen our study on what features make a minutia or a minutiae combination more important for identification performance. In addition, ongoing work is around determining critical areas of a latent fingerprint image in which missing a minutia could make the matching algorithms fail to identify a fingerprint correctly.

Short Biography of Authors
Octavio Loyola-González received his Ph.D. degree in Computer Science from the National Institute for Astrophysics, Optics, and Electronics, Mexico, in 2017. He has won several awards from different institutions due to his research work on applied projects; consequently, he is a Member of the National System of Researchers in Mexico (Rank1). He worked as a distinguished professor and researcher at Tecnologico de Monterrey, Campus Puebla, for undergraduate and graduate programs of Computer Sciences. Currently, he is responsible for running Machine Learning and Artificial Intelligence practice inside Altair Management Consultants Corp., where he is involved in the development and implementation using analytics and data mining in the Altair Compass department. He has outstanding experience in the fields of big data and pattern recognition, cloud computing, IoT, and analytical tools to apply them in sectors where he has worked for as Banking and Insurance, Retail, Oil and Gas, Agriculture, Cybersecurity, Biotechnology, and Dactyloscopy. From these applied projects, Dr. Loyola-González has published several books and papers in well-known journals, and he has several ongoing patents as a manager and researcher in Altair Compass.