Fingerprint Classiﬁcation Based on Deep Learning Approaches: Experimental Findings and Comparisons

: Biometric classiﬁcation plays a key role in ﬁngerprint characterization, especially in the identiﬁcation process. In fact, reducing the number of comparisons in biometric recognition systems is essential when dealing with large-scale databases. The classiﬁcation of ﬁngerprints aims to achieve this target by splitting ﬁngerprints into different categories. The general approach of ﬁngerprint classiﬁcation requires pre-processing techniques that are usually computationally expensive. Deep Learning is emerging as the leading ﬁeld that has been successfully applied to many areas, such as image processing. This work shows the performance of pre-trained Convolutional Neural Networks (CNNs), tested on two ﬁngerprint databases—namely, PolyU and NIST—and comparisons to other results presented in the literature in order to establish the type of classiﬁcation that allows us to obtain the best performance in terms of precision and model efﬁciency, among approaches under examination, namely: AlexNet, GoogLeNet, and ResNet. We present the ﬁrst study that extensively compares the most used CNN architectures by classifying the ﬁngerprints into four, ﬁve, and eight classes. From the experimental results, the best performance was obtained in the classiﬁcation of the PolyU database by all the tested CNN architectures due to the higher quality of its samples. To conﬁrm the reliability of our study and the results obtained, a statistical analysis based on the McNemar test was performed.


Introduction
Traditionally, in an Automated Fingerprint Identification Systems (AFIS), each fingerprint (or the derived biometric characteristics) must be compared with all the others contained in the database. If the database is large, a high number of comparisons is required, thus resulting in long processing and latency times. On the contrary, the availability of a system capable of classifying the fingerprints, before the identification phase, allows for reducing the 'search set', as well as the response times of any biometric identification system in real-time applications. Especially, when the identification process is performed on partial fingerprint images [1], it is especially important to rely on a division into classes to reduce the computational cost of searching for a fingerprint image.
Biometric systems, like all security systems, have vulnerabilities. A vulnerability in biometric security results in incorrect recognition or failure to correctly recognize individuals. The vulnerability concept includes methods to wrongly accept an individual (i.e., spoofing) affect the overall system performance (i.e., denial of service), or to attack another system via leaked data (i.e., identity theft) [2]. Exactly in this context and in order to • thorough analysis using three different pre-existing CNN-based architectures on fingerprint classification (into four, five, and eight classes), which can improve fingerprint recognition when integrated into an AFIS; • considerable number of tests performed, using two distinct large-scale fingerprint databases, aimed to acquire further insights into performance achieved on different databases with heterogeneous characteristics; • a user-friendly tool offering an immediate interaction and performance assessment via a GUI; • statistical validation of the obtained findings and comparisons by means of the McNemar test.
In order to confirm the goodness of the approaches and related obtained results, the p-values of the McNemar statistical test-on the pairwise comparisons of the classification results achieved by the investigated CNNs-considering the experiments described have been calculated and discussed. This paper is structured as follows. Section 2 summarizes the relevant background and principles on fingerprint classification. Section 3 outlines the main literature approaches for fingerprint classification. Section 4 illustrates the used databases and the Deep Learning Symmetry 2021, 13, 750 3 of 21 methods exploited to perform the comparison among CNN architectures. Section 5 shows the experiments and the obtained classification performance. Finally, Section 6 provides discussion and draws conclusions.

Fingerprint Classification
The identification process of a person-for instance, in an AFIS-requires the comparison of a fingerprint with all the others stored in a specific archive. These archives can grow to a substantial size. Therefore, millions of fingerprint comparisons may be required.
Typically, the first phase for any large-scale fingerprint recognition system (for both software and hardware systems) is fingerprint classification, aimed to provide an indexing mechanism for reducing the number of comparisons. In fact, for efficient one-to-many fingerprint recognition, fingerprint classification is an important step because assessing the target fingerprint only with the same type of fingerprints helps save searching time and the total time of one-to-one recognition. Therefore, fingerprint classification can reduce the execution time of the whole recognition procedure without affecting the performance [15].
Using a fingerprint classification system, it is possible to considerably reduce the number of comparisons-by processing only the fingerprints belonging to the same class as the fingerprint to be identified-and efficiently improve the AFIS performance.

Types of Fingerprint Classification
Since the 19th century, several people have been interested in the classification of fingerprints into different types: in 1823 Jan Evangelista Purkinje [16] discussed a thesis where the fingerprints were classified into nine different types.
A significant turning point in the classification of fingerprints was given by Francis Galton [17,18] at the end of the 19th century, who described for the first time the socalled "deltas" (i.e., points where three different areas of the fingerprint converge) by determining the conformation of a ridge. Galton proposed a fingerprint classification into three categories.
A few years later, Edward Henry proposed a classification system into five classes [19], which is still frequently applied. Inspired by Henry's classification model, the Federal Bureau of Investigation (FBI) presented in 1985 the book "The Science of Fingerprints" [20], where eight fingerprint types were described.

Fundamental Elements for Fingerprint Classification
Three types of points can be identified in a fingerprint ( Figure 1) and starting from their symmetric properties used to classify fingerprint images [8,9]: • Core: approximately coincides with the center of the ridge pattern. Each fingerprint has only a core point. In practice, it consists of the point with the greatest curvature of the innermost ridge that forms the spiral. • Delta: represents a divergence point and identifies a stretch where two ridges, which draw an almost parallel path, are divided. The delta is also defined as the point in the centre of a triangular region, which is usually found in the lower right or left corner, wherein the ridges converge from different directions. • Loop: is characterized by several ridges that cross the imaginary line between the core point and the delta point, thus forming a "U" pattern; the loops return approximately in the direction from which they originated, since they are indeed characterized by having exactly one loop point and one core.
Based on these three characteristics (core, delta, and loop), the various types of fingerprints can be distinguished.  Figure 1. Example of delta and core points within a fingerprint that define symmetric chara tics among similar images. Moreover, the "U" pattern of the loop is highlighted with a das line.

Fingerprint Classes
Relying upon a careful analysis of the three elements described above (i.e., cor and loop) and the related their symmetric characteristics, it is possible to characte fingerprint into several classes: • Arch: they are distinguished by the fact that the ridges enter from one side, ris ing a small protuberance and exit from the opposite side. They do not have loo deltas.

•
Tented Arch: if the arch has at least one ridge showing a high curvature, and the presence of a loop and a delta, they can be classified as Tented Arch.

•
Left Loop: the ridges that form it come and return towards the left direction have a loop and a delta (core positioned to the left of the delta). • Right Loop: characterized by one or more ridges that enter from the right side and exit from the same side. They have a loop and a delta (core positioned to th of the delta). • Whorl: it is characterized by the presence of at least two delta points. The pa defined by a part of ridges that tend to form a circular pattern. The fingerp this class are characterized by at least one ridge that makes a complete 36 around the fingerprint center.
Each fingerprint class has a different incidence, in terms of frequency; the mo mon are those of the Whorl class, while the Arch class that has a lower number o rences. Figure 2 shows an example for each of the five classes according to Henry' fication. Figure 1. Example of delta and core points within a fingerprint that define symmetric characteristics among similar images. Moreover, the "U" pattern of the loop is highlighted with a dashed line.

Fingerprint Classes
Relying upon a careful analysis of the three elements described above (i.e., core, delta and loop) and the related their symmetric characteristics, it is possible to characterize the fingerprint into several classes:

•
Right Loop: characterized by one or more ridges that enter from the right side, curve and exit from the same side. They have a loop and a delta (core positioned to the right of the delta). • Whorl: it is characterized by the presence of at least two delta points. The pattern is defined by a part of ridges that tend to form a circular pattern. The fingerprints of this class are characterized by at least one ridge that makes a complete 360 • turn around the fingerprint center.
Each fingerprint class has a different incidence, in terms of frequency; the most common are those of the Whorl class, while the Arch class that has a lower number of occurrences. Figure 2 shows an example for each of the five classes according to Henry's classification. Example of delta and core points within a fingerprint that define symmetric characteristics among similar images. Moreover, the "U" pattern of the loop is highlighted with a dashed line.

Fingerprint Classes
Relying upon a careful analysis of the three elements described above (i.e., core, delta and loop) and the related their symmetric characteristics, it is possible to characterize the fingerprint into several classes: • Arch: they are distinguished by the fact that the ridges enter from one side, rise forming a small protuberance and exit from the opposite side. They do not have loops and deltas.

•
Tented Arch: if the arch has at least one ridge showing a high curvature, and there is the presence of a loop and a delta, they can be classified as Tented Arch.

•
Left Loop: the ridges that form it come and return towards the left direction. They have a loop and a delta (core positioned to the left of the delta).

•
Right Loop: characterized by one or more ridges that enter from the right side, curve and exit from the same side. They have a loop and a delta (core positioned to the right of the delta).

•
Whorl: it is characterized by the presence of at least two delta points. The pattern is defined by a part of ridges that tend to form a circular pattern. The fingerprints of this class are characterized by at least one ridge that makes a complete 360° turn around the fingerprint center.
Each fingerprint class has a different incidence, in terms of frequency; the most common are those of the Whorl class, while the Arch class that has a lower number of occurrences. Figure 2 shows an example for each of the five classes according to Henry's classification. Fingerprints belonging to the Whorl class can be divided into further subclasses:

Fingerprint Classification using Deep Learning Approaches
Traditional methods for fingerprint classification rely on morphological and symmetric characteristics, based on local fingerprint substructures [22]. Given the difficulty in detecting them, and since some are very similar to each other, most systems use only the two most prominent structures [23]: Ridge Ending and Ridge Bifurcation (i.e., minutiae) which correspond to the terminations and bifurcations of the ridges, respectively. To achieve this, it is necessary to implement an algorithm, which can detect the minutiae, by taking into account various factors, such as the fingerprint acquisition method or the type of sensor used. These factors entail the need to adapt the identification algorithm to different circumstances, considerably affecting management costs and processing times.
CNN-based classifiers do not require any human intervention in the feature extraction and classification steps, introducing a drastic reduction in design times, cost reduction and ensuring greater flexibility.
The goal of this work is to classify fingerprints into four, five, and eight classes, by making use of CNN-based architectures, comparing them and providing guidelines on the best architecture according to the fingerprint database and the classification type.

Related Work
Generally, the first phase for any large-scale fingerprint recognition system (for both software and hardware systems) [24,25] is fingerprint classification [26]. Classifying a fingerprint image is crucial and represents a very difficult morphological structure recognition problem, due to variability of inter-class and intra-class features and for the presence of noise. Over the last thirty years, many techniques have been devised to overcome these problems, such as genetic algorithms [27,28], support vector machines [29], Wavelet and Fourier transforms [30], neural networks [31,32], statistical information [33]. However, the

Fingerprint Classification Using Deep Learning Approaches
Traditional methods for fingerprint classification rely on morphological and symmetric characteristics, based on local fingerprint substructures [22]. Given the difficulty in detecting them, and since some are very similar to each other, most systems use only the two most prominent structures [23]: Ridge Ending and Ridge Bifurcation (i.e., minutiae) which correspond to the terminations and bifurcations of the ridges, respectively. To achieve this, it is necessary to implement an algorithm, which can detect the minutiae, by taking into account various factors, such as the fingerprint acquisition method or the type of sensor used. These factors entail the need to adapt the identification algorithm to different circumstances, considerably affecting management costs and processing times.
CNN-based classifiers do not require any human intervention in the feature extraction and classification steps, introducing a drastic reduction in design times, cost reduction and ensuring greater flexibility.
The goal of this work is to classify fingerprints into four, five, and eight classes, by making use of CNN-based architectures, comparing them and providing guidelines on the best architecture according to the fingerprint database and the classification type.

Related Work
Generally, the first phase for any large-scale fingerprint recognition system (for both software and hardware systems) [24,25] is fingerprint classification [26]. Classifying a fingerprint image is crucial and represents a very difficult morphological structure recognition problem, due to variability of inter-class and intra-class features and for the presence of noise. Over the last thirty years, many techniques have been devised to overcome these problems, such as genetic algorithms [27,28], support vector machines [29], Wavelet and Fourier transforms [30], neural networks [31,32], statistical information [33]. However, the Symmetry 2021, 13, 750 6 of 21 main proposed approaches can be classified as follows: (i) based on heuristics or singularity points, (ii) structure-or morphology-based, and (iii) Neural-and CNN-based. The main approaches are outlined in what follows.

Approaches Based on Heuristics/Singularity Points
Solutions based on heuristics are generally focused on fingerprint ridge structures and singularity points. In [34], the authors proposed a classification approach exploiting singular points (extracted by means of a modified Poincare index) and the orientation field for a reliable and fast fingerprint classification. The algorithm was tested on two databases: (i) the Fingerprint Verification Competition (FVC) 2002 [35] and the FVC2004 [36] from the Biometric System Laboratory, University of Bologna, Italy, and (ii) the VeriFin-ger_Sample_DB from the Neurotechnologija website [37]. The whole set of 730 fingerprint images was classified into five different classes. The classification results showed an overall classification rate of 96.1%. Among all images, the algorithm failed six times during the core point detections, therefore all the remaining images were correctly classified. Unfortunately, singularity points are not always present in a fingerprint image due to an incorrect acquisition process (e.g., the acquired fingerprint might be partial) or the fingerprint belongs to particular classes (e.g., arch). For this reason, a new set of points-called pseudo-singularity-points-were defined and used to enable the classification as the first phase in an identification system [38]. As a result, fingerprint processing and fingerprint matching involve few steps to compare features obtaining performance comparable to the conventional minutiae-based systems. The experiments were performed on several official FVC databases. The achieved results show an identification system with a False Acceptance Rate (FAR) = 1.22% and a False Rejection Rate (FRR) = 9.23% on the FVC2002 DB2-A database; thus, a satisfactory effectiveness of the proposed approach was obtained [35]. The best results were obtained on the FVC2000 DB1-B [39] database, with a FAR = 0.26% and a FRR = 7.36%. The authors of [40] developed an efficient estimation approach to increase the accuracy of the extracted directional field. According to the improved directional field and for generating the input features of the fingerprint classifier, singular points and the related macro-features were extracted. After encoding the input features, to perform a five-class fingerprint classification, a fuzzy Wavelet neural network (FWNN)-based classifier was applied. The experiments were performed using the FWNN-based classifier on the NIST-4 database obtaining an overall accuracy of 92.4%.

Structure/Morphology-Based Approaches
Approaches based on morphological information are generally focused on fingerprint structural and information shape. Nain et al. in [41] used the Sobel operator and Gabor filter to extract the high ridge curvature region. Subsequently, the ridges within the region were traced in both the directions, starting from any point on the ridge. Vectors were drawn at the end points of the ridge in order to determine the class to which a fingerprint image belongs. The algorithm was very efficient by avoiding the core point detection, and was based on simple ridge-flow connectivity. The overall classification accuracy without rejection was 98.75%. The authors of [42], for enhancing the fingerprint image quality, proposed a threestage method. The fingerprint image was first processed by a denoising procedure based on the wave atom transform. Afterwards, the image augmentation (based on morphological operations) was exploited for improving the classification performance. The dilation and the area opening were the morphological operators considered for refinements. To evaluate the performance of the approach, the authors used an adaptive genetic neural network and the FVC2000 databases [39]. A fingerprint classification approach, based on histograms of oriented gradient descriptors, was proposed in [43]. The computed orientation field, which was adapted to the ridge patterns and incorporated into the proposed descriptor, better represented a fingerprint in a robust manner compared to the non-adapted field. An Extreme Learning Machine (ELM) with a Radial Basis Function (RBF) kernel was used as a classifier. The experimental tests were conducted on the FVC2004 [36] database, obtaining Symmetry 2021, 13, 750 7 of 21 a mean accuracy of 98.70%. In [44], the authors proposed a neural network model using as input the local ridge orientation information and the corresponding locations of the fingerprint singular points. The local ridge orientation was obtained by filtering the raw fingerprint to decrease errors due to noisy of the scanned fingerprint. From the resulting image, singular points were extracted and their positions were used as the input to the Sim-Net unsupervised neural network model. The number of the singular points and the distance between these points can adjust the number of classes.

Neural-and CNN-Based Approaches
In recent years, Deep Learning has been achieving outstanding results in many application fields, such as image processing and computer vision. An attempt on the fingerprint classification problem by using a CNN-based method was performed in [45]. For the four-class problem on the NIST-DB4 database, only choosing the orientation field as the classification feature, they achieved 91.4% accuracy, by using the stacked sparse auto-encoders with three hidden layers. Afterwards, two classification probabilities were considered for fuzzy classification, which can affect classification performance. By only adjusting the probability threshold, they obtained a classification accuracy of 96.1% (setting a threshold of 0.85), 97.2% (setting a threshold of 0.90) and 98.0% (setting a threshold of 0.95) with a single-layer architecture. The authors of [46] proposed a fingerprint classification algorithm, which can classify raw fingerprint images. The low computational complexity of the proposed algorithm was obtained by using transfer learning rather than training a deep CNN architecture. The NIST-4 database has been used to test the performance of the proposed algorithm. In the experimental part, the proposed algorithm achieved 94.7% and 96.2% for the five-class and four-class classification problems, respectively. Hamdi et al. in [47] investigated the use of the conic Radon transform as a feature extractor and a Deep Learning technique to solve fingerprint classification tasks. The used Radon technique (which represents an extension of classical Radon transform over conic sections) enabled the extraction of fingerprint global characteristics, which are invariant to geometrical transformations, such as translations and rotations. This approach was tested on the NIST-SD4 achieving a recognition rate of 96.5%. In [47,48], the authors proposed Res-FingerNet, a deep CNN to tackle the classification task. Moreover, to reduce the intra-class variance and increase the inter-class variance of the fingerprints, they utilized a center loss in the network training phase so that the learned deep features were more discriminative for fingerprint classification tasks. By using the center loss the classification accuracy increased about 1.5%. The performance of the approach was evaluated on NIST-DB4 database, achieving a classification accuracy of 97.9%. In [49], the authors assessed the performance of two pre-trained CNN architectures, namely VGG-F and VGG-S Nets, fine-tuned on the NIST SD4 database. The results showed that this approach obtained an accuracy of 94.4% using VGG-F with a testing time of 39 ms per image, while an accuracy of 95.05% using VGG-S with a testing time of 77 ms per image. The work in [50] proposed an architecture that comprised a pre-processing stage performed by using histogram equalization, Gabor filter enhancement and ridge thinning. Subsequently, these fingerprints were fed to a CNN-based classifier. This approach, tested on a proprietary database collected using a Futronics FS88 scanner device, achieved 98.21% classification accuracy with 0.9 loss. The database contained 10 images acquired from each of 56 users: 560 samples, where 280 (56 users × 5 images per user) samples considered for this research. Finally, Conti et al. in [51] proposed an efficient embedded fingerprint classifier node based on the fusion of a Weightless Neural Network architecture and the Virtual Neuron technique. The classifier leveraged devices with limited number of resources, allowing for resource-efficient hardware implementations (e.g., FPGA devices). Experimental results, based on a 10-fold cross-validation strategy, showed an average classification rate of 90.08% using the official FVC2002DB2 database.
In our work, the most used CNNs were investigated to extensively compare the results existing in the literature with respect to ones calculated in this work, and pro- vide some guidelines to choose the best CNN for classification in terms of precision and execution time.

Materials and Methods
This section describes the details of the fingerprint databases and the CNN architectures considered. Moreover, the development environment, the CNN training and the implemented tool are provided.
The analysed fingerprint images belong to two distinct public databases: PolyU [10] and NIST [11]. By means of transfer learning, the three investigated CNNs architectures-AlexNet, GoogLeNet and ResNet-were trained to classify fingerprints into four, five, and eight classes. Subsequently, in order to make the tool usable and easy to use, a GUI was implemented. Using this interface, it is possible to carry out an accurate analysis of the classification results obtained following the training, as well as a comparison of the CNN performance, in combination with the different databases.

Fingerprint Databases
The CNN training was conducted using two databases of fingerprints, publicly available for research and non-commercial purposes, for a total data set of 7800 images, divided as follows:  Table 1 shows the characteristics of the PolyU, and NIST fingerprint databases.

Investigated Deep Learning Architectures
CNNs are well-suited for the recognition of images or more generally with data that have a spatial correlation. To optimize the performance of a specific CNN architecture in a specific application scenario, we need to effectively train it. In particular, an approach called transfer learning was used in our work, which consists in fine-tuning the deepest CNN layers on a new data set. Therefore, starting from an existing and pre-trained network, new data are fed, containing previously unknown classes. Once the network is in place, a new task can be carried out, such as fingerprint classification in our case.

AlexNet
AlexNet is a CNN proposed by Krizhevsky et al. [12] and ranked second in the 2012 contest on image recognition of the ImageNet database. Compared to the LeNet to which it refers, proposed a few years earlier by Yann LeCun [52], AlexNet presents a deeper neural structure, composed of a higher number of pooling and convolution layers. AlexNet uses Rectified Linear Units (ReLUs) as activation functions.

GoogLeNet
GoogLeNet is a CNN [13], released by Google Inc. in 2014, the year in which it participated in the ImageNet Large Scale Visual Recognition Competition (ILSVRC) [53], reaching the first place with an error rate of 6.66%, which is close to the human error rate.
The network is composed of an architecture inspired by LeNet from which it draws an important element: the inception-module. Inception is a technique originally designed by Yann LeCun and consists of the use of multi-dimensional convolutions that are grouped into a layer. This introduces an improvement in the results, due to the fact that hierarchical information, with variable dimensions, can be identified at the same level.
Unlike other networks that apply momentum to accelerate the stochastic descent of the gradient, GoogLeNet uses Root Mean Square Propagation (RMSProp), a method with an adaptive learning rate.
GoogLeNet is based on very small convolutions to reduce the number of parameters. In fact, it has 22 layers, and a reduced number of parameters (four million versus 60 million of AlexNet).

ResNet
The Residual Network (ResNet) was released in 2015 by Microsoft Research, employing 152 layers [14]. GoogLeNet and AlexNet follow the archetypal scheme of classic CNNs: a series of convolutions, pooling, activation layer and finally fully-connected-layers. ResNet has a distinctly different architecture: the developers investigated why by increasing the number of layers, CNNs do not improve the performance. This problem has always been motivated by the hypothesis that adjacent feature-maps lead to unsatisfactory results, for the reasons discussed in the previous paragraph.
To address this problem, ResNet uses a global average-pooling (i.e., average calculated on the entire feature-maps), applied to the result of the last convolution, followed by a single fully-connected-layer. In practice, the difference between the feature maps of the layers is learned, or the "residual" and subsequently, correct the input based on this. Table 2 summarizes the most relevant characteristics of AlexNet, GoogLeNet, and ResNet CNN-based architectures.

Implementation Details: Fine-Tuning
Transfer learning makes it possible to exploit a neural network previously trained on a specific data set, to solve problems associated with "similar" contexts. This happens by reprocessing only some specific layers of the model: analysing the final layers of AlexNet, it is clear that the model provides the possibility of distinguishing 1000 different categories; if we want to classify the fingerprints in a number of classes not exceeding eight, it is clear that by making the appropriate changes to the network, the impact on computational costs is drastically reduced.
The benefits that can be drawn from this type of approach reside therefore in an effective reduction in the resources used. For example, during the training of GoogLeNet, a database of over one million images was used; as a result, the use of high-performance processors and graphics cards become essential. Through transfer learning, on the other hand, the network can be retrained to carry out new classification tasks, by requiring a much smaller number of images, as well as a computational platform with more limited performance.

Fingerprint Image Pre-Processing
In order to maximize the classification performance, some data pre-processing steps are needed. The fingerprint images were resized considering the input size of each CNNs (Table 2) and converted from grayscale to RGB (the pixel matrix is cloned times). Furthermore, the images underwent a data-augmentation procedure, using a scale factor between 0.9 and 1.1 (on both the x and y axes), in order to have greater variability of the input data.

CNN Architecture Adaptations
To enable result repeatability, we decided to keep the pre-existing CNN architectures unchanged for the training phase. By acting on the layers of the CNNs, it is possible to adapt the network to the type of classification at hand and reduce processing times accordingly. This is possible by freezing-layers, through which the weights of the frozen layers are not modified during training (basically, the learning rate on these layers is zero). This means that the gradients of the frozen layers are not calculated, thus significantly accelerating the network training process.
From a preliminary analysis it was obtained that by not freezing any layers and freezing all the layers (except the input-output ones), the performance remained overall unaffected, but with much longer training times in the case of complete training (i.e., without freezing some layers). For this reason, we chose to apply only a few small changes to the architectures to properly configure the input and output layers, by modifying and adapting them (i) to the input image size and (ii) to the output classes number, respectively, (four, five, or eight). This allowed us to keep the classification performance consistent by also reducing the training times. Table 3 provides the details of the layers that were modified. Table 3. CNNs architecture adaptation details.

Development Environment
The development environment used to train and test the CNNs was MATLAB R2019a (The MathWorks, Natick, MA, USA), installed on a Linux Ubuntu 18.10 platform, equipped with an Intel Core i7-980 CPU@3.2 GHz, 16 GB of RAM and a solid-state drive.
MATLAB allows us to use the Graphics Processing Unit (GPU) to support processing. For this reason, an NVIDIA GeForce GT710 GPU with 2GB of DDR5 RAM, which used the CUDA 3.0 computing technology, was leveraged. The use of the GPU allowed us to reduce the training time. Moreover, we aimed at drawing out the MATLAB potential for Deep Learning by also leveraging the Graphical User Interface Development Environment (GUIDE). From this practical point of view, the use of MATLAB can facilitate the implementation of user-friendly tools, providing also a GUI, but at the same time allowing developers to deal with the classifier models based on the underlying Deep Learning architectures.

Experimental Setup
Before training the three CNN architectures, for each database, it was necessary to classify the fingerprints into four, five, and eight classes. The prepared databases were randomly split into two partitions: (i) training set (70% of the whole database); (ii) test set (30% of the whole database). Of course, depending on the specific trained/tested network, the size of the database images was adapted to meet the requirements in Table 2 (i.e., input size). These data allowed us to train each CNN to classify each database according to the three taxonomies illustrated in Section 2.3. In total, 18 different training sessions were carried out (3 CNN architectures × 2 fingerprint database × 3 classification tasks) corresponding to about 61 h of training (Figures 4 and 5).      Considering that for a biometric system it is more important not to include false positives, Precision was used as a metric to assess the performance of the classification, obtained by dividing the number of fingerprints correctly classified as class C, by the number of all the fingerprints classified as class C. Let TP and FP be the True Positives and the False Positives, respectively, the Precision can be defined according to Equation (1): Finally, to confirm and corroborate the comparisons among the investigated CNNs and databases, the McNemar test on paired authentication result was performed [54]. This non-parametric test on paired nominal data assesses whether the precision achieved by two classification models is statistically different. This test uses the following null hypothesis: the predicted class labels from the two compared models have equal precision for predicting the ground-truth class labels. Since this statistical analysis involves multiple comparison tests, we adjusted the p-values using the Bonferroni-Holm method [55]. In all the tests, a significance level of α = 0.05 was used.

Fingerprint Classification Tool
The development and implementation work led to a user-friendly application, equipped with a GUI, which allows us to classify the fingerprints by using the three CNNs. This software application facilitated various operations for the network performance analysis, comparison of the results based on the database used and evaluations of the performance of the CNNs on individual fingerprints. In particular, the implementation of the GUI allows for easily using the classifiers implemented and analyzing the results obtained in the classification for each of the 18 combinations that can be obtained by selecting the CNN architecture, the fingerprint database and the number of classes. The GUI is displayed in Figure 6. It offers three panels on which it is possible to carry out the planned activities.
The development and implementation work led to a user-friendly application, equipped with a GUI, which allows us to classify the fingerprints by using the three CNNs. This software application facilitated various operations for the network performance analysis, comparison of the results based on the database used and evaluations of the performance of the CNNs on individual fingerprints. In particular, the implementation of the GUI allows for easily using the classifiers implemented and analyzing the results obtained in the classification for each of the 18 combinations that can be obtained by selecting the CNN architecture, the fingerprint database and the number of classes. The GUI is displayed in Figure 6. It offers three panels on which it is possible to carry out the planned activities. Figure 6. GUI of the implemented fingerprint classification tool. Figure 6. GUI of the implemented fingerprint classification tool.
On the left side of the GUI there are three panels, each of which groups a specific set of functionalities: (i) Preloaded Trained Neural Net, (ii) Single Image Classification by Database, and (iii) Single Image Classification by Neural Net.
The first panel "Preloaded Neural Net" refers to the training results that are stored at the end of each training. The GUI enables a complete outline of the performance for each CNN (for each of the three fingerprint databases and for the three different classification types). The tool generates a confusion matrix that returns a representation of the statistical classification performance (Figure 7).
The confusion matrix allows us to assess the performance of the network and how the network responds. The abscissa axis ("Actual Classes") indicates the real class of the fingerprint, while the ordinate axis ("Predicted Classes") denotes the classifications obtained from the CNNs. The elements that have been correctly classified are placed in the main diagonal of the matrix and can be identified in the green cells. For each specific class, each cell displays the number of correctly classified images.
The "Single Image Classification by Database" and "Single Image Classification by Neural Net" panels allow us to classify an image, leaving the user the right to choose the database, the CNN, and the classification type (Figure 8). In the first case (database selection), the results are: (i) the performance that each neural network obtained for the loaded image, (ii) the neural network that obtained the highest accuracy value, (iii) the type of classification selected, and (iv) the class of membership along with the chosen database. In the second case (selection of the neural network and the type of classification), the application autonomously chooses the best CNN and shows: (i) the performance that have been achieved in the three databases, (ii) the database that has obtained a higher performance, and (iii) the class assigned to the fingerprint. statistical classification performance (Figure 7).
The confusion matrix allows us to assess the performance of the network and how the network responds. The abscissa axis ("Actual Classes") indicates the real class of the fingerprint, while the ordinate axis ("Predicted Classes") denotes the classifications obtained from the CNNs. The elements that have been correctly classified are placed in the main diagonal of the matrix and can be identified in the green cells. For each specific class, each cell displays the number of correctly classified images. The "Single Image Classification by Database" and "Single Image Classification by Neural Net" panels allow us to classify an image, leaving the user the right to choose the database, the CNN, and the classification type ( Figure 8). In the first case (database selection), the results are: (i) the performance that each neural network obtained for the loaded image, (ii) the neural network that obtained the highest accuracy value, (iii) the type of classification selected, and (iv) the class of membership along with the chosen database. In the second case (selection of the neural network and the type of classification), the application autonomously chooses the best CNN and shows: (i) the performance that have been achieved in the three databases, (ii) the database that has obtained a higher performance, and (iii) the class assigned to the fingerprint.

Experimental Results
This section provides a detailed comparison of the results obtained and corroborated by the statistical analysis based on McNemar tests. The performance of each CNN classi-

Experimental Results
This section provides a detailed comparison of the results obtained and corroborated by the statistical analysis based on McNemar tests. The performance of each CNN classifier is shown as the database and the number of classes vary.

Classification Evaluation
Each CNN was tested on different classification types. Table 5 shows the performance on four-, five-and eight-class classification. In four-class classification, the highest precision was achieved by AlexNet with the NIST database, with 96.85%. Considering the PolyU database, the CNN that achieved the best results was ResNet with 99.6528%. On five-class classification, it is worth noting that, similar to the four-class classification for NIST, the best network is AlexNet with 96.0500%. Conversely, for the PolyU database, the networks with the highest precision index were AlexNet and GoogleNet that reached 99.7917%. Finally, in eight-class classification, it is observed that AlexNet predominated on the NIST database obtaining 93.75%, while GoogLeNet obtained the best result with 99.5833% on the PolyU database. Table 6 shows the p-values of the McNemar test on the pairwise comparisons of the classification results achieved by the investigated CNNs, considering the experiments described in Section 4.4. Although AlexNet consistently achieved the best performance on the NIST database, statistical significance over GoogLeNet and ResNet was observed only on the eight-class classification. On four-class and five-class classification, AlexNet and GoogLeNet showed comparable performance and significantly outperformed ResNet. Considering the PolyU database, no statistical difference was observed in any of the experiments performed. The average precision values of the networks were also determined: In Table 7 the precision data are determined by calculating the average of the precision that each network obtains in the three classifications. It has been observed that, considering a single data set composed of NIST and PolyU databases, AlexNet is the most accurate on all three classification types.  Table 8 reports the average precision values obtained by the CNNs on each database, averaged on all three classification types. Analysing the accuracies, it is worth to note that AlexNet and GoogLeNet both reach the higher precision in NIST and PolyU, respectively. The variations of the performance achieved by the three investigated architectures were evaluated as the classification type and database varied.
In particular, we evaluate the loss of average precision on both databases, when switching from one classification type to another. Going from a four-class to a five-class classification, the network that loses the least percentage points is ResNet, with a loss of 0.26%. Taking into consideration the variation in precision between from five-class to eight-class classification, it is always ResNet that shows a lower difference which is equal to 1.29%. The transition from four-class to eight-class classification introduces a loss factor of 0.88% with GoogleNet, which represents the lowest of the three networks.
Similarly, we can evaluate the loss of average precision on all three types of classification, as we move from one database to another. Considering the transition from the PolyU database to the NIST database, it is ResNet that prevails with a lower loss factor, corresponding to 4.06% followed by AlexNet with 4.61%.

Training Times
For the comparison of the performance of the CNNs, training times were evaluated. AlexNet has a simplified architecture that results in shorter training times. GoogLeNet and ResNet having a higher number of levels, require more time to process the data; to give consistency to processing times and performance, some levels were "freezing" (as discussed in Section 4.3) and the training options were tuned. Table 9 reports the training times necessary for each neural network, associated with each database, making classifications in four-, five-and eight-class, respectively. As expected, AlexNet is the network that requires the lowest training times in any type of classification. GoogLeNet and ResNet achieve different results which vary according to the type of classification and the database: in the case of the classification into four-class, ResNet is always preceded by GoogLeNet, regardless of the database being processed. This also happens by classifying CASIA fingerprints into five-and eight-class. For the fingerprints belonging to the PolyU and NIST, classified into five-and eight-class, GoogLeNet that follows ResNet during training times.

Discussion and Conclusions
The aim of this work was to extensively compare the performance of different CNN architectures for the fingerprint classification problem and provide indications regarding the most suitable network to classify the fingerprints coming from different databases and according to different well-established taxonomies (i.e., four-, five-and eight-class classification).
Considering the average performance in the different classification types (Tables 7 and 8), our experimental findings can be summarized as follows: • AlexNet and ResNet achieved equivalent performance in four-and five-class classifications on the NIST database, by significantly outperforming GoogLeNet; • AlexNet performed significantly the best in the eight-class classification; • all the investigated CNN-based architectures achieved comparable performance in the case of high-quality images (such as the PolyU database).
From a computational perspective, AlexNet requires the lowest amount of training time, with a~3× reduction of training times compared to GoogleNet and ResNet. This aspect is important if we consider that not always the latest generation GPUs are available to train the CNN architectures.
Considering the p-values obtained by the McNemar test (Table 6) for the pairwise comparison of the classification results achieved by the investigated CNNs, we can argue that AlexNet is overall the best performing network.
In [49], the performance of two versions of VGG Net was analysed in the classification into four classes of the fingerprints of the NIST database, thus reaching an accuracy of 95.05% withVGG-S Net. The accuracy value obtained in this work, in the same context, is equal to 91.67%. The difference of 3.38% is mainly attributable to the different training options. In particular, in the aforementioned publication, the number of epochs was set to 140, introducing a considerable increase of the training time (30 h compared to about 5 h on average with the networks chosen for this paper).
Extending the work carried out in [49], we decided to tackle the problem of fingerprint classification, using two distinct databases (PolyU and NIST) and three different neural networks (namely, AlexNet, GoogLeNet, and ResNet). The three analyzed CNNs reached excellentperformance: average precision values higher than 98% were achieved on the PolyU database (Table 9).
Analysing the obtained results, it is clear that the CNN performance is strongly influenced by the quality of the images. For the PolyU database, an innovative technique was used that does not require contact of the finger with the sensor, obtaining images with a higher resolution, therefore the results with this database are to be considered decisive in determining whether it is convenient to perform a classification into eight, five, or rather four classes.
With more details, regarding the PolyU database and AlexNet, the classification into four and eight classes (Table 5) showed an identical percentage of precision, equal to 99.61%. With the other two CNNs, the results followed almost the same trend. Considering the totality of the data set (PolyU and NIST databases), processed with AlexNet, the passage from a classification into four classes to one in five classes, introduced a decrease in performance of 0.10%, while a classification into eight classes reduced the precision by 5.5%.
Considering ResNet and GoogLeNet, the results are once again comparable to those just described, with minor variations, which are attributable to the two networks are characterized by a more complicated architecture. These architectures have been simplified, eliminating some of the least significant layers to make training and classification times as homogeneous as possible compared to AlexNet, which is composed of fewer levels.
Following these considerations, we could argue that AlexNet is the most suitable for solving eight-class fingerprint classification problems. In fact, a classification of this type does not significantly reduce the precision, compared to a classification into a smaller number of classes. In the event that a fingerprint recognition should be performed-for example, in an application that works in real-time-considering eight-class classification, rather than four or five, will introduce a higher reduction in the number of comparisons to be put into practice. Given that the extension from four classes to eight classes is mainly performed through a sub-classification of Whorl fingerprints, which occur with a frequency of about 28%, the time of recognition of the fingerprint would be considerably reduced.
For future developments, it would be interesting to train the CNNs from scratch, without using the freezing of the layers and investigate how the "simplified" transfer learning approach has affected the classification capacity of the architectures. In this way, the CNN-based classification could produce even higher performance. However, this might be possible only replying upon large-scale fingerprint classification databases. The availability of a high-performance computer, equipped with a GPU that supports at least CUDA 7.5, would allow for greater computing resources and leave the structures of the CNNs unchanged to carry out a new retraining of the networks in a short time.
In this work, we aimed at showing how off-the-shelf CNNs can perform fingerprint classification by considering a high number of experiments. Hyper-parameter tuning, by using a grid-search or metaheuristics on the various settings (e.g., filter size, strides, activation functions), would be interesting for future work [56]. For this reason, we preferred to keep the CNN hyper-parameters and settings as in the original implementations; indeed, we focused on conducting a considerable number of experiments by considering all the combinations among three CNNs, two datasets, and three distinct classification tasks. Therefore, an exhaustive hyper-parameter optimization would affect the validity across the different experimental configurations. With reference to filter size, obtaining empirical evidence involves an expensive training cost, especially for particularly deep networks [57]. Due to the fine-grained details of fingerprints (e.g., minutiae [58]), we did not consider strided convolutions to preserve the information conveyed by the original input images. Finally, it could be interesting to use other fingerprint databases to further assess the classification and generalization capabilities of CNN architectures.
As a further improvement of this study, it could consider other data-augmentation techniques using also other geometric transformations (e.g., cropping, rotation, stretching) and histogram-based operations. Moreover, Generative Adversarial Network (GAN)-based data augmentation might be investigated, in terms of both image enhancement [59] and geometrical transformations [60], for classification and detection tasks [61]. The preparation of adversarial examples-by applying small modifications to the original images that are close to the decision boundaries learned by a classifier [62]-might affect Deep Learning based fingerprint recognition systems [63] and authentication [64].