Automatic Fingerprint Classification Using Deep Learning Technology (DeepFKTNet)

Fingerprints are gaining in popularity, and fingerprint datasets are becoming increasingly large. They are often captured utilizing a variety of sensors embedded in smart devices such as mobile phones and personal computers. One of the primary issues with fingerprint recognition systems is their high processing complexity, which is exacerbated when they are gathered using several sensors. One way to address this issue is to categorize fingerprints in a database to condense the search space. Deep learning is effective in designing robust fingerprint classification methods. However, designing the architecture of a CNN model is a laborious and time-consuming task. We proposed a technique for automatically determining the architecture of a CNN model adaptive to fingerprint classification; it automatically determines the number of filters and the layers using Fukunaga–Koontz transform and the ratio of the between-class scatter to within-class scatter. It helps to design lightweight CNN models, which are efficient and speed up the fingerprint recognition process. The method was evaluated two public-domain benchmark datasets FingerPass and FVC2004 benchmark datasets, which contain noisy, low-quality fingerprints obtained using live scan devices and cross-sensor fingerprints. The designed models outperform the well-known pre-trained models and the state-of-the-art fingerprint classification techniques.


Introduction
A person can be recognized in security systems by a unique username and password, but they can be readily stolen [1]. The fingerprint is one of the first imaging modalities of biometric identification. It is more accurate and less expensive than other biometric modalities [2,3]. A fingerprint's surface has ridges and valleys, which do not change during a lifetime [4]. Fingerprint recognition can be used for authentication or identifying purposes. In verification, the fingerprint is compared to the templates of a particular subject in the database, but in identification, the unknown fingerprint is compared to the templates of all subjects in the database to ascertain the subject's identity [5]. Fingerprints are gaining in popularity and their datasets are becoming increasingly large. They are recorded utilizing a variety of low-cost embedded sensors in smart devices such as smartphones and computers. The high processing complexity of a fingerprint identification system is one of its primary drawbacks. One way to address this issue is to categorize fingerprints in a database to condense the search space. The existing classification methods are effective when fingerprints are recorded using the same sensor. However, when fingerprints are collected using various sensors (referred to as cross-sensor or sensor interoperability problem), classification performance is deteriorated; even verification of the same person's finger is degraded [6][7][8]. While considerable research has been conducted on cross-sensor fingerprint verification [8][9][10][11][12], there has been no study on cross-sensor fingerprint classification, which motivates us to work on this topic. the architecture of a CNN model using the fingerprints dataset. To begin, we use the LGDBP description Saeed, et al. [36] and K-medoid clustering algorithm [37] to choose representative fingerprints, and then we derive the layers filters using Fukunaga-Koontz Transform (FKT) [38]. To control the depth of a CNN model, we compute the ratio between traces of between-class scatter matrix S b and within-class scatter matrix S w .
The proposed fingerprint CNN classification system was evaluated against the stateof-the-art fingerprint classification schemes utilizing the benchmark multi-sensor datasets FingerPass and FVC2004. Specifically, the contributions of this work are as follows: • We developed an efficient automatic method for classifying cross-sensor fingerprints based on a CNN model.

•
We proposed a technique for the custom-designed building of a CNN model, which automatically determines the architecture of the model using the class discriminative information from fingerprints. The layers and their respective filters of an adaptive CNN model are customized using FKT, and the ratio of the traces of the between-class scatter matrix, and the within-class scatter matrix.

•
We thoroughly evaluated the proposed method on two datasets. The proposed fingerprint classification scheme is quick, accurate, and performs well with noisy fingerprints obtained using live scan devices as well as cross-sensor fingerprints.
The rest of the paper is organized as follows. Section 2 presents the details of the proposed technique. The experimental results have been given in Section 3. Section 4 discusses the performance of the proposed method in detail. Section 5 concludes the article.

Proposed Method
The convolutional neural network (CNN) is one of the most widely used and popular deep learning networks [39]. Its general structure comprises different types of layers, including the CONV layer with different filters, pooling layer, activation function layer, fully connected layer, and loss function [40]. It has been used for a wide range of tasks, including image and video recognition [41], classification of images [42], medical image analysis [43], computer vision [44], and natural language processing [45].
Many advancements in CNN learning methods and architecture have a place, allowing the network to handle larger, diverse, more complicated, and multiclass issues [46]. Following AlexNet's outstanding performance on the ImageNet dataset in 2012, many applications used CNNs [47]. A layer-wise representation of CNN reversed the trend toward extraction of features at low spatial resolution in deep architecture, as achieved in VGG [48]. Most modern architectures follow VGG's simple and homogeneous topology idea. The Google deep learning group introduced the divide, transform, and merge concept with the inception block. The inception block introduced the concept of branching within a layer, allowing for feature abstraction at various spatial scales [49]. Skip connections, developed by ResNet [50] for deep CNN training, gained popularity in 2015. Others, like Wide ResNet, are exploring the influence of multilevel transformations on CNN's learning capacity by increasing cardinality or widening the network [51]. So, the research turned from parameter optimization to network architecture design. Thus, new architectural concepts like channel boosting, spatial and feature-map exploitation, and attention-based information processing emerged [52]. The main issue in the design of CNN models is to tune the architecture of CNN for a specific application.

Problem Formulation
The fingerprints are categorized into four types: arch, left loop, right loop, and whorl. Identifying the type of a fingerprint is a multiclass classification problem. Let there be N subjects, and K fingerprints are captured from each subject with M different sensors; these fin- where F s ij represents the ith fingerprint of the jth subject captured with sth sensor, be the set of fingerprints, and C ={1, 2, . . . , C}, where C is the number of classes, be the set of Mathematics 2022, 10, 1285 4 of 17 fingerprint labels (classes). The problem of predicting the type of a fingerprint F s ij is to build a function ψ : F → C that takes a fingerprint F s ij ∈ F and assigns it a label c ∈ C, i.e., ψ F s ij ; θ = c, where θ are the parameters. We design the function ψ using a CNN model, in this case θ represents the weights and biases of the model. The model is built adaptively. Its design process is shown in Figure 1, and the detail is given in the rest of the section.

Selection of Representative Fingerprints
We extract discriminative information from fingerprints to specify the CONV layers and the depth of a CNN model adaptively. To do this, we cluster the training set to identify the most representative fingerprints of each class. For determining the representative fingerprints, discriminative features from fingerprints are extracted using the LGDBP descriptor [36] K-medoids [37] is used for clustering since it selects the instances as cluster centers and is suitable for finding the representative subset of the training set. The fingerprints corresponding to the cluster centers are chosen as the representative subset. The number of clusters for each class in the K-medoids algorithm is specified using the silhouette analysis [54]. Using this procedure, we select the set X = {X1, X2, …, XC}, where Xi = {RFj, j = 1, 2, 3, …, ni} is the set of representative fingerprints of ith class.

Design of the Main DeepFKTNet Architecture
The architectures of the state-of-the-art CNN models are usually not drawn from the data and are fixed and highly complex. On the contrary, we define a data-dependent architecture of DeepFKTNet. Its primary architecture is based on the answers to two questions: (i) how many CONV layers should be in the model and (ii) how many filters must be in each layer. These questions are addressed by an iterative algorithm that computes the number of filters in a CONV layer, adds it iteratively to the model, and terminates

Adaptive CNN Model
The main constituent of a CNN model is a convolutional (CONV) layer. It extracts discriminative features from the input signal, applying convolution operation with filters of fixed size. CONV layers are stacked in a CNN model to extract a hierarchy of features. The number of filters in each CONV layer and the number of CONV layers in a CNN model are hyper-parameters, and finding the best configuration of a model for a specific application is a hard optimization problem; it entails the search of huge parameter space. In addition, the initialization of learnable parameters of a CNN model has a significant effect on the performance of the model when it is trained with an iterative optimization algorithm like Adam optimizer. Leveraging the discriminative content of fingerprints, we propose a simple method to find the best configuration of the model adaptively. Initially, we select the representative fingerprints from each type to guide the design process of a CNN model. The discriminative information in these fingerprints is used to determine the width (the number of filters) of each CONV layer and the depth (the number of CONV layers) of the model; it is also used for data-dependent initialization of the filters of CONV layers. An overview of the design process is shown in Figure 1. We employ clustering to select the representative fingerprints, the Fukunaga-Koontz Transform (FKT) [38], which exploits class-discriminative information, to determine the number of filters in a CONV layer, and the ratio of the between-class scatter matrix S b to the within-class scatter matrix S w to adjust the depth (i.e., the number of CONV layers) of the CNN model. Finally, to minimize the number of learnable parameters and avoid overfitting, global pooling layers are introduced. By decreasing the resolution of the feature maps, the pooling layer seeks to achieve shift-invariance, and the pooling layer's feature map is linked directly to SoftMax [53]. The design process is worked out in detail and discussed in the following subsections, and its overview is shown in Figure 1.

Selection of Representative Fingerprints
We extract discriminative information from fingerprints to specify the CONV layers and the depth of a CNN model adaptively. To do this, we cluster the training set to identify the most representative fingerprints of each class. For determining the representative fingerprints, discriminative features from fingerprints are extracted using the LGDBP descriptor [36] K-medoids [37] is used for clustering since it selects the instances as cluster centers and is suitable for finding the representative subset of the training set. The fingerprints corresponding to the cluster centers are chosen as the representative subset. The number of clusters for each class in the K-medoids algorithm is specified using the silhouette analysis [54]. Using this procedure, we select the set X = {X 1 , X 2 , . . . , X C }, where X i = {RF j , j = 1, 2, 3, . . . , n i } is the set of representative fingerprints of ith class.

Design of the Main DeepFKTNet Architecture
The architectures of the state-of-the-art CNN models are usually not drawn from the data and are fixed and highly complex. On the contrary, we define a data-dependent architecture of DeepFKTNet. Its primary architecture is based on the answers to two questions: (i) how many CONV layers should be in the model and (ii) how many filters must be in each layer. These questions are addressed by an iterative algorithm that computes the number of filters in a CONV layer, adds it iteratively to the model, and terminates when a criterion is satisfied. We use the discriminative structural information embedded in fingerprints to determine the number of filters in a CONV layer and their initialization. The detail is given in Algorithm 1. We discuss the algorithm with motivation in the following paragraphs.
Initially, the set X = {X 1 , X 2 , . . . , X C } is used to determine the number of filters of the first CONV layer and initialize them. Inspired by the filter size of the first CONV layer in the state-of-the-art CNN models like ResNet [50], DenseNet [55], and Inception [49], we fixed the size of filter size of the first layer to 7 × 7. We extract patches of size w × h from the representative fingerprints (steps 2-3 of Algorithm 1) and formulate the problem of determining the filters ( f i , i = 1, 2, . . . N) as finding the optimal projection direction vectors u i , i = 1,2, . . . d, which are determined by solving the following optimization problem: where S b and S w are the between-class and within-class scatter matrices (as computed in step 4 of the Algorithm 1). According to Fukunaga Koontz Discriminant Analysis (FKT) [38], the optimal projection direction vectors u i are the eigenvectors ofŜ b i.e., whereŜ b = P T S b P, P = QD −1/2 and Q & D are obtained by the diagonalization of the sum S b + S w i.e., S b + S w = QDQ T (steps 5-6 of Algorithm 1). The Equation (2) gives the optimal vectors, which simultaneously maximize tr U T S b U and minimize tr U T S w U . Unlike Linear Discriminant Analysis (LDA) [56], the inversion of S w is not needed in this approach, so it can tackle very high-dimensional data. Additionally, this approach seeks to find optimal vectors that are orthogonal. As the dimension of the patch vectors b i related to the intermediate CONV layers is usually very high, and we need filters that are independent, so this approach is suitable for our design process. The problem of selecting the number of filters in the convolutional layer is to select the eigenvectors u k , k = 1, 2, . . . L so that the ratio γ k = Trace(SF b ) Trace (SF w ) attains maximum value. Here the between-class scatter matrix SF b and within-class matrix SF w are computed for each u k by projecting all activations a i j in the space spanned by u k (steps 7-8 of the Algorithm 1). It ensures to select the filters which extract discriminative features. After selecting u k , k = 1, 2, . . . L, the CONV block with L filters f k , k = 1, 2, . . . , L initialized with u k is introduced in DeepFKTNet. Then, a pooling layer is added if needed (step 8-10 of the Algorithm 1).
Using the current architecture of DeepFKTNet, the set of activations Z = {Z 1 , Z 2 , . . . , Z C } of X = {X 1 , X 2 , . . . , X C } is computed. These activations are used to determine whether to add more layers to the net. It is decided by calculating the trace ratio TR = Trace(S b ) Trace (S w ) , where S b and S w are the between-class and within-class scatter matrices of the activations Z. If TR is greater than the previous TR (PTR), it means that the addition of the current block of layers introduced the discriminative potential to the network. This criterion ensures that the features generated by DeepFKTNet have large inter-class variation and small intra-class scatter. To add another CONV block, the steps 3-10 are repeated with Z. To reduce the size of feature maps for computational effectiveness, pooling layers are added after the first and second CONV blocks.
As the kernels and their number are determined from the fingerprint images, each layer can have a different number of filters.
It is to be noted that the eigenvector u k , which are used to specify the kernels of a CONV layer, have the maximum γ k and capture most of the variability in input fingerprint images without redundancy in the form of independent features. The depth of a CNN model (number of layers) and the number of kernels for each layer are important factors that determine the model complexity.
Step 7 of Algorithm 1 determines the best kernels that ensure the preservation of maximum energy of the input image, and step 8 initializes these kernels to be suitable for the fingerprint domain. The selected kernels extract the features from fingerprint images so that the variability of the structures in fingerprint images is maximality preserved. It is also important that the features must be discriminative (i.e., have large inter-class variance and small intra-class scatter as we go deeper in the network). It is ensured using the trace ration TR = Trace(Sb) Trace (Sw) , the larger the value of the trace ratio, the larger the inter-class variance and the smaller the intra-class scatter [57].
Step 11 in Algorithm 1 allows adding CONV layers as long as TR is increasing and determines the data-dependent depth of DeepFKTNet, as shown in Figure 2. The set X = {X 1 , X 2 , . . . , X C }, where X i = {RF j , j = 1, 2, 3, . . . , n i } is the set of representative fingerprints of ith class. Output: The main DeepFKTNet Architecture.

Step 6:
Compute eigenvectors u k , k = 1, 2, . . . , D ofŜ b such thatŜ b u = λu Step 7: For each eigenvector . . , n i -Compute the between scatter matrix SF b and within scatter matrix SF w from Y.

Step 10:
If m = 1 or 2, add a max pool layer with pooling operation of size 2 × 2 and stride 2 to Deep FKTNet.

Step 11:
Compute , for each RF j ∈ X i Step 12: Using If PTR ≤ TR, set PTR = TR, w = 3, h = 3, d = L and go to Step 3, otherwise stop. It is to be noted that the eigenvector , which are used to specify the kernels of a CONV layer, have the maximum and capture most of the variability in input fingerprint images without redundancy in the form of independent features. The depth of a CNN model (number of layers) and the number of kernels for each layer are important factors that determine the model complexity.
Step 7 of Algorithm 1 determines the best kernels that ensure the preservation of maximum energy of the input image, and step 8 initializes these kernels to be suitable for the fingerprint domain. The selected kernels extract the features from fingerprint images so that the variability of the structures in finger-

Addition of Global Pool and Softmax Layers
Activation of the last CONV block is with dimension h × w × L, and after flattening, it is fed to FC layers; the number of parameters is huge and leads to overfitting. To reduce the number of parameters and spatial dimensions of the last CONV block activation, we feed it to global average pooling (GAP) and global max-pooling (GMP) layers [58]. The GAP average all the hw values, whereas the GMP takes into account the contributions of the neurons of maximum response; the number of neurons in the FC layer is h × w × L, and it is reduced to 1 × 1 × L when only GMP or GAP is introduced. We concatenate the output of GMP and GAP layers to overcome the shortcoming of each and then feed it to the FC layer, followed by the SoftMax layer.

Fine-Tuning the Model
The DeepFKTNet model is evaluated using the challenge multisensory FingerPass dataset [59], and it is compared to the well-known deep models: ResNet [50] and DenseNet [55] pre-trained on the ImageNet dataset and fine-tuned using the same dataset as DeepFKTNet. For further validation, we evaluated our method using the challenge FVC2004 dataset [60] and compared it to the state-of-the-art methods. For each dataset, we select the most representative fingerprint images from the training set using K-medoids and LGDBP descriptor and then built its adaptive DeepFKTNet architecture using Algorithm 1.

Datasets and the Adaptive Architectures
To verify the performance of the DeepFKTNet model on benchmark datasets, we used FingerPass and FVC2004 datasets. The FingerPass is a multi-sensor dataset; it was collected using nine different optical and capacitive sensors and two interaction types, i.e., press and sweep. The FingerPass contains a total of fingers separated into nine subsets based on sensors; each subset contains 12 impressions of 8 fingers from 90 persons.
FVC2004 dataset contains noisy images acquired by live scan devices. It has 4 sets: DB1 collected using optical V300 sensor, DB2 collected using optical U 4000, DB3 collected using thermal sweeping sensor, and DB4 is a synthetic fingerprint dataset. Each one contains 880 fingerprint images [60]. We categorized FVC2004 fingerprints into four categories: arch, left loop, right loop, and whorl. We merge the 4 sets of FVC2004 into one set of four classes; it is now a multi-sensor fingerprint dataset.
To setup best parameters for each DeepFKTNet model, the hyperparameter optimization software framework Optuna [61] is used to select the best hyperparameters for fine-tuning the DeepFKTNet model. Using Algorithms 1, the DeepFKTNet architecture obtained for the FVC2004 dataset consists of 5 CONV blocks, as shown in Figure 3a, whereas the architecture constructed for the FingerPass dataset has11 blocks, as depicted in Figure 3b. The number of filters for each CONV block and the depth of each model for each fingerprint dataset are determined using Algorithm 1. Using the Optuna optimization algorithm, we fine-tuned the hyperparameters and tested three optimizers (Adam, SGD, and RMSprop), learning rate between 1 × 10 −1 , and 1 × 10 −5 , patch size (5,10,15,20,30,50), activation functions (Relu, LRelu, and Sigmoid), and dropout between 0.25 and 0.50. After training for 10 epochs, the best hyper-parameters for each dataset are shown in Table 1.

Evaluation Procedure
For evaluation, we manually separated the FingerPass dataset into four classes (arch, left loop, right loop, and whorl). We divided the FingerPass dataset into three sets (80% training, 10% validation, and 10% testing) using two different scenarios. In scenario-1, the fingers from each sensor were divided into training, validation, and test sets. In scenario-2, fingers in the training, validation, and test sets are from different sensors.
For the FVC2004 dataset, we divided the dataset into training (80%), validation (10%), and testing (10%), keeping the balance. For performance evaluation, we used four commonly used metrics: accuracy (ACC), true positive rate (TPR), true negative rate (TNR), and Kappa [62][63][64][65]. The overall average of metrics has been computed. The used metrics [66,67] to evaluate the proposed system are: where TP, TN, FP, and FN are the numbers are true positives, true negatives, false positives, and false negatives; PO and PE are calculated from the confusion matrix; the detail is given in [68]. To compute TP, TN, FP, and FN, one class, in turn, is taken as positive, the other classes are assumed to be negative, and the TPR and TNR are calculated. Finally, mean TPR and TNR are calculated by averaging TPR and TNR over all classes. In the results, the mean TPR and TNR are reported.

Experimental Results
This section presents the experimental results of the DeepFKTNet models designed for the two datasets.
We designed the DeepFKTNet model for each dataset and fine-tuned it using the training sets. We validated its performance on FingerPass and FVC2004 datasets and compared it with the widely used CNN models ResNet [50] and DenseNet [

Evaluation Procedure
For evaluation, we manually separated the FingerPass dataset into four classes (arch, left loop, right loop, and whorl). We divided the FingerPass dataset into three sets (80% training, 10% validation, and 10% testing) using two different scenarios. In scenario-1, the fingers from each sensor were divided into training, validation, and test sets. In scenario-2, fingers in the training, validation, and test sets are from different sensors.
For the FVC2004 dataset, we divided the dataset into training (80%), validation (10%), and testing (10%), keeping the balance. For performance evaluation, we used four commonly used metrics: accuracy (ACC), true positive rate (TPR), true negative rate (TNR), and Kappa [62][63][64][65]. The overall average of metrics has been computed. The used metrics [66,67] to evaluate the proposed system are: where TP, TN, FP, and FN are the numbers are true positives, true negatives, false positives, and false negatives; P 0 and P e are calculated from the confusion matrix; the detail is given in [68]. To compute TP, TN, FP, and FN, one class, in turn, is taken as positive, the other classes are assumed to be negative, and the TPR and TNR are calculated. Finally, mean TPR and TNR are calculated by averaging TPR and TNR over all classes. In the results, the mean TPR and TNR are reported.

Experimental Results
This section presents the experimental results of the DeepFKTNet models designed for the two datasets.
We designed the DeepFKTNet model for each dataset and fine-tuned it using the training sets. We validated its performance on FingerPass and FVC2004 datasets and compared it with the widely used CNN models ResNet [50] and DenseNet [55], which were pre-trained on the ImageNet dataset and fine-tuned on the same training set that was used for the DeepFKTNet model. In the rest of the paper, we name the DeepFKTNet models as DeepFKTNet-11 and DeepFKTNet-5, designed for the FingerPass and the FV2004 datasets, respectively.
The results of the three models DeepFKTNet-11, ResNet152, and DenseNet121 for scenario-1 are shown in Figure 4a and Table 2a. The DeepFKTNet-11 model generated adaptively on the FingerPass dataset outperforms the state-of-the-art ResNet152 and DenseNet121 models in terms of all metrics. Though DenseNet121 is not better than DeepFKTNet-11, it outperforms ResNet152 in terms of all metrics. Figure 4b and Table 2b show the results for scenario-2 on the FingerPass dataset. In this scenario, the results obtained with the DeepFKTNet-11 are almost similar to those obtained in scenario-1. The DeepFKTNet-11 outperforms ResNet152 and DenseNet121. Figure 5 illustrates the confusion matrices for both scenarios. These give insights into the system performance for different classes. pre-trained on the ImageNet dataset and fine-tuned on the same training set that was used for the DeepFKTNet model. In the rest of the paper, we name the DeepFKTNet models as DeepFKTNet-11 and DeepFKTNet-5, designed for the FingerPass and the FV2004 datasets, respectively. The results of the three models DeepFKTNet-11, ResNet152, and DenseNet121 for scenario-1 are shown in Figure 4a and Table 2a. The DeepFKTNet-11 model generated adaptively on the FingerPass dataset outperforms the state-of-the-art ResNet152 and DenseNet121 models in terms of all metrics. Though DenseNet121 is not better than Deep-FKTNet-11, it outperforms ResNet152 in terms of all metrics. Figure 4b and Table 2b show the results for scenario-2 on the FingerPass dataset. In this scenario, the results obtained with the DeepFKTNet-11 are almost similar to those obtained in scenario-1. The Deep-FKTNet-11 outperforms ResNet152 and DenseNet121. Figure 5 illustrates the confusion matrices for both scenarios. These give insights into the system performance for different classes.     The DeepFKTNet-5 model was adaptively designed for the challenge FVC2004 dataset; it was evaluated using the above evaluation procedure. We fine-tuned the developed DeepFKTNet-5 model and the pre-trained models ResNet152 and DenseNet121 using the same dataset. The results are shown in Figure 6; the DeepFKTNet-5 model outperforms the state-of-the-art ResNet152 and DenseNet121 models in terms of all metrics. Figure 7 illustrates the confusion matrices for the FVC2004 dataset. These give insights into the system performance for different classes.

Discussions
We addressed the multi-sensor fingerprint classification problem and proposed a novel method for automatically generating a custom-designed DeepFKTNet model from the target fingerprint dataset. The number of layers and filters for each layer are not spec- The DeepFKTNet-5 model was adaptively designed for the challenge FVC2004 dataset; it was evaluated using the above evaluation procedure. We fine-tuned the developed DeepFKTNet-5 model and the pre-trained models ResNet152 and DenseNet121 using the same dataset. The results are shown in Figure 6; the DeepFKTNet-5 model outperforms the state-of-the-art ResNet152 and DenseNet121 models in terms of all metrics. Figure 7 illustrates the confusion matrices for the FVC2004 dataset. These give insights into the system performance for different classes. The DeepFKTNet-5 model was adaptively designed for the challenge FVC2004 dataset; it was evaluated using the above evaluation procedure. We fine-tuned the developed DeepFKTNet-5 model and the pre-trained models ResNet152 and DenseNet121 using the same dataset. The results are shown in Figure 6; the DeepFKTNet-5 model outperforms the state-of-the-art ResNet152 and DenseNet121 models in terms of all metrics. Figure 7 illustrates the confusion matrices for the FVC2004 dataset. These give insights into the system performance for different classes.

Discussions
We addressed the multi-sensor fingerprint classification problem and proposed a novel method for automatically generating a custom-designed DeepFKTNet model from the target fingerprint dataset. The number of layers and filters for each layer are not spec- The DeepFKTNet-5 model was adaptively designed for the challenge FVC2004 dataset; it was evaluated using the above evaluation procedure. We fine-tuned the developed DeepFKTNet-5 model and the pre-trained models ResNet152 and DenseNet121 using the same dataset. The results are shown in Figure 6; the DeepFKTNet-5 model outperforms the state-of-the-art ResNet152 and DenseNet121 models in terms of all metrics. Figure 7 illustrates the confusion matrices for the FVC2004 dataset. These give insights into the system performance for different classes.

Discussions
We addressed the multi-sensor fingerprint classification problem and proposed a novel method for automatically generating a custom-designed DeepFKTNet model from the target fingerprint dataset. The number of layers and filters for each layer are not spec-

Discussions
We addressed the multi-sensor fingerprint classification problem and proposed a novel method for automatically generating a custom-designed DeepFKTNet model from the target fingerprint dataset. The number of layers and filters for each layer are not specified randomly; they are determined from the best representative fingerprints selected using the K-medoids clustering algorithm and LDGBP descriptor from the fingerprint datasets.
The generated DeepFKTNet models are shallower than the state-of-the-art models, robust, involve a small number of learnable parameters, and suitable for fingerprint classification.
The results of the DeepFKTNet models on the FingerPass and FVC2004 datasets (Figures 4 and 6) indicate that they outperform the famous deep models ResNet152 and DenseNet121, which were pre-trained on the ImageNet dataset and fine-tuned using the same fingerprint datasets. The architecture of a DeepFKTNet model is drawn directly from the dataset; the internal structures of the data determine its design. For this reason, the DeepFKTNet model has a compact size and yields better classification results. Further, it does not suffer from the overfitting problem (see Table 3) since it involves a small number of learnable parameters (see Table 4), which is comparable with the number of training examples. If the number of learnable parameters is huge as compared to the training examples, the overfitting problem cannot be avoided. The training and testing accuracies shown in Table 3 indicate that the models do not suffer from overfitting. In addition, DeepFKTNet models are trained using the available training data, and the pre-training is not needed, unlike ResNet152 and DenseNet121. The space complexity of a CNN model is measured in terms of the number of learnable parameters, whereas the number of FLOPS determines its time complexity. Table 4 gives the statistics of the space and time complexities of the models. Overall, the DeepFKTNet model got competitive performance with fewer layers and parameters. The DeepFKTNet models designed for the two datasets have a small number of parameters, in thousands against millions in ResNet152 and DensNet121 models. DeepFKTNet-5 and DeepFKTNet-11 have fewer FLOPs than ResNet152 and DensNet121 and better performance. The DeepFKTNet-11 is relatively more complex than DeepFKTNet-5; the reason is that the FingerPass dataset involves a large number of sensors as compared to the FVC2004 dataset, and there is more variety of patterns in the FingerPass dataset, and to encode the discriminative pattern, more rich structure is needed.
Further, for investigating which features the DeepFKTNet models focus on for decision making, we employed GradCam [69]. Figure 8 shows some heat maps generated with GradCam for DeepFKTNet-11. The fingerprint images from class arches and their GradCam visualizations are shown in Figure 8a For a fair comparison, the DeepFKTNet-5 has been compared with the state-of-theart fingerprint classification methods, which were validated on the benchmark public FVC2004 dataset; the comparison results are given in Table 5.
The DeepFKTNet-5 model outperforms the state-of-the-art methods (handcraft and CNN methods) on the same dataset in terms of accuracy. The method of Jeon et al. [70], despite being a complex ensemble of CNN models, got an accuracy of 97.2%, which is less than that of DeepFKTNet-5. Zia et al. [33] employed B-DCNNs with five convolution layers and two FC layers (with 1024 and 512 neurons) for fingerprint classification and validated on the FVC2004 dataset; it does not yield better accuracy than that of DeepFKTNet-5 (95.3% vs. 98.89%). Its complexity is high; it has more FLOPs (0.65 G vs. 0.5 G) and more learnable parameters (38.66 M vs. 58.456 k). Nguyen et al. [34] employed a two-stage CNN model for enhancing and then training and prediction. They used LBCNN [71] method in the first stage, which has 0.352 M learnable parameters, and then employed a three-ternary model for training and prediction. They got an accuracy of 96.1% based on FVC2004 (three classes), which is less than DeepFKTNet-5. Nahar et al. [35] used a modified LNet-5 model for fingerprint classification; they got 99.1% accuracy but with only a subset (DB1) from FVC2004, whereas the DeepFKTNet-5 model evaluated on the combined multi-sensor dataset of the four datasets (DB1, DB2, DB3, and DB4) from FVC2004. Also, the LNet-5 has a higher number of parameters, 19.25 M and 1.42 G FLOPs vs. 58.456 k and 0.5 G FLOPs of DeepFKTNet-5. The reason for the better performance and less complexity of DeepFKTNet-5 is that it is custom-designed, keeping in view the internal discriminative structures of fingerprints. For a fair comparison, the DeepFKTNet-5 has been compared with the state-of-the-art fingerprint classification methods, which were validated on the benchmark public FVC2004 dataset; the comparison results are given in Table 5.
The DeepFKTNet-5 model outperforms the state-of-the-art methods (handcraft and CNN methods) on the same dataset in terms of accuracy. The method of Jeon et al. [70], despite being a complex ensemble of CNN models, got an accuracy of 97.2%, which is less than that of DeepFKTNet-5. Zia et al. [33] employed B-DCNNs with five convolution layers and two FC layers (with 1024 and 512 neurons) for fingerprint classification and validated on the FVC2004 dataset; it does not yield better accuracy than that of DeepFKTNet-5 (95.3% vs. 98.89%). Its complexity is high; it has more FLOPs (0.65 G vs. 0.5 G) and more learnable parameters (38.66 M vs. 58.456 k). Nguyen et al. [34] employed a two-stage CNN model for enhancing and then training and prediction. They used LBCNN [71] method in the first stage, which has 0.352 M learnable parameters, and then employed a three-ternary model for training and prediction. They got an accuracy of 96.1% based on FVC2004 (three classes), which is less than DeepFKTNet-5. Nahar et al. [35] used a modified LNet-5 model for fingerprint classification; they got 99.1% accuracy but with only a subset (DB1) from FVC2004, whereas the DeepFKTNet-5 model evaluated on the combined multi-sensor dataset of the four datasets (DB1, DB2, DB3, and DB4) from FVC2004. Also, the LNet-5 has a higher number of parameters, 19.25 M and 1.42 G FLOPs vs. 58.456 k and 0.5 G FLOPs of DeepFKTNet-5. The reason for the better performance and less complexity of DeepFKTNet-5 is that it is custom-designed, keeping in view the internal discriminative structures of fingerprints.

Conclusions
We introduced a technique for automatically creating a custom-designed CNN model for multi-sensor fingerprint categorization. Since CNN models contain a large number of parameters and are designed randomly, we used the FKT approach to build a low-cost, high-speed CNN model tailored for the target fingerprint dataset. The developed DeepFK-TNet model is data-dependent, with a distinctive architecture for each fingerprint dataset. DeepFKTNet-11 for the FigerPass dataset and DeepFKTNet-5 for FVC2004 outperform pre-trained deep ResNet152 and DenseNet121 models on identical datasets and assessment processes. The performance, complexity, and number of parameters of the DeepFKTNet models created are substantially fewer than those of ResNet152 and DenseNet. Compared to the state-of-the-art techniques on the FVC2004 dataset, the DeepFKTNet-5 model is simpler in terms of complexity and parameter count and achieves comparable performance. In future work, we will enhance DeepFKTNet to address the problem of cross-sensor fingerprint verification.