Deep Region of Interest and Feature Extraction Models for Palmprint Veriﬁcation Using Convolutional Neural Networks Transfer Learning

: Palmprint veriﬁcation is one of the most signiﬁcant and popular approaches for personal authentication due to its high accuracy and efﬁciency. Using deep region of interest (ROI) and feature extraction models for palmprint veriﬁcation, a novel approach is proposed where convolutional neural networks (CNNs) along with transfer learning are exploited. The extracted palmprint ROIs are fed to the ﬁnal veriﬁcation system, which is composed of two modules. These modules are (i) a pre-trained CNN architecture as a feature extractor and (ii) a machine learning classiﬁer. In order to evaluate our proposed model, we computed the intersection over union (IoU) metric for ROI extraction along with accuracy, receiver operating characteristic (ROC) curves, and equal error rate (EER) for the veriﬁcation task.The experiments demonstrated that the ROI extraction module could signiﬁcantly ﬁnd the appropriate palmprint ROIs, and the veriﬁcation results were crucially precise. This was veriﬁed by different databases and classiﬁcation methods employed in our proposed model. In comparison with other existing approaches, our model was competitive with the state-of-the-art approaches that rely on the representation of hand-crafted descriptors. We achieved a IoU score of 93% and EER of 0.0125 using a support vector machine (SVM) classiﬁer for the contact-based Hong Kong Polytechnic University Palmprint (HKPU) database. It is notable that all codes are open-source and can be accessed online.


Introduction
Biometric-based authentication has been discussed in a wide range of state-of-the-art research in the context of security applications.The increasing amount of industrial and governmental funding on this topic makes it a rapidly growing field.There are several biometric characteristics (e.g., DNA, face, and palmprint) that can be exploited in authentication systems [1].Each one has its own pros and cons, and therefore there is no single optimal characteristic that satisfies all application requirements [2].Consequently, based on the constraints and conditions of the operational mode, one or more biometric characteristics can be applied.There are several advantages that make the palmprint authentication more appropriate for real-world applications [3][4][5][6]: (i) low-cost image acquisition devices; (ii) applicability on both low and high-resolution images; (iii) discriminative features including ridges and creases; (iv) wide region of interest; (v) unique and reliable properties; (vi) hardly affected by age; and (vii) the high user acceptance of palmprint.
Among the existing approaches for palmprint verification, convolutional neural networks (CNNs) outperform the state-of-the-art techniques.CNNs are a class of deep learning concerned with architectures inspired by the structure and function of the brain, called artificial neural networks.Well-known CNN architectures such as Alexnet [7], VGGNet [8], and ResNet [9] have shown excellent performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competitions [10].Although CNN-based approaches effectively represent the biometric and perceptual features from the input data [11,12], there are some challenges in applying this concept to palmprint verification.On one hand, the number of samples is a challenging issue in current palmprint databases since the common CNN techniques need a large amount of input data for the training phase.On the other hand, CNN performance strongly depends on the deep architecture.Due to these issues, using a CNN architecture may result in overfitting in the case of small databases.It should be mentioned that data augmentation techniques can hardly reduce overfitting because of small intra-class variability among palmprint images.
Because it is possible to use a pre-trained CNN to solve new problems faster, a paradigm called transfer learning has appeared [13].Transfer learning is useful when one wants to train a CNN on a dataset, but due to insufficient training data, it is not possible to train a full neural network on that database.Transfer learning refers to the process of taking a pre-trained CNN, replacing the fully connected layers (and potentially the last convolutional layer), and training those layers on the pertinent dataset.By freezing the weights of convolutional layers, a CNN can extract discriminative image features such as edges, and fully connected layers can take this information and use it to classify the data in a way that is pertinent to the problem.By using this concept, we can significantly address the challenges of CNN approaches, including high computational complexity in the training phase and overfitting caused by small palmprint databases.The issue that still remains a challenge in palmprint verification using CNN transfer learning is to track and detect the region of interest (ROI), which is commonly a square region in the center of the palmprint.

Contributions
Motivated by the aforementioned considerations, we address the problem of palmprint verification using a transfer learning approach.Our approach uses low-resolution images, and the main contributions are summarized as follows:

•
To the best of our knowledge, this is the first study that extracts palmprint ROIs by convolutional neural networks.Lack of sample images and small intra-class variability are addressed using transfer learning.

•
We successively apply a pre-trained CNN network to extract discriminative features along with a machine learning classifier to measure the similarity for one-to-one matching.

•
We achieve an intersection over union (IoU) score of 93% for palmprint ROI extraction and equal error rate (EER) of 0.0125 for the verification task using the support vector machine (SVM) classifier for the contact-based Hong Kong Polytechnic University Palmprint (HKPU) database.This can be attributed to the superiority of discriminative deeply learned features over hand-crafted features.

Paper Organization
The remainder of this paper is organized as follows.Section 2 presents a comprehensive literature review.Section 3 presents the problem definition, related assumptions, and overviews the proposed architecture and its main components.The preliminaries are discussed in Section 3.1, and an outline of the proposed model for deep ROI and feature extraction is provided in Section 3.2.Sections 3.3-3.5 detail the proposed ROI extraction, feature representation, and matching modules, respectively.The obtained results are detailed in Section 4. Finally, Section 5 concludes the paper with some final remarks and outlines open research problems.

Related Work
In the following, we briefly discuss the main literature on palmprint verification.We divide the related works into two main categories: (i) ROI extraction and (ii) feature extraction and matching.

ROI Extraction
The ROI extraction phase plays an important role in palmprint verification, since the accuracy of the whole process is highly dependent on ROI extraction, where several findings have been discussed in previous work [14][15][16][17][18][19][20][21][22][23][24][25].In [18], ROI extraction steps consist of (i) skin-color thresholding, (ii) hand valley detection, and (iii) finding palm region.The authors used Chang and Robles's skin-color model in the thresholding phase.In the second and third phases, they proposed the competitive hand valley detection (CHVD) algorithm to locate valley key-points.Thereafter, the output was exploited to locate the ROI.ROI extraction based on skin-color segmentation is used in [19,20].Michael et al. [19] extended the CHVD algorithm to simultaneously track palmprint and knuckle-print ROIs.The most challenging issue in these approaches is poor segmentation for images with a skin-colored background.To address this problem, statistical segmentation models were employed.In this way, Reference [21] exploited active shape model (ASM), while [22,23] proposed their methods based on the active appearance model (AAM).These approaches suffer from the probability of important features loss due to small extracted ROIs.
Zhang et al. [14] presented an ROI extraction technique including several steps.First, they applied a low-pass filter to the original image.Thereafter, boundaries of the gaps between the fingers were obtained using a boundary tracking algorithm.Finally, based on the coordinate system, a fixed-size sub-image was extracted.Similarly, the methods of Connie [15], Han [16], and Badrinath [17] used the finger gaps to extract the ROI.Although these methods are famous and widely used, they depend on the gaps between the fingers as reference points to determine the coordinate system, which means that all fingers must be spread and the hand should be facing toward the camera.To tackle this problem, Ito et al. [24] proposed a five-step method that can be summarized as follows: (i) binarization of the input image, (ii) combination of binarized image and edge, (iii) key point candidate detection using the radial-distance function, (iv) optimal key-point selection, and (v) palm region extraction.This approach suffers from the probability of important features loss due to small extracted ROIs.

Feature Extraction and Matching
So far, many approaches have been proposed for feature extraction and matching tasks that can be categorized as hand-crafted and deep learning descriptors.Comprehensive studies of feature extraction approaches are provided in [26][27][28][29], which are divided into three main categories: holistic-based, local-based, and hybrid methods.
Among the available local feature-based approaches, some have been shown to perform effectively on low-resolution palmprint images.To extract principal lines (PriLine), Malik et al. [30] applied a Radon filter as an edge detector and provided two levels of authentication in order to increase the accuracy.Moreover, there are some coding-based methods, including competitive code (CompCode) [31], double orientation code (DOC) [32], and extended binary orientation co-occurrence vector (E-BOCV) [33], where a bank of Gabor filters was applied to represent the orientation features of palmprint images.The local line directional patterns (LLDP) [34], as a local texture descriptor, also extract the local line directional features by convolving the line filters with a palmprint image such as local line directional patterns-modified finite radon transform (LLDP-MFRAT) and LLDP-Gabor descriptors.
Considering these issues, instead of designing algorithms for hand-crafted descriptors, which can be very time-consuming, feature extraction can be automatically obtained by deep learning descriptors [50].This automatic feature mapping results in high-level features which can hardly be obtained via hand-crafted approaches [51].Another advantage of deep learning descriptors over the hand-crafted ones is that they can be trained with a large number of inputs to become robust against illumination, distortion, translation, and rotation variances.Automatic feature engineering exploits the hierarchical architecture to effectively learn complex models that are highly appropriate for unconstrained conditions (e.g., image background or palm position in the input image) [51].
Motivated by the aforementioned considerations, CNN as the most notable deep learning descriptor has attracted researchers' attention in recent years [5,[52][53][54][55][56].The work of Kumar and Wang [52] explored the matching possibility of left and right palmprint images, and several algorithms were investigated; the best results were obtained by CNN.Although they focus on an interesting topic, their method error rate is especially high for forensic applications.The authors of [5] used a principal component analysis network (PCANet) model [57] in feature extraction and an SVM classifier for palmprint identification.In this way, each spectral band was represented by features extracted by a deep learning technique.Since they used the ROI extraction method described in [14], their approach lacks robustness against illumination, distortion, translation, and rotation variances in input images.For the same reason, the approaches in References [53,54] are not robust against input image deformation.Minaee et al. [55] exploited a scattering network (a convolutional network where the architecture and filters are predefined wavelet transforms) for feature extraction, but no learning was involved in the ROI extraction.In [56], the input images were resized into 28 × 28 pixel images without applying ROI extraction techniques, which led to missing important features.

Proposed Model
Here the proposed model is precisely discussed.Basic preliminary facts about the CNN architecture used in this paper are discussed, followed by the outline of the proposed model.

Background
Chatfield et al. [58] described a fast CNN architecture inspired by AlexNet [7] model.They have shown that reducing the number of kernels in convolutional layers in their architecture does not impact the output performance in comparison to AlexNet. Figure 1

Outline
The outline of the proposed model is illustrated in Figure 2, which is composed of three modules: (i) ROI extraction module (REM), (ii) feature extraction module (FEM), and (iii) matching module (MM).In the first module, palmprint ROIs are extracted using the bounding box concept.A pre-processing procedure is applied to the input images.Thereafter, the location of the boxes on the palmprint images are specified using a CNN transfer learning approach.The output of this module is palmprint ROIs, which are the inputs of the feature extraction module.The FEM employs a pre-trained CNN architecture to represent features.Finally, this feature vector feeds the MM as the input to perform the verification task using a machine learning classifier.

ROI Extraction Module
We propose an architecture for ROI extraction based on the one discussed in Section 3.1 by replacing the last fully connected layer, FC8, with a four neuron-layer unit (depicted in Figure 3).Since Chatfield's architecture expects a fixed pixel size, a pre-processing procedure is required for input images.In brief, REM consists of two steps: (i) pre-processing and (ii) palmprint localization.In the following, each of these parts is discussed in detail.

Pre-Processing
As the network needs inputs of 224 × 224 × 3 pixel size, the pre-processing step maps gray-scale palmprint images to suitable ones to be fed into our CNN architecture.To this end, the inputs are rescaled to square images by removing the extra pixels from both left and right sides of the inputs.Then, these gray-scale images are converted to RGB color space by the weighted method and are resized to 224 × 224 × 3 using the nearest neighbour method (nearest-neighbor interpolation, http://en.wikipedia.org/wiki/Nearest-neighbor_interpolation). Figure 3 illustrates an example of the final pre-processed image.

Palmprint Localization
Palmprint localization is the process of finding the location of the ROI in palmprint images by putting a bounding box or drawing a rectangle around the position of the palmprint in the images.Since CNN-based approaches have shown encouraging performance in object localization [59], we propose a CNN architecture for palmprint localization based on Chatfield 's fast CNN (Section 3.1).
As shown in Figure 3, we keep the first seven layers of Chatfield's architecture and replace the last layer with a four neuron-layer unit that outputs a bounding box.The four numbers parameterize the bounding box, determining the extracted palmprint ROI.In the architecture, the localization can be specified using four parameters (b x , b y ), b w , and b h representing the center point, width, and height of the box containing the palmprint ROI, respectively.
Based on the fact that there are not enough palmprint images in the available databases, we employ the learning concept.To this end, the network weights of the first seven layers are pre-trained on ImageNet using the Alexnet model and the final layer is trained using our database.In other words, we freeze the parameters in all network layers, except the last one, and then we train and fine-tune pertinent parameters associated with our bounding boxes (i.e., b x , b y , b w , and b h ).The palmprint ROIs can be extracted by applying these parameters to our palmprint images.
Considering ω as the set of learning parameters and b x (ω), b y (ω), b w (ω), b h (ω) as the network output values, we define C(ω), the cost function, as follows: where m is the number of samples and b x i , b y i , b w i , b h i are the target values for the ith sample.The cost function specifies the difference between the target values and predicted ones.The objective is to determine a set of ω minimizing C(ω).

Feature Extraction Module
The main goal of FEM, as shown in Figure 4, is to represent distinctive features of palmprint ROIs.Because the output of REM is a 143 × 143 × 3 palmprint ROI but the network input should be 224 × 224 × 3 pixel size, a pre-processing step is expected.To do this, the images are resized based on the network requirements.In the next step, a pre-trained network is used for the feature extraction task.As mentioned, before using transfer learning techniques in the feature extractor, the learning process starts based on the weights learned from the ImageNet database.We should fine-tune these pre-learned parameters on our pertinent database using our limited palmprint ROI images.In this way, we use a CNN architecture similar to Figure 1, remove the last layer (FC8), and then initialize it with the learned weights to implement parameters transfer.The advantage of this pre-trained architecture is that it has the superior capacity of the fully trained network.After that, we fine-tune this pre-trained architecture by passing palmprint ROIs through the pre-trained layers.It should be noted that the fine-tuning process is similar to the training process except that not all weight parameters should be fine-tuned because of the lack of training data.The effective way to fine-tune the pre-trained weights is to optimize only the last fully connected layer and keep the parameters of other layers frozen.To fine-tune these parameters, some superior optimizers are used to adjust the pre-trained weights automatically during training.The Adam optimizer [60] is a gradient-based first-order algorithm used to optimize stochastic objective functions.The optimizer updates exponential moving averages with the learning rate and the exponential decay rates β 1 and β 2 .
In order to evaluate this module and analysis fine-tuning process, the cost function of fine-tuning to minimize the difference between the target output and predicted output is calculated.To prevent overfitting, we use a logistic regression algorithm, L, using the function below: So, the cost function can be expressed as below: where Ω is the set of parameters that needs to be optimized during the fine-tuning process, Ω is the set of predicted parameters, and m is the number of samples.It should be mentioned that the output is a fixed length 4096D vector for the different databases.

Matching Module
Two approaches are possible for the matching process: combining MM and FEM as a CNN block or using a separate classifier like SVM, K-nearest neighbor (KNN), or random forest (RF).The main differences between MM and REM are the number of existing samples for each class.In the REM, all input images belong to one class (palmprint), and therefore all database images can be used for learning of that class.On the other hand, there are several classes-one per individual-in the matching phase.To explain further, consider n as the number of images in the pertinent database, p as the number of individuals, and x as the number of existing samples per class.There are n samples for each class in REM (x REM = n), while it is n/p for MM (x MM = n/p).As an example, in the typical contact-based Hong Kong Polytechnic University Palmprint (HKPU) database (http://www4.comp.polyu.edu.hk/~biometrics/),x REM = 7752 and x MM ≈ 20.Due to a lack of training samples per individual, combining MM and FEM as a CNN block may lead to overfitting, even by exploiting the transfer learning.
Based on the aforementioned considerations, we chose a separate classifier for the MM.In this manner, three different classifiers were evaluated as the MM: support vector machines (SVMs), K-nearest neighbor (KNN), and random forests (RFs).SVMs are supervised learning models with associated learning algorithms that analyze data used for classification.Given a set of training examples, each marked as belonging to one class, the SVM training algorithm builds a model that assigns new examples to one class, making it a linear classifier.SVMs are the representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible.At this point, new examples are then mapped into that same space and predicted to belong to a class based on which side of the gap they fall.In more detail, SVM constructs a set of hyperplanes in a high-dimensional space to classify the input features.The main advantages of SVMs are [61]: (i) effective in high-dimensional spaces, (ii) effective in cases where the number of samples is lower than the number of dimensions, (iii) efficient memory usage (because it uses a subset of training points in the decision function called support vectors), and (iv) flexibility (different Kernel functions can be used for the decision function).
KNN is a non-parametric technique that classifies new cases based on the majority vote of its k-nearest neighbors, where the votes are considered according to the distances between testing and training samples.RF is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control overfitting.

Experimental Results
In this section, we first explain the pertinent databases, the exploited framework, the optimizer, and the parameters set in our experiment.Thereafter, since the output of the REM is a region in the input image, it is evaluated via computing the IoU metric.The FEM is analyzed by the cost function, explained in Section 3.4 during the fine-tuning process.The final output is evaluated via accuracy, receiver operating characteristic (ROC) curves, and EER.

Settings
In this paper, we used low-resolution palmprint images including both contact-based and contact-less databases.The typical HKPU version 2 was employed as contact-based palmprint images and the 2D contact-free HKPU [62] and Indian Institute of Technology Delhi (IITD) Touchless Palmprint Database [63] were used as the contact-less ones.
The typical contact-based HKPU contains 7752 gray-scale images corresponding to 193 subjects or 386 palms.The subjects were 131 males and 62 females.The minimum number of samples corresponding to each palm was 10, collected at two different time intervals.To have a consistent number of samples assigned to each class, we utilized 10 samples and ignored the extra ones.Although it is mentioned by the database owner that there are approximately 18 samples for each person, it should be noted that the existing samples of the 150 th individual are 11.
In 2D contact-free HKPU, each individual contributed five hand images in the first session, followed by another five in the second session.The database currently has 3540 hand images.
The IITD palmprint image database includes hand images from 230 users, all the images are in bitmap (*.bmp) format.Subjects in the database were in the age group 12-57 years.Five to six images from each person's palm exist in the database.
The simulation was performed in Python programming language, and the TensorFlow [64] framework was exploited to implement the CNN methods.Based on the documentation, it is recommended to leave the parameters of networks at their default values, except sometimes for the learning rates.In our experiment, the Adam optimizer was exploited with the good default settings by the tested machine learning problems, for β 1 = 0.9 and β 2 = 0.999 in the CNN architecture of REM and FEM.Besides, according to the documentation, the recommended learning rates are 1, 0.01, 0.001, and 0.0001.We used 70% of the palmprint images as the training set and the remaining as the test set for palmprint ROI extraction.

Evaluation of ROI Extraction Module
As mentioned in Section 3.3.1,a pre-processing step is needed before extracting the bounding boxes.Considering the contact-based HKPU database, the pre-processing step maps gray-scale palmprint images of 384 × 284 to suitable ones to be fed into our CNN architecture.To this end, the inputs were rescaled to 284 × 284 square images and thereafter were represented in RGB.Finally, the images were resized to 224 × 224 × 3 as the input of the network for the ROI extraction task.
In REM, we set the learning rate in two modes: a constant value of L r = 10 −4 and an adaptive one in which it starts with a predefined value (10 −3 ) and decays by a factor of 0.96 during steps.Experimental results showed the same performance for both values, therefore, we report the results of the constant learning rate.

Training Phase
In order to evaluate this module, we used the cost function described in Section 2.1.In our experiment, the training procedure was considered to be completed after 100 epochs.Figure 5 illustrates the predicted boxes during different epochs in the training procedure.As can be seen, the bounding box fit the intended ROI during epochs.The learning parameters of each epoch were supposed as the inputs of the cost function depicted in Figure 6.Based on this plot, the cost of palmprint localization dramatically decreased during the early steps while it started to become steady after approximately 100 epochs.

Testing Phase
In this phase, we applied the remaining images to our network as the test set.The results of the testing phase, illustrated in Figure 7, show that REM performed very well on the test set.To investigate the performance of the proposed REM, we used the IoU metric [65], which measures the overlap between the predicted ROI and ground truth ROI or the target region.Considering ground-truth ROIs as R g and the corresponding predicted ROIs by our model as R p , the IoU can be calculated by the Formula (4): Figure 8 presents a visual sample of the ground-truth ROI versus the ones by our model.It should be noted that ground truth ROIs are the ones firstly produced by bounding boxes in our own works.The final IoU score is obtained by the mean taken over all classes, and was 93% in our experiment.In order to validate the effectiveness of our proposed ROI extraction model, we compared it to the Han model [16], which performs well even under unconstrained scenes.
Figure 9 shows the learned features for the palmprint localization task in our model and the Han method.The Han method selects Haar features in order to represent the shape features of the hand, while our model extracts discriminative features automatically that are easier to learn than the complicated Haar features.In our ROI extraction module, the first convolutional layer extracts basic features like edges and lines, and other convolutional layers represent high-level features automatically.As depicted in Figure 10, our proposed model could effectively extract the palmprint ROI using deep features in comparison to the Han method, especially in segmenting the background from the palmprint and keeping the important regions of the palm.

Evaluation of Feature Extraction and Matching Modules
To extract palmprint ROI features, we passed palmprint ROI images, obtained by REM, through the network and optimized the final vector of n × 4096 feature matrix, where n is the overall number of images in each database.In other words, the values of each row in this vector correspond to the features of each individual's image.For example, considering n = 3480 for the contact-based HKPU database, the first 18 images of overall 3480 images are related to the first person, the second batch of 18 images is for the second person, and so on.A learning rate of L r = 10 −5 was fixed for the FEM network.The cost function of the fine-tuning process in FEM is depicted for three databases in Figure 11.As we show, the network parameters could be well optimized during the fine-tuning process.Moreover, the cost function had the lowest value for contact-based HKPU because of more discriminative features in the database images.To have a better perspective on the issue, we visualize the feature vector of the FC7 layer in FEM as an image shown in Figure 12.In other words, we reshaped the 4096D matrix into a 64 × 64 image.These extracted deep features were used as the input of the SVM classifier in the verification task.To evaluate the matching module, we compared the performance of different methods via verification accuracy, ROC curves, and EER using four main concepts: true positives (TPs) (data items correctly predicted as positive), false positives (FPs) (data items incorrectly predicted as positive), true negatives (TNs) (data items correctly predicted as negative), and false negatives (FNs) (data items incorrectly predicted as negative.).Furthermore, the verification performance of each classifier was calculated by matching palmprint ROIs from two sessions.The first classifier was a linear SVM with parameters C = 1 and max iter = 1000.We also applied KNN, which classifies feature vectors by the number of neighbors 3 via the Minkowski metric.The last was the RF classifier, for which the number of estimators was 200.

Accuracy
As mentioned, we calculated the accuracy of three classifiers for the assessment of the verification task.The accuracy can be defined by the following formula: Although the main focus was on evaluating our proposed model, we also validated the verification performance for different training sample sizes.
The impact of different classification methods is illustrated in Figures 13-15 related to the matching stage.As can be seen in Figure 13, linear SVM outperformed other classification methods and achieved a verification accuracy of 1 for the contact-based HKPU database.Additionally, SVM showed the best performance in terms of verification accuracy for 2D contact-free HKPU and IITD touchless databases.
Tables 1-3 show the accuracy results obtained using SVM, KNN, and RF classifiers with the input of extracted deep features on three databases.The overall trend shows that the larger the training sample size, the higher the accuracy.It can be realized that the classifiers in our proposed model required low training samples to achieve good performance.The receiver operating characteristic (ROC) curve is a graphical plot of trade-off which illustrates genuine acceptance rate (GAR) as the y coordinate versus false acceptance rate (FAR) as the x coordinate.The equations for GAR and FAR are as follows: A robust way to compare the performance of different classifiers is to measure the area under the ROC curves (AUCs).Figures 16-18 show ROC curves and AUC metric values for three databases.As can be seen, SVM had the highest AUC value among the classifiers.

Equal Error Rate
The equal error rate (EER) corresponds to the error rate for which (1-GAR) and FAR are equal.To verify the effectiveness of our model, we compared it with the state-of-the-art hand-crafted methods in terms of as the performance indicator.These methods included the principal line (PriLine) [30], competitive code (CompCode) [31], double orientation code (DOC) [32], extended binary orientation co-occurrence vector (E-BOCV) [33], local line directional patterns-modified finite radon transform (LLDP-MFRAT) [34], and local line directional patterns-Gabor (LLDP-Gabor) [34] described in Section 2.2.
Table 4 presents EER results of the aforementioned methods and our proposed model in the verification task for the contact-based HKPU database.To avoid presenting a large table, we report only more recent state-of-the-art methods demonstrated to outperform old ones in our results.It should be pointed out that the verification performance of each algorithm was obtained by matching palmprints from different sessions.In fact, the images related to the first session were used as the training set and the images related to the second session were considered as the test set.We also performed verification experiments for the IITD touchless database.The EERs obtained using different methods are summarized in Table 5.In the experimental results, the training set included the first four images and the remaining images were utilized as the test set.4 and 5, the results corroborate that using the linear SVM classifier in our matching module with the input of extracted deep features achieved the best performance in terms of the EER value among the state-of-the-art methods, which can be attributed to discriminative deeply learned features.In the other words, the CNN features extracted by our model were highly dominating over the only line features in PriLine and the orientation-based features in coding methods, so it can address the shortcomings of hand-crafted features.Moreover, as local feature-based methods, hand-crafted approaches generally demand more high-resolution palmprint images and have slower matching speed than our proposed model.Although LLPD-based methods have shown good performance as palmprint texture descriptors, they suffer from computation complexity and slower matching speed.
Furthermore, we can see from the comparative results that the overall EER rate for contact-based HKPU database was higher than that of the IITD touchless database because illumination, distortion, translation, and rotation variances in contact-less palmprint images led to less-significant features.Since our proposed model was tested on two palmprint image databases, we are fairly confident about the generalizability of our model.

Conclusions and Future Work
In this paper, we focused on palmprint verification using deep learning approaches, where we defined three main modules: (i) an ROI extraction module called REM, (ii) a feature extraction module called FEM, and (iii) a matching module called MM which is machine learning classifier.To the best of our knowledge, this is the first study of extracting palmprint ROIs using CNN transfer learning.For the FEM, we utilized a pre-trained network to obtain a discriminative feature vector.Moreover, to find the best-adopted classification method with our proposed REM and FEM, we investigated different classification methods.Experimental results showed that the linear SVM classifier had the best performance.Additionally, the superiority of our proposed model over the state-of-the-art hand-crafted approaches demonstrates that deeply learned features are more discriminative than hand-crafted features.
In future work, we will collect a mobile palmprint database and apply our proposed model, in addition to evaluating new architectures like generative adversarial networks and capsule networks for online and reliable personal authentication.
illustrates the aforementioned fast CNN architecture.This architecture is comprised of eight layers, including five convolutional layers (Conv.Layers) and three fully connected ones (FC Layers).The input image of this architecture is 224 × 224 × 3 and the structure of the five convolutional layers are as follows: (i) 64 kernels of size 11 × 11 × 3 in the first layer (Conv. 1 in Figure 1), (ii) 256 kernels of size 5 × 5 × 64 in the second layer (Conv.2), and (iii) 256 kernels of size 3 × 3 × 256 in the next three layers (Conv.3, Conv.4, and Conv.5).Both of the first two fully connected layers has 4096 neurons (FC6 and FC7).The last fully connected layer, referred to as the softmax layer, has an output of 1000 dimensions (FC8).

Figure 2 .
Figure 2. Our proposed model for palmprint verification.ROI: region of interest.

1 Figure 3 .
Figure 3. ROI extraction module.B x and B y : center point; B y : width and B h : high of the box containing the palmprint ROI.

Figure 5 .
Figure 5. Examples of predicted ROIs for different epochs.

Figure 7 .
Figure 7. Examples of extracted palmprint ROIs in the testing phase.

Figure 8 .
Figure 8.A visual sample of the intersection over union (IoU) metric in our proposed model.

Figure 9 .
Figure 9.Comparison of learned features in our ROI extraction model with the Han method [16].

Figure 10 .
Figure 10.Comparison of extracted palmprint regions with the ground truth ROI.

Figure 11 .
Figure 11.Fine-tuning cost versus epoch in Feature Extraction Module for contact-based Hong Kong Polytechnic University (HKPU), 2D contact-free HKPU, and Indian Institute of Technology Delhi (IITD) Touchless Palmprint Databases.

Figure 12 .
Figure 12.Visualization of the extracted deep features for two individuals.

Figure 14 .
Figure 14.Accuracy curves of SVM, RF, and KNN classifiers in the 2D contact-free HKPU database.

Figure 15 .
Figure 15.Accuracy curves of SVM, RF, and KNN classifiers in the IITD touchless database.

Figure 16 .Figure 17 .Figure 18 .
Figure 16.Receiver operating characteristic (ROC) curves of our proposed model for the contact-based HKPU database.AUC: area under the ROC curve.

Table 1 .
Verification accuracy obtained using different classifiers on the contact-based HKPU database.MM: Matching Module.

Table 2 .
Verification accuracy obtained using different classifiers on the 2D contact-free HKPU database.

Table 3 .
Verification accuracy obtained using different classifiers on the IITD touchless database.

Table 4 .
Comparison with state-of-the-art methods in terms of EER value for the contact-based HKPU database.CompCode: competitive code; DOC: double orientation code; E-BOCV: extended binary orientation co-occurrence vector; PriLine: principal line.

Table 5 .
Comparison with state-of-the-art methods in terms of EER value for the IITD touchless database.LLPD-MRFT: local line directional patterns-modified finite radon transform.