Contactless Palmprint Recognition Using Binarized Statistical Image Features-Based Multiresolution Analysis

In recent years, palmprint recognition has gained increased interest and has been a focus of significant research as a trustworthy personal identification method. The performance of any palmprint recognition system mainly depends on the effectiveness of the utilized feature extraction approach. In this paper, we propose a three-step approach to address the challenging problem of contactless palmprint recognition: (1) a pre-processing, based on median filtering and contrast limited adaptive histogram equalization (CLAHE), is used to remove potential noise and equalize the images’ lighting; (2) a multiresolution analysis is applied to extract binarized statistical image features (BSIF) at several discrete wavelet transform (DWT) resolutions; (3) a classification stage is performed to categorize the extracted features into the corresponding class using a K-nearest neighbors (K-NN)-based classifier. The feature extraction strategy is the main contribution of this work; we used the multiresolution analysis to extract the pertinent information from several image resolutions as an alternative to the classical method based on multi-patch decomposition. The proposed approach was thoroughly assessed using two contactless palmprint databases: the Indian Institute of Technology—Delhi (IITD) and the Chinese Academy of Sciences Institute of Automatisation (CASIA). The results are impressive compared to the current state-of-the-art methods: the Rank-1 recognition rates are 98.77% and 98.10% for the IITD and CASIA databases, respectively.


Introduction
Biometric recognition methods are now the most often used for identifying or verifying people. Biometrics has supplanted traditional authentication techniques such as passwords or badges, which may be easily forgotten or lost [1]. In favor of biometric recognition systems, they cannot be stolen or forgotten since they are unique to each individual [2]. Biometric modalities can be classified into behavioral, chemical, and physical traits [3].
The palmprint as a biometric recognition modality has recently gained more popularity than other biometric characteristics; the palmprint modality is considered among the most effective tools in increasing the security of a person's authentication [4,5]. Palmprint refers to the inside surface of the hand, which contains unique characteristics such as primary lines, wrinkles, ridges, and textures that even twins have different patterns [6]. Palmprint features remain stable and permanent during human life [7]. Biometric recognition based on palmprints is regarded as among the most practical and reliable authentication approaches compared to other physiological traits [8]. It has several advantages: the cheap cost, ease of access, acceptability by people, and higher distinctiveness [9].
Images of palmprints can be acquired in contact and non-contact (i.e., contactless) modes. In the context of a contact-based palmprint acquisition technique, the subject should place their hand in touch with the sensor attached to pegs to ensure that the hand is correctly positioned in order to take photos. In contrast, contactless acquisition is possible with commercial off-the-shelf cameras and under unconstrained conditions. The last mode provides several advantages over contact-based methods, such as increased user-friendliness and more confidentiality, and it does not provoke the hygiene risk [10].
The feature extraction stage is the most critical and delicate process in the biometric recognition system. If the algorithm used in this phase does not react appropriately, the system will automatically be flawed; this is why the feature extraction stage of the presented work piques our curiosity.
Local texture descriptors have gained significant research interest in the last few years. They describe pictures in small local patches and have been proven more efficient than global or geometric image descriptors; this is why we have chosen the local texture descriptor binarized statistical image features (BSIF) [11] in this research. In many computer vision applications, the BSIF descriptor has outperformed the performance of several homolog descriptors, such as the local binary patterns (LBP) [12], histograms of oriented gradients (HOG) [13], local phase quantization (LPQ) [14], or patterns of oriented edge magnitudes (POEM) [15]. Its principal consists of using the independent component analysis (ICA) [16] to train a collection of filters from real-world images.
The trained filters can be utilized to represent each pixel of the provided image as a binary string code by simply computing a convolution operation between its initial value and a learned filter. The binary code corresponding to the pixel may be considered a local descriptor of the picture intensity pattern in the pixel's neighborhood. Finally, the histogram of pixel code values can be used to characterize texture features within picture sub-regions (i.e., multi-patch).
This study proposes a feature extraction approach for contactless palmprint recognition based on discrete wavelet transform (DWT) [17] multiresolution analysis and multiscale BSIF description. The wavelet-based multiresolution analysis is a helpful tool that permits the analysis of the image and the extraction of pertinent information at multiple scales or resolutions. The BSIF descriptor is then applied to the original image in addition to the approximation coefficients (i.e., sub-images) extracted from each DWT level. It is well-known that the application of BSIF descriptor generates a feature vector in the form of a histogram representing the image.
As the BSIF descriptor is applied to the original image in addition to several DWT levels, i.e., each level will generate a corresponding histogram. In our approach, we have concatenated all histograms extracted from the same image (i.e., multi-level histograms) and generated a global histogram representing the image at several scales. So, we have performed a multiresolution image analysis and extracted the pertinent information from several resolutions.
For the classification stage, we have implemented and tested the performance of our approach with two classifiers, namely the K-nearest neighbors (K-NN) and centroid displacement-based K-nearest neighbors (CDNN). In addition, we conducted a deeper experimental analysis to find the best-performing parameters that present the higher recognition accuracy using two concurrent databases: the Indian Institute of Technology-Delhi (IITD) [18] and the Chinese Academy of Sciences Institute of Automatisation (CASIA) [19]. By comparing the obtained results against recently published state-of-the-art approaches, our method presents a competitive performance outperforming most of its contemporary approaches.
In addition, it has lower algorithmic complexity, i.e., lower computational cost than deep learning-based approaches, which take much time and require high-performance hardware usage. In summary, the main contributions of our paper can be synthesized as follows: - We proposed a feature extraction approach for contactless palmprint recognition based on DWT multiresolution analysis and multi-scale BSIF description. -We substituted the classical approach of multi-block (i.e., multi-patch) decomposition with DWT multiresolution analysis. -We conducted more profound experiments and analyses on the pre-processing, feature extraction, and classification stages by testing and considering several configurations to find the best-performing parameters that maximize the recognition rate.
The remaining paper is structured as follows: Section 2 includes a synopsis of the related work. Section 3 explains the proposed methodology and each technique used in this work. Section 4 displays our experimental results and a comparative study. The conclusion is introduced in Section 5.

Related Work
The following methodologies are the most often employed in the research for palmprint recognition: coding-based methods, texture-descriptor-based methods, and deeplearning-based methods.

Coding-Based Methods
These methods use filters to extract characteristics from palmprint images, which are subsequently encoded into digital codes. More precisely, they convert the responses of a bank of filters into bitwise codes and employ various distance metrics to compare and match palmprint images.
Kong and Zhang [20] proposed the competitive code, a method for palmprint identification based on a competitive coding scheme and angular matching. The authors applied the real part of the neurophysiology-based 2D Gabor filters to the palmprint to derive orientation information from the palmprint. In a proposed framework by Sun et al. [19], ordinal measures have been used to solve the representation problem. The authors introduced a different representation strategy called orthogonal line ordinal features that unifies several existing palmprint algorithms into the proposed framework.
The method produces a one-bit feature code by qualitatively comparing two elongated line-like image regions that are orthogonal. Tens of thousands of ordinal feature codes make up a palmprint pattern. Zhang et al. [21] analyzed the fragile bits phenomena in the palmprint's binary orientation co-occurrence vector (BOCV) representation. If the value of a bit fluctuates in code maps made from many images of the same palmprint, that bit is said to be fragile. Next, they extended BOCV to E-BOCV by appropriately integrating information about fragile bits.
Fei et al. [22] tackled the problems of orientation feature extraction and effective matching of palmprint images. A new dual orientation code (DOC) scheme was developed to describe the palmprint's orientation features. A good nonlinear angular match score was created to evaluate the similarity of DOCs. Xu et al. [23] proposed a new approach for palmprint authentication based on discriminative and reliable competitive coding and using a more precise dominant orientation representation of palmprint images. The authors suggested weighting the orientation data of a nearby region to enhance the robust and dominant discriminating orientation code's correctness and stability.

Texture-Descriptors-Based Methods
These methods decompose the encoded palmprint image into several blocks, apply the texture descriptor to each block, and extract their histograms. Then, they concatenate the extracted histograms of each block to obtain the feature vector representing the palmprint image. Finally, they compare and match palmprint images using various distance metrics.
Motivated by the public desire for clean and non-intrusive biometric technologies, Michael et al. [24] proposed a touchless palmprint model for palmprint biometrics using a low-resolution web camera to acquire pictures. The authors used the skin-color threshold-ing approach to derive the region of interest (ROI). Then, a valley detection process was utilized to locate the valleys of the fingers as the proper points to find the palmprint area. The discriminative palmprint features are obtained by applying the LBP texture descriptor to the directional gradient responses of the palmprint.
Morales et al. [25] analyzed the difficulties of two palmprint techniques used for contactless biometric authentication, namely the orthogonal line ordinal features (OLOF) and scale-invariant feature transform (SIFT) features. The authors evaluated their performance in the presence of large-scale, rotation, occlusion, and translation fluctuations.
Hammami et al. [26] focused on areas of the image containing the most discriminating characteristics for identification. The authors located, extracted, and preprocessed the ROI. Then, the ROI was divided into sub-regions before applying the LBP texture descriptor for feature extraction. Finally, LBP histograms are concatenated in the form of a global histogram. Wu et al. [27] suggested a SIFT-based method to extract and match features for contactless palmprint identification. They utilized two steps in the matching stage to eliminate the mismatched SIFT points. In the first stage, SIFT has been coupled with RANSAC (i.e., random sample consensus) method, and in the second stage, LPDs (i.e., local palmprint descriptors) were used.
Another contactless palmprint recognition approach has been proposed by Luo et al. [28] that is based on SIFT feature extraction. First, palmprint images are preprocessed with an isotopic filter before detecting and matching SIFT points. Next, mismatched points that do not satisfy the topological relations are eliminated using the iterative-RANSAC (I-RANSAC) algorithm. LPDs are retrieved for SIFT to eliminate the mismatched data. Wu et al. [29] proposed a variation of the LBP texture descriptor in the local line-geometry space for palmprint recognition called local line directional patterns (LLDP). By examining directional line data, LLDP encodes a neighborhood's structure. Consequently, using the modified finite random transform (MFRAT) or Gabor filters, the line responses in the neighborhood are calculated in 12 dissimilar directions.

Deep Learning-Based Methods
These approaches often use convolutional neural networks (CNN). CNNs are composed of convolutional, pooling, and fully connected layers that can simultaneously be trained and utilized for feature extraction and classification.
The authors in [30] proposed a heuristic palmprint identification approach that extracts three types of palmprint features from the image. First, they extracted the texture, gradient, and direction features to be encoded in triple-type feature codes. Then, they created the triple feature descriptors for palmprint representation using the block-wise histograms of the triple-type feature codes. The similarity between two matched triple-type feature descriptor palmprint images was then determined using a weighted matching-score level fusion.
In a recent study by Fei et al. [5], the authors summarized the feature extraction and recognition of different palmprint images using a unified framework to categorize palmprints into four categories. Then, they analyzed the motivations and theories of the representative extraction and matching methods. Among these methods, the authors analyzed the effectiveness of the deep-learning-based models on palmprint recognition.
Zhao and Zhang [31] suggested a new deep discriminative representation (DDR) strategy. DDR is a palmprint recognition technique based on learning the discriminative deep convolutional networks (DDCN). DDCNs are trained using constrained data to derive deep discriminative patterns (global and local) from palmprint images. They utilized CRC (i.e., collaboration representation based on a classifier) for the classification stage. Table 1 recapitulates the works synthesized in this section by approaches, databases employed, and experimental protocols. Motivated by the successes of the second class, "Texture-descriptors-based methods", we proposed in this work a practical approach called "binarized statistical image features-based multiresolution analysis" to handle the problem of palmprint recognition. Furthermore, most methods of the second class are based on image multi-bloc decomposition to extract the pertinent information from several levels. We proposed an alternative solution to the classical multi-bloc image decomposition strategy in this work. Our solution adopts the multiresolution analysis with the help of the DWT, which effectively extracts pertinent information from several levels.

Proposed Approach
This paper proposes a three-step approach to address the challenging problem of contactless palmprint recognition: (1) a pre-processing, based on median filtering and contrast limited adaptive histogram equalization (CLAHE), is used to remove potential noises and equalize the images' lighting; (2) a multiresolution analysis is applied to extract BSIF features at several DWT scales; (3) a classification stage is performed to categorize the extracted features into the corresponding class using the K-NN or CDNN classifiers. This section details the fundamental concepts of these approaches. We used the pre-processing phase before the feature extraction and classification processes, which significantly influences the performance and robustness of the recognition rate. We used median filtering and CLAHE for the pre-processing stage.

Pre-Processing
We used the pre-processing phase before the feature extraction and classification processes, which significantly influences the performance and robustness of the recognition rate. We used median filtering and CLAHE for the pre-processing stage.
Median filtering is a simple implementation of a nonlinear filter that is essential in image processing since it is widely recognized for preserving picture borders during noise reduction, particularly "Salt & Pepper" noise [32]. The median filter replaces each pixel's gray level with the median value of the gray levels in the pixels' neighborhood [33].
CLAHE is a line contrast enhancement pre-processing tool. CLAHE involves conducting histogram equalization on non-overlapping picture sub-areas and applying interpolation to repair irregularities across boundaries [34].
The technique includes the following steps: (1) CLAHE divides the input picture into non-overlapping sub-blocks, (2) it generates the histograms for each sub-block, (3) the clip limit notion is then used; the clip limit is a multiple of the histogram's average content (each histogram is redistributed to ensure its height does not exceed a predetermined "clip limit"), (4) the cumulative histogram is calculated to perform the equalization, and (5) finally, bilinear interpolation is used between the blocks to eliminate block distortions [35].

Feature Extraction
In this section, we first describe the theoretical fundaments of the methods used in this study and then present the working procedure of our proposed feature extraction approach.

A. Discrete Wavelet Transform (DWT)
Wavelets have been used in many applications, including feature extraction, compression, denoising, and contour detection. DWT divides a given signal into several frames, each of which is a time series of coefficients characterizing the signal's temporal evolution in the appropriate frequency band [36,37]. The DWT works with many mother wavelets, including Haar, Daubechies, Coiflets, symlet, etc. [36,38].
The theoretical framework to apply a DWT on a 2D image is expressed mathematically with Equations (1) to (7) and explained as follows: Suppose ψ(t) is a function of the mother wavelet. The wavelet function family ψ(s, p)(t) can be obtained as [39,40]: where s is the scale parameter, t is an instance, and p is the position parameter. Let f (x, y) be an image of a size M × N. The 2D-DWT is expressed as follows: where i = {horizontal, vertical, diagonal}, W ϕ (j 0 , m, n) are the coefficients that define an approximation of f (x, y) at a scale j 0 , W i ψ (j, m, n) are the coefficients that add horizontal, vertical, and diagonal details for a scale j ≥ j 0 , ϕ defines the scaling function, and ψ defines the wavelet function.
In a picture, several resolutions are represented by repeating cycles of scaling (low pass) and wavelet transform (high pass), as depicted in Figure 1. The scaling catches the image's low-frequency information, while the wavelet collects the image's high-frequency information. At each cycle of the wavelet transform, a low-resolution picture and a fine details image, each half the size of the original image, are generated. As the information in fine details images is frequently limited, the lower-resolution version captures most of the information in the original image, resulting in high image representation efficiency [41,42].

B. Binarized Statistical Image Features (BSIF)
The binarized statistical image features (BSIF) descriptor introduced by Kannala and Rahtu [11] was used for texture description and classification, which is based on local binary patterns (LBP) [12] and local phase quantization (LPQ) [14]. The fundamental concept behind this descriptor is to employ learned filters with independent component analysis (ICA) [16] rather than handcrafted filters to calculate a binary code string for each pixel in an image, which can then be used to maintain an efficient histogram representation.
The BSIF approach generates a binary code string from image pixels, with the code value serving as a local descriptor of the picture intensity model in the pixel's neighbors. Histograms of pixel code values can also quantify texture qualities within picture subspaces.
By binarizing the response of a linear filter with a zero threshold, the value of each bit in a binary code string is determined [43]. Each bit is associated with a unique filter, and the number of filters used is determined by the appropriate length of the bit string. The filters are trained using a training set of natural image patches. The learning process is carried out by maximizing the statistical independence of the filter responses [44,45].
For a given image patch of size × pixels and a linear filter of the same The 2D wavelet divides a picture into four sub-band images: Low-Low (LL), Low-High (LH), High-Low (HL), and High-High (HH), as displayed in Figure 1. Low frequency indicates approximation coefficients, while high frequency represents detail coefficients (horizontal, vertical, and diagonal). Mathematically, this procedure can be presented as follows.
In the following example, DWT frequency components for two distinct scale values are generated, and the acquired components' low-frequency sections (LL, LLLL) are used in processes.

B. Binarized Statistical Image Features (BSIF)
The binarized statistical image features (BSIF) descriptor introduced by Kannala and Rahtu [11] was used for texture description and classification, which is based on local binary patterns (LBP) [12] and local phase quantization (LPQ) [14]. The fundamental concept behind this descriptor is to employ learned filters with independent component analysis (ICA) [16] rather than handcrafted filters to calculate a binary code string for each pixel in an image, which can then be used to maintain an efficient histogram representation.
The BSIF approach generates a binary code string from image pixels, with the code value serving as a local descriptor of the picture intensity model in the pixel's neighbors. Histograms of pixel code values can also quantify texture qualities within picture subspaces.
By binarizing the response of a linear filter with a zero threshold, the value of each bit in a binary code string is determined [43]. Each bit is associated with a unique filter, and the number of filters used is determined by the appropriate length of the bit string. The filters are trained using a training set of natural image patches. The learning process is carried out by maximizing the statistical independence of the filter responses [44,45].
For a given image patch X of size l × l pixels and a linear filter W i of the same size, the s i filter response is determined as follows [11]: where s i denotes the filter response, W i denotes a linear filter, (u, v) denotes the spatial coordinates, i indicates the number of the filter, and the vectors w and x contain the pixels of W i and X, respectively. The binarized feature b i is produced by: Given n linear filters W i , which can be stacked to a matrix W of size n × l 2 , and computing all responses at once, i.e., s = Wx, the bit string b is obtained by binarizing each element s i of s as described above.
Lastly, the BSIF features are obtained by treating each individual pixel as a combination of binary values determined from the n number of linear filters. BSIF encoded features β are obtained as follows: Figure 2 depicts the entire BSIF extraction technique using an 11 × 11 BSIF filter with a bit string length of 8. Figure 2a illustrates the input ROI of a palmprint image, Figure 2b depicts pre-learned filters with a dimension of 11 × 11 and a bit string length of 8, Figure 2c represents the BSIF features derived by convolution of ROI images with BSIF filters, and the completed BSIF encoded features are shown in Figure 2d. Figure 3 depicts several BSIF findings with varying filter sizes and bit string lengths.
The filters W i are learned by improving the statistical independence of s i employing ICA. The BSIF descriptor has two parameters: the filter size l and the bit string length n [43]. The initial filters W i presented by [11,46] were learned by randomly sampling 50,000 picture patterns from 13 distinct natural photographs.  Figure 2c represents the BSIF features derived by convolution of ROI images with BSIF filters, and the completed BSIF encoded features are shown in Figure 2d. Figure 3 depicts several BSIF findings with varying filter sizes and bit string lengths.
The filters are learned by improving the statistical independence of employing ICA. The BSIF descriptor has two parameters: the filter size and the bit string length [43]. The initial filters presented by [11,46] were learned by randomly sampling 50,000 picture patterns from 13 distinct natural photographs.   The filters are learned by improving the statistical independence of em ing ICA. The BSIF descriptor has two parameters: the filter size and the bit s length [43]. The initial filters presented by [11,46] were learned by rand sampling 50,000 picture patterns from 13 distinct natural photographs.

Proposed Feature Extraction Approach-Multiresolution Analysis
The feature extraction stage, which is the main contribution of this study, is based on a multiresolution analysis of the entered palmprint image. As shown in the graphical flowchart of Figure 4, the concept consists firstly of applying the BSIF descriptor on the original image and extracting its corresponding histogram, which defines the feature vector of the original image. Secondly, we apply the DWT decomposition on the original image, which will decompose this image into four sub-band images: Low-Low (LL), Low-High (LH), High-Low (HL), and High-High (HH) images, as explained in Section 3.2.1. This decomposition represents the first level (or resolution) of the DWT decomposition. It is well-known that the pertinent information is concentrated in the approximations coefficients (i.e., LL sub-band), while the detail coefficients, which are high frequencies, generally contain not useful information (mostly noises or contours). In addition, the generated LL image will replace and compress the original image and permit access to more precise representations of the original image.
For the first level of decomposition, we also apply the BSIF descriptor on the L1-LL (i.e., Level 1 of Low-Low) sub-band and generate the corresponding histogram. Similarly, the process of DWT decomposition will be applied on the L1-LL sub-band to obtain the L2-LL (i.e., Level 2 of Low-Low) representation containing more useful and compressed information, i.e., the approximations coefficients of resolution 2. This operation is followed by a BSIF application to generate the histogram of the second resolution. The process of DWT decomposition is repeated for N decomposition levels depending on the size of the image. Finally, the histograms generated from each level are merged to create a global feature vector representing the palmprint at several levels. In other words, we have applied a multiresolution analysis on the palmprint image, as shown in Figure 4.
(i.e., Level 1 of Low-Low) sub-band and generate the corresponding histogram. Similarly, the process of DWT decomposition will be applied on the L1-LL sub-band to obtain the L2-LL (i.e., Level 2 of Low-Low) representation containing more useful and compressed information, i.e., the approximations coefficients of resolution 2. This operation is followed by a BSIF application to generate the histogram of the second resolution. The process of DWT decomposition is repeated for N decomposition levels depending on the size of the image. Finally, the histograms generated from each level are merged to create a global feature vector representing the palmprint at several levels. In other words, we have applied a multiresolution analysis on the palmprint image, as shown in Figure 4.

Classification
For the classification stage, we have implemented, tested, and compared the performance of two classifiers, namely K-NN and CDNN.

Classification
For the classification stage, we have implemented, tested, and compared the performance of two classifiers, namely K-NN and CDNN.

K-Nearest Neighbors (K-NN) Classifier
The K-nearest neighbors (K-NN) classifier is an easy-to-understand and simple-toimplement classification approach. The K-NN algorithm was used for classification, pattern recognition, and image processing. This method has three main components: a set of labeled items, a distance function that assesses the difference or similarity between two items, and the amount of K, i.e., the number of nearest neighbors. To categorize an unlabeled object, the distance between it and the labeled items is first calculated to determine its K nearest neighbors, and then the item is classified based on the majority class of its K nearest neighbors [47,48]. We have employed two distance metrics in this study: Euclidean and City Block.
The Euclidean distance D(A, B) between two samples A and B is formally defined as [49]: The City Block distance is calculated as follows: where D (A, B) is the distance between test sample A and specified training sample B of features (1, 2, 3, . . . , N), a i are the features of the test sample of A, b i are the features of the specified training sample of B, and N is the total number of features.

Centroid Displacement-Based K-Nearest Neighbor (CDNN) Classifier
We utilized CDNN for classification, which is noise and class distribution adaptable. CDNN is a modified K-NN method published by Nguyen et al. [50] that addresses the issue of the majority vote in the K-NN algorithm: if the distance between the test sample and its neighbors varies greatly, the closer neighbors more consistently predict the class label. The CDNN algorithm works on a basic principle: once the K-NN of x (test sample) list D k x is obtained, the nearest neighbors are organized into sets with the same class designation, where C is the class label of x, x t is the feature vector in instance t, x t = (x t1 , x t2 , . . . , x tN ), and c t is the class label of x t .
The centroid of each set and its displacement if the test sample x is added to the test are computed as follows.
The centroid of S j x is calculated as: C is the class label of x, x t is the feature vector in instance t, x t = (x t1 , x t2 , . . . , x tN ), c t is the class label of x t , and D k x is the list of K-nearest neighbors to x.
The new centroid if x is inserted into S j x is calculated as: x | c t = c j , c j ∈ C , C is the class label of x, x t is the feature vector in instance t, x t = (x t1 , x t2 , . . . , x tN ), c t is the class label of x t , and D k x is the list of K-nearest neighbors to x. The centroid displacement is calculated as: Finally, the test sample is assigned to the assortment with the lowest centroid displacement.

Experimental Analysis
The proposed method was assessed using the IITD touchless palmprint (version 1.0) [18] and CASIA palmprint [19] datasets. This section discusses the specifications of each employed dataset and its assessment protocol. Moreover, we examine the results achieved from applying our suggested method and compare the Rank-1 recognition rates with other recent state-of-the-art approaches.

IITD Touchless Palmprint Database (Version 1.0)
The IITD (i.e., Indian Institute of Technology-Delhi) touchless palmprint database [18] was created in the biometric research laboratory from January 2006 to July 2007. The database was collected from 230 users with several images from both hands of each user. The age of participants ranges from 12 to 57 years. The database employed user-pegs to reduce the hand-pose and image scale variations. The resolution of the original images is 800 × 600 pixels and 150 × 150 pixels for cropped and normalized palmprint images. Figure 5 shows some sample images from the IITD database.

Setups
In this study, all our experiments are conducted using the identification mode. Biometric identification attempts to identify the label of a query sample by comparing its feature vector to a collection of labeled samples (i.e., feature vectors) stored in a referential database. As reported in many recently published state-of-the-art papers, the evaluation protocol used in the identification experiments consists of selecting the first images for each person to create the gallery set. The remaining examples of each person are used as query samples. Then, the Rank-1 identification rate is calculated to measure the performance of our approach and make a comparison against the related approaches.
The identification rate is calculated using the following formula: It is well-known that 230 persons have participated in the formation of the IITD database using the left and right hands, i.e., the database includes 460 classes because each hand is considered an independent class. As the number of images per class varies between 5 and 6 images, the value considered in this experiment is equal to 3, i.e., the first 3 images are used in training, and the remaining images (2-3) are used in testing.
Similarly, the CASIA dataset was generated with the participation of 312 persons that have used both hands, i.e., the database holds 624 classes because each subject has 8 left images and 8 right images. The number of images per person in this database is constant and equal to 8. The value considered in this experiment is equal to 4, i.e., the first

Setups
In this study, all our experiments are conducted using the identification mode. Biometric identification attempts to identify the label of a query sample by comparing its feature vector to a collection of labeled samples (i.e., feature vectors) stored in a referential database. As reported in many recently published state-of-the-art papers, the evaluation protocol used in the identification experiments consists of selecting the first N images for each person to create the gallery set. The remaining examples of each person are used as query samples. Then, the Rank-1 identification rate is calculated to measure the performance of our approach and make a comparison against the related approaches.
The identification rate is calculated using the following formula: It is well-known that 230 persons have participated in the formation of the IITD database using the left and right hands, i.e., the database includes 460 classes because each hand is considered an independent class. As the number of images per class varies between 5 and 6 images, the N value considered in this experiment is equal to 3, i.e., the first 3 images are used in training, and the remaining images (2-3) are used in testing.
Similarly, the CASIA dataset was generated with the participation of 312 persons that have used both hands, i.e., the database holds 624 classes because each subject has 8 left images and 8 right images. The number of images per person in this database is constant and equal to 8. The N value considered in this experiment is equal to 4, i.e., the first 4 images of each subject are used for training, and the remaining 4 images are used for testing.

Experiment #1 (Effects of the BSIF Parameters)
In the first experiment, we evaluated the suggested approach by experimenting with different BSIF settings to find the optimum configuration-that is, the one that produces the most outstanding recognition accuracy. The BSIF operator depends on two parameters: bit string length n and filter kernel size l × l, as described in Section 3.2.1. In this experiment, we have not applied any pre-processing, the BSIF descriptor was applied on the original image, i.e., without any image decomposition or application of DWT multiresolution analysis, and the classical K-NN with the Euclidean distance was used as a classifier. Tables 2 and 3 show the detailed results of this first experiment on both employed databases. The highest results are marked in bold.  Tables 2 and 3 show that the recognition rates increase with the augmentation in the size of filter kernel l × l or with the value of the bit string n. In addition, the configuration l × l = 17 × 17 and n = 12 is the best performing; the best-achieved recognition rates using this configuration are 96.35% and 95.49% for the CASIA and IITD, respectively.

Experiment #2 (Effects of the Multiresolution Analysis)
The feature extractor BSIF was applied only to the original image without any image decomposition or multiresolution analysis in the previous experiment. This second experiment aims to check if exploiting multiresolution information improves recognition performance. We try to answer the previous question by testing and assessing the recognition performance at different DWT decomposition levels. So, several levels of the multiresolution analysis were tested and compared in this experiment.
In addition, the BSIF configuration considered in this experiment is a bit string length n = 12 bits and a filter kernel size l × l = 17 × 17 pixels (the best-performing configuration found in the previous experiment), the Haar wavelet is the DWT family considered in the experiment, no image pre-processing, and the classical K-NN with the Euclidean distance was used as a classifier. Figure 7 shows the detailed results of this experiment on both used databases.

Experiment #3 (Effects of the Wavelet Family)
Starting from the optimal parameters determined in the previous experiments, i.e., × = 17 × 17, = 12, and 2nd DWT level, we assessed in this 3rd experiment the performance of recognition by testing and comparing several wavelet families. The most famous wavelet families are Haar, Daubechies (db), Biorthogonal (bior), Coiflets (coif), Symlets (sym), Fejer-Korovkin filters (fk), Reverse Biorthogonal (rbio), and Discrete Meyer. For more details on wavelet families, see [51]. Tables 4 and 5 show the detailed results of this third experiment on both tested databases. Note that this experiment was applied using the classical K-NN with the Euclidean distance as a classifier and without any image pre-processing.
We can observe from Table 4 that Coiflets wavelet of types coif2 and coif4, as well as Daubechies wavelet type db4, achieve the best performance with a recognition rate = 96.39%, and the recognition rate was improved from 96.23% to 96.39% on IITD database. On the other hand, it can be observed from Table 5 that the Reverse Biorthogonal wavelet is the best performing, with a recognition rate of 97.12% and an improvement from 96.92% to 97.12% on the CASIA database. For the following experiments, we consider the wavelet family Coiflets type coif4 the optimal parameter as it presents a good compromise between both databases regarding the recognition rate.   Figure 7 shows that going deeper with DWT decomposition permits higher results until converging at level 2. We can conclude that DWT decomposition has a prominent role in extracting useful information permitting to improve recognition accuracy. We consider level 2 as the best-performing configuration because it achieves the best recognition rates for both databases, and its algorithmic complexity is minimal compared to the 3rd level (due to the less computation). Compared to the previous experiments, the recognition rates were improved from 95.49% to 96.23% and from 96.35% to 96.92% on IITD and CASIA databases, respectively.

Experiment #3 (Effects of the Wavelet Family)
Starting from the optimal parameters determined in the previous experiments, i.e., l × l = 17 × 17, n = 12, and 2nd DWT level, we assessed in this 3rd experiment the performance of recognition by testing and comparing several wavelet families. The most famous wavelet families are Haar, Daubechies (db), Biorthogonal (bior), Coiflets (coif), Symlets (sym), Fejer-Korovkin filters (fk), Reverse Biorthogonal (rbio), and Discrete Meyer. For more details on wavelet families, see [51]. Tables 4 and 5 show the detailed results of this third experiment on both tested databases. Note that this experiment was applied using the classical K-NN with the Euclidean distance as a classifier and without any image pre-processing.
We can observe from Table 4 that Coiflets wavelet of types coif2 and coif4, as well as Daubechies wavelet type db4, achieve the best performance with a recognition rate = 96.39%, and the recognition rate was improved from 96.23% to 96.39% on IITD database. On the other hand, it can be observed from Table 5 that the Reverse Biorthogonal wavelet is the best performing, with a recognition rate of 97.12% and an improvement from 96.92% to 97.12% on the CASIA database. For the following experiments, we consider the wavelet family Coiflets type coif4 the optimal parameter as it presents a good compromise between both databases regarding the recognition rate.

Experiment #4 (Effects of the Pre-Processing)
The objective of this experiment is to show if image pre-processing can influence or increase the performance of our palmprint identification system. It is well-known in the image processing field that pre-processing covers two types of operations: filtering (i.e., eliminating noise or irrelevant information) and restoration (i.e., enhancing or equilibrating the image's lighting). We applied median filtering before the feature extraction phase in the first attempt; median filtering is considered among the best image filtering operators, eliminating impulsion noises and preserving the image's content. In the second attempt, we applied CLAHE to improve and restore the lighting of images (see Section 3.1 for more details about CLAHE pre-processing).
The identification results are presented in Table 6, and the best parameters found in the previous experiments are taken into consideration. We can observe from Table 6 that using median filtering as pre-processing harms the identification rate; this later decreased from 96.39% to 95.99% with the IITD database and from 97.12% to 96.60% with the CASIA database. So, applying an operation of filtering based on median filtering to our system is not suitable.  Table 6 also recorded the results of applying the CLAHE pre-processing to the IITD and CASIA databases.). With CLAHE, the recognition rates have been improved from 96.39% to 97.13% and from 97.12% to 97.21% with IITD and CASIA databases, respectively. In contrast to median filtering, CLAHE pre-processing positively impacts the identification performance.

Experiment #5 (Effects of Classification)
In the last experiment of this work, we tried to adjust and assess the parameters of classification. More precisely, we have considered two classifiers: K-NN and CDNN (see Section 3.3 for more details about these classifiers). In addition, it is well-known that K-NN should use a distance measure. For this reason, we have tested and considered two distance measures: Euclidean and City Block (explained in Section 3.3). Finally, choosing the K-value is essential as it represents an important element in finding the majority class from the nearest neighbors. For this reason, several K-values have been tested, considered, and compared.
In Table 7, we recorded the results of this experiment by varying the classifier, the distance measure, and the K-value for both databases, in addition to using the best-performing parameters found in the previous experiments.  Table 7, it can be observed that augmenting the K-value causes a decrease in the recognition rates; the K = 1 is the best choice for both classifiers and with both distances. In addition, the classification results of the K-NN surpass the results of the CDNN with K > 1, but both classifiers have the same result with k = 1, which is considered the best. Finally, the City Block distance performs well than the Euclidean distance with both databases. As a recapitulation, adjusting the classification parameters has improved the recognition rates from 97.13% to 98.77% and from 97.21% to 98.10% with IITD and CASIA databases, respectively. Table 8 presents the final classification accuracies of the IITD and CASIA datasets using our proposed BSIF + DWT approach and a comparison study to recently published approaches. The same experimental protocol was used for all tested techniques for a fair comparison. We can notice that our approach outperforms existing deep learning-based methods (e.g., FEM [30] and ResNet-50 [5]), texture-based methods (e.g., SIFT_IRANSAC_OLOF [27]), and other approaches. Our approach has the highest accuracy, with 98.77% and 98.10%, respectively, on the IITD and CASIA palmprint datasets, except for the work of Zhao and Zhang [31], named DDR, which has achieved 99.41% for the CASIA palmprint dataset.

Comparison
The best results obtained using the BSIF-based multiresolution analysis on both IITD and CASIA datasets could be explained by using the powerful feature extractor BSIF on various DWT decomposition levels and adjusting the classification parameters.
In summary, the superiority of our approach is justified by the following points: -The texture extractor analyses the picture pixel by pixel, i.e., we consider the advantages of local information. - The picture is analyzed at several levels, i.e., we exploit multi-level information. -The extracted occurrences from each level are collected in a histogram, i.e., we operate with global information.

Conclusions
Palmprint-based biometric recognition is a highly challenging issue that we attempted to solve by introducing a novel feature extraction strategy called multiresolution analysis, which is an alternative to the classical multi-patch decomposition strategy. More precisely, we have first applied the DWT on the original palmprint image, which permits the generation of several image representations (i.e., sub-bands) with different resolutions. Then, we applied the local texture descriptor BSIF on the original image as well as the generated image sub-bands of low-low resolutions. Finally, the extracted histograms from each level are merged to form the final feature vector, which represents the entered image at several resolutions.
We validated the performance of the proposed feature extraction approach by conducting more profound experiments using the recognition rate as a statistical measure as well as the IITD and CASIA palmprint databases. The best Rank-1 recognition rates achieved using our approach are 98.77% and 98.10% for the IITD and CASIA databases, respectively; these results are very competitive and surpass the performance of many recently published state-of-the-art approaches.
In the future, we intend to (1) implement the multiresolution analysis with a deep learning model as an alternative to the local texture descriptor BSIF and (2) improve the quality of training and render it more semantic by using the deep unsupervised active learning strategy.