Finger Vein Recognition Using DenseNet with a Channel Attention Mechanism and Hybrid Pooling

: This paper proposes SE-DenseNet-HP, a novel finger vein recognition model that integrates DenseNet with a squeeze-and-excitation (SE)-based channel attention mechanism and a hybrid pooling (HP) mechanism. To distinctively separate the finger vein patterns from their background, original finger vein images are preprocessed using region-of-interest (ROI) extraction, contrast enhancement, median filtering, adaptive thresholding, and morphological operations. The preprocessed images are then fed to SE-DenseNet-HP for robust feature extraction and recognition. The DenseNet-based backbone improves information flow by enhancing feature propagation and encouraging feature reuse through feature map concatenation. The SE module utilizes a channel attention mechanism to emphasize the important features related to finger vein patterns while suppressing less important ones. HP architecture used in the transitional blocks of SE-DenseNet-HP concatenates the average pooling method with a max pooling strategy to preserve both the most discriminative and contextual information. SE-DenseNet-HP achieved recognition accuracy of 99.35% and 93.28% on the good-quality FVUSM and HKPU datasets, respectively, surpassing the performance of existing methodologies. Additionally, it demonstrated better generalization performance on the FVUSM, HKPU, UTFVP, and MMCBNU_6000 datasets, achieving remarkably low equal error rates (EERs) of 0.03%, 1.81%, 0.43%, and 1.80%, respectively.


Introduction
With the increasing preference for secure recognition systems, the use of biometric traits for human recognition has received a great deal of attention in recent years.Biometric techniques, as opposed to traditional authentication methods such as secret keys and passwords, are less vulnerable to theft or duplication.These techniques utilize either physiological and/or behavioral characteristics to authenticate a person through partially or completely automated methods [1,2].Biometric characteristics can be divided into two categories: extrinsic characteristics and intrinsic characteristics.Biometric systems that use extrinsic characteristics usually rely on external features of the human body, such as fingerprints, faces, and irises [3].Even though biometric systems that utilize extrinsic characteristics are less susceptible to theft than conventional methods, they are vulnerable in practice to forged input, raising privacy and security concerns [4].In contrast, biometric systems use intrinsic characteristics such as finger veins and palm veins that are hidden beneath the skin, which are difficult to forge [5][6][7].Particularly, finger vein authentication systems have demonstrated promising applications in consumer electronics, banking, and airports [8].
Finger vein recognition is an intrinsic biometric technique that uses human finger vein patterns captured under near-infrared (NIR) illumination as the basis for biometric authentication [9].During the image acquisition process, a finger is placed between the NIR light source and a camera [10].When NIR light is directed at the finger, it is either absorbed or reflected depending on the thickness of the skin and the blood flow in the finger.As hemoglobin in the blood flowing through finger veins absorbs more NIR light than other tissues, the amount of NIR light transmitted through the blood vessels is reduced.Hence, the finger veins appear as dark lines in the acquired image.Even though finger vein recognition has been researched for many years, it is still a challenging task.During real-time acquisition, the process of acquiring finger vein images is susceptible to quality degradation.Degradation occurs for many reasons: light scattering when imaging the finger tissue, uneven muscle thickness within the finger, and physiological changes like inaccurate placement of the finger [11].Hence, finger vein images need to be preprocessed using multiple image processing methods to acquire significant information [12].
The existing finger vein recognition models are categorized as nonlearning models and learning models [13].Nonlearning models [14][15][16][17], which rely on rule-based approaches, mainly depend on input image quality and texture deletion.Additionally, these methods use handcrafted features and fixed hyperparameters, which may not generalize well to data acquired in different environmental settings, leading to lower recognition accuracy [18].To address these issues and enhance generalization performance, researchers started to develop finger vein recognition algorithms based on learning models, which are further classified into machine learning (ML) [19][20][21][22] and deep learning (DL) [23][24][25][26].Conventional ML-based methods, such as principal component analysis (PCA) and linear discriminant analysis (LDA), transform an image into low-dimensional space before classification [27].Finger vein feature extraction using these techniques is often not robust to local appearance variations that are caused by rotation, scale, skin scattering, uneven illumination, and other factors [28].DL methods consisting of convolutional neural networks (CNNs) overcome these limitations by automatically learning discriminative features from finger vein images [29].However, a more advanced DL structure, such as DenseNet [30], and additional processing steps are required to precisely learn the detailed information of finger veins that represent complex patterns.
In this paper, we propose SE-DenseNet-HP architecture, which integrates DenseNet with a squeeze-and-excitation (SE)-based channel attention mechanism and a hybrid pooling (HP) mechanism for robust feature extraction and recognition of finger veins.As acquired finger vein images contain unsatisfactory illumination and noise distortions, a preprocessing stage is required before feeding input images into a DL network.To observe clear vein patterns in finger images, our preprocessing stage consists of region-of-interest (ROI) extraction, contrast enhancement, median filtering, adaptive thresholding, and morphological operations.The preprocessed image is then fed into the SE-DenseNet-HP architecture for robust feature extraction.The DenseNet backbone in SE-DenseNet-HP improves finger vein information flow and the gradients between neural network layers by combining the feature maps from all preceding levels that have the same spatial dimension.The channel attention mechanism based on the SE module [31] enhances the feature extraction process of the SE-DenseNet-HP architecture by putting more emphasis on the feature map channels that contain discriminative characteristics of the finger vein patterns.The HP process in SE-DenseNet-HP optimizes the feature extraction performance of the architecture by combining average pooling and max pooling to acquire generalized global information as well as detailed local information.From the simulation results, we show that the proposed SE-DenseNet-HP model can achieve recognition accuracies of up to 99.35% and 99.25%, respectively, for good-and poor-quality images of the Finger Vein University Sains Malaysia (FVUSM) [32] dataset and 93.28% and 88.16%, respectively, for goodand poor-quality images of the Hong Kong Polytechnic University (HKPU) [16] dataset.In addition, it is able to achieve an equal error rate (EER) of 0.03%, 1.81%, 0.43%, and 1.80% for the FVUSM, HKPU, University of Twente Finger Vascular Pattern (UTFVP) [33], and MMCBNU_6000 [34] datasets, respectively.
The motivation for the proposed combination of image preprocessing techniques is to create a preprocessing pipeline that can adapt to various environmental conditions for the acquisition of finger vein images.In addition, the specific adaptation and combination of DenseNet, the SE block, and HP architecture in the context of finger vein recognition have not been thoroughly explored in the existing literature; thus, we conducted extensive simulations with various datasets to design DNN architecture with these components, resulting in significant enhancements in accuracy and generalization performance.Overall, the major contributions of this study to improve finger vein recognition are as follows: • To successfully separate vein patterns from the background in finger images, we develop an image preprocessing stage that consists of ROI extraction, contrast enhancement, median filtering, adaptive thresholding, and morphological operations.
The quality of the preprocessed image is compared with conventional preprocessing; • To enhance the performance of the DenseNet backbone model, the SE-based channel attention mechanism and the HP strategy are integrated into the DenseNet network structure.The SE module emphasizes the important features related to finger vein patterns while suppressing less important ones.The HP process used in the transitional blocks of SE-DenseNet-HP concatenates the average pooling method with a max pooling strategy to preserve the most discriminative and contextual information; • To show the novelty of our proposed system, we compare the finger vein recognition performance of SE-DenseNet-HP with existing recognition approaches in terms of recognition accuracy, the receiver operating characteristic (ROC) curve, and the EER.
The rest of this paper is organized as follows: Section 2 describes previous works related to finger vein recognition.Section 3 provides a detailed explanation of our SE-DenseNet-HP model.Section 4 presents the experiments and results of our work.Finally, Section 5 provides the conclusion of our study.

Related Works
A finger vein recognition system is composed of four components: image acquisition, image preprocessing, feature extraction, and matching [35].After acquiring vein images with NIR optical imaging techniques, quality enhancement is accomplished through several preprocessing steps, such as image filtering, ROI extraction, and image normalization.Then, feature extraction is performed on the enhanced images to extract discriminative features from individual vein images.Using the feature-matching technique from the extracted vein features, biometric recognition can be performed.Several studies have been conducted in this regard to develop finger vein recognition systems using rule-based methods, ML, and DL.
Rule-based methods often extract finger vein patterns using handcrafted feature extraction methods, and they utilize matching techniques based on extracted finger vein features.Miura et al. used a repeated line tracking (RLT) [14] feature extraction method to recognize finger veins.In that method, finger vein patterns were extracted using randomly varying start points.The same authors later proposed a new method for extracting vein patterns by detecting local maximum curvatures in the cross-sectional profiles of finger vein images with varying widths and brightness levels [15].Kumar et al. [16] demonstrated a method for extracting finger veins using Gabor filters.Lee et al. [17] proposed a feature fusion method for finger veins that employs simple binarization, a local binary pattern (LBP), and a local derivative pattern (LDP).Features are extracted based on binary codes acquired by comparing the intensity value of the center pixel to the neighboring pixels.After extracting the vein features, the above approaches employ distance-based feature-matching algorithms, such as Hamming distance, Euclidean distance cross-correlation matching, and template matching.However, finger vein recognition based on the rule-based approach can work well only with specific NIR images and related system parameters [18].As various NIR imaging devices and image-capture environments should be considered in biometric applications, rule-based approaches show limitations in generalization performance.
Several methods in ML algorithms have been used for finger vein recognition.He et al. [19] used PCA to extract key features of finger veins, such as shape and orientation and then constructed a multilayer neural network classifier for finger vein recognition.Khellat-Kihel et al. [20] proposed a support vector machine (SVM)-based finger vein recognition system.In their paper, they utilized a Gabor filtering method for feature extraction and an SVM algorithm for both classification and recognition.A finger vein recognition algorithm combining kernel principal component analysis (KPCA) and a weighted k-nearest centroid neighbor (WKNCN) was proposed by Mobarakeh et al. [21].Kumar et al. [22] proposed integrated responses of texture (IRT) for feature extraction and a multi-SVM for classification.However, these methods are still inferior to DL methods because they are less robust to variations in acquired finger vein patterns caused by rotation, scale, skin scattering, uneven illumination, and other factors [26].
Recently, DL techniques have had a huge impact in various fields, including biometrics system recognition, like finger vein authentication.DL can learn the finger vein patterns from acquired images in a reliable manner, even with variations in the shape and orientation of the acquired finger vein images.Wenming et al. [23] used a generative adversarial network (GAN) for finger vein extraction and recognition.Qin and El-Yacoubi [24] utilized a deep neural network (DNN) for finger vein quality assessment and recognition.Das et al. [25] used a CNN to produce steady and highly accurate results when dealing with finger vein images of varying quality.However, the CNN they used had only five convolutional layers, which is not sufficient for robust feature extraction.To obtain better recognition accuracy, FVR-Net (a CNN with an HP technique) was used by Tamang and Kim [26].The HP strategy used in their study exploits max pooling and average pooling in parallel.Max pooling generates sharp features that identify high activation values within the feature map.In contrast, average pooling is a more generalized computation that encourages networks to identify the complete extent of their input volume.However, the image preprocessing steps used in their study cannot guarantee high-quality vein patterns.Additionally, the FVR-Net lacks sufficient learning capabilities to recognize complex patterns in the finger vein images owing to the shallow network depth.Therefore, advanced preprocessing methods and a DNN architecture need to be introduced to enhance the performance of finger vein recognition.

Data Preprocessing
Due to the restrictions of imaging environments, captured finger vein images easily degrade due to low contrast and image blur.This degradation occurs because of light scattering when imaging finger tissue, uneven muscle thickness within the finger, and physiological changes such as incorrect finger placement [28].As a result, acquired images contain unsatisfactory illumination or strong noise distortions.Therefore, the finger vein images must be preprocessed before being sent to the neural network for training and testing [26].The preprocessing stage is designed to remove any nonideal entities that appear in the vein images, such as noise, shadows, and low contrast, which are caused by a rotational or translational variant property of the finger or a faulty acquisition device.
A block diagram of the image preprocessing stage used in the proposed scheme is illustrated in Figure 1.First, ROI extraction is carried out by identifying the pixel areas where finger vein patterns are located.By obtaining the ROI that contains the most salient finger vein information from the input image and eliminating unnecessary regions such as the background, finger vein recognition systems can increase the accuracy in pattern recognition [36].The contrast between finger vein patterns and the background is then improved using the contrast limited adaptive histogram equalization (CLAHE) contrast enhancement method [37], making the finger vein pattern more distinct from the background and easier to extract [26].The contrast-enhanced ROI image is then subjected to a median blur filter, which eliminates unnecessary salt-and-pepper noise that might be present in finger vein images [38,39].In the subsequent image preprocessing step, adaptive thresholding [40] is applied to the noise-free enhanced ROI image to separate the finger vein patterns from the background.The extraction of vein patterns from finger vein images can be challenging due to uneven lighting and shadows created during the image acquisition process.The adaptive thresholding technique overcomes this difficulty by determining the threshold value locally for each pixel or region, which enables the threshold to adapt dynamically to illumination fluctuations present in various regions of the image [41].Hence, adaptive thresholding facilitates a more precise separation of finger vein patterns from the background.Finally, morphological closing, which incorporates dilation followed by erosion, is applied to the binarized image to enhance and refine the acquired finger vein patterns.These processes aid in enhancing the quality of the binary image produced by the adaptive thresholding method and make it easier to extract vein features with more reliability and accuracy.
enhancement method [37], making the finger vein pattern more distinct from the back ground and easier to extract [26].The contrast-enhanced ROI image is then subjected to median blur filter, which eliminates unnecessary salt-and-pepper noise that might be pre sent in finger vein images [38,39].In the subsequent image preprocessing step, adaptiv thresholding [40] is applied to the noise-free enhanced ROI image to separate the finge vein patterns from the background.The extraction of vein patterns from finger vein im ages can be challenging due to uneven lighting and shadows created during the imag acquisition process.The adaptive thresholding technique overcomes this difficulty by de termining the threshold value locally for each pixel or region, which enables the threshold to adapt dynamically to illumination fluctuations present in various regions of the imag [41].Hence, adaptive thresholding facilitates a more precise separation of finger vein pat terns from the background.Finally, morphological closing, which incorporates dilation followed by erosion, is applied to the binarized image to enhance and refine the acquired finger vein patterns.These processes aid in enhancing the quality of the binary image pro duced by the adaptive thresholding method and make it easier to extract vein feature with more reliability and accuracy.The ROI extraction process to remove unnecessary background from a sample imag obtained during image acquisition is shown in Figure 2. The original finger vein image i first subjected to a Sobel edge detector, which identifies the ROI in the image of the finger A global thresholding technique is then used to convert the detected edges into a binar image format.This binarization of edge detection from the image provides a clear separa tion of edge and nonedge regions more accurately.The finger vein patterns may becom distorted or misaligned when finger vein images are acquired at various rotation angle during the image acquisition process.Due to this misalignment, the accuracy of finge recognition can be reduced.Hence, in the subsequent process, rotation correction is ap plied to the binarized image to standardize the orientation of finger vein patterns and minimize the impact of rotational variations during finger vein recognition.Finally, th rotated image is cropped to a dimension of 100 × 300 pixels for extracting the ROI.The ROI extraction process to remove unnecessary background from a sample image obtained during image acquisition is shown in Figure 2. The original finger vein image is first subjected to a Sobel edge detector, which identifies the ROI in the image of the finger.A global thresholding technique is then used to convert the detected edges into a binary image format.This binarization of edge detection from the image provides a clear separation of edge and nonedge regions more accurately.The finger vein patterns may become distorted or misaligned when finger vein images are acquired at various rotation angles during the image acquisition process.Due to this misalignment, the accuracy of finger recognition can be reduced.Hence, in the subsequent process, rotation correction is applied to the binarized image to standardize the orientation of finger vein patterns and minimize the impact of rotational variations during finger vein recognition.Finally, the rotated image is cropped to a dimension of 100 × 300 pixels for extracting the ROI.
The extracted ROI is further preprocessed to obtain clear finger vein patterns, as shown in Figure 3.After extracting the ROI from the original input image, the contrast between the finger vein patterns and surrounding background is increased using CLAHE.A median filter is then applied to the contrast-enhanced image, using the median value of adjacent pixels to remove unnecessary salt-and-pepper noise that may be present in the finger vein images.The median filter preserves edge features and fine details in images better than a mean filter because it replaces each pixel value with the median value of the neighboring pixels rather than the mean [38].Subsequently, we apply an adaptive thresholding method to the noise-free image.Based on changes in local image intensity, adaptive thresholding divides images into foreground and background areas [40,41].This is particularly useful in finger vein recognition systems because it separates the finger vein patterns from their surroundings.As it computes the threshold value locally at each pixel based on neighboring pixel values, it can adapt to local variations in the illumination and contrast seen in the finger vein images.Following adaptive thresholding, finger vein images might have small gaps in the connectivity of finger vein patterns.A morphological closing operation closes these gaps by extending veins to cover the gaps and then contracting them to the previous size.This enhances the finger vein patterns obtained from adaptive thresholding by enhancing connectivity in the patterns.Hence, the binarized finger vein patterns obtained after adaptive thresholding are subsequently enhanced using morphological closing, which incorporates dilation followed by erosion.After the completion of the image preprocessing stage, the obtained finger vein images are passed to the SE-DenseNet-HP model for robust feature extraction of the preprocessed finger vein patterns.The extracted ROI is further preprocessed to obtain clear finger vein patterns, as shown in Figure 3.After extracting the ROI from the original input image, the contrast between the finger vein patterns and surrounding background is increased using CLAHE.A median filter is then applied to the contrast-enhanced image, using the median value of adjacent pixels to remove unnecessary salt-and-pepper noise that may be present in the finger vein images.The median filter preserves edge features and fine details in images better than a mean filter because it replaces each pixel value with the median value of the neighboring pixels rather than the mean [38].Subsequently, we apply an adaptive thresholding method to the noise-free image.Based on changes in local image intensity, adaptive thresholding divides images into foreground and background areas [40,41].This is particularly useful in finger vein recognition systems because it separates the finger vein patterns from their surroundings.As it computes the threshold value locally at each pixel based on neighboring pixel values, it can adapt to local variations in the illumination and contrast seen in the finger vein images.Following adaptive thresholding, finger vein images might have small gaps in the connectivity of finger vein patterns.A morphological closing operation closes these gaps by extending veins to cover the gaps and then contracting them to the previous size.This enhances the finger vein patterns obtained from adaptive thresholding by enhancing connectivity in the patterns.Hence, the binarized finger vein patterns obtained after adaptive thresholding are subsequently enhanced using morphological closing, which incorporates dilation followed by erosion.After the completion of the image preprocessing stage, the obtained finger vein images are passed to the SE-DenseNet-HP model for robust feature extraction of the preprocessed finger vein patterns.The extracted ROI is further preprocessed to obtain clear finger vein patterns, as shown in Figure 3.After extracting the ROI from the original input image, the contrast between the finger vein patterns and surrounding background is increased using CLAHE.A median filter is then applied to the contrast-enhanced image, using the median value of adjacent pixels to remove unnecessary salt-and-pepper noise that may be present in the finger vein images.The median filter preserves edge features and fine details in images better than a mean filter because it replaces each pixel value with the median value of the neighboring pixels rather than the mean [38].Subsequently, we apply an adaptive thresholding method to the noise-free image.Based on changes in local image intensity, adaptive thresholding divides images into foreground and background areas [40,41].This is particularly useful in finger vein recognition systems because it separates the finger vein patterns from their surroundings.As it computes the threshold value locally at each pixel based on neighboring pixel values, it can adapt to local variations in the illumination and contrast seen in the finger vein images.Following adaptive thresholding, finger vein images might have small gaps in the connectivity of finger vein patterns.A morphological closing operation closes these gaps by extending veins to cover the gaps and then contracting them to the previous size.This enhances the finger vein patterns obtained from adaptive thresholding by enhancing connectivity in the patterns.Hence, the binarized finger vein patterns obtained after adaptive thresholding are subsequently enhanced using morphological closing, which incorporates dilation followed by erosion.After the completion of the image preprocessing stage, the obtained finger vein images are passed to the SE-DenseNet-HP model for robust feature extraction of the preprocessed finger vein patterns.

SE-DenseNet-HP Architecture
Due to the limitations on the generalization performance of rule-based and ML approaches, we adopted a DL-based finger recognition system.For feature extraction and

SE-DenseNet-HP Architecture
Due to the limitations on the generalization performance of rule-based and ML approaches, we adopted a DL-based finger recognition system.For feature extraction and classification, we use the backbone network of a DenseNet169 architecture [30] that incorporates an HP strategy and channel attention mechanism.DenseNet optimizes the flow of information and gradients between neural network layers by concatenating feature maps from all preceding layers with the same spatial dimension.The SE [31] module-based channel attention mechanism enables the network to adaptively recalibrate the relevance of different feature channels, allowing it to focus on more discriminative characteristics of finger vein patterns.HP combines the average pooling method, which achieves improved feature localization, with the max pooling strategy, which activates the most discrete features of the input.
In the DenseNet structure, each layer receives additional input from all the layers that precede it and passes on its own feature maps to all subsequent layers with the same spatial dimension.Consider an input image, x 0 , that passes through a neural network of n layers.Every lth layer in the network consists of a nonlinear transformation, H l (.), which includes pooling, batch normalization (BN), a rectified linear unit (ReLU), or convolution.The output of the (l − 1)th layer, denoted as x l−1 , is connected as an input to the lth layer in traditional convolutional feed-forward networks, as shown in (1): It is known that ResNet [42] bypasses nonlinear transformations with an identity function by incorporating a skip connection.In ResNet, the gradient can flow directly from latter layers to earlier layers through the identity function, as shown in (2): (2) However, the summation is used to combine the identity function and the output of H l , which may impede information flow in the network.To improve information flow between layers even further, DenseNet utilizes direct connections from any layer to all subsequent layers.As a result, the feature maps of all preceding layers, x 0 , . .., x l−1 , are passed to the lth layer as the input, which is shown in (3): where [x 0 , x 1 , ..., x l−1 ] denotes the concatenation of feature maps produced in the 0, . .., l − 1 layers.In DenseNet, the number of channels generated by feature map concatenation in deeper layers increases significantly if the size of the feature maps remains the same.This might result in an increase in the computational and memory costs of the network.Hence, to address this issue, downsampling layers are used in DenseNet to decrease the size of feature maps.The downsampling layers divide the DenseNet architecture into numerous dense blocks, each consisting of feature maps of different sizes.Figure 4 illustrates feature map concatenation within a single dense block.As shown in Figure 4a, each layer inside a dense block is connected to its preceding layers to improve information flow between layers and resolve the vanishing gradient problem.However, as the number of layers within a dense block increases, the number of channels in the concatenated feature maps significantly increases.This can result in a network that is both computationally expensive and memory-intensive.To solve this issue, a bottleneck layer is introduced in the dense block.The bottleneck layer consists of a BN, an ReLU, and a 1 × 1 convolution, followed by another BN, ReLU, and 3 × 3 convolution, as shown in Figure 4. First, the input feature maps are fed into the BN layer, and nonlinearity is added to the normalized input by the ReLU activation function.Then, the 1 × 1 convolution operation is used to reduce the amount of input feature maps and improve computational efficiency.This is followed by another BN layer and ReLU activation.The 3 × 3 convolution operation applied to the input feature maps helps in the extraction of high-level features from the normalized input data.
The SE module in Figure 4c is utilized after the bottleneck layer to recalibrate the importance of different feature map channels.The reason for using an SE module lies in the fact that different channels within feature maps carry different amounts of meaningful information.Standard DNNs treat all channels equally and independently, and they may be unable to capture and highlight the most discriminating elements of feature maps.By explicitly modeling the channel dependencies and recalibrating their relative importance, the SE module shows robust performance in distinguishing important characteristics of finger vein patterns.The SE module's initial squeeze operation, which uses global average pooling, compresses each feature channel's spatial dimensions to a single value.The channel descriptor, which is the result of this pooling operation, gathers channel-specific information.Hence, each feature channel's significance or relevance inside a feature map is described by the channel descriptor.The excitation mechanism takes the channel descriptor and learns a set of adaptive weights that represent the importance of each channel.These weights capture the relevance of each channel in relation to the others.The original feature map is then rescaled using the adaptive weights that were generated during the excitation mechanism.Hence, by using the SE module inside a dense block, the SE-DenseNet-HP architecture selectively emphasizes the differentiating characteristics of finger vein patterns while suppressing less important ones during the feature extraction process by multiplying each channel by its appropriate weight.Lastly, the input feature maps of all preceding layers are concatenated with the output feature maps of each layer inside a dense block, as shown in Figure 4.This feature map concatenation improves gradient flow through the network, resulting in higher performance accuracy of the finger vein recognition system.The SE module in Figure 4c is utilized after the bottleneck layer to recalibrate the importance of different feature map channels.The reason for using an SE module lies in the fact that different channels within feature maps carry different amounts of meaningful information.Standard DNNs treat all channels equally and independently, and they may be unable to capture and highlight the most discriminating elements of feature maps.By explicitly modeling the channel dependencies and recalibrating their relative importance, the SE module shows robust performance in distinguishing important characteristics of finger vein patterns.The SE module's initial squeeze operation, which uses global average pooling, compresses each feature channel's spatial dimensions to a single value.The channel descriptor, which is the result of this pooling operation, gathers channel-specific information.Hence, each feature channel's significance or relevance inside a feature map is described by the channel descriptor.The excitation mechanism takes the channel descriptor and learns a set of adaptive weights that represent the importance of each channel.These weights capture the relevance of each channel in relation to the others.The original feature map is then rescaled using the adaptive weights that were generated during the excitation mechanism.Hence, by using the SE module inside a dense block, the SE-DenseNet-HP architecture selectively emphasizes the differentiating characteristics of finger vein patterns while suppressing less important ones during the feature extraction As the depth of the DenseNet-based architecture is large, the number of channels generated by feature map concatenation is substantially huge, resulting in an increase in the computational and memory cost of the network.Hence, transitional blocks are placed between the dense blocks in SE-DenseNet-HP to reduce the size of feature maps obtained from the dense blocks.Figure 5 illustrates a block diagram of the transitional blocks used in SE-DenseNet-HP.The output of the prior dense block is fed as an input to the transitional block.First, the BN layer and ReLU activation function are used to normalize the input and provide nonlinearity, respectively.Then, an HP layer is followed by a 1 × 1 convolution layer to reduce the size of the feature maps.The lower resolution output of the transitional block is fed to the subsequent dense block.The original DenseNet [30] uses 2 × 2 average pooling layers for the pooling strategy in transitional layers.However, average pooling uses the average of all the values in each pooling region [41], which can lead to the loss of discriminative information in finger vein patterns.Therefore, in our research, we propose an HP method for the pooling strategy in transitional layers.
in SE-DenseNet-HP.The output of the prior dense block is fed as an input to the transitional block.First, the BN layer and ReLU activation function are used to normalize the input and provide nonlinearity, respectively.Then, an HP layer is followed by a 1 × 1 convolution layer to reduce the size of the feature maps.The lower resolution output of the transitional block is fed to the subsequent dense block.The original DenseNet [30] uses 2 × 2 average pooling layers for the pooling strategy in transitional layers.However, average pooling uses the average of all the values in each pooling region [41], which can lead to the loss of discriminative information in finger vein patterns.Therefore, in our research, we propose an HP method for the pooling strategy in transitional layers.HP is a feature pooling strategy that integrates the advantages of both max pooling and average pooling.In contrast to average pooling, which selects the average of all values in each pooling region, max pooling is a technique that takes the maximum value in each pooling region [43].Thus, the HP approach can capture both the overall structure of the input feature maps and the most discriminative features by combining both pooling techniques.In our experiment, the HP strategy combines the output from average pooling and max pooling layers, which are concatenated and then passed to the subsequent layer.
Using the above-mentioned dense block and transitional block, the overall architecture of SE-DenseNet-HP is presented in Figure 6.As can be seen, the proposed network consists of four dense blocks, three transitional blocks, and a classification layer.First, the input image passes through a 5 × 5 convolution layer, followed by an HP operation.The SE-DenseNet-HP architecture creates an initial feature map representation of the input finger vein image using the first convolution operation.The HP strategy reduces the dimension of the feature maps obtained after the first convolution operation.The output from the HP operation is used as an input for the first dense block.Dense blocks are the main blocks of the SE-DenseNet-HP architecture.Each dense block consists of multiple bottleneck layers and SE modules.The bottleneck layer reduces the amount of input feature maps and improves computational efficiency.During the feature extraction process, the SE module selectively emphasizes distinctive characteristics of finger vein patterns while suppressing less significant ones.Finally, each layer's output feature maps obtained after the SE operation are concatenated with the input feature maps of every subsequent layer inside a dense block to improve network performance by enhancing information flow between these layers.Transitional blocks in the SE-DenseNet-HP architecture are used as connectors between dense blocks, changing the size of feature maps and managing the flow of information throughout the network.After Dense Block 4, the feature map is fed into the classification layer to generate a one-dimensional vector as output.The output vector consists of the probabilities that the input data belong to one of the S classes.Then, the model selects the class with the highest probability from the output vector as the prediction result for finger vein recognition.The detailed architecture of SE-DenseNet-HP is presented in Table 1.HP is a feature pooling strategy that integrates the advantages of both max pooling and average pooling.In contrast to average pooling, which selects the average of all values in each pooling region, max pooling is a technique that takes the maximum value in each pooling region [43].Thus, the HP approach can capture both the overall structure of the input feature maps and the most discriminative features by combining both pooling techniques.In our experiment, the HP strategy combines the output from average pooling and max pooling layers, which are concatenated and then passed to the subsequent layer.
Using the above-mentioned dense block and transitional block, the overall architecture of SE-DenseNet-HP is presented in Figure 6.As can be seen, the proposed network consists of four dense blocks, three transitional blocks, and a classification layer.First, the input image passes through a 5 × 5 convolution layer, followed by an HP operation.The SE-DenseNet-HP architecture creates an initial feature map representation of the input finger vein image using the first convolution operation.The HP strategy reduces the dimension of the feature maps obtained after the first convolution operation.The output from the HP operation is used as an input for the first dense block.Dense blocks are the main blocks of the SE-DenseNet-HP architecture.Each dense block consists of multiple bottleneck layers and SE modules.The bottleneck layer reduces the amount of input feature maps and improves computational efficiency.During the feature extraction process, the SE module selectively emphasizes distinctive characteristics of finger vein patterns while suppressing less significant ones.Finally, each layer's output feature maps obtained after the SE operation are concatenated with the input feature maps of every subsequent layer inside a dense block to improve network performance by enhancing information flow between these layers.Transitional blocks in the SE-DenseNet-HP architecture are used as connectors between dense blocks, changing the size of feature maps and managing the flow of information throughout the network.After Dense Block 4, the feature map is fed into the classification layer to generate a one-dimensional vector as output.The output vector consists of the probabilities that the input data belong to one of the S classes.Then, the model selects the class with the highest probability from the output vector as the prediction result for finger vein recognition.The detailed architecture of SE-DenseNet-HP is presented in Table 1.

Experiments and Results
In this section, we evaluate the performance of finger vein recognition from the proposed model on four publicly available datasets: HKPU [16], FVUSM [32], UTFVP [33], and MMCBNU_6000 [34].As the number of images per class is not enough to train the model, we utilized a data augmentation technique to increase the number of training images.Initially, we used recognition accuracy for good-and poor-quality finger vein images for HKPU and FVUSM, as in [26].To prove that our proposed model can generalize better in additional datasets compared to the previous model used in [26], we used UTFVP and MMCBNU_6000, along with HKPU and FVUSM datasets, for computing the EER and receiver operating characteristic (ROC) curve.All experiments were conducted using Tensorflow 2.10.1 in Python on a Windows 10 operating system with Nvidia GeForce RTX 3080 Ti GPU and 32 GB memory.

Experiment Datasets
The effectiveness of our proposed finger vein model was tested on the HKPU, FVUSM, MMCBNU_6000, and UTFVP datasets.The HKPU dataset comprises finger vein images from male and female volunteers obtained from a contactless imaging device.The dataset was largely compiled between April 2009 and March 2010.It contains 3132 images from 156 classes, with the first 105 images captured in two separate sessions.The shortest, longest, and average time between sessions was one month, more than six months, and 66.8 days, respectively.Each of the 105 participants gave six samples of their index and middle fingers.The remaining 51 classes' images were captured in a single session.In our study, we used all 156 classes-a single session from the first 105 classes in addition to the single session from the remaining 51 classes-yielding a total of 1872 images.Each finger was treated as a separate class, yielding a total of 312 classes.Each class consists of six finger vein images.Figure 7a,b illustrate sample original HKPU images along with their preprocessed images, respectively.
The FVUSM dataset contains finger vein images from 123 participants with ages ranging from 20 to 52 years.Each person provided images of four fingers-left index, left middle, right index, and right middle-for a total of 492 finger classes.Each finger was imaged six times in one session, and each person took part in two sessions separated by more than two weeks.From the two sessions, 5904 images from 492 finger classes were obtained.In our study, we used only one session with a total of 2952 images.Sample original images, along with their corresponding preprocessed images, are shown in Figure 7c,d, respectively.The FVUSM dataset contains finger vein images from 123 participants with ages ranging from 20 to 52 years.Each person provided images of four fingers-left index, left middle, right index, and right middle-for a total of 492 finger classes.Each finger was imaged six times in one session, and each person took part in two sessions separated by more than two weeks.From the two sessions, 5904 images from 492 finger classes were A total of 1440 finger vascular pattern images are included in the UTFVP dataset.These images were gathered from 60 volunteers at the University of Twente.The collection process involved capturing images in two separate sessions, with an average time gap of 15 days.The vascular patterns of the index, ring, and middle fingers of both hands were collected twice during each session.In each session, two images were collected for each finger.The captured images had a resolution of 672 × 380 pixels.In this study, all 1440 images from both sessions resulted in a total of 360 finger classes, with each class having 4 image samples.Figure 7e,f display the original sample images alongside their corresponding preprocessed images, respectively, for the UTFVP dataset.
The MMCBNU_6000 dataset comprises finger vein images that were captured from a group of 100 volunteers originating from 20 different countries.Each participant was requested to provide images of their index finger, middle finger, and ring finger from both hands during the image-capturing process.To ensure an adequate number of samples, each person's collection of six fingers was ten times, resulting in a total of ten finger vein images per individual.Consequently, the finger vein database contains a total of 6000 images.Each image had dimensions of 480 × 640 pixels.In this study, we used all 6000 images from the MMCBNU_6000 database, resulting in 360 finger classes.Figure 7g,h illustrate the sample original images with their corresponding preprocessed images for MMCBNU_6000.
The quality of finger vein patterns is crucial for obtaining high accuracy from a finger vein recognition system.To analyze the performance of SE-DenseNet-HP against finger vein images of varied quality, the HKPU and FVUSM datasets were further divided into good-quality and poor-quality images based on the image quality of the captured finger vein image and the preprocessed image.Out of 312 classes from the HKPU dataset, 66 were classified as good quality, and the rest were classified as poor quality.Similarly, based on the visual quality of the preprocessed images from the FVUSM dataset, 231 classes were classified as good quality, and the rest were classified as poor quality.As shown in Figure 8, good-quality and poor-quality images from both datasets had different levels of contrast between finger vein patterns and their surroundings.The finger veins in goodquality original images from both datasets were more distinctive from their surroundings compared to poor-quality original images, as shown in Figure 8a,c,e,g.Therefore, the finger vein patterns can be more clearly separated from their background in the preprocessing stage for good-quality original images compared to poor-quality original images, as seen in Figure 8b,d,f,h.

Training, Validation, and Test Datasets
In all datasets, there are a limited number of images per class, which is insufficient for training the proposed SE-DenseNet-HP model.To increase the number of image samples per class in the training stage, an intraclass data augmentation technique was used.The primary goal of data augmentation is to expand and enrich the examples of finger vein images so that deformations in different vein patterns can be reflected in a real-world scenario.First, four original images per class from HKPU and FVUSM, three original images per class from UTFVP, and five original images per class from MMCBNU_6000 were selected for training, and the rest were separated for testing.Then, we applied 0.01 width and height pixel shifts (left to right and up to down) on the training images.To cross-validate the proposed model performance during the training stage, the training data were further divided into training and validation sets at a ratio of 7:3.

Training Stage
To train the model, the preprocessed finger vein images for both datasets were fed into the SE-DenseNet-HP architecture.A cross-entropy loss function was used to calculate the dissimilarity between model predictions and actual ground truth labels.The loss function was minimized using the Adam optimizer with an initial learning rate of 0.001 to achieve faster convergence.A batch size of 16 and 50 epochs was used during training.Furthermore, early stopping was used with a patience of 10 steps to stop the learning mechanism when there is a saturation in validation loss.

Training, Validation, and Test Datasets
In all datasets, there are a limited number of images per class, which is insufficient for training the proposed SE-DenseNet-HP model.To increase the number of image samples per class in the training stage, an intraclass data augmentation technique was used.The primary goal of data augmentation is to expand and enrich the examples of finger vein images so that deformations in different vein patterns can be reflected in a real-world scenario.First, four original images per class from HKPU and FVUSM, three original images per class from UTFVP, and five original images per class from MMCBNU_6000 were selected for training, and the rest were separated for testing.Then, we applied 0.01 width and height pixel shifts (left to right and up to down) on the training images.To cross-

Testing Stage
To test the recognition performance of SE-DenseNet-HP, we compared the recognition accuracy, ROC curve, and EER with existing works conducted on the HKPU, FVUSM, UTFVP, and MMCBNU_6000 datasets.And we performed an ablation study, which aims to address the advancements in the use of the DenseNet model in finger vein recognition by incorporating attention mechanisms and custom pooling strategies and assessing their impact on the model's performance.For this, we assessed the recognition accuracy performance of SE-DenseNet-HP in comparison with the original DenseNet and DenseNet-HP, which consist of DenseNet with HP but without SE.Among the various components that can be considered to perform an ablation study, an SE block is considered to have a decisive impact on finger vein recognition performance.The integration of the SE mechanism introduces a channel attention mechanism that selectively highlights features that are relevant to finger vein patterns while suppressing those that are less important.This combination leads to a more distinct representation of finger vein patterns and contributes to the enhanced recognition performance of our model.In addition, the recognition accuracy of SE-DenseNet-HP was compared with existing works including a DNN [24], FVR-Net [26], P-SVM [44], and NN [45], as shown in Table 2. Recognition accuracy values for P-SVM, NN, DNN and FVR-Net were adapted from [26].As shown in Table 2, SE-DenseNet-HP has an overall better performance compared to DenseNet-HP and DenseNet, especially for the HKPU dataset.It outperforms DenseNet (86.57%) and DenseNet-HP (92.53%), with a recognition accuracy of 93.28% for high-quality vein images in HKPU.It outperforms DenseNet (80.20%) and DenseNet-HP (84.08%) in recognition accuracy, even in the poor-quality images of HKPU.The obtained results highlight that the integration of SE and the HP mechanism improves the overall performance of the original DenseNet.The channel attention mechanism based on the SE module enhances the feature extraction process of the SE-DenseNet-HP architecture by putting more emphasis on the feature map channels that contain discriminative characteristics of the finger vein patterns.The HP architecture used in the transitional blocks of SE-DenseNet-HP concatenates the average pooling method with the max pooling strategy to preserve both the most discriminative and contextual information.For good-quality vein patterns from the FVUSM dataset, SE-DenseNet-HP obtained 32.87%, 37.00%, 30.02%, and 1.51% higher accuracy than PS-VM, NN, DNN, and FVR-Net, respectively.With the HKPU dataset, SE-DenseNet-HP gained an increment of 12.71%, 11.84%, 8.69%, and 0.10% in recognition accuracy compared with the above-mentioned schemes.For poor-quality vein patterns from the FVUSM dataset, SE-DenseNet-HP outperformed the previously mentioned baseline models and obtained 33.52%, 44.23%, 30.66%, and 2.01% higher recognition accuracy.The higher recognition accuracy obtained by SE-DenseNet-HP is explained by its robust image preprocessing and feature extraction mechanism.
Figure 9 illustrates the comparison of the preprocessed images from the SE-DenseNet-HP and FVR-Net [26].The original sample images corresponding to three classes in the HKPU dataset are shown in Figure 9a(i,ii,iii).These images illustrate the raw input data prior to any image preprocessing techniques being applied.Figure 9b,c show the finger vein patterns obtained after preprocessing the raw input images corresponding to Figure 9a in SE-DenseNet-HP and FVR-Net, respectively.Similarly, Figure 9d-f show the original sample images for three classes from the FVUSM dataset and the corresponding finger vein patterns obtained after preprocessing in SE-DenseNet-HP and FVR-Net.As shown in Figure 9b,c,e,f, the finger veins patterns obtained after preprocessing in SE-DenseNet-HP have more detailed information compared to preprocessing in FVR-Net for both the HKPU and FVUSM datasets.This shows that the preprocessing steps in SE-DenseNet-HP are better at extracting the vein patterns compared to the preprocessing method employed in FVR-Net, resulting in enhanced finger vein recognition.
the HKPU dataset are shown in Figure 9a(i,ii,iii).These images illustrate the raw input data prior to any image preprocessing techniques being applied.Figure 9b,c show the finger vein patterns obtained after preprocessing the raw input images corresponding to Figure 9a in SE-DenseNet-HP and FVR-Net, respectively.Similarly, Figure 9d-f show the original sample images for three classes from the FVUSM dataset and the corresponding finger vein patterns obtained after preprocessing in SE-DenseNet-HP and FVR-Net.As shown in Figure 9b,c,e,f, the finger veins patterns obtained after preprocessing in SE-DenseNet-HP have more detailed information compared to preprocessing in FVR-Net for both the HKPU and FVUSM datasets.This shows that the preprocessing steps in SE-DenseNet-HP are better at extracting the vein patterns compared to the preprocessing method employed in FVR-Net, resulting in enhanced finger vein recognition.To construct the ROC curve and calculate the EER, the true acceptance rate (TAR), which is TAR = 1 -the false rejection rate (FRR), is plotted against the false acceptance rate (FAR) at different threshold settings.Note that a decision threshold in a finger vein recognition system is the boundary threshold determining whether a sample finger vein pattern matches the previously stored pattern obtained from a person's finger veins.EER is the threshold value at which the system achieves an equal balance of false positives and false negatives.It denotes the point at which the FRR and FAR are equal.The FRR is the percentage of authorized users that the system rejects, and the FAR is the percentage of imposters that the system accepts.When the decision threshold is lower than the EER, a finger vein recognition system has a low level of security owing to a high FAR.However, when the decision threshold is higher than the EER, the FRR increases as more authorized users are identified as imposters.Therefore, the EER is a trade-off between security and acceptability, which determines the most optimal decision-making threshold in the system.For a finger vein recognition application, lower EER values are preferable because the system is more accurate at identifying the difference between genuine and imposter finger vein patterns.To compare the EER obtained from the proposed SE-DenseNet-HP model with FVR-Net, we used the same image preprocessing technique as above in this study.
Figure 10 illustrates the ROC curve depicting the TAR against the FAR for the FVUSM, HKPU, UTFVP, and MMCBNU_6000 datasets.A curve that is positioned nearer to the upper-left corner of the ROC graph signifies the superior performance of a system.As seen in Figure 10, SE-DenseNet-HP has higher TAR values compared to FVR-Net for FAR values less than 10 −1 .The above ROC curve suggests that the proposed SE-DenseNet-HP model performed better than FVR-Net for all datasets.This is further justified by the EER obtained with all datasets.For example, the proposed model was able to obtain an EER of 0.03% for FVUSM, which is significantly lower than the EER obtained with FVR-Net (0.68%).
As seen in Figure 10, SE-DenseNet-HP has higher TAR values compared to FVR-Net for FAR values less than 10 −1 .The above ROC curve suggests that the proposed SE-DenseNet-HP model performed better than FVR-Net for all datasets.This is further justified by the EER obtained with all datasets.For example, the proposed model was able to obtain an EER of 0.03% for FVUSM, which is significantly lower than the EER obtained with FVR-Net (0.68%).To further observe the trade-off between security and acceptability, we compared the EERs with variations of the proposed SE-DenseNet-HP model and existing works.At first, we conducted another ablation study to further enhance the SE-DenseNet-HP architecture by incorporating the group convolution (cardinality) used in ResNext [46] and reducing the skip connections of DenseNet, called Log-DenseNet [47], inside the dense block.The SE-DenseNet-HP with reduced skip connection is named Log-SE-DenseNet-HP.By doing this, we can identify the most effective way to enhance the feature extraction performance of finger veins using DenseNet with an SE mechanism and HP strategy.In the ablation study, the SE-DenseNet-HPs with cardinality 16 and 32 are named SE-DenseNet-HP-16 and SE-DenseNet-HP-32, respectively.Table 3 illustrates the EER comparison of SE-DenseNet-HPs by modifying the cardinality and skip connection process.As illustrated in Table 3, SE-DenseNet-HP-16 and SE-DenseNet-HP-32 achieved a better performance by achieving lower EERs of 0.16% and 0.32% compared to SE-DenseNet-HP (0.43%) in the UTFVP dataset.Conversely, for the HKPU, FVUSM, and MMCBNU_6000 datasets, SE-DenseNet-HP-16 shows an increase in EER values by 0.07%, 0.3%, and 0.37%, while SE-DenseNet-HP-32 demonstrates increases of 0.07%, 0.86%, and 0.54% compared to SE-DenseNet-HP.The above results show that SE-DenseNet-HP has a better performance than SE-DenseNet-HP-16 and SE-DenseNet-HP-32.Moreover, SE-DenseNet-HP exhibits superior performance compared to Log-SE-DenseNet-HP, Log-SE-DenseNet-HP-16, and Log-SE-DenseNet-HP-32.In the FVUSM dataset, SE-DenseNet-HP outperformed Log-SE-DenseNet-HP, Log-SE-DenseNet-HP-16, and Log-SE-DenseNet-HP-32, with a lower EER of 0.03% compared to EERs of 0.13%, 0.06%, and 0.07%.In the HKPU and MMCBNU_6000 datasets, SE-DenseNet-HP exhibits EERs of 1.81% and 1.80%, which is 0.45% and 0.15% lower than Log-SE-DenseNet-HP.In addition, Log-SE-DenseNet-HP-16 and Log-SE-DenseNet-HP-32 also have higher EERs of 2.59% and 2.56% than SE-DenseNet-HP (1.81%) for the HKPU dataset.In the MMCBNU_6000 dataset, SE-DenseNet-HP has a lower EER of 1.80% compared to Log-SE-DenseNet-HP-16 (2.29%) and Log-SE-DenseNet-HP-32 (2.34%).The obtained results show that SE-DenseNet-HP has superior performance compared to Log-DenseNet-HP.SE-DenseNet-HP's dense connections allow each layer to receive direct input from all preceding layers.In finger vein recognition, this might be beneficial for the improved feature extraction of finger vein patterns with different variations in thickness and orientation.
Table 4 presents an EER comparison of SE-DenseNet-HP with FVR-Net, deep segmentation (DS) [4], BACS-LBP [6], Lu et al. [7], and BMSU-LBP [9].EER values for DS, BACS-LBP, Lu et al. and BMSU-LBP were adapted from [4,6,7,9] respectively.When evaluated on the HKPU dataset, SE-DenseNet-HP surpasses other methods by achieving an EER of 1.81%.Similarly, it achieves a lower EER score in the FVUSM (0.03%), UTFVP (0.43%), and MMCBNU_6000 (1.80%) datasets.In comparison to other existing methodologies, such as deep segmentation (DS) [4], BACS-LBP [6], Lu et al. [7], BMSU-LBP [9], and FVR-Net [24], SE-DenseNet-HP outperforms them by consistently achieving superior EER performance.Note that these methodologies exhibit varying degrees of EERs across different datasets, whereas SE-DenseNet-HP consistently stands out with its lower EER values across all datasets, validating the effectiveness of our proposed finger vein recognition system.The proposed combination of image preprocessing techniques used in this study creates a preprocessing pipeline that can adapt to different environmental conditions specifically for finger vein recognition.By combining the image preprocessing methods and integrating DenseNet, SE block, and HP architecture, our system achieves the enhanced feature representation of finger veins, leading to improved accuracy and generalization performance across various datasets.

Conclusions
In this paper, we proposed a DNN-based network called SE-DenseNet-HP that makes use of DenseNet with HP and an SE-based channel attention mechanism for robust feature extraction and recognition of finger veins.The preprocessed images are then fed into SE-DenseNet-HP for robust feature extraction and recognition.The feature map concatenation in DenseNet improves the information flow between network layers.The SE module added inside the dense blocks uses the channel attention mechanism to emphasize channels containing important finger vein feature information while suppressing other channels that contain less information.HP improves feature extraction performance by acquiring both global and local information from the preprocessed finger vein images.Our experiments show that the SE-DenseNet-HP model can achieve recognition accuracy of up to 99.35% and 93.28% from good-quality vein patterns in the FVUSM and HKPU datasets and outperforms existing work.In addition, it achieves EERs of 0.03%, 1.81%, 0.43%, and 1.80% for the FVUSM, HKPU, UTFVP, and MMCBNU_6000 datasets, respectively, proving its robustness in finger vein recognition compared to previous studies.

Figure 1 .
Figure 1.Block diagram of the image preprocessing stage used in finger vein recognition.

Figure 1 .
Figure 1.Block diagram of the image preprocessing stage used in finger vein recognition.

Figure 2 .
Figure 2. ROI image extraction to remove unnecessary background in a sample image from the FVUSM dataset.

Figure 2 .
Figure 2. ROI image extraction to remove unnecessary background in a sample image from the FVUSM dataset.

Figure 2 .
Figure 2. ROI image extraction to remove unnecessary background in a sample image from the FVUSM dataset.

Figure 3 .
Figure 3. Image preprocessing steps to obtain clear finger vein patterns from an original finger image in HKPU dataset.

Figure 3 .
Figure 3. Image preprocessing steps to obtain clear finger vein patterns from an original finger image in HKPU dataset.

Figure 4 .
Figure 4. Denseblock architecture inside SE-DenseNet-HP.(a) Feature map concatenation inside a dense block of the SE-DenseNet-HP architecture.(b) Detailed structure of bottleneck layer.(c) Detailed structure of SE module.

Figure 4 .
Figure 4. Denseblock architecture inside SE-DenseNet-HP.(a) Feature map concatenation inside a dense block of the SE-DenseNet-HP architecture.(b) Detailed structure of bottleneck layer.(c) Detailed structure of SE module.

Figure 5 .
Figure 5. Block diagram of transitional block used in SE-DenseNet-HP architecture.

Figure 5 .
Figure 5. Block diagram of transitional block used in SE-DenseNet-HP architecture.

Electronics 2024 , 20 Figure 6 .
Figure 6.SE-DenseNet-HP network architecture for finger vein extraction and recognition.The color circles represent feature map concatenation between subsequent layers inside a dense block.

Figure 6 .
Figure 6.SE-DenseNet-HP network architecture for finger vein extraction and recognition.The color circles represent feature map concatenation between subsequent layers inside a dense block.

Electronics 2024 ,
13,  x FOR PEER REVIEW 11 of 20 longest, and average time between sessions was one month, more than six months, and 66.8 days, respectively.Each of the 105 participants gave six samples of their index and middle fingers.The remaining 51 classes' images were captured in a single session.In our study, we used all 156 classes-a single session from the first 105 classes in addition to the single session from the remaining 51 classes-yielding a total of 1872 images.Each finger was treated as a separate class, yielding a total of 312 classes.Each class consists of six finger vein images.Figure7a,b illustrate sample original HKPU images along with their preprocessed images, respectively.

Figure 8 .
Figure 8.Comparison of good-quality and poor-quality finger vein patterns obtained from the HKPU and FVUSM datasets: (a) good-quality original HKPU images; (b) preprocessing of goodquality original HKPU images; (c) poor-quality original HKPU images; (d) preprocessing of poorquality original HKPU images; (e) good-quality original FVUSM images; preprocessing of goodquality original FVUSM images; (g) poor-quality original FVUSM images; (h) preprocessing of poor-quality original FVUSM images.The labels (i) and (ii) denote images selected from two separate classes within the good-quality and poor-quality categories of the HKPU and FVUSM datasets, respectively.

Figure 8 .
Figure 8.Comparison of good-quality and poor-quality finger vein patterns obtained from the HKPU and FVUSM datasets: (a) good-quality original HKPU images; (b) preprocessing of good-quality original HKPU images; (c) poor-quality original HKPU images; (d) preprocessing of poor-quality original HKPU images; (e) good-quality original FVUSM images; (f) preprocessing of good-quality original FVUSM images; (g) poor-quality original FVUSM images; (h) preprocessing of poor-quality original FVUSM images.The labels (i) and (ii) denote images selected from two separate classes within the good-quality and poor-quality categories of the HKPU and FVUSM datasets, respectively.

Figure 10 .
Figure 10.Comparison of EERs (%) obtained from SE-DenseNet-HP and FVR-Net.To further observe the trade-off between security and acceptability, we compared the EERs with variations of the proposed SE-DenseNet-HP model and existing works.At first, we conducted another ablation study to further enhance the SE-DenseNet-HP architecture by incorporating the group convolution (cardinality) used in ResNext[46] and reducing the skip connections of DenseNet, called Log-DenseNet[47], inside the dense block.The SE-DenseNet-HP with reduced skip connection is named Log-SE-DenseNet-HP.By doing this, we can identify the most effective way to enhance the feature extraction performance

Table 2 .
Recognition accuracy comparison of SE-DenseNet-HP with existing methods.